Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Regular Expressions in Python Introduction to Regular Expressions Email Groups

Leo Marco Corpuz
Leo Marco Corpuz
18,975 Points

Email groups

Checking to see if my code makes sense for the email group. Any feedback? Thanks!

emails.py
import re

string = '''Love, Kenneth, kenneth+challenge@teamtreehouse.com, 555-555-5555, @kennethlove
Chalkley, Andrew, andrew@teamtreehouse.co.uk, 555-555-5556, @chalkers
McFarland, Dave, dave.mcfarland@teamtreehouse.com, 555-555-5557, @davemcfarland
Kesten, Joy, joy@teamtreehouse.com, 555-555-5558, @joykesten'''
contacts=re.search(r'^(?P<email>[\w+\W?\w?]@[\w+\W\w+\W?\w?])?',string)

You can use the website: https://pythex.org/ to test various emails. Sometimes an email will contain numbers like (lillie1974@aol.com). I do not think your regex will find those kinds of emails.

2 Answers

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,457 Points

I think you may be confused on defining character sets using square brackets [ ].

Just as \w, \w?, \w*, and \w+ mean a single word character, and optional word character, any number of word characters, and one or more word characters, when using square brackets [ ] to define character set, the modifiers go outside: [ ], [ ]?, [ ]*, [ ]+.

When the a + or ? are used inside a character set it means a literal plus sign or question mark.

Reading you regex says:

^  # Starting anchored at the beginning of the string
(?P<email>  # Start a group named email
[  # Start a character set
\w+\W?\w?  # Set contains any word character,
           # a + sign, a non-word character, a question mark
]  # Look for exactly one character matching this set
@  # Set is followed by an @sign
[\w+\W\w+\W?\w?]  # Define another character set
                  # exactly the same as the first one
)  # End named group
?  # This group may optionally be present

By having a \W non-word character in the set, it could match the space preceding the email address.

The characters listed within a set should be as explicit as necessary to not get false matches. For example [\w+.]+ which says “Match one or more word characters, plus signs, or periods.

You should remove the leading caret ^ since the email is not at the start of the line. You’ll also need use re.M since there are multiple lines within the string.

Your two character sets are the same for the same reason that [banana] and [nba] are the same: repeated characters do not change the set.

Post back if you need more help. Good luck!!!

Leo Marco Corpuz
Leo Marco Corpuz
18,975 Points

So here's my revised code:

contacts=re.search(r'(?P<email>[\w]+[\W]?[\w]?@[\w]+[\W][\w]+[\W]?[\w]?)',string)

[\w]+[\W]?[\w]? (example kenneth+challenge, dave.mcfarland)

[\w]+[\W][\w]+[\W]?[\w]? (example teamtreehouse.co.uk)

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Getting closer. I see your intentions. The email group would pass with one change. The character set just before the @ sign should be followed by a * instead of a ?. This allows more than one character to follow the plus or period in the username.:

r'(?P<email>[\w]+[\W]?[\w]*@[\w]+[\W][\w]+[\W]?[\w]?)'

That said, if a character set only has one item in the set, then square brackets aren’t needed:

r'(?P<email>\w+\W?\w*@\w+\W\w+\W?\w?)'

A simplified version would be to put the optional characters inside a set:

r'(?P<email>[+.\w]+@[.\w]+)'

Note the period and plus inside the set are literal and the plus outside the set is “one or more”

Post back if you need more help. Good luck!!!

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Getting closer still. Your latest attempt:

contacts=re.search(r'(?P<email>[\w]+[\W]?[\w]*@[\w]+[\W][\w]+[\W]?[\w]?) \t (?P<phone>\d{3}\-\d{3}\-\d{4})' ,string)

This will match if you fix the characters between the groups. You have space Tab space. This is not in the target string. you need comma space. This can be written as ,\s

Once this passes, try to simplify the regex character sets using my comment above.

Leo Marco Corpuz
Leo Marco Corpuz
18,975 Points

So I made the correction before the @ sign. How's my phone group? It's not included in the string if it's on a new line. Is that where \t comes in?

contacts=re.search(r'(?P<email>[\w]+[\W]?[\w]*@[\w]+[\W][\w]+[\W]?[\w]?) \t (?P<phone>\d{3}\-\d{3}\-\d{4})' ,string)

I placed a '\' before '-' but it's not showing when I pasted this code.

[MOD: added ```python formatting -cf]