Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Regular Expressions in Python Introduction to Regular Expressions Sets

nicole lumpkin
PLUS
nicole lumpkin
Courses Plus Student 5,328 Points

Using the + character at the end of a set

This has really been bugging me. It was my understanding that the + character when used after a set means any character within the set must appear at lease once. My question is why does the following pattern match my email variable(defined below) even though the '+' character, amongst others, is not present in the string?

email = 'some_body@gmail.com'

print(re.findall(r'[-\w\d+.]+@[-\w\d.]+', email, re.I))

>>>['some_body@gmail.com']

Why does this work(match)?? There are no digits, hyphens, or plus characters in 'email'.

2 Answers

Steven Parker
Steven Parker
229,732 Points

The plus character itself is not actually part of the allowed/required set.

The regex "[-\w\d.]+" means "at least one hyphen, "word character", digit, or period". The digit (\d) metacharacter is actually not needed here since a "word character" (\w) can be any alpha-numeric character or _ (underscore).

nicole lumpkin
nicole lumpkin
Courses Plus Student 5,328 Points

Thanks Steven Parker, perhaps my question was poorly worded. The reason I added unnecessary characters is because that's what happened in the lecture in the attempt to catch all the emails. I just don't understand why why the email is able to match with the regex"[-\w\d.]+" since the plus character means at least one instance of each character in the set and clearly 'some_body@gmail.com' does not contain a hyphen or the numbers 0-9. Wouldn't the use of the asterisk character be more appropriate since the characters in the set could appear any number of times including zero times???

[-\w\d.]*

Hopefully this was worded better than my first attempt.

Steven Parker
Steven Parker
229,732 Points

I think I understand your confusion. You're thinking the character class means you must have each character in the class, but what it actually matches is any character in the class.

So "[-\w.]" will match a hyphen OR any letter OR any number OR a period. Adding a plus sign after it means it will match any number of characters from that group (but at least one).

And you don't want to accept "zero times", that would allow the address to have nothing at all after the @ symbol.

nicole lumpkin
PLUS
nicole lumpkin
Courses Plus Student 5,328 Points

Excellent, thank you Steven Parker that's what has been tripping me up! Have a great weekend :)