Welcome to the Treehouse Community
The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.
Lingjian Kong6,330 Points
Don't understand word boundaries
Could someone explain the concept of word boundary
print(re.findall(r'@[-\w\d.]*[^gov\t]', data)) print(re.findall(r'\b@[-\w\d.]*[^gov\t]\b', data))
These are two different results I got.
>> ['@teamtreehouse.com', '@kennethlove\n', '@teamtreehouse.com', '@camelot.co.uk', '@norrbotten.co.se', '@sverik\n', '@killerrabbit.com', '@teamtreehouse.com', '@ryancarson\n', '@tardis. co.uk', '@example.com', '@example\n', '@us.', '@potus44\n', '@teamtreehouse.com', '@chalkers\n', '@empire.', '@darthvader\n', '@spain.'] >> ['@teamtreehouse.com', '@teamtreehouse.com', '@camelot.co.uk', '@norrbotten.co.se', '@killerrabbit.com', '@teamtreehouse.com', '@tardis.co.uk', '@example.com', '@us.', '@teamtreehouse. com', '@empire.', '@spain.']
Could someone explain why we have to use \b in the front and the back?
Chris FreemanTreehouse Moderator 68,064 Points
A word boundary
\b says "in this place, a word character is expected." In your second example, this means a word character is expected before the "
@" and the last matching character must proceed a word character.
Since some of the matches in the first group end in a newline character "
\n", they will be rejected by the second pattern.
The boundary character is an anchor that says a word character must be hear but it doesn't "consume" the character into the results. You may think of it like a word match character "
\w" that doesn't hold on to the match results.
Bronson Avila4,160 Points
For anyone else reading this question, I can understand how the code shown in this exercise appears confusing. When Kenneth defined a word boundary in the Escape Hatches video, he specifically said a word boundary is, quote, "It's the edges of a word, defined by white space or the edges of a screen."
This definition may be misleading because it suggests that a word boundary cannot existing between two non-white space characters in a string. However, a word boundary can in fact exist under such circumstances, as one source notes that a word boundary can occur "between two characters in the string, where one is a word character and the other is not a word character."
So in the case of an email address such as "firstname.lastname@example.org", all of the characters up until the "@" symbol are word characters, while the @ symbol itself is not a word character. Thus, the "gap" between "sender" and "@" constitutes a word boundary.
Ohhhhh, ok- I was wrong... thanks for steering me to the correct answer, Chris!!