Confused with the $ and ^ for finding Twitter handles in emails.py

Question

As a reminder of this code challenge, we have:

import re

string = '''Love, Kenneth, kenneth+challenge@teamtreehouse.com, 555-555-5555, @kennethlove
Chalkley, Andrew, andrew@teamtreehouse.co.uk, 555-555-5556, @chalkers
McFarland, Dave, dave.mcfarland@teamtreehouse.com, 555-555-5557, @davemcfarland
Kesten, Joy, joy@teamtreehouse.com, 555-555-5558, @joykesten'''

The second part of the challenge asks us to use re.search to find the twitter handle.

I have tried those two lines of code previously, which did not work, and I saw the solution in responses to other's questions, as the third option. However I don't understand why it didn't find the twitter handle in my two previous lines and why it does in the third line. Can you help me?

code1:

twitters = re.search(r'@[\w]+', string)
<_sre.SRE_Match object; span=(32,46), match='@teamtreehouse'>

code 2: trying to find the twitter handle inside of contact but it doesn't find the twitter handle.

contacts = re.search(r'''
    (?P<email>[-+.\w]+@[-.\w]+),\s
    (?P<phone>\d{3}-\d{3}-\d{4}),\s
    (?P<twitter>@[-\w]+)
''', string, re.X)
<_sre.SRE_Match object; span=(15,78), match='@kenneth+challenge@teamtreehouse.com, 555-555-5555>

code 3: the right one - but why do we just mark the end of the string with $ and not the start of the string using ^

twitters = re.search(r'(?P<twitter>@[-\w]+)$', string, re.M)
<_sre.SRE_Match object; span=(66,78), match='@kennethlove'>

Answer 1 · 2018-03-08T21:14:53Z

March 8, 2018 9:14pm

Good question! The subtleties can be hard to grasp.

The caret charter ^ is used as a pattern anchor that indicates this point in the pattern must align with the beginning of the string or beginning of the line in multi-line strings. The dollar sign character $ is an anchor that indicates this point in the pattern must align with the end of the string for the ending of the line in multi-line strings.

The re.search method returns the first successful match found. Also notice the span information in the output. This indicates the range of characters of the input string that match the pattern.

Your first code:

twitters = re.search(r'@[\w]+', string)

says, "match one or more word characters immediately following an at sign @."

Since all email addresses contain an at sign, the domain name, up to the dot (the first non-word character), is found before the twitter handle is reached. The span=(32,46), match='@teamtreehouse' indicates this.

If the end anchor $ is added to the pattern, the email address is skipped since the address domain doesn't end the string or line. The next match would the @ in the twitter handle. By also adding the re.M or re.MULTILINE argument, this would pass the challenge. The re.M is needed so the input string is treated as multiple lines making "@kennethlove" the "end" of the first line and the desired solution. Without the re.M, the whole string is considered which makes "@joykesten" the last twitter handle on the line.

Your second code sample, does find the twitter handle, but that's not what the challenge is looking for. It is not expecting to see the twitter information inside the contacts match object.

>>> contacts = re.search(r'''
    (?P<email>[-+.\w]+@[-.\w]+),\s
    (?P<phone>\d{3}-\d{3}-\d{4}),\s
    (?P<twitter>@[-\w]+)
''', string, re.X)
>>> contacts.groups()
('kenneth+challenge@teamtreehouse.com', '555-555-5555', '@kennethlove')

As you've said, your code 3 works. By using the end-of-line anchor $, the pattern now says "match an at sign followed by one or more hyphens or word characters that occurs as the last characters of a string". Note the hyphen in the pattern is not needed since no hyphens exist in the twitter handles.

If the beginning-of-line anchor ^ was also used, then the pattern would need to consume the characters from the beginning of the line to start of the target pattern. The following pattern would accomplish this by adding the caret, a dot (for any character), an asterisk (for zero or more characters):

twitters = re.search(r'^.*(?P<twitter>@[-\w]+$)', string, re.M)
>>> twitters
<_sre.SRE_Match object; span=(0, 78), match='Love, Kenneth, kenneth+challenge@teamtreehouse.co>
# above is truncated output to console. This is the full match:
>>> twitters.string[twitters.start():twitters.end()]
'Love, Kenneth, kenneth+challenge@teamtreehouse.com, 555-555-5555, @kennethlove'
>>> twitters.groups()
('@kennethlove',)
>>> twitters.groupdict()
{'twitter': '@kennethlove'}

However, the checker is not checking closely enough for the matching group output and does not accept the above solution. This will be reported as a checker bug! Tagging Craig Dennis

Welcome to the Treehouse Community

Looking to learn something new?

Flore W

Flore W

Confused with the $ and ^ for finding Twitter handles in emails.py

1 Answer

Chris Freeman

Chris Freeman