Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

Regex practice

I was practicing Regex after almost completing the Regular Expressions course. I made my own custom version of the thing Kenneth had made (the thing where it matches some crazy thing and Kenneth used some function to turn it into a dictionary). Here's my Python code (It should be very similar to Kenneth's code):

import re


with open('addresses.txt') as file:
    data = file.read()


#    \w+ \w+                         Search for full names
#    [\w+]+@[\w+.]+                  Search for email addresses
#    \(?\d{3}\)?\s?-?\d{3}-\d{4}     Search for phone numbers

results = re.compile(r"""
    (?P<name>\w+\s\w+\t)    # Search for full names
    (?P<email>[\w+]+@[\w+.]+\t)    # Search for email adresses
    (?P<phone>\(?\d{3}\)?\s?-?\d{3}-\d{4}) # Search for phone numbers
""", re.X|re.I)

print(re.match(results, data).groupdict())

And here's my custom blob(ish) file I made:

Alexander Davison   alexanderdavison888@gmail.com   (123) 123-1234
Kenneth Love    kennethlove@teamtreehouse.com   (555) 555-5555
Jason Seifer    jasonseifer@teamtreehouse.com   777-777-7777
Nick Pettit nickpettit@teamtreehouse.com    (222) 222-2222
Joseph Davison  jdavison@google.com 321-321-4321
Darth Vader darth+vader@galaxy.xyz  999-999-9999

And this works 100 % OK. However, this is the output:

{'name': 'Alexander Davison\t', 'email': 'alexanderdavison888@gmail.com\t', 'phone': '(123) 123-1234'}

And I knew that would be the output, 'cuz I already saw Kenneth's video about how to change the list of tuples into a dictionary. However, I want to take this one step farther. My plan is to get all of the people's names, phone numbers and emails. How would I proceed? I tried changing "match" to "findall", but it came with this error:

Traceback (most recent call last):
  File "regular_expression.py", line 18, in <module>
    print(re.findall(results, data).groupdict())
AttributeError: 'list' object has no attribute 'groupdict'

And this is just Python complaining that I can't use the "findall" function. How would I solve this problem? I looked everywhere in the docs but I still couldn't find a descent answer...

I appreciate any help :)

Thank you! ~Alex

Tagging Steven Parker and Jason Anders

1 Answer

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,468 Points

Hi Alexander, there isn't a direct path to get all of the results in a dictionary format. As you found, using findall() returns a list of the matching results, but does not contain the dictionary keys you seek. Using match.groupdict() will give you the dictionary keys, but only for the first result. What you need is a way to iterate over the data string to generate each matching object getting the groupdict() from each match as you go. For this, use finditer():

# finditer() returns an iterator object
>>> re.finditer(results, data)
<callable_iterator object at 0x000000E95B5195F8>

# use list to see the full results
>>> list(re.finditer(results, data))
[<_sre.SRE_Match object; span=(0, 62), match='Alexander Davison\talexanderdavison888@gmail.com\>, <_sre.SRE_Match object; span=(63, 120), match='Kenneth Love\tkennethlove@teamtreehouse.com\t(555>, <_sre.SRE_Match object; span=(121, 176), match='Jason Seifer\tjasonseifer@teamtreehouse.com\t777->, <_sre.SRE_Match object; span=(177, 232), match='Nick Pettit\tnickpettit@teamtreehouse.com\t(222) >, <_sre.SRE_Match object; span=(233, 280), match='Joseph Davison\tjdavison@google.com\t321-321-4321>, <_sre.SRE_Match object; span=(281, 328), match='Darth Vader\tdarth+vader@galaxy.xyz\t999-999-9999>]

# use a list comprehension (or a for-loop) to get the groupdict from each match object
>>> [x.groupdict() for x in re.finditer(results, data)]
[{'phone': '(123) 123-1234', 'name': 'Alexander Davison\t', 'email': 'alexanderdavison888@gmail.com\t'}, {'phone': '(555) 555-5555', 'name': 'Kenneth Love\t', 'email': 'kennethlove@teamtreehouse.com\t'}, {'phone': '777-777-7777', 'name': 'Jason Seifer\t', 'email': 'jasonseifer@teamtreehouse.com\t'}, {'phone': '(222) 222-2222', 'name': 'Nick Pettit\t', 'email': 'nickpettit@teamtreehouse.com\t'}, {'phone': '321-321-4321', 'name': 'Joseph Davison\t', 'email': 'jdavison@google.com\t'}, {'phone': '999-999-9999', 'name': 'Darth Vader\t', 'email': 'darth+vader@galaxy.xyz\t'}]

Notice the TAB ("\t") in the results. This is due to the TAB character inside the parens defining the groups. Move the TAB outside to eliminate it from the results.

Thank you! :smile:

This helped a lot :)