Regex practice

Question

I was practicing Regex after almost completing the Regular Expressions course. I made my own custom version of the thing Kenneth had made (the thing where it matches some crazy thing and Kenneth used some function to turn it into a dictionary). Here's my Python code (It should be very similar to Kenneth's code):

import re


with open('addresses.txt') as file:
    data = file.read()


#    \w+ \w+                         Search for full names
#    [\w+]+@[\w+.]+                  Search for email addresses
#    \(?\d{3}\)?\s?-?\d{3}-\d{4}     Search for phone numbers

results = re.compile(r"""
    (?P<name>\w+\s\w+\t)    # Search for full names
    (?P<email>[\w+]+@[\w+.]+\t)    # Search for email adresses
    (?P<phone>\(?\d{3}\)?\s?-?\d{3}-\d{4}) # Search for phone numbers
""", re.X|re.I)

print(re.match(results, data).groupdict())

And here's my custom blob(ish) file I made:

Alexander Davison   alexanderdavison888@gmail.com   (123) 123-1234
Kenneth Love    kennethlove@teamtreehouse.com   (555) 555-5555
Jason Seifer    jasonseifer@teamtreehouse.com   777-777-7777
Nick Pettit nickpettit@teamtreehouse.com    (222) 222-2222
Joseph Davison  jdavison@google.com 321-321-4321
Darth Vader darth+vader@galaxy.xyz  999-999-9999

And this works 100 % OK. However, this is the output:

{'name': 'Alexander Davison\t', 'email': 'alexanderdavison888@gmail.com\t', 'phone': '(123) 123-1234'}

And I knew that would be the output, 'cuz I already saw Kenneth's video about how to change the list of tuples into a dictionary. However, I want to take this one step farther. My plan is to get all of the people's names, phone numbers and emails. How would I proceed? I tried changing "match" to "findall", but it came with this error:

Traceback (most recent call last):
  File "regular_expression.py", line 18, in <module>
    print(re.findall(results, data).groupdict())
AttributeError: 'list' object has no attribute 'groupdict'

And this is just Python complaining that I can't use the "findall" function. How would I solve this problem? I looked everywhere in the docs but I still couldn't find a descent answer...

I appreciate any help :)

Thank you! ~Alex

Tagging Steven Parker and Jason Anders

Answer 1 · 2016-10-31T23:49:40Z

October 31, 2016 11:49pm

Hi Alexander, there isn't a direct path to get all of the results in a dictionary format. As you found, using findall() returns a list of the matching results, but does not contain the dictionary keys you seek. Using match.groupdict() will give you the dictionary keys, but only for the first result. What you need is a way to iterate over the data string to generate each matching object getting the groupdict() from each match as you go. For this, use finditer():

# finditer() returns an iterator object
>>> re.finditer(results, data)
<callable_iterator object at 0x000000E95B5195F8>

# use list to see the full results
>>> list(re.finditer(results, data))
[<_sre.SRE_Match object; span=(0, 62), match='Alexander Davison\talexanderdavison888@gmail.com\>, <_sre.SRE_Match object; span=(63, 120), match='Kenneth Love\tkennethlove@teamtreehouse.com\t(555>, <_sre.SRE_Match object; span=(121, 176), match='Jason Seifer\tjasonseifer@teamtreehouse.com\t777->, <_sre.SRE_Match object; span=(177, 232), match='Nick Pettit\tnickpettit@teamtreehouse.com\t(222) >, <_sre.SRE_Match object; span=(233, 280), match='Joseph Davison\tjdavison@google.com\t321-321-4321>, <_sre.SRE_Match object; span=(281, 328), match='Darth Vader\tdarth+vader@galaxy.xyz\t999-999-9999>]

# use a list comprehension (or a for-loop) to get the groupdict from each match object
>>> [x.groupdict() for x in re.finditer(results, data)]
[{'phone': '(123) 123-1234', 'name': 'Alexander Davison\t', 'email': 'alexanderdavison888@gmail.com\t'}, {'phone': '(555) 555-5555', 'name': 'Kenneth Love\t', 'email': 'kennethlove@teamtreehouse.com\t'}, {'phone': '777-777-7777', 'name': 'Jason Seifer\t', 'email': 'jasonseifer@teamtreehouse.com\t'}, {'phone': '(222) 222-2222', 'name': 'Nick Pettit\t', 'email': 'nickpettit@teamtreehouse.com\t'}, {'phone': '321-321-4321', 'name': 'Joseph Davison\t', 'email': 'jdavison@google.com\t'}, {'phone': '999-999-9999', 'name': 'Darth Vader\t', 'email': 'darth+vader@galaxy.xyz\t'}]

Notice the TAB ("\t") in the results. This is due to the TAB character inside the parens defining the groups. Move the TAB outside to eliminate it from the results.

Welcome to the Treehouse Community

Looking to learn something new?

Alexander Davison

Alexander Davison

Regex practice

1 Answer

Chris Freeman

Chris Freeman

Alexander Davison

Alexander Davison