Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python Regular Expressions in Python Introduction to Regular Expressions Negation

Yuda Leh
Yuda Leh
7,618 Points

I am not getting the same output at Kenneth for the .gov removing.

Here is my Code

print(re.findall(r'''
    \b@[-\w\d.]* #First a word boundary, an @, and then any number of characters
    [^gov\t]+     #ignor 1+instances of the letters 'g', 'o' or 'v' and a tab.
    \b            #Match another word boundary
''', data, re.VERBOSE|re.I))

My output: '@teamtreehouse.com (555) 555-5555 Teacher, ', '@teamtreehouse.com (555) 555-5554 Teacher, ', '@camelot.co.uk ', '@norrbotten.co.se ', '@killerrabbit.com Enchanter, Killer Rabbit ', '@teamtreehouse.com (555) 555-5543 ', '@tardis.co.uk Time ', '@example.com 555-555-5552 Example, Example ', '@us.gov 555 555-5551 President, United States ', '@teamtreehouse.com (555) 555-5553 Teacher, ', '@empire.gov (555) 555-4444 Sith ', '@spain.gov First Deputy Prime Minister, Spanish ']

Getting phone numbers, jobs titles, and .gov. Not sure the problem (Running python 3.5.2) I am not sure but maybe something changed?

-Thanks

Another problem I ran into:

print(re.findall(r"""
                 \b[-\w]+, # Find a worb boundary, 1+ hyphens or chracters and comma
                 \s  # Find 1 whitespace
                 [-\w ]+  # 1+ hyphens and characters and explicit spaces
                 [^\t\n]  # Ignores tabs and newline
                 """, data, re.X))

output: ['Love, Kenneth kenneth@', 'Teacher, Treehouse @', 'McFarland, Dave dave@', 'Teacher, Treehouse', 'Arthur, King king_arthur@', 'King, Camelot', 'Österberg, Sven-Erik governor@', 'Governor, Norrbotten @', 'Enchanter, Killer Rabbit Cave', 'Carson, Ryan ryan@', 'CEO, Treehouse @', 'Doctor, The doctor+', 'Lord, Gallifrey', 'Exampleson, Example me@', 'Example, Example Co.', 'Obama, Barack president.', 'President, United States of America @', 'Chalkley, Andrew andrew@', 'Teacher, Treehouse @', 'Vader, Darth darth-vader@', 'Lord, Galactic Empire @', 'Sanz, María Teresa mtfvs@', 'Minister, Spanish Govt.']

I am receiving the @ and some portion of the email. I am confused and not sure what to do

Kenneth Love

(P.S I did download the file from the frist video to do some bedugging and I copied Kenneth's code from the file and it gave me the same output as the ones O poted above.

3 Answers

Kenneth Love
STAFF
Kenneth Love
Treehouse Guest Teacher

Your first pattern matches the correct email addresses in my tests.

And the second doesn't contain any of the email address.

Nothing changed in re in 3.5.2 as far as I can tell.

Yuda Leh
Yuda Leh
7,618 Points

Oh, then I am not sure why I am running into this problem.

Jonathan Mitten
PLUS
Jonathan Mitten
Courses Plus Student 11,173 Points

Actually, I think I've solved this and others' issues with us using our console. I suspect the issue is in copying and pasting the names.txt file into a text editor that replaces tabs with spaces. My set up for editing Python in Sublime Text 3 swaps tabs with 4x spaces, rendering some of the regex rules invalid.

Instead of copying the text from the workspace, instead download the workspace and move the names.txt file into your working directory (overwrite the current file if it's still in there).

Try your regex as the movies have you do it, and see if they work as Kenneth Love says they should.

Jonathan Mitten
PLUS
Jonathan Mitten
Courses Plus Student 11,173 Points

I'm also running into this issue. Checking with an online regex validator, I'm seeing the results being the same as our "problem" output, rather than the course's suggested outcome: https://regex101.com/r/Bf4Xz4/1 However, when I put a space between the gov set and the \t , I get the same results as Kenneth Love

I don't understand why, though. And now that I've messed about a bit, the workspace is giving me a different output than it before: https://w.trhou.se/qs8fllit1m