Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python Regular Expressions in Python Introduction to Regular Expressions Negation

Quinton Dobbs
Quinton Dobbs
5,149 Points

Workspaces and my terminal are giving different output for the same code

Whenever I enter the code below in Workspaces I get the same output that Kenneth gets in the video and Workspaces is running Python 3.5.0 at the time. But, whenever I run the same code in my terminal (Which is running Python 3.6.3) I get the following output:

My output:

['@teamtreehouse.com (555) 555-5555 Teacher, ', '@teamtreehouse.com (555) 555-5554 Teacher, ', '@camelot.co.uk ', '@norrbotten.co.se ', '@killerrabbit.com Enchanter, Killer Rabbit ', '@teamtreehouse.com (555) 555-5543 ', '@tardis.co.uk Time ', '@example.com 555-555-5552 Example, Example ', '@us.gov 555 555-5551 President, United States ', '@teamtreehouse.com (555) 555-5553 Teacher, ', '@empire.gov (555) 555-4444 Sith ', '@spain.gov First Deputy Prime Minister, Spanish ']

Kenneth's output:

['@teamtreehouse.com', '@teamtreehouse.com', '@camelot.co.uk', '@norrbotten.co.se', '@killerrabbit.com', '@teamtreehouse.com', '@tardis.co.uk', '@example.com', '@us.', '@teamtreehouse.com', '@empire.', '@spain.']

Could this be because of the different versions of Python? If so how would you change it to get the same output as Kenneth?

import re

name_file = open("names.txt", encoding="utf-8")
data = name_file.read()
name_file.close()

#print(re.match(r"Love", data))
#print(re.search(r"Kenneth", data))
#print(re.findall(r"\(?\d{3}\)?-?\s?\d{3}-\d{4}", data))
#print(re.findall(r"\w*, \w+", data))
#print(re.findall(r"[-\w\d+.]+@[-\w\d.]+", data))
#print(re.findall(r"\b[trehous]{9}\b", data, re.I))
print(re.findall(r'''
    \b@[-\w\d.]*   # find word boundry after @ with any number of word characters
    [^gov\t]+   # ignore one or more instances of gov
    \b # end of word boundry
''', data, re.VERBOSE | re.I))

2 Answers

Jonathan Mitten
PLUS
Jonathan Mitten
Courses Plus Student 11,173 Points

Actually, I think I've solved this and others' issues with us using our console. I suspect the issue is in copying and pasting the names.txt file into a text editor that replaces tabs with spaces. My set up for editing Python in Sublime Text 3 swaps tabs with 4x spaces, rendering some of the regex rules invalid.

Instead of copying the text from the workspace, see if downloading the workspace and moving the names.txt file into your working directory (overwriting the current file if it's still in there).

Try your regex as the movies have you do it, and see if they work as Kenneth Love says they should.

Jonathan Mitten
PLUS
Jonathan Mitten
Courses Plus Student 11,173 Points

I'm experiencing the same issue. However, my local Python version is 3.5.4. Workspace version is 3.5.0.

At first, I thought it could be the way I was opening and reading the file... which I was at first doing using the with open technique. I changed it to how @KennithLove does it in the course video, to :

Before you read further, check my next answer below this one. I'm curious if this resolves most of these issues. It did mine.

names_file = open("names.txt", encoding="utf-8")
data = names_file.read()
names_file.close()

Locally,

print(re.findall(r'''
    \b@[-\w\d.]*
    [^gov\t]+
    \b
''', data, re.VERBOSE|re.I))

results in:

['@teamtreehouse.com   (555) 555-5555  Teacher, ', '@teamtreehouse.com  (555) 555-5554  Teacher, ', '@camelot.co.uk       ', '@norrbotten.co.se       ', '@killerrabbit.com        Enchanter, Killer Rabbit ', '@teamtreehouse.com  (555) 555-5543  ', '@tardis.co.uk       Time ', '@example.com  555-555-5552    Example, Example ', '@us.gov 555 555-5551    President, United States ', '@teamtreehouse.com    (555) 555-5553  Teacher, ', '@empire.gov  (555) 555-4444  Sith ', '@spain.gov     First Deputy Prime Minister, Spanish ']

but on the the workspace,

>>> print(re.findall(r'''                                                            
... \b@[-\w\d.]*                                                                     
... [^gov\t]+                                                                        
... \b                                                                               
... ''', data, re.VERBOSE|re.I))   

results in:

['@teamtreehouse.com', '@teamtreehouse.com', '@camelot.co.uk', '@norrbotten.co.se', '
@killerrabbit.com', '@teamtreehouse.com', '@tardis.co.uk', '@example.com', '@us.', '@
teamtreehouse.com', '@empire.', '@spain.']    

After poking around in these community boards, seeing at least one other student with the same problem, I took the issue to the regex101 website, and it agrees with our regex engines:

https://regex101.com/r/Bf4Xz4/1

Playing with the regex options (far right of the regular expression field), I changed the option to include "ungreedy", and looked that up on the Python docs, here: https://docs.python.org/3/library/re.html , where it says

*?, +?, ??
The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<a> b <c>', it will match the entire string, and not just '<a>'. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using the RE <.*?> will match only '<a>'.

I put a space between gov and \t, so it reads ^[gov \t]+ and now I'm getting the same result as Kenneth Love . https://regex101.com/r/9F8wTZ/1, it output in the workspace is different! https://w.trhou.se/qs8fllit1m