Regex for teamtreehouse or treehouse?

Question

Using the names.txt file for this video which contains several strings of "treehouse" as well as several strings containing "teamtreehouse", how would I write a regex that gave me either?

I guess I'm confused about when and wether to use string literals inside the regex.

I can find all occurrences of 'treehouse' with this:

print(re.findall(r'\b[trehous]{9}\b', data, re.I))

as given in the video. But why can't I do this:

print(re.findall(r'(team)*[trehous]{9}\b', data, re.I))

to find either treehouse or teamtreehouse? The asterisk means zero or more occurrences, so I thought (team)* would match zero or more occurrences of 'team'.

I could do something cumbersome like:

print(re.findall(r'(teamtreehouse)|(treehouse)', data, re.I))

but that returns a series of tuples which include a space character, why?

 keith@ada:~/code/py$ python address_book.py
[('teamtreehouse', ''), ('', 'Treehouse'), ('teamtreehouse', ''), ('', 'Treehouse'), ('teamtreehouse', ''), ('', 'Treehouse'), ('teamtreehouse', ''), ('', 'Treehouse')]

What's the correct syntax?

Thanks, Keith

Answer 1 · 2017-08-02T06:06:37Z

August 2, 2017 6:06am

I like this question. You made me work for it. This took some digging, but of course, I found the answer in the documentation.

If you spend enough time in there, you'll find the section on this:

(?:...)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

This is what we want. We want to group (team) without actually capturing it and returning it with the match. Also one other nitpick - you should probably use ? instead of *, since we want 0 or 1 instance of "team", not 0 or more (we shouldn't match teamteamteamtreehouse, for example).

solution.py

print(re.findall(r'\b(?:team)?[trehous]{9}\b', data, re.I))

Of course, now that I've discovered this, it seems ridiculous to use the pattern \b[trehous]{9}\b to match "treehouse". We can use our new tool!

cleaner.py

print(re.findall(r'\b(?:team)?(?:treehouse)\b', data, re.I))

or if we just wanted treehouses:

original_search.py

print(re.findall(r'\b(?:treehouse)\b', data, re.I))

So much better!

Cheers

-Greg

Answer 2 · 2017-07-29T20:17:28Z

July 29, 2017 8:17pm

i think you are close with the capture group (parens) but don't put treehouse in brackets and don't use a quantifier (curly braces). (team)*treehouse ought to work. then to also pull in caps you would use brackets and pipe, like ([t|T]eam)*.... should get team or Team etc, same with the treehouse part, [t|T]reehouse.

Answer 3 · 2017-07-29T21:10:50Z

July 29, 2017 9:10pm

print(re.findall(r'(team)treehouse', data, re.I))

returns

(py) keith@ada:~/code/py$ python address_book.py
['team', 'team', 'team', 'team']

Answer 4 · 2017-07-29T22:06:47Z

July 29, 2017 10:06pm

did you try with an asterisk after the capture group? i edited my answer because in markdown asterisk denotes the beginning of an italicized section, so it originally had no asterisks but a sentence in italics that was between the two asterisks you see after my edit, where i escaped them. with /(team)*treehouse/gi i get treehouse and teamtreehouse but not team.

Answer 5 · 2017-07-29T22:19:47Z

July 29, 2017 10:19pm

Thanks for helping out James. I have tried using the asterisk before (as shown above) but it doesn't give me the intended result:

print(re.findall(r'([T|t]eam)*[T|t]reehouse', data, re.I))

returns

keith@ada:~/code/py$ python address_book.py
['team', '', 'team', '', 'team', '', 'team', '']

I've also tried:

print(re.findall(r'([T|t]eam)*([T|t]reehouse)', data, re.I))

which returns:

keith@ada:~/code/py$ python address_book.py
[('team', 'treehouse'), ('', 'Treehouse'), ('team', 'treehouse'), ('', 'Treehouse'), ('team', 'treehouse'), ('', 'Treehouse'), ('team', 'treehouse'), ('', 'Treehouse')]

Welcome to the Treehouse Community

Looking to learn something new?

Keith Ostertag

Keith Ostertag

Regex for teamtreehouse or treehouse?

5 Answers

Greg Kaleka

Greg Kaleka

Keith Ostertag

Keith Ostertag

Greg Kaleka

Greg Kaleka

james south

james south

Keith Ostertag

Keith Ostertag

james south

james south

Keith Ostertag

Keith Ostertag