Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

"+" in regular expression in python

print re.findall(r'[\w]', "aa,ab,abc,abcd") => ['a', 'a', 'a', 'b', 'a', 'b', 'c', 'a', 'b', 'c', 'd']

print re.findall(r'[\w]+', "aa,ab,abc,abcd") => ['aa', 'ab', 'abc', 'abcd']

why the one with "+" does not include the case of single character? doesn't it mean "one or more"?

2 Answers

Kenneth Love
STAFF
Kenneth Love
Treehouse Guest Teacher

I'm not sure I see the issue? The regular expression is doing exactly what you asked it to do, find 1+ letters.

So it does not contain "1" itself? and must be > 1 ? Why there is no single character in the second case? ('a','b','c','d')

Kenneth Love
Kenneth Love
Treehouse Guest Teacher

Because the selection is greedy. If it only found one, that would be great. But since there's more than 1, it finds all of them. You can make it non-greedy with an extra question mark.

>>> re.findall(r'[\w]+?', "aa,ab,abc,abcd")
['a', 'a', 'a', 'b', 'a', 'b', 'c', 'a', 'b', 'c', 'd']
Jeff Jacobson-Swartfager
Jeff Jacobson-Swartfager
15,419 Points

The data that you're providing has at least 2 characters for every item in the comma separated string. Your second regular expression is looking for anything that has 1 or more word character. Since commas aren't word characters, the find ends there. Basically, you are asking for word strings in the second one instead of individual characters.

In the first regular expression, you are asking to return every character, and it does. Again, it doesn't return commas, it just returns each character.