Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python

"+" in regular expression in python

print re.findall(r'[\w]', "aa,ab,abc,abcd") => ['a', 'a', 'a', 'b', 'a', 'b', 'c', 'a', 'b', 'c', 'd']

print re.findall(r'[\w]+', "aa,ab,abc,abcd") => ['aa', 'ab', 'abc', 'abcd']

why the one with "+" does not include the case of single character? doesn't it mean "one or more"?

2 Answers

Kenneth Love
STAFF
Kenneth Love
Treehouse Guest Teacher

I'm not sure I see the issue? The regular expression is doing exactly what you asked it to do, find 1+ letters.

So it does not contain "1" itself? and must be > 1 ? Why there is no single character in the second case? ('a','b','c','d')

Kenneth Love
Kenneth Love
Treehouse Guest Teacher

Because the selection is greedy. If it only found one, that would be great. But since there's more than 1, it finds all of them. You can make it non-greedy with an extra question mark.

>>> re.findall(r'[\w]+?', "aa,ab,abc,abcd")
['a', 'a', 'a', 'b', 'a', 'b', 'c', 'a', 'b', 'c', 'd']
Jeff Jacobson-Swartfager
Jeff Jacobson-Swartfager
15,417 Points

The data that you're providing has at least 2 characters for every item in the comma separated string. Your second regular expression is looking for anything that has 1 or more word character. Since commas aren't word characters, the find ends there. Basically, you are asking for word strings in the second one instead of individual characters.

In the first regular expression, you are asking to return every character, and it does. Again, it doesn't return commas, it just returns each character.