Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

Regex Confusion, Good Numbers

I was struggling with the "Negated Numbers" code challenge. Spolier alert: I solved the problem. But on my first attempt, it seemed like I should be using the following code:

import re

string = '1234567890'

def good_numbers(s):
    return re.findall(r'\d[^567]', s)    #all the digits except 5,6,or7

print(good_numbers(string))

...but when I tested it in the workspace, I kept getting weird results: ['12', '34', '78', '90']

It got even weirder when I tried omitting just a couple of numbers. [^56] for example yielded: ['12', '34', '67', '89']

If I take the [^567] out all together, it makes a list of the individual numbers in the string, which is what I'd expect: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '0']

I got the correct answer. I understand that I need to >>>SPOILER<<< omit the \d altogether. But I am at a loss to understand the behavior I described above. Can anyone clear that up for me? What am I asking python to do here?

Update: Now I'm super confused. The code worked wonderfully in the test workspace but Won't Solve The Challenge. For the love of all that is beautiful and holy in this dark dark world, would someone please tear the wool from mine eyes and show me the path to freedom and enlightenment.

2 Answers

Jonathan Grieve
MOD
Jonathan Grieve
Treehouse Moderator 91,252 Points

I have a few theories but it's tricky for me to test without a link to the code challenge.

The different results suggests to meit's a massive on inclusivity and exclusivity, that is where regex draws the line as to which of the digits in the range to negate. But it's a while since I did anything with regex..

As i understand it removing the \d operator and using the hard brackets only omits the numbers in them rather than say you used something like [5-7].

Update 2: Challenge solved. BUT

What am I telling python when I say r'\d[^567]' that is so different than r'[^567]?

Jonathan Grieve
Jonathan Grieve
Treehouse Moderator 91,252 Points

So the difference is the \d digit operator. So I think the answer lies there.

I did a bit of digging and when you;re using \d in a regex, it is shorthand for [0-9] which matches a range of those numbers not specifically those numbers. So it's shorthand for ranges.

Have a look for regex cheat sheets that'll teach you at a glance the different behaviours of regex characters! :-)

Ok, so a year later after this was posted, now I'm struggling with the same. I was able to solve the challenge with

good_numbers = re.findall(r'\s*[^567]', string)

However, I have no idea why \s would work instead of \d. If \d would tell re.findall to find all the digits between 0 and 9, with the exception [^567], why did it keep finding the entire string 1234567890.

Using re.findall(r'[\d][^890]', string), for example, I was able to ignore the latter part of the string, but never the middle part of it. I've also tried with r'[\d][^890][\d]*', but it would find 1234567890 again.

Can anyone explain what is going on here??