Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Regular Expressions in Python Introduction to Regular Expressions Counts

Jeremy Justus
Jeremy Justus
2,910 Points

Python - Readable RegEx Variables

While I was able to complete the most recent Python RegEx challenge, I've encountered trouble adhering to PEP20 with the code design itself. I ran into trouble trying to insert the int_count string into my regular expression with the python interpreter complaining. More specifically, both of the ways I found to accomplish what I wanted are rather ugly and cumbersome. It just seems like there should be a simpler, more elegant and pythonic way to achieve this.

import re

# EXAMPLE:
# >>> find_words(4, "dog, cat, baby, balloon, me")
# ['baby', 'balloon']

def find_words(int_count, words_string):
    word_list1 = re.findall(
        r"\w{}".format("{" + str(int_count) + ",}"), words_string)
    # OR, alternatively:
    word_list2 = re.findall(
        (r"\w{" + str(int_count) + ",}"), words_string)
    print(word_list2)
    return word_list2

find_words(4, "dog, cat, baby, balloon, me")

In the code above, both word_list1 and word_list2 return the list that the challenge expects. But they're both so cumbersome, and would be a headache to decipher if I came back to the code a few days later. Is there a simpler way to achieve this that I'm overlooking?

2 Answers

One way might be to use string formatting rather than concatenating strings to form your regex pattern.

For example:

import re

# EXAMPLE:
# >>> find_words(4, "dog, cat, baby, balloon, me")
# ['baby', 'balloon']

def find_words(int_count, words_string):
    word_list3 = re.findall(r"\w{%d}" % int_count, words_string)
    return word_list3

print find_words(4, "dog, cat, baby, balloon, me")
Jeremy Justus
Jeremy Justus
2,910 Points

Ahh, thank you. That is much, much more readable than either of my two examples above.

This work for 2.x version not for 3.x, my variant for 3.x looks like that:

'''<p>def find_words(int_count,words_string):</p> <p>         return re.findall(r'\w{'+str(int_count)+',}', words_string)</p>'''

It's the print function that breaks in Py3, the actual find_words function still works in Python 3. See http://i.imgur.com/X5IXc96.png

Must be you right,but Kenny didn't give this kind of formatting in lessons or i missed this

A variation is to make use of python 3 string formatting:
"If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }}."

for example

import re

def find_words(int_count, words_string):
    return re.findall(r"\w{{{},}}".format(int_count), words_string)

print(find_words(4, "dog, cat, baby, balloon, me"))  
Ron Chan
Ron Chan
10,987 Points

Thanks Ben, that code helped a lot. I believe there should also be a comma behind the '%d' like this: {%d,}.

Is there a name for the way you pass the argument into a string using the % sign? I would like to review that video again but I cannot recall which one it was that mentioned this.

justlevy
justlevy
6,325 Points

Here's my solution using the newer 'f string formatting combined with raw strings'.

Link Here.

  • the triple curly braces stump me. Maybe someone can help me understand why?
import re

def find_words(digit, str):
    result = re.findall(fr"\w{{{digit},}}", str)
    print(result)
    return result

find_words(3, "June, gloom, on, of, at, fog")