Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python Regular Expressions in Python Introduction to Regular Expressions Sets

Sanjay Noronha
Sanjay Noronha
7,296 Points

Regex for treehouse using sets/character classes?

This video introduces the concept of a set or character class but i am not sure if the example selected is appropriate. If one wanted to search for 'treehouse' one could just use the pattern 'treehouse' instead of using the characters in a set i.e. [treehouse].

A better (IMHO) example could be finding the color grey which is also spelt as gray, by using the pattern gr[ea]y. In this case we really harness the capabilities of the set/character class.

Please share your thoughts?

2 Answers

Steven Parker
Steven Parker
220,634 Points

Kenneth actually shows why "[treehouse]" is not a good way to search for the word "treehouse" in the video. He revises the expression several times and finally arrives at "[trehous]{9}" which does a better job by looking for words of exactly 9 characters containing only those letters.

Even so, the object isn't the best way to find the word "treehouse" but just to demonstrate what character classes do.

If you feel different examples would convey the concept better, you can also make course suggestions directly to the Support staff.

Sanjay Noronha
Sanjay Noronha
7,296 Points

Thanks for your response Steven. I watched the video again , a bit more 'attentively' and I have to agree with you. Cheers.

I don't think you were inattentive. Not acknowledging that a simpler solution is also better is remarkable. The problem addressed by {9} was created by introducing an unneeded character set in the first place. A student just learning this might come away with their mind scrambled by the awkwardness of '[trehous]{9}' and wonder what to trust:

  • their own sense that it's overly complicated
  • a more experienced instructor not questioning it

Supposing the instructor must have had their reasons leaves the student pondering doubts about their own understanding, which may have been fine.

Steven Parker
Steven Parker
220,634 Points

Actually,the size limit addresses other potential issues not related to the unneeded character For example, without the limit, undesired words like "the" and "us" were being accepted by the pattern.

The task is to match 'treehouse' case-insensitively. Compare case-insensitive regular expressions treehouse and [trehous]{9}. Notice treehouse matches only case variations of the string 'treehouse', eg

  • treehouse
  • TREEHOUSE
  • Treehouse
  • TrEeHoUsE

Shorter strings like 'the' or 'us' don't match.

Notice [trehous]{9} matches way more: any string of 9 characters from that set, eg

  • housetree
  • ttttttttt
  • traptrout
  • shouthere

This approach can match strings we don't want. It only works if we absolutely know no other length 9 string combining the same characters occurs in our data, limiting utility. (Who would know that?) Even if we know, there's nothing to gain from it, only trouble.

Is it reasonable to expect an inexperienced student to realize this right away from the video presentation? I doubt it.

Steven Parker
Steven Parker
220,634 Points

Your examples do a great job of showing why a character class would not be the preferred way to find a specific string. But I think the lesson was just showing how character classes work, not necessarily when to use them.

Sure, the lesson could be understood that way: he's just showing what character sets and quantifiers do, and we see some results. Unless it's stated, meaning has to be interpreted. Could he also be saying those are good uses of character sets and quantifiers? Though an experienced person would easily conclude they're not, it wasn't directly stated. However, an exposition of a technique is reasonably expected to show good uses for it. That's why I wouldn't say a learner is inattentive for missing something unstated that runs contrary to reasonable expectations.

I'm just saying no one should fault themselves over this. It could have been clearer, and the point the poster raised has merit.