Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Regular Expressions in Python Introduction to Regular Expressions Reading Files

"This saves us some very confusing repeated uses of the backslash character or the escape character"

Hey Kenneth Love,

Can you please expand on this comment you made in this video regarding raw strings v regular strings? Would love to know what you meant here.

Thanks! Graham

2 Answers

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,423 Points

First working without raw strings

Given a basic string for a URL:

>>> s1 = "\\127.0.0.1\new\text.txt"
>>> s1
'\\127.0.0.1\new\text.txt'

But printing this causes the '\n' and '\t' to be interpreted as newline and tab

>>> print(s1)
\127.0.0.1
ew  ext.txt

The first solution is to escape each backslash () using a double-backslash (\):

>>> s2 = "\\\\127.0.0.1\\new\\text.txt"
>>> s2
'\\\\127.0.0.1\\new\\text.txt'
>>> print(s2)
\\127.0.0.1\new\text.txt

Now say want to create a regex to compare to our first string

>>> pat1 = re.compile('\\127.0.0.1\new\text.txt')
>>> pat1.match(s1)
# Does NOT match because the Regex *also* interprets the backslashes.
# Solution is to escape the regex backslashes with double-backslashes
>>> pat1escape = re.compile('\\\\127.0.0.1\\new\\text.txt')
>>> pat1escape.match(s1)
<_sre.SRE_Match object; span=(0, 21), match='\\127.0.0.1\new\text.txt'>
# A match !!

Let's move on to matching the escaped string s2

>>> s2
'\\\\127.0.0.1\\new\\text.txt'
>>> print(s2)
\\127.0.0.1\new\text.txt

First we'll try without addtional backslash escaping

>>> pat2 = re.compile('\\\\127.0.0.1\\new\\text.txt')
>>> pat2.match(s2)
# Does not match!  
# As we've seen above, the regex needs to escape all backslashes
>>> pat2escape = re.compile('\\\\\\\\127.0.0.1\\\\new\\\\text.txt')
>>> pat2escape.match(s2)
<_sre.SRE_Match object; span=(0, 24), match='\\\\127.0.0.1\\new\\text.txt'>
# A match!

That's way too many backslashes! There's got to be a better way! There is using raw strings. By adding the 'r' character before the string, backslashes are treated as a regular characters and do not need to be escaped.

# matching string s1
>>> pat1raw = re.compile(r'\\127.0.0.1\new\text.txt')
>>> pat1raw.match(s1)
<_sre.SRE_Match object; span=(0, 21), match='\\127.0.0.1\new\text.txt'>

# matching string s2
>>> pat2raw = re.compile(r'\\\\127.0.0.1\\new\\text.txt')
>>> pat2raw.match(s2)
<_sre.SRE_Match object; span=(0, 24), match='\\\\127.0.0.1\\new\\text.txt'>

That's what I'm talking about... Great explanation, Chris Freeman, thanks!

Ryan Cross
Ryan Cross
5,742 Points

now tell me why we would skip something like this in the course of our studies on regex? excellent explanation btw thanks for the clarity

Seth Kroger
Seth Kroger
56,413 Points

The backslash character is an escape character that allows you to use special characters in regular strings. For instance "\n" for a new line or "\t" for a tab character. When you need an actual backslash in your string you have to double up on them like this, "\". So where a regex in a raw string might be \w+\s in a regular string is "\w+\s".

Thanks, Seth Kroger. But, your comparison between the raw string and regular string examples only contain extra double quotes between them, whereas Kenneth's comment was about saving repeated uses of the backslash character...

Seth Kroger
Seth Kroger
56,413 Points

It's supposed to have doubled backslashes, but the site doesn't want to show them outside of a code block apperently. :-/