Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Regular Expressions in Python Introduction to Regular Expressions Reading Files

Rajesh Tupakula
Rajesh Tupakula
1,927 Points

Need help with the regular expression.

when I use this below script it complains about the string pattern

import urllib.request
import re
from re import findall
page = urllib.request.urlopen('https://sonar.com/prod/drilldown/measures/1274530?metric=coverage')
data = page.read()
print(page.read())
html = page.read()
htmlStr = html.decode()
print(data)
print(re.findall(r'[coverag]{8}', data))
Traceback (most recent call last):
  File "C:\Users\a530614\Documents\pythons\script.py", line 10, in <module>
    print(re.findall(r'\b[coverag]\b', data))
  File "C:\Users\a530614\python\lib\re.py", line 213, in findall
    return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object

but when use b type it works for me.

import urllib.request
import re
from re import findall
page = urllib.request.urlopen('https://sonar.com/prod/drilldown/measures/1274530?metric=coverage')
data = page.read()
print(page.read())
html = page.read()
htmlStr = html.decode()
print(data)
print(re.findall(b'[coverag]{8}', data))

What is that I am missing here ?

[MOD: added ``` markdown formatting -cf]

1 Answer

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,423 Points

urllib.request.urlopen returns an HTTPResonse object. The .read() method returns a bytes object. Looking at it interactively:

$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request

# Get page
>>> page = urllib.request.urlopen('https://teamtreehouse.com/')
>>> type(page)
<class 'http.client.HTTPResponse'>

# Get data
>>> data = page.read()
>>> type(data)
<class 'bytes'>

A bytes object is different from a string object. Using re on a bytes object requires a bytes regex. The "b" signifies that the string should be interpreted as bytes. You can also pair this with the "r" to signify a raw-bytes pattern.

# Using data from above
>>> import re
>>> re.findall(b'Treehouse', data)
[b'Treehouse', b'Treehouse', b'Treehouse', b'Treehouse', b'Treehouse', b'Treehouse']

# Try again with regular string will Fail
>>> re.findall('Treehouse', data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/re.py", line 210, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

Post back if you have if you need more help.

Chris Freeman
Chris Freeman
Treehouse Moderator 68,423 Points

Why was this down voted? Was there an issue with this answer?