Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Data Science Basics Getting Started with Data Science Obtaining Data

Jason Law
Jason Law
15,254 Points

Urllib2 not supported in python 3. How can the import script be adjusted to do the same thing without urllib2?

I would like to download the data directly, but I am having issue because I am using Python 3 instead of Python 2. Specifically, my code is breaking because urlopen(ties) does not return json, but instead returns a HTTPResponse class. How can I adjust this to make it work with Python 3?

from urllib.request import urlopen
import json
import pandas as pd

my_api_key = "xxxxxxx"

url = "http://api.shopstyle.com/api/v2/"
ties = "{}products?pid={}&cat=mens-ties&limit=100".format(url, my_api_key)
jsonResponse = urlopen(ties)
print(type(jsonResponse)) # returns <class 'http.client.HTTPResponse'>
data = json.load(jsonResponse) # this errors because jsonResponse is not json
Jason Law
Jason Law
15,254 Points

I figured it out. The read() method and decode() method were required to return the http response as json. This is the script that worked for me in python 3.

from urllib.request import urlopen
import json
import pandas as pd
import math

my_api_key = "xxxxxxxxxxxxx"

url = "http://api.shopstyle.com/api/v2/"
ties = "{}products?pid={}&cat=mens-ties&limit=100".format(url, my_api_key)

data = json.loads(urlopen(ties).read().decode(encoding='UTF-8'))

total = data['metadata']['total']
limit = data['metadata']['limit']
offset = data['metadata']['offset']
pages = math.ceil(total / limit)

print("{} total, {} per page. {} pages to process".format(total, limit, pages))

# tmp = pd.DataFrame(data['products'])

# set up an empty dictionary
dfs = {}

# connect with api again, page by page and save the results to the dictionary
for page in range(pages + 1):
    allTies = "{}products?pid={}&cat=mens-ties&limit=100&offset={}&sort=popular".format(url, my_api_key, (page * 50))
    data = json.loads(urlopen(allTies).read().decode(encoding='UTF-8'))
    dfs[page] = pd.DataFrame(data['products'])

df = pd.concat(dfs, ignore_index=True)

df = df.drop_duplicates('id')
df['priceLabel'] = df['priceLabel'].str.replace('$', '').str.replace(',', '')
df['priceLabel'] = df['priceLabel'].astype(float)


def breakId(x, y=0):
    try:
        y = x["id"]
    except:
        pass
    return y


def breakName(x, y=""):
    try:
        y = x["name"]
    except:
        pass
    return y


df['brandId'] = df['brand'].map(breakId);
df['brandName'] = df['brand'].map(breakName);


def breakCanC(x, y=""):
    try:
        y = x[0]["canonicalColors"][0]["name"]
    except:
        pass
    return y


def breakColorName(x, y=""):
    try:
        y = x[0]["name"]
    except:
        pass
    return y


def breakColorId(x, y=""):
    try:
        y = x[0]["canonicalColors"][0]["id"]
    except:
        pass
    return y


df['colorId'] = df['colors'].map(breakColorId);
df['colorFamily'] = df['colors'].map(breakCanC);
df['colorNamed'] = df['colors'].map(breakColorName);

df.to_csv("data.csv", sep='\t', encoding='utf-8',
          columns=['id', 'priceLabel', 'name', 'brandId', 'colorId', 'colorFamily', 'colorNamed'])