Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python Data Science Basics Describing Data Loading Raw Data

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5815:

import csv
import numpy

def open_with_csv(filename, d='\t'):
    data = []
    with open(filename, encoding='utf-8') as tsvin:
        tie_reader = csv.reader(tsvin, delimiter=d)
        for line in tie_reader:
            data.append(line)
        return data

data_from_csv = open_with_csv('data.csv')


FIELDNAMES = ['', 'id', 'priceLabel', 'name', 'brandId', 'brandName', 'imageLink', 'desc', 'vendor', 'patterned', 'material']

DATATYPES = [('myint', 'i'), ('myid', 'i'), ('price', 'f8'), ('name', 'a200'), ('brandId', '<i8'), ('brandName', 'a200'), ('imageUrl', '|S500'), ('description', '|S900'), ('vendor', '|S100'), ('pattern', '|S50'), ('material', '|S50'), ]

def load_data(filename,d='\t'):
    my_csv = numpy.genfromtxt(filename, delimiter=d, skip_header=1,
                              names=FIELDNAMES, invalid_raise=False,
                              dtype=DATATYPES)
    return my_csv

my_csv = load_data('data.csv')
my_csv = load_data('data.csv')

produces the following error:

Traceback (most recent call last): File "D:\New Python\Data Science\testing\s2v1.py", line 25, in <module> my_csv = load_data('data.csv') File "D:\New Python\New Python\Data Science\testing\s2v1.py", line 22, in load_data dtype=DATATYPES) File "C:\Users\daniel\AppData\Local\Programs\Python\Python36-32\lib\site-packages\numpy\lib\npyio.py", line 1951, in genfromtxt for (i, line) in enumerate(itertools.chain([first_line, ], fhd)): File "C:\Users\daniel\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5815: character maps to <undefined>

QUESTION: Why is this error occurring and how can I correct it so I can continue following the rest of the course?

2 Answers

OK, this really peeved me something awful so I got to work researching this issue:

For anyone else that may have been getting this error, here is how I solved it:

def load_data(filename,d='\t'):
    my_csv = numpy.genfromtxt(filename, delimiter=d, skip_header=1, 
                              names=FIELDNAMES, invalid_raise=False,
                              dtype=DATATYPES)
    return my_csv

In this above, original code, I had to add the following encoding='utf-8', to make it work:

def load_data(filename,d='\t'):
    my_csv = numpy.genfromtxt(filename, delimiter=d, skip_header=1, encoding='utf-8',
                              names=FIELDNAMES, invalid_raise=False,
                              dtype=DATATYPES)
    return my_csv

Hope this helps everyone else to move further in the series.

Thank you sir!