Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Data Science Basics Describing Data Loading Raw Data

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5815:

import csv
import numpy

def open_with_csv(filename, d='\t'):
    data = []
    with open(filename, encoding='utf-8') as tsvin:
        tie_reader = csv.reader(tsvin, delimiter=d)
        for line in tie_reader:
            data.append(line)
        return data

data_from_csv = open_with_csv('data.csv')


FIELDNAMES = ['', 'id', 'priceLabel', 'name', 'brandId', 'brandName', 'imageLink', 'desc', 'vendor', 'patterned', 'material']

DATATYPES = [('myint', 'i'), ('myid', 'i'), ('price', 'f8'), ('name', 'a200'), ('brandId', '<i8'), ('brandName', 'a200'), ('imageUrl', '|S500'), ('description', '|S900'), ('vendor', '|S100'), ('pattern', '|S50'), ('material', '|S50'), ]

def load_data(filename,d='\t'):
    my_csv = numpy.genfromtxt(filename, delimiter=d, skip_header=1,
                              names=FIELDNAMES, invalid_raise=False,
                              dtype=DATATYPES)
    return my_csv

my_csv = load_data('data.csv')
my_csv = load_data('data.csv')

produces the following error:

Traceback (most recent call last): File "D:\New Python\Data Science\testing\s2v1.py", line 25, in <module> my_csv = load_data('data.csv') File "D:\New Python\New Python\Data Science\testing\s2v1.py", line 22, in load_data dtype=DATATYPES) File "C:\Users\daniel\AppData\Local\Programs\Python\Python36-32\lib\site-packages\numpy\lib\npyio.py", line 1951, in genfromtxt for (i, line) in enumerate(itertools.chain([first_line, ], fhd)): File "C:\Users\daniel\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5815: character maps to <undefined>

QUESTION: Why is this error occurring and how can I correct it so I can continue following the rest of the course?

2 Answers

OK, this really peeved me something awful so I got to work researching this issue:

For anyone else that may have been getting this error, here is how I solved it:

def load_data(filename,d='\t'):
    my_csv = numpy.genfromtxt(filename, delimiter=d, skip_header=1, 
                              names=FIELDNAMES, invalid_raise=False,
                              dtype=DATATYPES)
    return my_csv

In this above, original code, I had to add the following encoding='utf-8', to make it work:

def load_data(filename,d='\t'):
    my_csv = numpy.genfromtxt(filename, delimiter=d, skip_header=1, encoding='utf-8',
                              names=FIELDNAMES, invalid_raise=False,
                              dtype=DATATYPES)
    return my_csv

Hope this helps everyone else to move further in the series.

Thank you sir!