Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Data Science Basics Cleaning Data Cleaning Data

Brendan Whiting
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Brendan Whiting
Front End Web Development Techdegree Graduate 84,738 Points

A couple questions about this challenge

1) Isn't the index method already going to return an int? Why wrap this with the int() function?

2) I don't understand the purpose of this line of code: filtered_rows.append([str(x).encode('utf8') for x in row]) . Wasn't the data already utf-8?

See comments in code below:

import csv

def open_with_csv(filename, d='\t'):
  data = []
  with open(filename, encoding='utf-8') as tsvin:    # doesn't the data become uft-8 starting here?
    tie_reader = csv.reader(tsvin, delimiter=d)
    for row in tie_reader:
      data.append(row)
  return data

def filter_col_by_string(the_data, field, filter_condition):
    filtered_rows = []

    col = int(the_data[0].index(field))   # isn't the index method guaranteed to return an int (or an error)?
    filtered_rows.append(the_data[0])

    for row in the_data[1:]:
        if row[col] == filter_condition:
            filtered_rows.append([str(x).encode('utf8') for x in row]) # Why do we need to encode everything to utf-8 here, didn't we do that already?

    return filtered_rows

data_from_csv = open_with_csv('data.csv')
dkny_ties = filter_col_by_string(data_from_csv, "brandName", "DKNY")

2 Answers

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,454 Points

I would agree that the_data[0].index(field) returns an int or throws an error. So the int() seems unnecessary.

UTF-8 is an encoding for raw files. The open function is interpreting the file as currently encoded as UTF-8 and returns data as regular strings.

The line filtered_rows.append([str(x).encode('utf8') for x in row]) is using a list comprehension to walk down each item in the list row and return a list where each string is encoded back into a byte string with encoding UTF-8. It's not clear why this is necessary. Perhaps there is other code that expects filter_col_by_string to return byte-strings. There is where a little docstring would have gone a long way.

In looking through the other code from this course, all instances of using the output of filter_col_by_string, such as for totaling the prices or finding the minimum prices, all use float(row[col]) or something similar to convert the byte string into a float. This could just as easily been done if the strings were left in regular text format. I am curious if anyone else has more insight in this code.

Sorry, but the answer is IDK! :-/

import csv #from s2q1.py

function from s2q2.py

def open_with_csv(filename, d='\t'): data = [] with open(filename, encoding='utf-8') as tsvin: tie_reader = csv.reader(tsvin, delimiter=d) for row in tie_reader: data.append(row) return data

def filter_col_by_string(the_data, field, filter_condition): filtered_rows = []

#find index of field in first row
col = int(the_data[0].index(field))
filtered_rows.append(the_data[0])

for row in the_data[1:]:
    if row[col] == filter_condition:
        filtered_rows.append([x for x in row])

return filtered_rows

data_from_csv = open_with_csv('data.csv')

code above this line is included to make this file compile on its own.

-------------------------------------------------------------------------

here is the answer:

dkny_ties = filter_col_by_string(data_from_csv, "brandName", "DKNY")

--------------------------------------------------------------------------

for testing:

print(dkny_ties)