Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Data Science Basics Cleaning Data Clean Up Your Data

Sahar Nasiri
Sahar Nasiri
7,454 Points

Something Wrong!

In the s3v1.py file:

from s2v5 import *

def creat_bool_field_from_search_term(data_sample, search_term):
    new_array = []
    new_array.append(data_sample[0].append(search_term))

    for row in data_sample[1:]:
        new_bool_field = False
        if search_term in row[7]:
            new_bool_field = True

        row.append(new_bool_field)
        new_array.append(row)

    return new_array


def filter_by_bool(data_sample, col):
    matches_search_term = []

    for item in data_sample[1:]:
        if item[col]:
            matches_search_term.append(item)

    return matches_search_term

my_new_csv = creat_bool_field_from_search_term(data_from_csv, "cashmere")
number_of_cashmere_ties = number_of_records(filter_by_bool(my_new_csv, 11))
print("Length:", number_of_cashmere_ties)

In the filter_by_bool function she has written:

for item in data_sample[1:]:
        if item[col]:
            matches_search_term.append(item)

While she passed my_new_csv which is an array that doesn't need its first row to be ignored like:

for item in **data_sample[1:]**:

I think this function is not correct for filter_by_bool(my_new_csv, 0)!

1 Answer

Well you're right that filter_by_bool(my_new_csv, 0) doesn't work if you're trying to figure out how many ties are made of cashmere. In the video we passed in 11 to the function specifically because "cashmere" is at index number 11 in the first row of the dataset which functions as the header for all other rows.

Just to you show you, here is the header row at index 0:

['', 'id', 'priceLabel', 'name', 'brandId', 'brandName', 'imageLink', 'desc', 'vendor', 'print', 'material', 'cashmere']

So counting from zero you can see that 'cashmere' is number 11.

While writing this answer I did discover that there is in fact an error with the code in the video which maybe is real the source of your confusion. Instead of:

new_array.append(data_sample[0].append(search_term))

It should actually be:

new_array.append(data_sample[0][:])
new_array[0].append(search_term)

The original code had the unintentional consequence of actually modifying the data_from_csv that we passed in, when really we only wanted to modify new_array. The new code selects the first row of data_sample [0] and takes a slice of that entire row [:] which effectively copies it.

It took me over an hour to figure out how to fix that :D

Sahar Nasiri
Sahar Nasiri
7,454 Points

:D tnx for the answer, yeah I know that it is in the index 11, but I just wanted to mention that the code is not right :). Wow! It is exactly one of my questions, but I was just ignoring it ;).