Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Data Science Basics Cleaning Data Clean Up Your Data

data_sample[0].append(search_term) - doesn't that change the original data set being passed in? Is that bad?

isn't data_sample just a pointer/reference to the original array being passed in?

If so, aren't we actually modifying the array being passed in (the first row to be exact, since data_sample[0] is a reference to the first row). It doesn't seem like this is a good idea or the intent.

Furthermore, if I print out the result of an append operation (e.g. ['a'].append('b')), I get None, so I don't understand the value or expected result of nested appends.

1 Answer

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,423 Points

You appear to have found a bug (or at least unintended results) in the lesson programming. Retyping the code from s3v1.py:

from s2v5 import *

def create_bool_field_from_scratch_term(data_sample, search_term):
    new_array = []
    new_array.append(data_sample[0].append(search_term))

    for row in data_sample[1:]:
        new_bool_field = False
        if search_term in row[7]:
            new_bool_field = True

        row.append(new_bool_field)
        new_array.append(row)

    return new_array

def filter_col_by_bool(data_sample, col):
    matches_search_term = []
    for item in data_sample[1:]:
        if item[col]:
            matches_search_term.append(item)

    return matches_search_term

my_new_csv = create_bool_field_from_scratch_term(data_from_csv, "cashmere")
number_of_cashmere_ties = number_of_records(filter_col_by_bool(my_new_csv, 11))
print("length:", number_of_cashmere_ties)

isn't data_sample just a pointer/reference to the original array being passed in? Correct

If so, aren't we actually modifying the array being passed in (the first row to be exact, since data_sample[0] is a reference to the first row). Correct.

It doesn't seem like this is a good idea or the intent. It usually is not a good idea. However, since the code reloads from the data.csv file each time the code is run, it isn't fatal.

Furthermore, if I print out the result of an append operation (e.g. ['a'].append('b')), I get None, so I don't understand the value or expected result of nested appends. Correct. The value of my_new_csv[0] is None. This is because the .append() method returns None. See help(list.append)

It turns out that this code is modifying data_from_csv then building the new data in my_new_csv using the modified objects.

In Python, two objects are considered the equivalent if that have the same id(). Adding a loop to compare the object ids between my_new_csv and data_from_csv shows they have the same contents:

print("id(my_new_csv):", id(my_new_csv), " id(data_from_csv):", id(data_from_csv))
id_match = 0
for new_data, old_data in zip(my_new_csv, data_from_csv):
    if id(new_data) == id(old_data):
        id_match += 1

print("len my_new_csv: ", len(my_new_csv))
print("id_match: ", id_match)

We get:

id(my_new_csv): 140320757671112  id(data_from_csv): 140320730889160
len my_new_csv:  5051
id_match:  5050

The top-level list object ids are different. Inside the lists, the only object not to match is row 0 due to the None as the first row of my_new_csv

A No Side-effect Alternative

Rewriting create_bool_field_from_scratch_term to create copies instead of changing data_from_csv

def create_bool_field_from_scratch_term_copy(data_sample, search_term):
    # no need to init new_array if assigning in next statement
    # new_array = []
    # create copy of data_sample[0] as first item of new_array
    new_array = [data_sample[0][:]]
    # append search_term
    new_array[0].append(search_term)

    for row in data_sample[1:]:
        new_bool_field = False
        if search_term in row[7]:
            new_bool_field = True

        # create copy of row
        new_row = row[:]
        # append Boolean value
        new_row.append(new_bool_field)
        # append new_row to new_array
        new_array.append(new_row)

    return new_array

Tagging Kat Chuang for comment

Chris ty for this...saved a lot of frustration.

one quick question.. why doesn't: "new_row = row[:]" just point to the same thing..that is, why don't you have to use copy?

ty again

Chris Freeman
Chris Freeman
Treehouse Moderator 68,423 Points

The [:] slice notation always returns a new object. The docs say:

All slice operations return a new list containing the requested elements. This means that the following slice returns a shallow copy of the list:

>>> squares = [1, 4, 9, 16, 25]
>>> squares
[1, 4, 9, 16, 25]
>>> squares[:]
[1, 4, 9, 16, 25]

Because of this feature, a slice has become shorthand for a copy.

ty!