data_sample[0].append(search_term) - doesn't that change the original data set being passed in? Is that bad?

Question

isn't data_sample just a pointer/reference to the original array being passed in?

If so, aren't we actually modifying the array being passed in (the first row to be exact, since data_sample[0] is a reference to the first row). It doesn't seem like this is a good idea or the intent.

Furthermore, if I print out the result of an append operation (e.g. ['a'].append('b')), I get None, so I don't understand the value or expected result of nested appends.

Answer 1 · 2015-11-17T11:17:11Z

November 17, 2015 11:17am

You appear to have found a bug (or at least unintended results) in the lesson programming. Retyping the code from s3v1.py:

from s2v5 import *

def create_bool_field_from_scratch_term(data_sample, search_term):
    new_array = []
    new_array.append(data_sample[0].append(search_term))

    for row in data_sample[1:]:
        new_bool_field = False
        if search_term in row[7]:
            new_bool_field = True

        row.append(new_bool_field)
        new_array.append(row)

    return new_array

def filter_col_by_bool(data_sample, col):
    matches_search_term = []
    for item in data_sample[1:]:
        if item[col]:
            matches_search_term.append(item)

    return matches_search_term

my_new_csv = create_bool_field_from_scratch_term(data_from_csv, "cashmere")
number_of_cashmere_ties = number_of_records(filter_col_by_bool(my_new_csv, 11))
print("length:", number_of_cashmere_ties)

isn't data_sample just a pointer/reference to the original array being passed in? Correct

If so, aren't we actually modifying the array being passed in (the first row to be exact, since data_sample[0] is a reference to the first row). Correct.

It doesn't seem like this is a good idea or the intent. It usually is not a good idea. However, since the code reloads from the data.csv file each time the code is run, it isn't fatal.

Furthermore, if I print out the result of an append operation (e.g. ['a'].append('b')), I get None, so I don't understand the value or expected result of nested appends. Correct. The value of my_new_csv[0] is None. This is because the .append() method returns None. See help(list.append)

It turns out that this code is modifying data_from_csv then building the new data in my_new_csv using the modified objects.

In Python, two objects are considered the equivalent if that have the same id(). Adding a loop to compare the object ids between my_new_csv and data_from_csv shows they have the same contents:

print("id(my_new_csv):", id(my_new_csv), " id(data_from_csv):", id(data_from_csv))
id_match = 0
for new_data, old_data in zip(my_new_csv, data_from_csv):
    if id(new_data) == id(old_data):
        id_match += 1

print("len my_new_csv: ", len(my_new_csv))
print("id_match: ", id_match)

We get:

id(my_new_csv): 140320757671112  id(data_from_csv): 140320730889160
len my_new_csv:  5051
id_match:  5050

The top-level list object ids are different. Inside the lists, the only object not to match is row 0 due to the None as the first row of my_new_csv

A No Side-effect Alternative

Rewriting create_bool_field_from_scratch_term to create copies instead of changing data_from_csv

def create_bool_field_from_scratch_term_copy(data_sample, search_term):
    # no need to init new_array if assigning in next statement
    # new_array = []
    # create copy of data_sample[0] as first item of new_array
    new_array = [data_sample[0][:]]
    # append search_term
    new_array[0].append(search_term)

    for row in data_sample[1:]:
        new_bool_field = False
        if search_term in row[7]:
            new_bool_field = True

        # create copy of row
        new_row = row[:]
        # append Boolean value
        new_row.append(new_bool_field)
        # append new_row to new_array
        new_array.append(new_row)

    return new_array

Tagging Kat Chuang for comment

Welcome to the Treehouse Community

Looking to learn something new?

Shane Kercheval

Shane Kercheval

data_sample[0].append(search_term) - doesn't that change the original data set being passed in? Is that bad?

1 Answer

Chris Freeman

Chris Freeman

A No Side-effect Alternative

daniel steinberg

daniel steinberg

Chris Freeman

Chris Freeman

daniel steinberg

daniel steinberg