Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Data Science Basics Cleaning Data Filtering Rows

One more additional record?

if I don't get it wrong: the number of all records increase by one because we append the header into the list and count that one record in!

Alex Peck
Alex Peck
6,907 Points

I noticed this too when I had a typo ("gucci") for "Gucci". The result of my search, filter and print returned one "gucci" tie instead of the 171 "Gucci" ties. I realized the search term was case specific, but then I wondered if there was one record in the data_sample that was lowercase, and, if so, how to deal with mixed cases in data sets.

Turns out any search term will return one tie. "falfkafj" brand ties shows one tie, "supercalifragilisticexpialidocious" brand ties shows one tie, etc.

All the results are showing one more record than really exist.

Why aren't we subtracting one from the results to remove the (extra) header record from the count?

Kat Chuang
Kat Chuang
Treehouse Guest Teacher

Excellent question! Notice that the function filter_col_by_string() is returning the header with the data.

Now let me challenge you with this question: Given that the data is in a list structure, how would you remove the header row with Python code?

Alex Peck
Alex Peck
6,907 Points

I can remove the header row by slicing into the list at [1:], ex:

supercalifragilisticexpialidocious_ties[1:]

But I think my prefered method of avoiding this situation would be to change the default behavior of the number_of_records function to include a header check like the one we used in the find_average function. Perhaps something like this:

def number_of_records(data_sample, header=True):
    if header:
        return len(data_sample) - 1
    else:
        return len(data_sample)

What do you think?