Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python CSV

How to filter CSV rows by keywords from another CSV file?

Hi,

I have a CSV file with up to 1000 rows, and I want to filter out rows within the CSV file that meets specific criteria.

I have a list of keywords/strings in one file ('maths teacher', 'English teacher', 'receptionist', 'deputy head' etc) named titles.csv

In the other file, I have multiple columns and rows, and one of the column fields is occupation. I want to be able to filter out all rows in this CSV file that don't include the list of occupations in titles.csv

Can anyone help me with this problem? I've heard Pandas may be the way to solve this, but I am keen to try and do it using csv module.

1 Answer

Dave Faliskie
Dave Faliskie
17,793 Points

If you want to use pythons csv module you could basically open both of your csv files and store them as 2D lists [[],[],[]]. Then you could loop through both 2D list and compare the individual columns that would have matching keywords. Then only save the non-matching rows to a new 2D list and finally write that to a new csv file.

here is a start:

import csv
# function to read a csv file to a 2D list ex: [[col1,col2,col3],[col1,col,2col3]]
def read_from_file(csv_file):
    with open(csv_file, 'rb') as f:
        reader = csv.reader(f)
        data = list(reader)
    return data

data_from_first_csv = read_from_file(file1.csv)
data_from_second_csv = read_from_file(file2.csv)

output_data = []
for row in data_from_first_csv:
    for key in data_from_second_csv:
        # if col 1 in the first csv matches any key in col 2 of the second csv skip it
        if row[0] != key[1]:
            output_data.append(row)

// write the output to a new csv
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(output_data)

Hope this helps

Thank you for the help!