CSV7:52 with Kenneth Love
CSVs, or comma-separated value files, are a common file-based database-like format. Python has a built-in `csv` module that makes working with these files quick and easy.
Python has great documentation for the
Sometimes you might have issues with the
csv module and files that contain a lot of Unicode, especially if you're using Python 2. If you run into that issue, check out unicodecsv.
For both CSV and JSON files, and others, the amazing tablib library is a great thing to check out.
[MUSIC] 0:00 In the data science and business software worlds, 0:04 it's hard to escape the dreaded spreadsheet. 0:07 Well, maybe you don't dread them as much as I do. 0:09 I just never learned to use Office software appropriately. 0:12 Okay, enough about me. 0:15 Anyway, in those two worlds and many others, 0:16 a lot of your work as a coder will be using CSV or comma separated value files. 0:19 These files usually have a header row and then multiple rows below them with each 0:24 columns of value separated from the previous column by a comma. 0:29 Yeah, it's a spreadsheet without all of the Excel excellence. 0:33 Python though has a pretty robust CSV module that lets us read and 0:37 write tabular data in this popular format. 0:40 Now if you've done the data science basics course you've already seen how to 0:43 use CSVs, but it's worth focusing on this for all of you that have to use CSVs but 0:46 don't need all of the information in that course. 0:51 I have a CSV here from the Canada Science and Technology Museums Corporation. 0:55 And has about a hundred of their artifacts in it but I 1:00 can't display CSVs in work spaces and even if I could I don't really wanna read it. 1:04 I wanna use it, so let's make a file for this. 1:09 I'm gonna say new file, artifacts.py, 1:12 Python's CSV module is where we wanna start, at least most of the time. 1:17 I'll put a link or two in the teacher's notes for some other CSV libraries. 1:21 The biggest problem that Python's built-in one has 1:24 is that it's slightly lacking when it comes to Unicode support. 1:27 But let's just see what we can do with it. 1:31 So we're gonna import csv. 1:33 And we're gonna, obviously the first thing I have to do is import the library. 1:35 And then I wanna see about reading from our file. 1:38 We need to open the file a little bit differently than we normally would do. 1:41 Normally we would just get with you know a flag or something. 1:45 And in this case we wanna give it a new line argument, so let's just say, 1:48 with open museum.csv, 1:51 and then when I say newline is equal to and just an empty thing here. 1:57 The reason I want to do that is because if we have a new line inside 2:00 of a quoted section then we want to be sure and 2:04 be able to differentiate that from an actual new line marking a new line. 2:08 So yeah. 2:13 And we're just gonna call this csvfile. 2:15 Oops, I forgot my colon. 2:17 And so now we need to make our CSV reader. 2:18 And we instantiate this reader so that we can tell it how to read from the file. 2:21 So I'm gonna call this, I don't know, artreader, and it will be csv.reader. 2:25 And it's gonna be a csv file. 2:32 And the delimiter, 2:34 I know this from looking at my, I think that's how you spell it, 2:35 they actually used a pipe character as the delimiter instead of a comma. 2:40 So really it's a pipe separated value file. 2:45 Whatever, the idea is that something separates each thing. 2:48 You kind of have to look at your csv files to see what they're using as a delimiter. 2:51 All right, so our artreader object here is a generator. 2:55 That means we can't index it. 2:59 We can't do like artreader two but I only wanna look at a few of the records. 3:00 So I'm gonna go ahead and turn this thing into a list. 3:05 So I'm gonna say rows = list (artreader). 3:07 And now I can loop over it. 3:12 So for row in rows up to two, let's print a comma and a space. 3:14 And we'll join whatever is in that row. 3:22 So okay, cool. 3:25 And now, let's go ahead and python artifacts.py. 3:27 So, cool. 3:33 There is the stuff that's in our file. 3:34 So you can see our first row that we printed out there, 3:37 those are our field names. 3:40 So, artifact number, object name, category three, and so on. 3:41 And then we get this thing here, that here's the number, it's a cover, 3:45 it's paper. 3:50 And so on. 3:51 So that's cool. 3:53 The problem with this is, I have to know, like, which column the thing I want is in. 3:53 And that's not always the easiest thing, right? 3:58 So let's turn these into dictionaries instead of just lists. 4:02 And we can do that by changing our reader. 4:07 So we have this reader here. 4:10 Think of this as like a list reader and 4:12 we can actually replace that with what's called DictReader. 4:14 And DictReader reads and gives us dictionaries. 4:18 So that's pretty cool, all right. 4:22 So we'll just take those first two. 4:24 Actually, let's skip that first one. 4:26 And let's print out row and 4:29 then let's print out group1, so whatever they have in their group1 field. 4:31 Let's go ahead and run that again. 4:37 And our two groups, or 4:40 our two rows rather, both have Aviation as their group1. 4:42 So that's pretty cool. 4:46 We can give a list of field names to the DictReader, 4:48 when we specify the DictReader right here. 4:51 And it'll use those as the field names in order, but since we didn't do that, 4:53 it actually reads in that first row, and uses those as the field names. 4:57 So it's kind of smart, which is pretty cool. 5:03 Let's change group one over there to manucity. 5:05 Which is the manufactured city. 5:09 If we run that, then we don't have a manufactured city for 5:12 either of those two items. 5:15 Alright, cool. 5:17 This is for working with a reader. 5:18 Just the thing that reads in the data and parses it. 5:21 Let's talk about making something that writes out to a CSV. 5:24 I'm actually going to do a new file for this. 5:28 Just because I kind of want to keep these a little bit separate, right? 5:32 So I'm gonna call this teachers.py, and again I'm gonna import CSV. 5:36 And then I'm gonna do with open teachers.csv with the a format as csvfile, 5:39 and I'm gonna say my field names are first name 5:48 and last name and topic. 5:55 You can, of course, put whatever fieldnames in here you want. 5:58 You can have hundreds of these, that's completely up to you. 6:03 I'm going to call this 'Teach Writer'. 6:05 And we'll do csv.DictWriter and we're going to write to our csv files. 6:08 And our field names are the field names variable. 6:14 So then let's do teachwriter. 6:20 Writeheader and so that writes out a row that has the header stuff in it. 6:23 And then we'll do teachwriter.writerow. 6:30 And we'll write first_name is Kenneth. 6:36 Last_name is Love and topic is Python. 6:43 And I'm gonna rack those up. 6:51 All right, cool. 6:55 I'm actually pretty good. 6:58 If I had a list of dictionaries that already had all my things in it, 6:59 then I could pass those in to the right rows method as well. 7:01 Or I could call right row multiple times. 7:04 So, like let's add in two rows here. 7:06 I'm gonna add in Alena Holligan, and she teaches PHP. 7:10 All right, so now let's run this. 7:16 Python teachers.py. 7:19 And then if I was to cat teachers.csv, 7:23 then we've got our header row, and then we have the row for me and the row for Alena. 7:31 So that's pretty cool, so 7:38 reading and writing CSVs is fairly straightforward in Python. 7:39 The main thing you have to remember to do is create these field names if you're 7:42 writing I recommend using the DictReader and DictWriter. 7:46 They're a little bit more useful. 7:49
You need to sign up for Treehouse in order to download course files.Sign up