Reading Files7:00 with Kenneth Love
Before we can search through our text, we have to be able to open the file it's contained in. Then we can start with some super-specific searches of our text.
open()- Opens a file in Python. This won't contain the content of the file, it just points to it in memory.
.read()- Reads the entire contents of the file object it's called on.
.close()- Closes the file object it's called on. This clears the file out of Python's memory.
r'string'- A raw string that makes writing regular expressions easier.
re.match(pattern, text, flags)- Tries to match a pattern against the beginning of the text.
re.search(pattern, text, flags)- Tries to match a pattern anywhere in the text. Returns the first match.
A better way to read files
If you don't know the size of a file, it's better to read it a chunk at a time and close it automatically. The following snippet does that:
with open("some_file.txt") as open_file: data = open_file.read()
with causes the file to automatically close once the action inside of it finishes. And the action inside, the
.read(), will finish when there are no more bytes to read from the file.
Why didn't I cover this in the video? There's more going on here, behind-the-scenes, and I think it's better to know the long way first.
[MUSIC] 0:00 Regular expressions let us match patterns against text. 0:04 For example, if I had a pattern that said, 0:08 I want to find the first time the word ghost is used in Charles Dickens's, 0:10 A Christmas Carol, I could do re.search(r'ghost,christmas_carol). 0:13 But knowing that I can do it doesn't help you, does it? 0:19 Before we get into 0:22 work spaces though let's talk about the problem we're going to solve. 0:23 I have a text file full of names, phone numbers, email addresses, etc. 0:26 The problem is that it's kind of garbled and 0:30 some of the people don't have all of their information. 0:32 I'd like to get this sorted out, to where I can turn them into classes and 0:35 make a nice interface for looking at my contacts. 0:38 Regular expressions are made for 0:40 processing text, so this should be really doable. 0:42 Let's get to it. 0:44 Well I guess, first thing's first. 0:46 We need to read the file,. 0:47 The file we want to read is this names.txt one. 0:50 And let's go down here to our Python shell and see about doing that. 0:53 So, we can read in a file using a really handy function that 0:59 Python gives us called open. 1:04 And we give it a name of the file that we wanna open, 1:07 just in case because this file does have some UTF-8 characters. 1:12 Let's actually give it an encoding of utf-8. 1:15 That way it knows that it is utf-8. 1:19 So we do that and we get the file open. 1:23 I mean that's great, but you know, 1:26 actually I don't wanna do this in the shell. 1:28 I wanna do this in a, in an actual script. 1:30 So we've used open. 1:32 Let's just, let's get out of here and let's do this as an actual script. 1:34 So let's go up here and we'll make a new file. 1:39 Let's call this address_book.py, cuz we're making an address book. 1:41 Okay. 1:48 So inside here let's go ahead and do just what we did before. 1:49 So names_file we're gonna open up names.txt, 1:54 and we're gonna say that the encoding is utf-8 just in case. 2:00 And so what we're doing here, 2:05 names_file isn't the file, like the, the contents of the file. 2:06 Names_file is a pointer to the file on the file system, 2:10 which we can then do things to. 2:13 We can read from it or close it or whatever. 2:15 And in fact, we're gonna do that. 2:18 We're gonna read from it. 2:19 So let's do names_file.read. 2:20 So that puts all the contents of names_file into data. 2:24 Now I know that names.txt isn't really that big of a file, it's fairly small. 2:30 But if I knew it was a really big file or 2:35 I didn't know how big it was, there's a slightly better way of handling this. 2:37 And I'm gonna post that in the teacher's notes. 2:43 I wanna do a more standard, less magical and fancy version here in the course. 2:45 So, now we have the file opened and 2:51 we've read the file, we have all the contents of it. 2:54 We don't need that file anymore. 2:56 So what we're gonna do now is we want to close it so 2:58 that we're no longer pointing to it and it's erased out of memory. 3:00 So that's it. 3:06 That's opening the file, reading the file, and then closing the file. 3:08 I love when all of these actions are just really simple. 3:12 So let's print out what we have in data, just to make sure that it's what we want. 3:16 All right. 3:23 So let's come down here to our console and run this. 3:24 Python address_book.py. 3:29 And there we go. We've got all of our names. 3:31 So that's great. 3:34 That's everything. 3:35 All right, so now we wanna start on our regex stuff. 3:37 I'm gonna get rid of that print. 3:41 And so, everything that we're gonna do with regex, 3:43 we need the regex library, which is called re for regular expressions. 3:46 So, we're gonna do a match. 3:52 Let's try and match if we look back at names.txt the first name here is my name. 3:55 So let's see if we can match part of my name. 4:01 So let's do print re.match and love. 4:04 And we're gonna match that against data. 4:10 And then, you know what, let's do another one here. 4:13 Let's do re.match Kenneth and that's also against data. 4:15 Now, why do our strings in here, in our matches have an r in front of them? 4:22 Well, that tells Python that it's a raw string as opposed to 4:27 a regular string, a well done string. 4:31 This saves us some very confusing repeated uses of the backslash character or 4:35 the escape character. 4:42 We'll talk about those in a later video if, if it comes up. 4:43 After the r, we're just looking for the two major parts of my name. 4:47 Let's run this and see what we get. 4:52 So we come down here, python address_book.py, and 4:55 we get back a, a match object for the first one. 5:00 See, we've got match object, and it matched love. 5:03 Cool. 5:07 But we got none for the second one. 5:08 Why did we get none? 5:10 Well, if you remember I said that match, match is from the beginning of the string. 5:12 Our string starts with love, but it doesn't start with Kenneth. 5:20 Kenneth comes after love comma space. 5:25 So the second match can't ever, doesn't ever happen. 5:27 But if we come back and we change this match to search, 5:31 then now we should be able to do it. 5:38 And look, we've got two match objects now. 5:40 We've got one matching Love and one matching Kenneth. 5:44 So if you're matching at the beginning of a string, use match. 5:47 If you're matching somewhere in the string, use search. 5:50 Now you may have all ready guessed this, but 5:54 if you didn't, we can do these as variables. 5:57 They're just strings. 6:00 So we could say last_name equals love, first_name equals Kenneth. 6:02 And then here, do last_name and first_name. 6:10 And this would still work the same way and we still get our two matches. 6:17 Sometimes that's a whole lot easier than writing them inside 6:24 the re.match re.search areas. 6:27 Toward the end of this course, 6:31 we're going to look at another way of making patterns as variables. 6:32 For now though, I'm just gonna stick with doing it all in one step as we, 6:36 as we first did. 6:40 Feel free to make the variables if you want, though. 6:41 All right, we can open, close, and read files and match very exact patterns. 6:44 It might not seem like much, but 6:49 these are definitely the first steps on a long path to regular expression magic. 6:51 In our next video, we're going to look at a more common search method and 6:55 some more forgiving pattern options. 6:58
You need to sign up for Treehouse in order to download course files.Sign up