Compiling and Loops6:51 with Kenneth Love
Patterns, sets, groups, oh my! Now that we have so much control over what our patterns find and what we can create from the
Matches we get back, let's look at a way to turn regular expressions into a variable and then loop over our results.
re.compile(pattern, flags)- method to pre-compile and save a regular expression pattern, and any associated flags, for later use.
.groupdict()- method to generate a dictionary from a
Matchobject's groups. The keys will be the group names. The values will be the results of the patterns in the group.
re.finditer()- method to generate an iterable from the non-overlapping matches of a regular expression. Very handy for
.group()- method to access the content of a group.
0or none is the entire match.
1through how ever many groups you have will get that group. Or use a group's name to get it if you're using named groups.
Eventually, you're going to be in a situation where you need to 0:00 use a pattern on 100 different items in a big loop. 0:02 Or maybe it'll be 500, or 1000. 0:05 You don't wanna have to build and match the pattern every single time. 0:08 So the regular expression library has given us a way to compile a pattern into 0:11 an object that we can match against. 0:15 Let's look at how to do that. 0:17 We've already seen that we can make patterns as strings. 0:18 We stored them as a variable back, way, way back in the first video. 0:21 But, that's not really the most useful thing in the world. 0:25 I mean like, if we have, say this string, yeah, it's handy having this as 0:27 a variable, but it doesn't save us a lot of time or trouble. 0:32 It'd be better if we could say that pattern in a state where it 0:36 was ready to go. 0:40 It's ready to be used as a regular expression. 0:41 And we can do that. 0:44 That's what the Compile method let's us do. 0:46 So that's actually change line here. 0:49 Instead of Search, we'll wanna say Compile. 0:52 And what that does is this is gonna take the regular expression and compile it, 0:55 get it ready for use. 0:59 Now the one other thing we have to change, is we have to take out where it says data. 1:01 Because when we compile a regular expression, 1:06 we don't compile it with the data it's gonna be run against. 1:08 It's like making the regular expression a bit more generic. 1:11 We can now run it against a lot of different things. 1:13 Not just the one thing that we did the search against, or the match or whatever. 1:16 So, I'm gonna take out this line one and let's look at this one. 1:21 So we've got line.groupdict and 1:24 that no longer makes sense because line isn't a match. 1:26 So let's go ahead and do a match. 1:29 We'll do re.search, line, data, and then groupdict. 1:31 So what we did here, is we created a regular expression search, 1:39 just like before, and we said, okay, your pattern is this compiled one, it's line. 1:43 And the string you're gonna match it against is data. 1:49 And now notice, we didn't specify any flags. 1:51 You specify the flags when you do the compile, not when you do the search. 1:54 So let's save that and let's Run. 1:59 And there we go, we got that same thing. 2:03 So, that's great. 2:06 But what's really, really, really cool, 2:07 at least I think so, is we don't have to do this re.search stuff. 2:10 What we can do, instead, is we can just use line directly. 2:14 [BLANK_AUDIO] 2:19 So we can just say, okay, take line. 2:23 It's a pattern. We know it's a pattern. 2:25 Do a search with it against data. 2:27 So let's take that out. 2:31 Run this again, and we should get the exact same content, and we do. 2:33 So, that's pretty cool. 2:37 But, we've only got one thing. 2:39 We're only, we're only getting my line. 2:41 How do we get the rest of the lines? 2:42 Well, this is the last part of the r-a library that I wanna go over with you. 2:44 But there's still more in there to explore. 2:49 Go check the docs. 2:51 They're awesome. 2:52 It's a method that's named finditer. 2:53 And it gives us back an iterable of each non-overlapping match. 2:55 It's kind of like giving us back a list, but it's not exactly a list. 2:59 It's also kinda like using Find All, but instead of getting back topples, 3:02 we get back a match object, like when we use re.match or re.search. 3:07 So let's try this out. 3:11 Let's come back up here and let's say for 3:13 match in line.search, oops sorry. 3:18 Not line.search. 3:22 Line.finditer against data. 3:24 We want to print match.group. 3:29 Name. 3:32 So, this .group method, 3:33 when you have a match object says, show me whatever is inside of the group. 3:36 Now, I can say, group with a, with nothing, and 3:41 it'll show me the whole thing that it captures. 3:44 I can do group with a number, and it'll show me the group at that index. 3:47 So, 0, 1, 2, so on. 3:50 Or I can give it a name and it'll show me the group at that name. 3:52 All right, so let's try that out. 3:56 So there we go. 4:00 There's everybody's names. 4:01 You can see Ryan Carson, The Doctor, Exampleson, Example, so on. 4:03 All right, that's all of our names. 4:06 We can of course ask for any group that we want or do all sorts of other stuff. 4:09 For example, let's make a fully qualified email to headers. 4:14 You, you've seen this before where it's like a name and then less than, 4:17 their email address and greater than, right? 4:21 Okay, so before we do that, 4:23 I want to actually come back and edit our pattern, and add two things. 4:25 So what's cool is we can have subpatterns, or subgroups, sorry. 4:30 So we can do here last, and that's the last name. 4:34 And we can do here first, and that of course is the first name. 4:42 So there's our first name, there's our last name, as these little sub patterns. 4:49 'Kay? 4:52 So now in our. 4:53 For loop, we need to print out a new thing. 4:55 And let's print out first, last, email. 5:00 'Kay. 5:09 And we wanna format this. 5:10 With the match.groupdict. 5:12 Right? It'll go ahead and 5:17 find those keys in the groupdict, it'll use those as keyword arguments. 5:19 All right, so let's try this out, we should have email addresses for everybody. 5:26 Check it out, we've got email addresses. 5:30 There's the name, and an email address. 5:32 That works pretty well, and it's a lot simpler than I 5:35 would have actually expected it to be if you were just like hey, 5:38 turn this into a bunch of email addresses. 5:41 Hopefully this will give you some ideas on how to take our address book further. 5:43 You can build all sorts of things out of dictionaries. 5:47 Made from strings through regular expressions. 5:50 In this course, compiling our pattern doesn't actually save us a lot of time or 5:53 memory, but it's a good habit to get into, so 5:56 you won't forget to do it when it actually matters. 5:59 And using things like finditer helps us to save even more memory. 6:01 Compiling patterns also lets you make patterns available for 6:06 import in other parts of your applications, or even in other packages. 6:08 This way, you can make your pattern as perfect as possible in just one place and 6:12 reap the benefits anywhere else that you need it. 6:16 I think we're done. 6:18 We've gone over pretty much every area of regular expressions in Python, and 6:20 we've turned a block of text that no one would want to wade through into super 6:24 useful dictionaries that we can put into classes or transform into a new text file. 6:27 In fact, that's your extra credit for this course. 6:32 Take the regex we wrote and the text and make a class for 6:34 a person to find by the data. 6:37 Give them names and a phone number and an email address. 6:39 Maybe make another class for an address book that collects all these 6:41 people together and let's the user search them. 6:44 Be sure to share it on the forums. 6:46 Thanks so much for being part of this course. 6:48 I'll see you next time. 6:50
You need to sign up for Treehouse in order to download course files.Sign up