Groups9:45 with Kenneth Love
Now that we can search for just about anything, let's organize our results a bit better. Regular expressions give us indexed and named groups, both of which are super-handy.
([abc])- creates a group that contains a set for the letters 'a', 'b', and 'c'. This could be later accessed from the
(?P<name>[abc])- creates a named group that contains a set for the letters 'a', 'b', and 'c'. This could later be accessed from the
.groups()- method to show all of the groups on a
re.M- flag to make a pattern regard lines in your text as the beginning or end of a string.
^- specifies, in a pattern, the beginning of the string.
$- specifies, in a pattern, the end of the string.
When we create our patterns, they just match the entire thing. 0:00 We've seen it already where our match objects only have one group in them. 0:02 It's often really handy to have multiple groups defined inside of your pattern, so 0:06 that you can later access just parts of the text that you care about. 0:10 Like for our case, making a group for the email address, and a group for 0:13 the phone number, and a group for 0:16 the name would make it a lot simpler later to pull those out and use them. 0:18 So, at this point, 0:22 we've gotten to where we can catch pretty much anything in our text file. 0:23 So I think what we should do now is we should kind of just use all these 0:28 things at once. 0:30 This might get a little confusing, so 0:33 what we're gonna do is we're actually gonna break these up into groups. 0:34 So let's let's start this out with our normal print and re.findall. 0:38 And then let's do a large verbose one and we're gonna need re.X. 0:45 So, all right. 0:51 Now we can write our pattern. 0:53 I'm gonna add in some extra lines her just so 0:55 I can make this a little bit more readable. 0:57 All right, so we define groups with parentheses. 1:00 So, our first group here, we wanna capture the last name and the first name. 1:04 So for the last name, we need that. 1:09 So hyphens, word characters, and spaces. 1:17 Any number of those from zero on up. 1:20 And then we need a comma, and we need an actual space, and 1:23 then we need hyphen w space again, and that's our group. 1:27 So that's last name, comma space first name. 1:33 And then there's gonna be a tab. 1:36 All right. So let's make a little note of 1:38 that last and first names. 1:39 Okay. So now for 1:42 our email address, which was our next thing in our line. 1:43 those, oops, those should cover our items. 1:49 So hyphens, word characters, numbers, periods, and plus signs. 1:53 So we've got one or more of those. 2:00 We have an at symbol, and then we again have hyphens, 2:01 word characters, digits, and periods. 2:05 One or more of those, and then there's a tab. 2:08 And this is for our email. 2:12 All right, so what comes next? 2:14 Well, next is the phone number. 2:16 So, remember we have to escape these parentheses and 2:18 we wanna mark them as optional. 2:21 So, then there's three digits. 2:23 And there is closing parentheses that is optional. 2:27 There is a hyphen that is optional, and a space that is optional. 2:31 And then there are three numbers, a hyphen, and the four numbers. 2:36 That's our group, and then there's a tab. 2:42 So we'll say that's phone. 2:43 yep, all right. 2:47 Then we have the job and the company that they work for. 2:49 So, this is a whole lot like our one that captures the names, but 2:53 we don't have a lot of stuff in here. 2:59 So, it's pretty much just word characters and spaces. 3:02 So there can be one or more of those, 3:05 a comma, some sort of white space, and then again words and spaces. 3:07 And then of course there's a tab. 3:14 So job and company. 3:16 And then the last thing that we put in there on some of the lines at 3:18 least is a Twitter account. 3:21 So, let's grab that, Twitter is actually really easy to grab. 3:23 It's just /w/d, because, 3:27 I guess, no underscores are being included in slash w. 3:31 You can't have hyphens, you can't really have special characters, 3:36 you can just have numbers and letters. 3:39 So, that's that for Twitter, and let's mark that Twitter. 3:40 All right. So, that's our pattern. 3:47 Now it's a really long reg X, and there's actually a couple of problems with this, 3:49 things that it won't catch. 3:53 But let's run it and, and see what we get. 3:55 So, we'll come down here, python address_book. 3:58 And we can see like, you notice that there's opening parenthesis, there's a, 4:02 a tuple. 4:07 Yeah, you see the tuple? 4:08 And the tuple shows all of our little groups that we caught. 4:09 Each item in the tuple is one of our groups. 4:13 So that's pretty awesome. 4:16 We're gonna come back to that. 4:17 Do you notice there's anything missing? 4:19 Dave's not here. 4:21 And King Arthur isn't in here. 4:23 And the reason is because they don't have some of the items that we're looking for. 4:25 So since they don't match exactly, they don't get included. 4:32 So what we should do is we should go back and 4:36 mark a couple of things as being optional. 4:38 We're also gonna do a couple of other tricks here. 4:41 So, let's see. 4:43 The first thing we're gonna do is we're actually gonna add a symbol right here. 4:44 We're gonna add the carrot, and that means the beginning of the string. 4:49 Okay. 4:54 And to compliment that, right down here right after that closing parenthesis, 4:54 we're gonna put in a dollar sign, which marks the end of the string. 4:59 'Kay, we've got another trick we're gonna do for 5:04 that in just a minute, but remember those. 5:06 So Tim doesn't have a last name. 5:10 So we'll mark those as completely optional. 5:12 And everybody's got email. 5:15 I don't think there's anything we need to change on email. 5:17 And some of them. 5:20 Let's see. 5:22 I think they all have phone numbers. 5:23 Some of them, however, don't have jobs listed. 5:25 So, rather, they have jobs listed. 5:30 They may not have if they don't have a phone number, 5:33 then we mar, oh, sorry, yeah, we wanna change this. 5:37 A phone number is optional. 5:41 We wanna make that phone number optional. 5:42 If they don't have a phone number, it won't be there. 5:44 The tab after job, if they don't have a Twitter account, 5:48 the tab after job will actually be a new line. 5:51 So that tab won't be there. 5:54 We wanna mark that tab as being optional. 5:55 And really over here in the company name, 5:58 we should add in a dot as being a possible character. 6:01 Because we've got that one, that co dot. 6:04 So we want to be able to mark that, or catch it. 6:05 And then some of them don't have Twitter accounts, so 6:08 let's make Twitter optional as well. 6:10 The other thing we need to add, because we marked beginning and end of the string. 6:13 And our string is this entire thing. 6:17 We want our string to be in one line. 6:20 Right? 6:24 So what we need to do is we need to add in re.MULTILINE. 6:25 And what that says is treat each line a return me and count our slash in. 6:29 Treat that as the end of the string. 6:33 So, it turns our one big string into a lot of strings, 6:35 as far as the regular expression engine is concerned. 6:39 Okay? 6:42 If we want we can do re.M, instead of re.MULTILINE. 6:43 So either way that's gonna work. 6:48 All right, let's try this one out. 6:51 Look at that. We've got a whole lot more stuff. 6:55 I do believe we've got everything for everyone. 6:56 There's the doctor, even with his big email address. 6:59 [BLANK_AUDIO] 7:02 We got Tim. 7:05 We got everybody in there. 7:06 All of our stuff is there. 7:07 So that's amazing. 7:09 That's awesome. 7:09 So, what we wanna do now though, is we wanna make this regular expression. 7:11 It's really handy as it is, but 7:15 it's just giving us out a list of tuples when we do this find all. 7:17 And no matter what we did, we would only get tuples, and 7:21 we would get like index positions. 7:24 What I wanna do though, is I wanna be able to turn this into a dictionary, so 7:26 that I can use that dictionary and do something else with it. 7:29 So let's take our groups and make them named groups. 7:34 So the way that we do that, we don't have to change any of our code. 7:38 Our code gets to stay the same. 7:41 We just add on a couple of things. 7:42 We add a question mark and a p, and this is what makes it a name. 7:43 And then we specify the name inside of less than and greater than signs. 7:48 So we're gonna name this first group name, cuz that's what it is. 7:52 The second group we're going to name email. 7:58 The third group we will name phone. 8:02 The fourth group we'll name job. 8:06 And the last group, we'll name Twitter. 8:10 All right? I think that's pretty good. 8:16 But let's actually, instead of doing all of this here and, and 8:18 printing, let's make this a little easier for ourselves. 8:23 Let's say line equals and let's do a search. 8:25 [BLANK_AUDIO] 8:28 And then we need to get rid of one of these. 8:30 All right? So line is a search. 8:32 For right now it's just gonna be that first line. 8:34 It's just gonna be me. 8:35 But we can print out what this gets. 8:37 So let's print out line. 8:39 [BLANK_AUDIO] 8:41 And then let's also print out line line.groupdict. 8:43 And let's see what's these two things do. 8:50 So, okay, let's come down here, address book. 8:52 So when we print out line, we get this match objects. 8:55 All right. 8:57 And the match object catches a whole bunch of stuff. 8:58 But when we print the dictionary, look what we get. 9:00 We've got the dictionary that has the name and email address, and the job. 9:02 Yeah, it gets the slash t on the job, but that's okay. 9:06 And Twitter gets kennethlove and the phone gets the phone number. 9:10 So we got all this stuff. 9:12 That's so much better than what we've gotten before when we 9:14 were just getting these tuples. 9:17 So our next video, we've got just two last big steps and 9:20 we'll have turned this in to something absolutely amazing. 9:24 Wow, using groups, especially named groups, 9:26 makes our string almost act like an object or dictionary. 9:29 We've turned a simple string into really useful data, good job, us. 9:32 All right, just a bit more to go, and we'll have this in the bag. 9:37 In our next video, let's look at making reusable patterns, and 9:39 how we can loop over our addresses in a more useful manner. 9:43
You need to sign up for Treehouse in order to download course files.Sign up