1 00:00:00,170 --> 00:00:02,855 Eventually, you're going to be in a situation where you need to 2 00:00:02,855 --> 00:00:05,106 use a pattern on 100 different items in a big loop. 3 00:00:05,106 --> 00:00:08,308 Or maybe it'll be 500, or 1000. 4 00:00:08,308 --> 00:00:11,740 You don't wanna have to build and match the pattern every single time. 5 00:00:11,740 --> 00:00:15,140 So the regular expression library has given us a way to compile a pattern into 6 00:00:15,140 --> 00:00:17,000 an object that we can match against. 7 00:00:17,000 --> 00:00:18,540 Let's look at how to do that. 8 00:00:18,540 --> 00:00:21,200 We've already seen that we can make patterns as strings. 9 00:00:21,200 --> 00:00:25,030 We stored them as a variable back, way, way back in the first video. 10 00:00:25,030 --> 00:00:27,862 But, that's not really the most useful thing in the world. 11 00:00:27,862 --> 00:00:32,218 I mean like, if we have, say this string, yeah, it's handy having this as 12 00:00:32,218 --> 00:00:36,190 a variable, but it doesn't save us a lot of time or trouble. 13 00:00:36,190 --> 00:00:40,329 It'd be better if we could say that pattern in a state where it 14 00:00:40,329 --> 00:00:41,475 was ready to go. 15 00:00:41,475 --> 00:00:44,645 It's ready to be used as a regular expression. 16 00:00:44,645 --> 00:00:46,380 And we can do that. 17 00:00:46,380 --> 00:00:49,250 That's what the Compile method let's us do. 18 00:00:49,250 --> 00:00:52,060 So that's actually change line here. 19 00:00:52,060 --> 00:00:54,290 Instead of Search, we'll wanna say Compile. 20 00:00:55,630 --> 00:00:59,850 And what that does is this is gonna take the regular expression and compile it, 21 00:00:59,850 --> 00:01:01,350 get it ready for use. 22 00:01:01,350 --> 00:01:06,590 Now the one other thing we have to change, is we have to take out where it says data. 23 00:01:06,590 --> 00:01:08,621 Because when we compile a regular expression, 24 00:01:08,621 --> 00:01:11,272 we don't compile it with the data it's gonna be run against. 25 00:01:11,272 --> 00:01:13,808 It's like making the regular expression a bit more generic. 26 00:01:13,808 --> 00:01:16,940 We can now run it against a lot of different things. 27 00:01:16,940 --> 00:01:21,350 Not just the one thing that we did the search against, or the match or whatever. 28 00:01:21,350 --> 00:01:24,870 So, I'm gonna take out this line one and let's look at this one. 29 00:01:24,870 --> 00:01:26,830 So we've got line.groupdict and 30 00:01:26,830 --> 00:01:29,960 that no longer makes sense because line isn't a match. 31 00:01:29,960 --> 00:01:31,492 So let's go ahead and do a match. 32 00:01:31,492 --> 00:01:39,166 We'll do re.search, line, data, and then groupdict. 33 00:01:39,166 --> 00:01:43,337 So what we did here, is we created a regular expression search, 34 00:01:43,337 --> 00:01:49,076 just like before, and we said, okay, your pattern is this compiled one, it's line. 35 00:01:49,076 --> 00:01:51,580 And the string you're gonna match it against is data. 36 00:01:51,580 --> 00:01:54,260 And now notice, we didn't specify any flags. 37 00:01:54,260 --> 00:01:58,250 You specify the flags when you do the compile, not when you do the search. 38 00:01:59,400 --> 00:02:01,240 So let's save that and let's Run. 39 00:02:03,680 --> 00:02:06,420 And there we go, we got that same thing. 40 00:02:06,420 --> 00:02:07,930 So, that's great. 41 00:02:07,930 --> 00:02:10,060 But what's really, really, really cool, 42 00:02:10,060 --> 00:02:14,790 at least I think so, is we don't have to do this re.search stuff. 43 00:02:14,790 --> 00:02:19,240 What we can do, instead, is we can just use line directly. 44 00:02:19,240 --> 00:02:23,544 [BLANK_AUDIO] 45 00:02:23,544 --> 00:02:25,740 So we can just say, okay, take line. 46 00:02:25,740 --> 00:02:27,810 It's a pattern. We know it's a pattern. 47 00:02:27,810 --> 00:02:31,152 Do a search with it against data. 48 00:02:31,152 --> 00:02:32,290 So let's take that out. 49 00:02:33,500 --> 00:02:37,360 Run this again, and we should get the exact same content, and we do. 50 00:02:37,360 --> 00:02:39,220 So, that's pretty cool. 51 00:02:39,220 --> 00:02:41,162 But, we've only got one thing. 52 00:02:41,162 --> 00:02:42,963 We're only, we're only getting my line. 53 00:02:42,963 --> 00:02:44,999 How do we get the rest of the lines? 54 00:02:44,999 --> 00:02:49,804 Well, this is the last part of the r-a library that I wanna go over with you. 55 00:02:49,804 --> 00:02:51,138 But there's still more in there to explore. 56 00:02:51,138 --> 00:02:52,080 Go check the docs. 57 00:02:52,080 --> 00:02:53,340 They're awesome. 58 00:02:53,340 --> 00:02:55,876 It's a method that's named finditer. 59 00:02:55,876 --> 00:02:59,213 And it gives us back an iterable of each non-overlapping match. 60 00:02:59,213 --> 00:03:02,918 It's kind of like giving us back a list, but it's not exactly a list. 61 00:03:02,918 --> 00:03:07,569 It's also kinda like using Find All, but instead of getting back topples, 62 00:03:07,569 --> 00:03:11,880 we get back a match object, like when we use re.match or re.search. 63 00:03:11,880 --> 00:03:13,520 So let's try this out. 64 00:03:13,520 --> 00:03:18,120 Let's come back up here and let's say for 65 00:03:18,120 --> 00:03:22,190 match in line.search, oops sorry. 66 00:03:22,190 --> 00:03:24,599 Not line.search. 67 00:03:24,599 --> 00:03:29,320 Line.finditer against data. 68 00:03:29,320 --> 00:03:32,706 We want to print match.group. 69 00:03:32,706 --> 00:03:33,563 Name. 70 00:03:33,563 --> 00:03:36,570 So, this .group method, 71 00:03:36,570 --> 00:03:41,530 when you have a match object says, show me whatever is inside of the group. 72 00:03:41,530 --> 00:03:44,990 Now, I can say, group with a, with nothing, and 73 00:03:44,990 --> 00:03:47,390 it'll show me the whole thing that it captures. 74 00:03:47,390 --> 00:03:50,680 I can do group with a number, and it'll show me the group at that index. 75 00:03:50,680 --> 00:03:52,980 So, 0, 1, 2, so on. 76 00:03:52,980 --> 00:03:55,810 Or I can give it a name and it'll show me the group at that name. 77 00:03:56,940 --> 00:03:58,140 All right, so let's try that out. 78 00:04:00,040 --> 00:04:01,150 So there we go. 79 00:04:01,150 --> 00:04:03,180 There's everybody's names. 80 00:04:03,180 --> 00:04:06,850 You can see Ryan Carson, The Doctor, Exampleson, Example, so on. 81 00:04:06,850 --> 00:04:09,560 All right, that's all of our names. 82 00:04:09,560 --> 00:04:14,430 We can of course ask for any group that we want or do all sorts of other stuff. 83 00:04:14,430 --> 00:04:17,960 For example, let's make a fully qualified email to headers. 84 00:04:17,960 --> 00:04:21,590 You, you've seen this before where it's like a name and then less than, 85 00:04:21,590 --> 00:04:23,920 their email address and greater than, right? 86 00:04:23,920 --> 00:04:25,200 Okay, so before we do that, 87 00:04:25,200 --> 00:04:30,540 I want to actually come back and edit our pattern, and add two things. 88 00:04:30,540 --> 00:04:34,150 So what's cool is we can have subpatterns, or subgroups, sorry. 89 00:04:34,150 --> 00:04:40,820 So we can do here last, and that's the last name. 90 00:04:42,450 --> 00:04:48,170 And we can do here first, and that of course is the first name. 91 00:04:49,320 --> 00:04:52,885 So there's our first name, there's our last name, as these little sub patterns. 92 00:04:52,885 --> 00:04:53,720 'Kay? 93 00:04:53,720 --> 00:04:54,750 So now in our. 94 00:04:55,950 --> 00:04:59,230 For loop, we need to print out a new thing. 95 00:05:00,730 --> 00:05:09,279 And let's print out first, last, email. 96 00:05:09,279 --> 00:05:10,760 'Kay. 97 00:05:10,760 --> 00:05:12,198 And we wanna format this. 98 00:05:12,198 --> 00:05:17,980 With the match.groupdict. 99 00:05:17,980 --> 00:05:19,910 Right? It'll go ahead and 100 00:05:19,910 --> 00:05:25,400 find those keys in the groupdict, it'll use those as keyword arguments. 101 00:05:26,480 --> 00:05:29,370 All right, so let's try this out, we should have email addresses for everybody. 102 00:05:30,680 --> 00:05:32,300 Check it out, we've got email addresses. 103 00:05:32,300 --> 00:05:35,370 There's the name, and an email address. 104 00:05:35,370 --> 00:05:38,650 That works pretty well, and it's a lot simpler than I 105 00:05:38,650 --> 00:05:41,160 would have actually expected it to be if you were just like hey, 106 00:05:41,160 --> 00:05:43,500 turn this into a bunch of email addresses. 107 00:05:43,500 --> 00:05:47,480 Hopefully this will give you some ideas on how to take our address book further. 108 00:05:47,480 --> 00:05:50,520 You can build all sorts of things out of dictionaries. 109 00:05:50,520 --> 00:05:53,560 Made from strings through regular expressions. 110 00:05:53,560 --> 00:05:56,850 In this course, compiling our pattern doesn't actually save us a lot of time or 111 00:05:56,850 --> 00:05:59,250 memory, but it's a good habit to get into, so 112 00:05:59,250 --> 00:06:01,790 you won't forget to do it when it actually matters. 113 00:06:01,790 --> 00:06:06,080 And using things like finditer helps us to save even more memory. 114 00:06:06,080 --> 00:06:08,910 Compiling patterns also lets you make patterns available for 115 00:06:08,910 --> 00:06:12,880 import in other parts of your applications, or even in other packages. 116 00:06:12,880 --> 00:06:16,750 This way, you can make your pattern as perfect as possible in just one place and 117 00:06:16,750 --> 00:06:18,480 reap the benefits anywhere else that you need it. 118 00:06:18,480 --> 00:06:20,430 I think we're done. 119 00:06:20,430 --> 00:06:24,420 We've gone over pretty much every area of regular expressions in Python, and 120 00:06:24,420 --> 00:06:27,970 we've turned a block of text that no one would want to wade through into super 121 00:06:27,970 --> 00:06:32,100 useful dictionaries that we can put into classes or transform into a new text file. 122 00:06:32,100 --> 00:06:34,670 In fact, that's your extra credit for this course. 123 00:06:34,670 --> 00:06:37,590 Take the regex we wrote and the text and make a class for 124 00:06:37,590 --> 00:06:39,170 a person to find by the data. 125 00:06:39,170 --> 00:06:41,800 Give them names and a phone number and an email address. 126 00:06:41,800 --> 00:06:44,510 Maybe make another class for an address book that collects all these 127 00:06:44,510 --> 00:06:46,570 people together and let's the user search them. 128 00:06:46,570 --> 00:06:48,500 Be sure to share it on the forums. 129 00:06:48,500 --> 00:06:50,260 Thanks so much for being part of this course. 130 00:06:50,260 --> 00:06:50,860 I'll see you next time.