1 00:00:00,410 --> 00:00:04,440 As I'm sure you've noticed by now, having to type \w eight times for 2 00:00:04,440 --> 00:00:07,310 eight characters gets old pretty quick. 3 00:00:07,310 --> 00:00:10,850 Python's reg.exe engine let's us say that something should occur a certain number 4 00:00:10,850 --> 00:00:11,930 of times. 5 00:00:11,930 --> 00:00:16,643 We can say that something occurs an exact number of times by using the curly braces. 6 00:00:16,643 --> 00:00:21,430 {3} says that something occurs exactly three times,3 says that it 7 00:00:21,430 --> 00:00:26,870 occurs 0 to 3 times, and 3, says that it occurs 3 or more times. 8 00:00:26,870 --> 00:00:29,910 3,5 says that it occurs 3, 4, or 5 times. 9 00:00:29,910 --> 00:00:32,900 We can also use more generic counts. 10 00:00:32,900 --> 00:00:36,640 The question mark says that something is optional, it occurs 0 or 1 times. 11 00:00:37,840 --> 00:00:40,960 The asterisk says that something occurs at least zero times, 12 00:00:40,960 --> 00:00:44,590 so it can not occur at all or it can occur hundreds of times. 13 00:00:44,590 --> 00:00:46,390 There's no upper bounds to the asterisk. 14 00:00:47,420 --> 00:00:49,680 And finally is the plus sign. 15 00:00:49,680 --> 00:00:51,630 Like the asterisk, there's no upper bound, but 16 00:00:51,630 --> 00:00:53,470 the pattern has to occur at least once. 17 00:00:54,540 --> 00:00:57,300 Let's look at using all of these to solve some of the problems in our 18 00:00:57,300 --> 00:00:58,940 blob of addresses. 19 00:00:58,940 --> 00:01:03,820 So, like I said last time, I, I'm lazy and I'm not bored and 20 00:01:03,820 --> 00:01:09,410 ridiculous enough to try and type \w as many times as there are letters in a word. 21 00:01:09,410 --> 00:01:13,970 So I'm pretty lucky that I can instead 22 00:01:15,540 --> 00:01:20,180 use the plus sign to find places where we have one or more letters. 23 00:01:20,180 --> 00:01:21,870 So let's try that. 24 00:01:23,580 --> 00:01:30,370 So, what we said here is that we want to find word characters with a \w. 25 00:01:30,370 --> 00:01:35,875 Which those are any sort of Unicode character and then the underscore. 26 00:01:37,120 --> 00:01:42,790 I wanna find one or more of those and then a comma, a space, and 27 00:01:42,790 --> 00:01:45,820 again one or more word characters. 28 00:01:45,820 --> 00:01:47,870 So let's let's give that a try. 29 00:01:47,870 --> 00:01:50,730 [BLANK_AUDIO] 30 00:01:50,730 --> 00:01:51,950 And, look at that. 31 00:01:51,950 --> 00:01:53,120 We've got my name. 32 00:01:53,120 --> 00:01:54,770 So that's pretty cool. 33 00:01:54,770 --> 00:01:55,460 If we go back, 34 00:01:55,460 --> 00:02:00,300 we can actually take our number search here, and we can clean this up too. 35 00:02:00,300 --> 00:02:04,520 Because we know that we've got three numbers here. 36 00:02:05,860 --> 00:02:08,150 And we know if we've got three numbers here. 37 00:02:09,170 --> 00:02:11,330 And we know if we got four numbers here. 38 00:02:13,090 --> 00:02:18,220 I can say, hey, find the parenthesis, find three digits, exactly three. 39 00:02:19,320 --> 00:02:24,360 Find a closing parenthesis, find a space, find three more numbers, a hyphen, and 40 00:02:24,360 --> 00:02:25,780 then four more numbers. 41 00:02:25,780 --> 00:02:26,550 So if we. 42 00:02:27,830 --> 00:02:31,830 Run this one, then we get our phone number match and our name match. 43 00:02:32,960 --> 00:02:36,690 These are maybe a little bit less readable, but 44 00:02:36,690 --> 00:02:38,110 they're definitely more accepting. 45 00:02:38,110 --> 00:02:40,900 In fact, let's make it just a little bit more accepting. 46 00:02:40,900 --> 00:02:42,680 We have some phone numbers. 47 00:02:42,680 --> 00:02:48,610 If you look down here like say, this one or this one, 48 00:02:48,610 --> 00:02:53,500 that don't have parenthesis on them, so let's make these parenthesis optional. 49 00:02:53,500 --> 00:02:57,050 Now we do that by putting a question mark after them. 50 00:02:57,050 --> 00:03:02,330 Question mark says, this should show up zero times or one time. 51 00:03:03,860 --> 00:03:08,180 So let's try that, and. 52 00:03:08,180 --> 00:03:10,540 Oh, yeah, that's only gonna run on my line. 53 00:03:10,540 --> 00:03:12,380 So let's try this on multiple lines, 54 00:03:12,380 --> 00:03:15,630 and let's change our search here to be findall. 55 00:03:17,420 --> 00:03:21,160 And what findall will do is it'll move through the whole string, 56 00:03:21,160 --> 00:03:25,980 the whole data variable, and find all the places where this doesn't overlap. 57 00:03:27,580 --> 00:03:29,490 So let's try running that again. 58 00:03:29,490 --> 00:03:31,470 And we see we've got a bunch of phone numbers here. 59 00:03:31,470 --> 00:03:35,220 In fact, the only one we don't have is the one that's like this, but 60 00:03:35,220 --> 00:03:37,710 with a hyphen right there. 61 00:03:37,710 --> 00:03:39,440 So let's put that one in too. 62 00:03:39,440 --> 00:03:41,480 We've got the optional parenthesis, but 63 00:03:41,480 --> 00:03:46,910 let's put in a hyphen that is optional and the space is actually optional too. 64 00:03:46,910 --> 00:03:50,710 And let's do that as \s instead of the actual space. 65 00:03:50,710 --> 00:03:53,580 That will be a little bit more clear. 66 00:03:53,580 --> 00:03:57,270 So let's save that and let's run this again. 67 00:03:59,380 --> 00:04:02,970 And we get the 555-1. 68 00:04:02,970 --> 00:04:05,911 We did, there it is, 555 hyphen. 69 00:04:05,911 --> 00:04:07,210 So that's great. 70 00:04:08,450 --> 00:04:13,320 And you know what, I bet if we want to, we can 71 00:04:14,670 --> 00:04:21,870 take this a little bit further on our findall. 72 00:04:21,870 --> 00:04:26,380 And I bet, we can use this to find all of our names. 73 00:04:28,270 --> 00:04:30,820 So let's actually comment this one out, just so 74 00:04:30,820 --> 00:04:32,190 we've got a little bit less showing up. 75 00:04:33,280 --> 00:04:34,400 And let's try running that again. 76 00:04:36,580 --> 00:04:41,410 Well we got all the names, but we also get back the jobs and company. 77 00:04:41,410 --> 00:04:43,520 So not exactly what we wanted. 78 00:04:43,520 --> 00:04:46,700 We'll clean that up later, but if we look at this, 79 00:04:46,700 --> 00:04:51,140 we didn't get our name here on line five, we didn't get Tim. 80 00:04:52,530 --> 00:04:56,620 The reason we didn't get Tim is because we said one or more. 81 00:04:56,620 --> 00:04:58,030 There has to be something there. 82 00:04:58,030 --> 00:05:02,370 I wonder if we can make it to where that doesn't have to be the case. 83 00:05:02,370 --> 00:05:07,040 We also, if you see here, we got Enchanter, Killer, for Tim. 84 00:05:07,040 --> 00:05:12,670 And it's supposed to be Enchanter, Killer Rabbit Cave. 85 00:05:12,670 --> 00:05:15,380 So we'll worry about the Killer Rabbit Cave in a little while. 86 00:05:15,380 --> 00:05:17,010 We'll, we'll do that in a later video, but for 87 00:05:17,010 --> 00:05:20,190 now, let's see if we can get Tim in there. 88 00:05:20,190 --> 00:05:24,310 So instead of this plus sign, let's do an asterisk. 89 00:05:24,310 --> 00:05:29,580 And the asterisk says it'll be zero or an infinite number of times. 90 00:05:29,580 --> 00:05:33,140 Just, this thing, if it appears, cool, show it to me. 91 00:05:33,140 --> 00:05:35,090 If it doesn't, that's fine. 92 00:05:35,090 --> 00:05:36,290 Move on. 93 00:05:36,290 --> 00:05:37,150 So let's try running that. 94 00:05:38,850 --> 00:05:40,170 And did we get Tim? 95 00:05:40,170 --> 00:05:41,370 There we go, we got Tim. 96 00:05:43,030 --> 00:05:44,310 So that's awesome. 97 00:05:44,310 --> 00:05:45,450 We got Tim. 98 00:05:45,450 --> 00:05:49,170 We'll do more about catching the company names after we have a couple more tools at 99 00:05:49,170 --> 00:05:50,630 our disposal. 100 00:05:50,630 --> 00:05:53,160 Counts definitely help a lot in writing smaller, 101 00:05:53,160 --> 00:05:55,410 if sometimes less readable, patterns. 102 00:05:55,410 --> 00:05:58,290 We'll talk about a way to make patterns more readable in the next video, 103 00:05:58,290 --> 00:06:01,160 along with ways to cut characters out of being matched. 104 00:06:01,160 --> 00:06:04,250 We'll also talk about how to make our patterns a bit more restrictive than our 105 00:06:04,250 --> 00:06:05,260 scape sequences are.