Negation8:20 with Kenneth Love
Negated sets let us specify characters and sequences that should be left out of any matches.
[^abc]- a set that will not match, and, in fact, exclude, the letters 'a', 'b', and 'c'.
re.I- flag to make a search case-insensitive.
re.match('A', 'apple', re.I)would find the 'a' in 'apple'.
re.X- flag that allows regular expressions to span multiple lines and contain (ignored) whitespace and comments.
Let's try a slightly harder one, slightly weirder one perhaps. 0:00 So let's actually, let's see. 0:06 Let's comment these two both out, and let's take our email address one. 0:08 And I wanna match all the email address, just like we did before. 0:16 But if the email undress ends in .gov, I want to leave that part off. 0:19 Just pretend I have a good reason for this cuz I, I really don't. 0:25 So, all right, this sounds like a really good place for us to use a negative set. 0:30 And we can also write this out. 0:35 I mean, this is really, there's a lot to this. 0:37 So, let's leave ourselves some comments, make this a little bit easier. 0:41 So okay, first of all, yeah. 0:45 We can definitely use a negative set here. 0:48 So, let's make this a multiline string. 0:51 And we gotta end that multiline string. 0:58 You know what, we need to make this four spaces. 1:02 There we go, all right. 1:09 And we'll end our multiline string, and then we'll do stuff as usual against data. 1:11 All right. So, let's take this. 1:18 I actually don't wanna catch the part before. 1:20 I just wanna get the the e-mail address. 1:23 So, let's do a \b and 1:27 then an @, and then that part. 1:32 And I don't care how many things are there. 1:39 So find a word boundary, just leaving myself a little note here, 1:43 an @, and then any number of characters. 1:48 All right, then what I want to ignore is gov, and 1:55 I don't wanna get that tab that's in there. 2:01 You can't necessarily see it, but 2:03 the space between each of these things is a tab character. 2:07 And I know there's a tab character right here, and 2:12 it just might catch it, so let's leave that off. 2:14 So one or more of those is fine. 2:19 And let's leave another comment here of ignore, wow, wow. 2:21 Ignore one or more instances of the letters g, 2:29 o, or v and a tab. 2:36 All right. And 2:41 then we have another b here, so match another word boundary, all right. 2:41 And then we do data. 2:49 Now, I've done a flag here, which is that I've done multiple lines. 2:50 So I need to use this VERBOSE flag. 2:57 And then, since we've got gov in there, and we've got it in lowercase. 3:00 Just in case there was an uppercase version, I'd want to add on the flag re.I. 3:04 And we add multiple flags with the pipe symbol in between each of the flags. 3:09 It's a little weird. 3:15 It's just something you get used to. 3:17 You just kinda have to remember it. 3:19 So, all right, let's try that out. 3:21 And there we go, 3:25 we've got @teamtreehouse.com, @teamtreehouse.com, blah, blah, blah. 3:26 And then we get over here, and we've got us, this was supposed to be us.gov, and 3:30 we've got just us. 3:34 And then we were supposed to have empire.gov, as we've got up here, and 3:36 we've just have empire. 3:39 And we're supposed to have spain.gov, and we just got spain. 3:41 So, that's pretty cool, 3:43 we got all the email addresses, but we left off the .gov on two of them. 3:46 So, I think that's pretty cool, pretty handy. 3:51 Let's try another one with our VERBOSE flag, 3:56 just to get used to doing our VERBOSE flag. 4:00 Gonna comment this out. 4:04 All right. 4:05 So let's try another verbose pattern that will match our our names. 4:08 It'll also match our jobs, but it's still a good practice. 4:14 So we're gonna do print(re.findall. 4:18 And then we're gonna do a multi-line string, cuz we're gonna use verbose. 4:23 So let's do \b -\w. 4:26 So that would be Find a word boundary 1+ 4:33 hyphens or word characters. 4:40 We'll just say characters. 4:47 And a comma cuz that comma's in there. 4:49 It has to find that comma. 4:52 And then let's have it find, find whitespace. 4:54 Find 1 whitespace. 5:00 And then let's have it find another hyphen, a w, or 5:02 a space as part of our set. 5:06 We'll talk about why that's different in just a second. 5:07 1+ hyphens and characters, and explicit spaces. 5:10 And then I want it to not find tabs or new line characters. 5:21 Ignore tabs and newlines. 5:25 And then we wanna close this, we're gonna run this against data, and 5:29 we're gonna do re.x. 5:34 All right. 5:36 So let's talk about this one for a second before we run it. 5:36 So, when we do the verbose flag, 5:40 which re.x if you didn't guess is the short hand version of re.VERBOSE. 5:43 When we do the verbose ones, the regular expression engine ignores all of 5:49 the spaces that are just out in our pattern. 5:54 So like, these spaces here and 5:57 these spaces here are completely ignored, as is this comment. 6:00 So we have to mark those with this \s. 6:05 That, and, and that is whitespace. 6:09 So that matches spaces, it matches tabs, it matches new lines. 6:12 It matches all sorts of stuff. 6:17 Actually, I don't remember if it matches new lines or 6:18 not, but it matches spaces and tabs, and other characters like that. 6:19 If you wanna go look up like, half tab or letter space and 6:24 stuff like that, there's all sorts of these spaces that are available. 6:28 So it matches all of those. 6:31 But inside of a set, we can use an explicit space and 6:33 that will only match spaces. 6:36 It won't match tabs or newlines or whatever. 6:40 And then down here we want to ignore tab and newline. 6:42 Now, why didn't we have to use re.i in this one, or re.ignorecase. 6:46 The reason's because we're not matching any explicit characters. 6:50 We're not matching, like, the letter t, that may be uppercase or lowercase. 6:54 Since we're not matching those things, we're matching more generic stuff like 6:59 word characters, then we can use, or we can, we can leave off re.i. 7:03 . So let's run this and see what it does. 7:09 And I forgot another character. 7:13 We should have a plus sign there as well. 7:15 So let's run that again. 7:19 There we go. 7:21 So now we've got Kenneth Love and Teacher Treehouse, Dave MacFarlane, or 7:22 MacFarlane, Dave, Teacher Treehouse, and so on. 7:26 So we got the names, and we got the where they work. 7:29 So, of course, if we want to get Tim in there, we need to change this to a star. 7:34 Run this again and we should get Tim. 7:40 I don't see Tim actually. 7:44 So Tim's not in there, but we will fix that later. 7:47 We'll select everybody before we get to the end of this. 7:52 As you can tell though, it really, really helps breaking up our patterns or 7:55 multiple lines. 7:59 And being able to annotate each line with a comment, so that we remember what we're 8:00 doing, what we're looking for and how to make things again. 8:04 We have a ton of choices now when we write patterns. 8:09 They can be as flexible or strict as we need. 8:12 Our next video will cover the real meat of what'll make our regular expressions 8:14 capable of solving our immediate problem. 8:18
You need to sign up for Treehouse in order to download course files.Sign up