Boolean Array Indexing12:51 with Craig Dennis
You can create an array of booleans and then use that to index into your array. Let's use this to filter our values.
My Notes for Indexing
## Creation * You can create a random but bound grouping of values using the `np.random` package. * `RandomState` let's you seed your randomness in a way that is repeatable. * You can append a row in a couple of ways * You can use the `np.append` method. Make sure the new row is the same shape. * You can create/reassign a new array by including the existing array as part of the iterable in creation. ## Indexing * You can use an indexing shortcut by separating dimensions with a comma. * You can index using a `list` or `np.array`. Values will be pulled out at that specific index. This is known as fancy indexing. * Resulting array shape matches the index array layout. Be careful to distinguish between the tuple shortcut and fancy indexing.
All right, before we get started here, 0:00 I thought I'd share my notes as a quick refresher. 0:02 We talked about creation and we saw a new way to build a random grouping of numbers. 0:05 And we used RandomState to let us seed the randomness in a way that was repeatable. 0:10 I have the same random values as you do. 0:14 That is super handy. 0:16 And we also learned that you can append a row in a couple of ways. 0:18 There's even more that we haven't seen here. 0:21 You can append, use the np.append method. 0:23 You need to make sure that it's the same shape. 0:26 Remember we did that little hack where we wrapped it in a list. 0:28 And you can create and 0:31 reassign a new array by including existing arrays as part of the iterable right? 0:33 So you can throw the new array in there and 0:37 reassign it cuz you can't change the size. 0:39 And we looked at indexing. 0:42 And there's that nice indexing shortcut for the multidimensional array, 0:43 remember where you can use the comma. 0:46 So you can say like 3,4 and that's really row three, column four. 0:48 And instead of having to use the hard brackets you can use the commas and 0:53 it creates a tupule automatically. 0:55 And you can also index using a list or another np array and the values will be 0:57 pulled out in that specific index, and that's known as fancy indexing. 1:02 The resulting array comes back and it's the same shape as what you asked for. 1:06 But it's very important to remember to look for lists or 1:10 arrays versus just using a tupule with a comma. 1:13 All right, so now where were we? 1:18 All right, we wanted to look at our study log and 1:20 find hours that were just about an hour, but not quite. 1:23 So let's get back down into where we did all that work, so. 1:28 So here's the study minutes, I'm gonna get rid of this last one here. 1:31 To delete a cell, Escape, and then D, D. 1:35 Here we go, so we have our study minutes array is all written out. 1:38 And remember we were using this fake_log. 1:41 So let's start with that fake_log, 1:44 cuz that definitely has some values that we know are under 60. 1:45 And remember, that's what we're looking for 1:50 because they don't count towards the challenge. 1:51 The concept being there, that is if we saw those ones that almost made it, 1:53 maybe they'd provide a little inspiration for us to stick with it for that next day. 1:58 So the np array object is pretty powerful. 2:02 Just about every comparison operator has been overridden. 2:07 So let's take that fake_log object that we're using, cuz it's one-dimensional. 2:11 So it's a one-dimensional array, and it's filled with 100 random values. 2:15 So we'll do fake_log, and check this out. 2:19 I am looking for values that are less than 60, 2:25 because you know there are 60 minutes in an hour. 2:27 So I can just write that. 2:29 So fake_log < 60. 2:31 And what will happen is, Is it's not defined, 2:33 this might happen to you sometimes. 2:38 So this is good, I'm glad that this happened. 2:40 So what we can do is we can go ahead and run Kernel, and say Restart & Run All. 2:41 And that popped up our help cuz we left the help up there. 2:52 So I'm gonna go ahead and close this. 2:54 And so what happened is we've got this fake_log. 2:55 And what will happen is you remember that we have these values. 2:59 So the first one that's there is this fourth, so the fourth value. 3:03 So if we go False, False, False, True and then looks like again at the eighth there, 3:07 so there's some more false, false, false, true. 3:12 So what's happening is that it's comparing every single one of these values and 3:14 it's showing us true where it is. 3:19 Every value is represented. 3:21 And any place that we see a True, 3:23 it is means that it is true that it's less than 60. 3:25 And that probably doesn't seem all that handy. 3:29 Well, that is until you find out that you can do fancy indexing with 3:32 a Boolean array. 3:36 The way that it works is that as long as the Boolean array lines up with your 3:38 other array, any value where True exists will be kept. 3:43 So, here check this out. 3:46 So this is what we want, right? 3:48 We want to say, anything from the fake_log, 3:49 we will use that Boolean array as a fancy index. 3:55 There we go, we pulled it all out every value that was True. 4:00 That's exactly what we are looking for, right, these are all not quiet 60. 4:05 Pretty cool, right? 4:10 We did that filtering all without a loop. 4:11 You could totally accomplish this same thing by saying something like a list 4:14 comprehension, or even something similar like this, this really simple loop. 4:19 So say results equals this, let's iterate through each of the values. 4:22 So for value in fake_log, if, here we go. 4:26 If the value is less than 60, then we're gonna say results.append(value). 4:31 And then just to get back exactly the same thing we'll just use it. 4:38 We'll say np.array(results), right? 4:41 So there's a loop that we had to write, and obviously we got back the same thing. 4:43 But using a Boolean array index, is orders of magnitudes faster than this for loop? 4:49 And look at the code difference too. 4:55 Something you might be wondering is what happens with multidimensional arrays, 4:57 like our study in minute array. 5:02 Well the good news is, it just works. 5:04 So if we say study minutes less than 60, you'll see back 5:07 that we get an array, a Boolean array that is of the exact same shape as our array. 5:12 So that's 3 by 100. 5:17 And of course, we can use that array as an index. 5:21 So let's do that as well, so we can say study_minutes, 5:26 where the study_minutes is less than 60. 5:30 Boom, now notice that we're returned a one dimensional array. 5:35 Not our original three dimensional array, it's all of the values that match. 5:42 Now we could rewrite this as a nested for 5:46 loop of the same time type that we did before, right. 5:48 Like we could loop through each round and then loops through each day and 5:51 adds into our results. 5:55 But we don't need to do that because this is done all for us without a loop. 5:56 That's kind of gross that's a bunch of zeros, right? 6:02 If we're looking to motivate ourselves and we really don't wanna see these zeros. 6:06 What we really wanna see is anything that's less than 60 minutes and 6:10 greater than 0. 6:16 That gets minutes from days where we worked a little bit at least. 6:17 So we want to make two Boolean index arrays. 6:21 Like we wanna make this study_minutes array, this one. 6:24 We wanna make that array, the study_minutes where it's less than 60. 6:28 And we also wanna have another one where the index array is 6:32 study_minutes greater than 0. 6:36 And then we actually want to have the results where it's a combination of those 6:39 added together. 6:43 You could actually compare arrays together element by element, 6:44 which is what we want to do. 6:48 So, I'm gonna come back here. 6:50 Let's just manually, we'll go ahead and 6:51 we'll manually create an array, a Boolean array of False, True, True. 6:54 And to compare, we used the bit wise operator for and, the ampersand. 7:00 Now this is not the and keyword, it's an ampersand. 7:07 Now common mistake is, [LAUGH] to forget and use the and keyword, and 7:11 we'll explore what happens in here in a bit about that. 7:14 And then I'll create another Boolean array that we can compare it to, so np.array, 7:17 and we'll put in True, False, True. 7:21 So what happens is we get a brand new array with each element added together. 7:27 So remember, when you're checking Boolean logic, 7:33 both sides need to be true to be considered true. 7:35 So, looking here we have False and True, and that's False, 7:39 and then we have True and False. 7:44 And that of course is False as well because they're not both True, and 7:46 then we have True and True, definitely True. 7:50 So if we go ahead and we run this, we'll see that we get back a single 7:54 array with the values anded together, False, False, True, just like we saw. 7:59 So we could use this result as a Boolean index array, right? 8:04 Do you see how we can just build the Boolean index array together? 8:10 Values that we want to chain together with all of other conditions in a series of 8:13 ands and ors? 8:17 Before we use it, I do wanna show you what happens if you forget to use the bit wise 8:18 and, as the resulting error is a little confusing at first. 8:22 So depending on how times you have joined logical expressions, 8:25 your muscle memory might actually accidentally type the and key word here. 8:28 So let's do that, let's put this last and key word here. 8:33 Yak, ValueError and 8:34 it's saying the truth value of an array with one more elements is ambiguous. 8:39 So, what it's trying to do is it's trying to figure out a truthiness of this, and 8:44 that's what and does. 8:49 It creates a truthiness, and if it's assuming that we wanna have a scale or 8:51 value, which is not what we want, we wanna compare element by element. 8:55 So if you did wanna get a scalar value, 9:00 if you wanted to see that everything was true, you would use a.all and 9:02 that returns a Boolean or any if there's any true in there at all. 9:05 All that to say, just use bit wise operation. 9:08 So just go ahead, use a bit wise operation. 9:11 I just thought I'd preemptively warn you about this, as it happens a lot, 9:14 more in the teacher's notes. 9:18 [LAUGH] So let's build up our index, so we wanna have study_minutes, 9:20 Where the study_minutes, 9:26 Are < 60 & study_minutes > 0, 9:31 right, that's what we're looking for. 9:35 But we wanna take caution to make sure that we're careful about the order of 9:41 operations. 9:44 This & here is stronger than the less than. 9:46 So what we're going to get is 60 and minutes. 9:49 And again, we're gonna run into the truthy problem that we saw before. 9:52 So, we don't want that. 9:57 So let's put parenthesis in place to just to make sure we've got the order 9:58 of operations correct. 10:02 And voila, there we have it. 10:08 A brand new array containing entries that represent values from our 10:11 study_minutes array, that are less than 60 and greater than 0. 10:16 That's pretty cool, right? 10:21 And you can see, you can pretty much read that more or less, right? 10:22 You'll get used to remembering to use the parens and 10:25 ampersand, but I guarantee you'll forget sometimes. 10:28 Now, one thing we really should consider is this. 10:32 Even though we did those minutes, these are minutes here that we spent some time. 10:36 They don't actually count for completing the challenge. 10:40 The challenge is to do at least an hour a day. 10:42 So in reality, we really should set all of these to zero. 10:45 If deleting these minutes doesn't motivate me, I don't know what will, 10:51 especially this 58 minutes. 10:55 Now even though this index statement, this study_minutes, 10:59 Study_minutes, < 60, 11:06 now even though that creates a brand new array, 11:09 if you assign to it, you can do an update. 11:14 And if we look now, we look at our third row there, 11:19 we'll see that we add some zeros in where they were not before. 11:24 You guys look at those, 11:30 all that time didn't count because I didn't reach that hour. 11:31 No, now of course that time did actually count. 11:35 I was learning, but it didn't count towards the challenge. 11:39 And I'll tell you what, this 100 days of code challenge totally motivates me. 11:43 So losing that time definitely will keep me focused in the future. 11:47 It reminds me that I just need to stick with it, 11:50 I want to complete this challenge. 11:53 Speaking of challenges, I'd like to again challenge you to capture your thoughts 11:56 on Boolean array indexing in your notebook. 12:00 Remember to think through the possible gotchas that we walked through. 12:03 Like, accidentally using the and keyword or forgetting to use parentheses? 12:06 If you've ever done SQL programming before, that might have felt familiar. 12:10 Capture those thoughts a bit. 12:14 Also, now is a good time to take a moment and review your notebook. 12:16 Is everything in there clear? 12:20 If not, please hit up the community and ask your questions. 12:21 If you are looking to solidify your knowledge, 12:25 I highly recommend attempting to answer some else's questions. 12:27 I can't recommend it enough, by taking the time to explain a concept, 12:31 you will uncover new knowledge. 12:35 Give it a shoot and won't disappoint. 12:37 So far what we've been doing is returning a new array. 12:39 But you can actually return a view of the data that you can manipulate. 12:43 Let's take a look at data views and some more powerful slicing features next. 12:46
You need to sign up for Treehouse in order to download course files.Sign up