1 00:00:00,730 --> 00:00:04,077 So here's my notes on ufuncs, or universal functions. 2 00:00:04,077 --> 00:00:07,657 They are commonly needed in vectorized functions, again, 3 00:00:07,657 --> 00:00:12,009 which allow you to operate element by element instead of using a loop, and 4 00:00:12,009 --> 00:00:16,503 standard mathing comparison operators like plus, minus, multiply, and 5 00:00:16,503 --> 00:00:20,576 greater than, greater than equal to they've all been overloaded so 6 00:00:20,576 --> 00:00:24,927 that they can make use of vectorization, and values can be broadcasted or 7 00:00:24,927 --> 00:00:27,660 stretched to be applied to the vector. 8 00:00:27,660 --> 00:00:30,430 So, remember that two got stretched all the way across the scalar, or 9 00:00:30,430 --> 00:00:31,740 we did it by rows. 10 00:00:31,740 --> 00:00:36,637 Awesome, so we saw some super powerful ufuncs, and let's go take a look at 11 00:00:36,637 --> 00:00:41,015 some higher level routines that make use of them for common tasks. 12 00:00:41,015 --> 00:00:44,957 Now, all this talk of trigonometry is making me want to go back and 13 00:00:44,957 --> 00:00:49,616 take a look at one of those first multi-dimensional arrays that we created, 14 00:00:49,616 --> 00:00:51,090 that students_gpas. 15 00:00:51,090 --> 00:00:53,640 That was way up here at the top, wasn't it? 16 00:00:53,640 --> 00:00:54,590 Let's get back up here. 17 00:00:56,240 --> 00:00:58,577 We've done a lot in this course. 18 00:00:58,577 --> 00:01:00,030 All right, so, here we go. 19 00:01:00,030 --> 00:01:01,072 Here's our students_gpas. 20 00:01:01,072 --> 00:01:06,530 Let's go ahead, let's take a look again one more time at what that is. 21 00:01:06,530 --> 00:01:11,134 So we'll say students_gpas. 22 00:01:13,040 --> 00:01:16,880 Right, so the zeroth row of this is me, 23 00:01:16,880 --> 00:01:21,550 and then we had Vlada, and then we had Quesy. 24 00:01:21,550 --> 00:01:22,050 Awesome. 25 00:01:23,160 --> 00:01:28,379 One thing that we can do is we can find out the average or mean of this data. 26 00:01:28,379 --> 00:01:32,190 So the way that you do that is just call a function on it. 27 00:01:32,190 --> 00:01:34,078 Say students_gpas.mean. 28 00:01:35,198 --> 00:01:37,465 Whoops. 29 00:01:37,465 --> 00:01:41,157 [LAUGH] That returned all of our scores averaged together, 30 00:01:41,157 --> 00:01:44,440 which 3.805 is not bad for our cohort average, 31 00:01:44,440 --> 00:01:49,180 however, I was hoping to get the mean of each row of these students. 32 00:01:49,180 --> 00:01:54,680 Now, the great news is that there is an access argument that we can pass and 33 00:01:54,680 --> 00:01:57,230 it will do what we want. 34 00:01:57,230 --> 00:02:00,187 The parameter though has been known to trip people up, so 35 00:02:00,187 --> 00:02:01,860 let's focus a bit on the issue. 36 00:02:01,860 --> 00:02:05,050 So, we have a two-dimensional array. 37 00:02:05,050 --> 00:02:08,494 Our first dimension is students, and 38 00:02:08,494 --> 00:02:12,911 our second dimension is of GPA by year in school. 39 00:02:12,911 --> 00:02:19,050 So we want to have the mean of the second dimension. 40 00:02:19,050 --> 00:02:22,710 We want this dimension, this is what we want, the gpas is what we want. 41 00:02:23,750 --> 00:02:26,230 So that would be axis one. 42 00:02:26,230 --> 00:02:29,926 Remember that they are zero based, so it's axis zero, is the other way, so 43 00:02:29,926 --> 00:02:30,884 axis one is this. 44 00:02:30,884 --> 00:02:32,300 So let's go ahead and do that. 45 00:02:32,300 --> 00:02:40,378 Let's say, the students_gpas.mean(axis=1). 46 00:02:43,630 --> 00:02:46,430 And since we've got three results here, and we only have three students, 47 00:02:46,430 --> 00:02:47,775 we know that it did the right thing. 48 00:02:47,775 --> 00:02:50,920 It went across and did the average there, so there is 3.69. 49 00:02:50,920 --> 00:02:55,572 Let's go ahead and say 3.7, and then there's 3.75 and 50 00:02:55,572 --> 00:03:00,850 3.97, and by the way, that 3.7 didn't really mean anything. 51 00:03:00,850 --> 00:03:06,891 Now a common mistake is that people think that they want to work with each row, so 52 00:03:06,891 --> 00:03:13,293 they choose the axis zero, but really what happens with axis zero, let's go ahead and 53 00:03:13,293 --> 00:03:18,902 do that, we'll say (axis=0), is it ends up going this way, right? 54 00:03:18,902 --> 00:03:24,270 So it's averaging axis this way, cuz it's reducing the values. 55 00:03:24,270 --> 00:03:26,880 It's summing all these values up, but we want to go this way, so 56 00:03:26,880 --> 00:03:30,210 when you think about the axis, remember it's what direction you're moving in. 57 00:03:30,210 --> 00:03:32,310 Totally common hiccup. 58 00:03:32,310 --> 00:03:36,323 Just remember to imagine the function happening across the dimension. 59 00:03:36,323 --> 00:03:40,157 Now you might want this sometimes though right? 60 00:03:40,157 --> 00:03:45,875 This (axis=0) will give you the average of all students by year. 61 00:03:45,875 --> 00:03:50,410 That's what you want, and then if you want to you can do (axis=1), and 62 00:03:50,410 --> 00:03:55,770 it gives you average of all years by student. 63 00:03:55,770 --> 00:03:58,790 This type of function is known as a reduction operation. 64 00:03:58,790 --> 00:04:02,280 The function reduces a set of values down to one. 65 00:04:02,280 --> 00:04:04,870 The concept is that there is a function that takes two values, 66 00:04:04,870 --> 00:04:10,040 a total value of all operations and the next value in the array like object. 67 00:04:11,110 --> 00:04:12,230 It performs the operation and 68 00:04:12,230 --> 00:04:16,340 returns the total to be used in the next iteration, recursively. 69 00:04:16,340 --> 00:04:20,200 It might sound complicated, but it's actually what you would do in your head if 70 00:04:20,200 --> 00:04:22,500 I asked you to add up all the values in this list. 71 00:04:22,500 --> 00:04:26,340 It's probably easier to just see it in action, so let's do it. 72 00:04:26,340 --> 00:04:31,020 All functions that are ufuncs, have the ability to do this, built into it. 73 00:04:31,020 --> 00:04:34,230 Here, let's go back down to the hundred days of code study minutes list. 74 00:04:36,170 --> 00:04:37,630 Where is this at? 75 00:04:37,630 --> 00:04:40,680 Let's go down to where we have the very last one of them. 76 00:04:44,402 --> 00:04:50,306 Here we go, study_minutes list, here we go. 77 00:04:50,306 --> 00:04:55,174 All right, so I'm going to add one below this. 78 00:04:55,174 --> 00:04:58,720 Remember that our study_minutes array is a two dimensional array. 79 00:04:58,720 --> 00:05:02,045 The first dimension represents rounds or attempts, and 80 00:05:02,045 --> 00:05:06,409 the second dimension is the minutes per day, and there are 100 days, so 81 00:05:06,409 --> 00:05:09,886 let's simplify things first by using a single dimension. 82 00:05:09,886 --> 00:05:15,902 I'm gonna grab the first round here, so we'll say study_minutes[0]. 83 00:05:15,902 --> 00:05:20,214 Now, if I asked you to total these minutes up, I bet you'd just start adding, and 84 00:05:20,214 --> 00:05:21,540 remembering like this. 85 00:05:21,540 --> 00:05:26,420 You'd say okay, so 150 + 60, that's 210, and 86 00:05:26,420 --> 00:05:32,170 now I go 210 + 80, that's 290, and then I take the total of 290 and 87 00:05:32,170 --> 00:05:35,500 I add 60 to get 350, and so on, and so on, and so on. 88 00:05:36,850 --> 00:05:39,530 That is reducing in a nutshell. 89 00:05:39,530 --> 00:05:42,730 If we continue all the way through the array, we'll have a total. 90 00:05:42,730 --> 00:05:46,940 Now, I said that all ufuncs had the ability to do this reduction, and 91 00:05:46,940 --> 00:05:50,950 the way they provide this functionality is by exposing some functions 92 00:05:50,950 --> 00:05:52,830 off of the ufunc itself. 93 00:05:52,830 --> 00:05:54,283 That sentence was pretty funky. 94 00:05:54,283 --> 00:05:57,630 What we were doing was adding all the values up. 95 00:05:57,630 --> 00:06:01,100 In that case, the ufunc that we would like to use is add. 96 00:06:01,100 --> 00:06:03,790 So, let's do it. 97 00:06:03,790 --> 00:06:08,599 So we'll say np.add.reduce, and 98 00:06:08,599 --> 00:06:12,640 then we'll pass in our array. 99 00:06:14,970 --> 00:06:19,012 And there it is, 440, and it did just like we were doing. 100 00:06:19,012 --> 00:06:22,227 If you want to actually see each step, there is a function for 101 00:06:22,227 --> 00:06:24,066 that available too on each ufunc. 102 00:06:24,066 --> 00:06:27,610 So np.add.accumulate, and this will show you each step through. 103 00:06:27,610 --> 00:06:36,590 So if we do, again, if we do study_minutes[0], 104 00:06:36,590 --> 00:06:39,620 we'll see that we have 150, 210, 290, 350, and 105 00:06:39,620 --> 00:06:45,390 then actually you'll see all of the zero adds that we had to do, and 106 00:06:45,390 --> 00:06:49,200 yikes, you can see the waste of time that we made this do by adding all the zeros. 107 00:06:49,200 --> 00:06:50,620 We could have filtered them out. 108 00:06:50,620 --> 00:06:51,436 More in the teacher's notes. 109 00:06:51,436 --> 00:06:55,580 Now, we want to get the sum of all these values together and 110 00:06:55,580 --> 00:07:00,840 there is of course a routine that's super common, and it is called, sum. 111 00:07:00,840 --> 00:07:04,776 So if we just make this, well let's make a new one, 112 00:07:04,776 --> 00:07:10,197 we'll save that there for us, np.sum(study_minutes[0]). 113 00:07:10,197 --> 00:07:14,953 We'll see that we get 440, which is exactly what we did when we did 114 00:07:14,953 --> 00:07:19,660 the reduce, and the reduction works on multi dimensions as well. 115 00:07:19,660 --> 00:07:23,874 So we can just say np.sum(study_minutes), and 116 00:07:23,874 --> 00:07:28,386 it will get, wow, 10,000 hours, must be a pro. 117 00:07:28,386 --> 00:07:31,191 I think that's what Macklemore said, or Malcom Gladwell, 118 00:07:31,191 --> 00:07:33,010 I can't remember which one said that. 119 00:07:33,010 --> 00:07:38,150 Reduction functions will almost always define an access parameter. 120 00:07:38,150 --> 00:07:42,000 So in this case we want to see the sum of all minutes by round. 121 00:07:42,000 --> 00:07:48,848 So, that's axis=1, and 122 00:07:48,848 --> 00:07:51,981 there we know that we did it right, because there are three results turn back, 123 00:07:51,981 --> 00:07:54,950 and 440 was what we get out when we're getting for the first one. 124 00:07:54,950 --> 00:07:55,750 Awesome. 125 00:07:55,750 --> 00:07:57,410 Pretty handy, right? 126 00:07:57,410 --> 00:08:02,620 And as you can imagine that mean function that we were just using is probably 127 00:08:02,620 --> 00:08:07,570 using this sum function under the covers since, to calculate the mean, what 128 00:08:07,570 --> 00:08:11,920 you do is you add all of the values and then divide by the total amount of values. 129 00:08:11,920 --> 00:08:16,330 But, what's nice is that you don't need to remember that formula. 130 00:08:16,330 --> 00:08:18,175 Even though it is simple, 131 00:08:18,175 --> 00:08:22,150 it's been extracted away from you by simply calling the mean function. 132 00:08:22,150 --> 00:08:26,854 You'll find that there are lots of formulas extracted away from you in 133 00:08:26,854 --> 00:08:27,817 the library. 134 00:08:27,817 --> 00:08:32,620 In fact, let's pop over real quick to another popular page in the documentation. 135 00:08:32,620 --> 00:08:36,732 I'm just gonna Google statistics numpy. 136 00:08:40,092 --> 00:08:40,850 Here we go. 137 00:08:42,610 --> 00:08:46,215 There are tons of functions available for you here. 138 00:08:49,250 --> 00:08:52,963 Since it's statistics, a bunch of these are reduction-based. 139 00:08:52,963 --> 00:08:57,542 They reduce all the values down to one, and here's one that you'll see everywhere, 140 00:08:57,542 --> 00:09:01,741 std, and while it's actually known to spread itself around, it's short for 141 00:09:01,741 --> 00:09:03,150 standard deviation. 142 00:09:03,150 --> 00:09:04,660 Here, let's pop in, so 143 00:09:04,660 --> 00:09:07,460 it computes the standard deviation along the specified access. 144 00:09:08,550 --> 00:09:10,660 The measure of the spread of a distribution, which is great. 145 00:09:12,302 --> 00:09:15,227 And if we scroll down here in the notes, 146 00:09:15,227 --> 00:09:19,430 we can see that this is what has been calculated. 147 00:09:19,430 --> 00:09:21,330 This is the formula. 148 00:09:21,330 --> 00:09:23,970 Now, I kinda remember doing that in math class, 149 00:09:23,970 --> 00:09:28,610 but the point is here, you don't need to know how to calculate it. 150 00:09:28,610 --> 00:09:30,084 You want to a why to use it, 151 00:09:30,084 --> 00:09:34,317 as we discussed when we first introduced the grade point averages or GPAs. 152 00:09:34,317 --> 00:09:38,827 People struggle with math concepts when they are first introduced to them, and 153 00:09:38,827 --> 00:09:43,137 I'm under the belief it's the memorization of the formula that most people 154 00:09:43,137 --> 00:09:44,098 struggle with. 155 00:09:44,098 --> 00:09:46,171 Typically, that's what you're tested on, 156 00:09:46,171 --> 00:09:49,130 not the actual way to use the function in the real world. 157 00:09:49,130 --> 00:09:53,770 Current learning science says that if you don't use it, you will lose it. 158 00:09:53,770 --> 00:09:57,250 So if these equations feel a bit rusty and you haven't used them recently, 159 00:09:57,250 --> 00:10:00,070 don't fret, your brain is just working correctly. 160 00:10:00,070 --> 00:10:04,230 Most of the time, you hardly get a chance to see why in your math class. 161 00:10:04,230 --> 00:10:05,640 You just focused on the how. 162 00:10:06,810 --> 00:10:14,150 So with that said, let's pop up a couple levels here in our bookmarks to Routines. 163 00:10:14,150 --> 00:10:20,144 This page here is a really great overview of how powerful this library is, 164 00:10:20,144 --> 00:10:23,923 and a great look at some common abstractions. 165 00:10:23,923 --> 00:10:30,930 So if we look down here, here is some Discreet Fourier Transforms. 166 00:10:30,930 --> 00:10:35,903 Here's some financial functions, linear algebra, 167 00:10:35,903 --> 00:10:39,363 input and output, logic functions, 168 00:10:39,363 --> 00:10:44,610 polynomials, statistics, there's a lot in here. 169 00:10:44,610 --> 00:10:47,644 Remember, you don't need to know all of these, 170 00:10:47,644 --> 00:10:52,016 just be aware that what you are trying to do most likely already exist. 171 00:10:52,016 --> 00:10:52,812 As you can see, 172 00:10:52,812 --> 00:10:56,410 there are tons of directions that you can head with this library. 173 00:10:56,410 --> 00:11:01,350 So stand on the shoulders of giants who built things out for you. 174 00:11:01,350 --> 00:11:06,180 We talked way back when about how all sorts of different libraries accept and 175 00:11:06,180 --> 00:11:08,120 return numpy arrays. 176 00:11:08,120 --> 00:11:09,630 Let's take a quick break and 177 00:11:09,630 --> 00:11:14,370 take a look at one common use case, plotting values on a graph. 178 00:11:14,370 --> 00:11:17,620 Well, that is, right after we jot down some notes. 179 00:11:17,620 --> 00:11:20,710 Why don't you talk a bit about some common routines that you saw, and 180 00:11:20,710 --> 00:11:22,710 talk a bit about reduction.