1
00:00:00,730 --> 00:00:04,077
So here's my notes on ufuncs,
or universal functions.

2
00:00:04,077 --> 00:00:07,657
They are commonly needed in
vectorized functions, again,

3
00:00:07,657 --> 00:00:12,009
which allow you to operate element by
element instead of using a loop, and

4
00:00:12,009 --> 00:00:16,503
standard mathing comparison operators
like plus, minus, multiply, and

5
00:00:16,503 --> 00:00:20,576
greater than, greater than equal
to they've all been overloaded so

6
00:00:20,576 --> 00:00:24,927
that they can make use of vectorization,
and values can be broadcasted or

7
00:00:24,927 --> 00:00:27,660
stretched to be applied to the vector.

8
00:00:27,660 --> 00:00:30,430
So, remember that two got stretched
all the way across the scalar, or

9
00:00:30,430 --> 00:00:31,740
we did it by rows.

10
00:00:31,740 --> 00:00:36,637
Awesome, so we saw some super powerful
ufuncs, and let's go take a look at

11
00:00:36,637 --> 00:00:41,015
some higher level routines that
make use of them for common tasks.

12
00:00:41,015 --> 00:00:44,957
Now, all this talk of trigonometry
is making me want to go back and

13
00:00:44,957 --> 00:00:49,616
take a look at one of those first
multi-dimensional arrays that we created,

14
00:00:49,616 --> 00:00:51,090
that students_gpas.

15
00:00:51,090 --> 00:00:53,640
That was way up here at the top,
wasn't it?

16
00:00:53,640 --> 00:00:54,590
Let's get back up here.

17
00:00:56,240 --> 00:00:58,577
We've done a lot in this course.

18
00:00:58,577 --> 00:01:00,030
All right, so, here we go.

19
00:01:00,030 --> 00:01:01,072
Here's our students_gpas.

20
00:01:01,072 --> 00:01:06,530
Let's go ahead, let's take a look
again one more time at what that is.

21
00:01:06,530 --> 00:01:11,134
So we'll say students_gpas.

22
00:01:13,040 --> 00:01:16,880
Right, so the zeroth row of this is me,

23
00:01:16,880 --> 00:01:21,550
and then we had Vlada,
and then we had Quesy.

24
00:01:21,550 --> 00:01:22,050
Awesome.

25
00:01:23,160 --> 00:01:28,379
One thing that we can do is we can find
out the average or mean of this data.

26
00:01:28,379 --> 00:01:32,190
So the way that you do that is
just call a function on it.

27
00:01:32,190 --> 00:01:34,078
Say students_gpas.mean.

28
00:01:35,198 --> 00:01:37,465
Whoops.

29
00:01:37,465 --> 00:01:41,157
[LAUGH] That returned all of
our scores averaged together,

30
00:01:41,157 --> 00:01:44,440
which 3.805 is not bad for
our cohort average,

31
00:01:44,440 --> 00:01:49,180
however, I was hoping to get the mean
of each row of these students.

32
00:01:49,180 --> 00:01:54,680
Now, the great news is that there is
an access argument that we can pass and

33
00:01:54,680 --> 00:01:57,230
it will do what we want.

34
00:01:57,230 --> 00:02:00,187
The parameter though has been
known to trip people up, so

35
00:02:00,187 --> 00:02:01,860
let's focus a bit on the issue.

36
00:02:01,860 --> 00:02:05,050
So, we have a two-dimensional array.

37
00:02:05,050 --> 00:02:08,494
Our first dimension is students, and

38
00:02:08,494 --> 00:02:12,911
our second dimension is
of GPA by year in school.

39
00:02:12,911 --> 00:02:19,050
So we want to have the mean
of the second dimension.

40
00:02:19,050 --> 00:02:22,710
We want this dimension, this is what
we want, the gpas is what we want.

41
00:02:23,750 --> 00:02:26,230
So that would be axis one.

42
00:02:26,230 --> 00:02:29,926
Remember that they are zero based, so
it's axis zero, is the other way, so

43
00:02:29,926 --> 00:02:30,884
axis one is this.

44
00:02:30,884 --> 00:02:32,300
So let's go ahead and do that.

45
00:02:32,300 --> 00:02:40,378
Let's say, the students_gpas.mean(axis=1).

46
00:02:43,630 --> 00:02:46,430
And since we've got three results here,
and we only have three students,

47
00:02:46,430 --> 00:02:47,775
we know that it did the right thing.

48
00:02:47,775 --> 00:02:50,920
It went across and
did the average there, so there is 3.69.

49
00:02:50,920 --> 00:02:55,572
Let's go ahead and say 3.7,
and then there's 3.75 and

50
00:02:55,572 --> 00:03:00,850
3.97, and by the way,
that 3.7 didn't really mean anything.

51
00:03:00,850 --> 00:03:06,891
Now a common mistake is that people think
that they want to work with each row, so

52
00:03:06,891 --> 00:03:13,293
they choose the axis zero, but really what
happens with axis zero, let's go ahead and

53
00:03:13,293 --> 00:03:18,902
do that, we'll say (axis=0),
is it ends up going this way, right?

54
00:03:18,902 --> 00:03:24,270
So it's averaging axis this way,
cuz it's reducing the values.

55
00:03:24,270 --> 00:03:26,880
It's summing all these values up,
but we want to go this way, so

56
00:03:26,880 --> 00:03:30,210
when you think about the axis, remember
it's what direction you're moving in.

57
00:03:30,210 --> 00:03:32,310
Totally common hiccup.

58
00:03:32,310 --> 00:03:36,323
Just remember to imagine the function
happening across the dimension.

59
00:03:36,323 --> 00:03:40,157
Now you might want this
sometimes though right?

60
00:03:40,157 --> 00:03:45,875
This (axis=0) will give you
the average of all students by year.

61
00:03:45,875 --> 00:03:50,410
That's what you want, and then if
you want to you can do (axis=1), and

62
00:03:50,410 --> 00:03:55,770
it gives you average of
all years by student.

63
00:03:55,770 --> 00:03:58,790
This type of function is known
as a reduction operation.

64
00:03:58,790 --> 00:04:02,280
The function reduces a set
of values down to one.

65
00:04:02,280 --> 00:04:04,870
The concept is that there is
a function that takes two values,

66
00:04:04,870 --> 00:04:10,040
a total value of all operations and
the next value in the array like object.

67
00:04:11,110 --> 00:04:12,230
It performs the operation and

68
00:04:12,230 --> 00:04:16,340
returns the total to be used in
the next iteration, recursively.

69
00:04:16,340 --> 00:04:20,200
It might sound complicated, but it's
actually what you would do in your head if

70
00:04:20,200 --> 00:04:22,500
I asked you to add up all
the values in this list.

71
00:04:22,500 --> 00:04:26,340
It's probably easier to just see
it in action, so let's do it.

72
00:04:26,340 --> 00:04:31,020
All functions that are ufuncs, have
the ability to do this, built into it.

73
00:04:31,020 --> 00:04:34,230
Here, let's go back down to the hundred
days of code study minutes list.

74
00:04:36,170 --> 00:04:37,630
Where is this at?

75
00:04:37,630 --> 00:04:40,680
Let's go down to where we have
the very last one of them.

76
00:04:44,402 --> 00:04:50,306
Here we go, study_minutes list,
here we go.

77
00:04:50,306 --> 00:04:55,174
All right, so
I'm going to add one below this.

78
00:04:55,174 --> 00:04:58,720
Remember that our study_minutes
array is a two dimensional array.

79
00:04:58,720 --> 00:05:02,045
The first dimension represents rounds or
attempts, and

80
00:05:02,045 --> 00:05:06,409
the second dimension is the minutes
per day, and there are 100 days, so

81
00:05:06,409 --> 00:05:09,886
let's simplify things first
by using a single dimension.

82
00:05:09,886 --> 00:05:15,902
I'm gonna grab the first round here,
so we'll say study_minutes[0].

83
00:05:15,902 --> 00:05:20,214
Now, if I asked you to total these minutes
up, I bet you'd just start adding, and

84
00:05:20,214 --> 00:05:21,540
remembering like this.

85
00:05:21,540 --> 00:05:26,420
You'd say okay, so
150 + 60, that's 210, and

86
00:05:26,420 --> 00:05:32,170
now I go 210 + 80, that's 290, and
then I take the total of 290 and

87
00:05:32,170 --> 00:05:35,500
I add 60 to get 350, and so
on, and so on, and so on.

88
00:05:36,850 --> 00:05:39,530
That is reducing in a nutshell.

89
00:05:39,530 --> 00:05:42,730
If we continue all the way through
the array, we'll have a total.

90
00:05:42,730 --> 00:05:46,940
Now, I said that all ufuncs had
the ability to do this reduction, and

91
00:05:46,940 --> 00:05:50,950
the way they provide this functionality
is by exposing some functions

92
00:05:50,950 --> 00:05:52,830
off of the ufunc itself.

93
00:05:52,830 --> 00:05:54,283
That sentence was pretty funky.

94
00:05:54,283 --> 00:05:57,630
What we were doing was
adding all the values up.

95
00:05:57,630 --> 00:06:01,100
In that case, the ufunc that
we would like to use is add.

96
00:06:01,100 --> 00:06:03,790
So, let's do it.

97
00:06:03,790 --> 00:06:08,599
So we'll say np.add.reduce, and

98
00:06:08,599 --> 00:06:12,640
then we'll pass in our array.

99
00:06:14,970 --> 00:06:19,012
And there it is, 440, and
it did just like we were doing.

100
00:06:19,012 --> 00:06:22,227
If you want to actually see each step,
there is a function for

101
00:06:22,227 --> 00:06:24,066
that available too on each ufunc.

102
00:06:24,066 --> 00:06:27,610
So np.add.accumulate, and
this will show you each step through.

103
00:06:27,610 --> 00:06:36,590
So if we do, again,
if we do study_minutes[0],

104
00:06:36,590 --> 00:06:39,620
we'll see that we have 150,
210, 290, 350, and

105
00:06:39,620 --> 00:06:45,390
then actually you'll see all of
the zero adds that we had to do, and

106
00:06:45,390 --> 00:06:49,200
yikes, you can see the waste of time that
we made this do by adding all the zeros.

107
00:06:49,200 --> 00:06:50,620
We could have filtered them out.

108
00:06:50,620 --> 00:06:51,436
More in the teacher's notes.

109
00:06:51,436 --> 00:06:55,580
Now, we want to get the sum of
all these values together and

110
00:06:55,580 --> 00:07:00,840
there is of course a routine that's
super common, and it is called, sum.

111
00:07:00,840 --> 00:07:04,776
So if we just make this,
well let's make a new one,

112
00:07:04,776 --> 00:07:10,197
we'll save that there for us,
np.sum(study_minutes[0]).

113
00:07:10,197 --> 00:07:14,953
We'll see that we get 440,
which is exactly what we did when we did

114
00:07:14,953 --> 00:07:19,660
the reduce, and the reduction
works on multi dimensions as well.

115
00:07:19,660 --> 00:07:23,874
So we can just say np.sum(study_minutes),
and

116
00:07:23,874 --> 00:07:28,386
it will get, wow,
10,000 hours, must be a pro.

117
00:07:28,386 --> 00:07:31,191
I think that's what Macklemore said,
or Malcom Gladwell,

118
00:07:31,191 --> 00:07:33,010
I can't remember which one said that.

119
00:07:33,010 --> 00:07:38,150
Reduction functions will almost
always define an access parameter.

120
00:07:38,150 --> 00:07:42,000
So in this case we want to see
the sum of all minutes by round.

121
00:07:42,000 --> 00:07:48,848
So, that's axis=1, and

122
00:07:48,848 --> 00:07:51,981
there we know that we did it right,
because there are three results turn back,

123
00:07:51,981 --> 00:07:54,950
and 440 was what we get out when
we're getting for the first one.

124
00:07:54,950 --> 00:07:55,750
Awesome.

125
00:07:55,750 --> 00:07:57,410
Pretty handy, right?

126
00:07:57,410 --> 00:08:02,620
And as you can imagine that mean function
that we were just using is probably

127
00:08:02,620 --> 00:08:07,570
using this sum function under the covers
since, to calculate the mean, what

128
00:08:07,570 --> 00:08:11,920
you do is you add all of the values and
then divide by the total amount of values.

129
00:08:11,920 --> 00:08:16,330
But, what's nice is that you don't
need to remember that formula.

130
00:08:16,330 --> 00:08:18,175
Even though it is simple,

131
00:08:18,175 --> 00:08:22,150
it's been extracted away from you by
simply calling the mean function.

132
00:08:22,150 --> 00:08:26,854
You'll find that there are lots of
formulas extracted away from you in

133
00:08:26,854 --> 00:08:27,817
the library.

134
00:08:27,817 --> 00:08:32,620
In fact, let's pop over real quick to
another popular page in the documentation.

135
00:08:32,620 --> 00:08:36,732
I'm just gonna Google statistics numpy.

136
00:08:40,092 --> 00:08:40,850
Here we go.

137
00:08:42,610 --> 00:08:46,215
There are tons of functions available for
you here.

138
00:08:49,250 --> 00:08:52,963
Since it's statistics,
a bunch of these are reduction-based.

139
00:08:52,963 --> 00:08:57,542
They reduce all the values down to one,
and here's one that you'll see everywhere,

140
00:08:57,542 --> 00:09:01,741
std, and while it's actually known to
spread itself around, it's short for

141
00:09:01,741 --> 00:09:03,150
standard deviation.

142
00:09:03,150 --> 00:09:04,660
Here, let's pop in, so

143
00:09:04,660 --> 00:09:07,460
it computes the standard deviation
along the specified access.

144
00:09:08,550 --> 00:09:10,660
The measure of the spread of
a distribution, which is great.

145
00:09:12,302 --> 00:09:15,227
And if we scroll down here in the notes,

146
00:09:15,227 --> 00:09:19,430
we can see that this is
what has been calculated.

147
00:09:19,430 --> 00:09:21,330
This is the formula.

148
00:09:21,330 --> 00:09:23,970
Now, I kinda remember
doing that in math class,

149
00:09:23,970 --> 00:09:28,610
but the point is here, you don't
need to know how to calculate it.

150
00:09:28,610 --> 00:09:30,084
You want to a why to use it,

151
00:09:30,084 --> 00:09:34,317
as we discussed when we first introduced
the grade point averages or GPAs.

152
00:09:34,317 --> 00:09:38,827
People struggle with math concepts when
they are first introduced to them, and

153
00:09:38,827 --> 00:09:43,137
I'm under the belief it's the memorization
of the formula that most people

154
00:09:43,137 --> 00:09:44,098
struggle with.

155
00:09:44,098 --> 00:09:46,171
Typically, that's what you're tested on,

156
00:09:46,171 --> 00:09:49,130
not the actual way to use
the function in the real world.

157
00:09:49,130 --> 00:09:53,770
Current learning science says that if
you don't use it, you will lose it.

158
00:09:53,770 --> 00:09:57,250
So if these equations feel a bit rusty and
you haven't used them recently,

159
00:09:57,250 --> 00:10:00,070
don't fret,
your brain is just working correctly.

160
00:10:00,070 --> 00:10:04,230
Most of the time, you hardly get
a chance to see why in your math class.

161
00:10:04,230 --> 00:10:05,640
You just focused on the how.

162
00:10:06,810 --> 00:10:14,150
So with that said, let's pop up a couple
levels here in our bookmarks to Routines.

163
00:10:14,150 --> 00:10:20,144
This page here is a really great overview
of how powerful this library is,

164
00:10:20,144 --> 00:10:23,923
and a great look at some
common abstractions.

165
00:10:23,923 --> 00:10:30,930
So if we look down here,
here is some Discreet Fourier Transforms.

166
00:10:30,930 --> 00:10:35,903
Here's some financial functions,
linear algebra,

167
00:10:35,903 --> 00:10:39,363
input and output, logic functions,

168
00:10:39,363 --> 00:10:44,610
polynomials, statistics,
there's a lot in here.

169
00:10:44,610 --> 00:10:47,644
Remember, you don't need
to know all of these,

170
00:10:47,644 --> 00:10:52,016
just be aware that what you are trying
to do most likely already exist.

171
00:10:52,016 --> 00:10:52,812
As you can see,

172
00:10:52,812 --> 00:10:56,410
there are tons of directions that
you can head with this library.

173
00:10:56,410 --> 00:11:01,350
So stand on the shoulders of giants
who built things out for you.

174
00:11:01,350 --> 00:11:06,180
We talked way back when about how all
sorts of different libraries accept and

175
00:11:06,180 --> 00:11:08,120
return numpy arrays.

176
00:11:08,120 --> 00:11:09,630
Let's take a quick break and

177
00:11:09,630 --> 00:11:14,370
take a look at one common use case,
plotting values on a graph.

178
00:11:14,370 --> 00:11:17,620
Well, that is,
right after we jot down some notes.

179
00:11:17,620 --> 00:11:20,710
Why don't you talk a bit about some
common routines that you saw, and

180
00:11:20,710 --> 00:11:22,710
talk a bit about reduction.