Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Vectorization and Broadcasting are what makes NumPy so fast. Pandas' data structures have similar super powers!
Example Code
Vectorization in NumPy
Arrays provide a vectorized method named add which removes the need for you to loop through each value to add things together.
np.array([1, 2, 3]) + np.array([4, 5, 6])
array([5, 7, 9])
Broadcasting in NumPy
Scalar values can be broadcasted to values, it's as if there was an equal sized array of all 1's.
conference_counts = np.array([4, 5, 10, 8, 15])
# Broadcast a scalar value
conference_counts + 1
array([ 5, 6, 11, 9, 16])
In the next step we'll see how to use vectorization and broadcasting in Pandas.
NumPy is fast,
0:00
much faster than we could ever achieve
ourselves using just straight Python.
0:01
One of the reasons for this is that
it relies heavily on vectorization.
0:05
It might have been a while
since you heard that term, so
0:10
I thought we'd take a moment
to recall what that means.
0:12
Vectorization allows us to avoid looping.
0:15
It provides the ability to work on
an entire set of values all at one.
0:18
Because that abstraction is in place,
0:22
optimizations are handled
at a very low level.
0:24
That is, if we remember to use them.
0:27
Because Panda sits directly on top
of NumPy, I wanna make sure that you
0:29
don't forget about the vectorization
superpowers that are available to you.
0:33
Just like in NumPy, whenever you start
to write a loop, you should pause and
0:38
think about how you might solve
the problem in a vectorized manner.
0:42
A key feature that allow vectorization
to happen more easily is broadcasting.
0:45
Broadcasting enables you to use similar
enough values in element to element based
0:50
operations.
0:55
You don't have to have
the same number of elements,
0:56
your intention can be
figured out in context.
0:59
For instance, let's assume that
we work in the learning and
1:01
development department of a business.
1:04
And we wanna keep track of how many
conferences our employees are attending.
1:06
We have a single dimensional NumPy array,
where each element represents the count of
1:10
all conferences attended
by specific employees.
1:14
There's a conference that
comes through our town, and so
1:17
we send every single employee to it.
1:20
What we really want is to increment
each of these values by 1.
1:21
But before we simply just loop
through these, we should step back and
1:25
lean on vectorization.
1:30
We currently don't have an array
the same size, full of 1s.
1:31
We could create one, but
that'd be some extra work.
1:34
What we can do is just add 1, and
1:37
our scalar value will be assumed to be the
same size when the vectorization occurs.
1:39
Is just as if we had an array of all 1s.
1:45
Our 1 is broadcasted to all entries.
1:47
The series object supports both
vectorization and broadcasting.
1:50
They are important tools to remember
that you have at your disposal.
1:55
Let's explore how they work.
1:58
You need to sign up for Treehouse in order to download course files.
Sign up