Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Vectorization and Broadcasting Review2:00 with Craig Dennis
Vectorization and Broadcasting are what makes NumPy so fast. Pandas' data structures have similar super powers!
Vectorization in NumPy
Arrays provide a vectorized method named add which removes the need for you to loop through each value to add things together.
np.array([1, 2, 3]) + np.array([4, 5, 6])
array([5, 7, 9])
Broadcasting in NumPy
Scalar values can be broadcasted to values, it's as if there was an equal sized array of all 1's.
conference_counts = np.array([4, 5, 10, 8, 15]) # Broadcast a scalar value conference_counts + 1
array([ 5, 6, 11, 9, 16])
In the next step we'll see how to use vectorization and broadcasting in Pandas.
NumPy is fast,
much faster than we could ever achieve ourselves using just straight Python.
One of the reasons for this is that it relies heavily on vectorization.
It might have been a while since you heard that term, so
I thought we'd take a moment to recall what that means.
Vectorization allows us to avoid looping.
It provides the ability to work on an entire set of values all at one.
Because that abstraction is in place,
optimizations are handled at a very low level.
That is, if we remember to use them.
Because Panda sits directly on top of NumPy, I wanna make sure that you
don't forget about the vectorization superpowers that are available to you.
Just like in NumPy, whenever you start to write a loop, you should pause and
think about how you might solve the problem in a vectorized manner.
A key feature that allow vectorization to happen more easily is broadcasting.
Broadcasting enables you to use similar enough values in element to element based
You don't have to have the same number of elements,
your intention can be figured out in context.
For instance, let's assume that we work in the learning and
development department of a business.
And we wanna keep track of how many conferences our employees are attending.
We have a single dimensional NumPy array, where each element represents the count of
all conferences attended by specific employees.
There's a conference that comes through our town, and so
we send every single employee to it.
What we really want is to increment each of these values by 1.
But before we simply just loop through these, we should step back and
lean on vectorization.
We currently don't have an array the same size, full of 1s.
We could create one, but that'd be some extra work.
What we can do is just add 1, and
our scalar value will be assumed to be the same size when the vectorization occurs.
Is just as if we had an array of all 1s.
Our 1 is broadcasted to all entries.
The series object supports both vectorization and broadcasting.
They are important tools to remember that you have at your disposal.
Let's explore how they work.
You need to sign up for Treehouse in order to download course files.Sign up