Vectorization and Broadcasting Review2:00 with Craig Dennis
Vectorization and Broadcasting are what makes NumPy so fast. Pandas' data structures have similar super powers!
Vectorization in NumPy
Arrays provide a vectorized method named add which removes the need for you to loop through each value to add things together.
np.array([1, 2, 3]) + np.array([4, 5, 6])
array([5, 7, 9])
Broadcasting in NumPy
Scalar values can be broadcasted to values, it's as if there was an equal sized array of all 1's.
conference_counts = np.array([4, 5, 10, 8, 15]) # Broadcast a scalar value conference_counts + 1
array([ 5, 6, 11, 9, 16])
In the next step we'll see how to use vectorization and broadcasting in Pandas.
NumPy is fast, 0:00 much faster than we could ever achieve ourselves using just straight Python. 0:01 One of the reasons for this is that it relies heavily on vectorization. 0:05 It might have been a while since you heard that term, so 0:10 I thought we'd take a moment to recall what that means. 0:12 Vectorization allows us to avoid looping. 0:15 It provides the ability to work on an entire set of values all at one. 0:18 Because that abstraction is in place, 0:22 optimizations are handled at a very low level. 0:24 That is, if we remember to use them. 0:27 Because Panda sits directly on top of NumPy, I wanna make sure that you 0:29 don't forget about the vectorization superpowers that are available to you. 0:33 Just like in NumPy, whenever you start to write a loop, you should pause and 0:38 think about how you might solve the problem in a vectorized manner. 0:42 A key feature that allow vectorization to happen more easily is broadcasting. 0:45 Broadcasting enables you to use similar enough values in element to element based 0:50 operations. 0:55 You don't have to have the same number of elements, 0:56 your intention can be figured out in context. 0:59 For instance, let's assume that we work in the learning and 1:01 development department of a business. 1:04 And we wanna keep track of how many conferences our employees are attending. 1:06 We have a single dimensional NumPy array, where each element represents the count of 1:10 all conferences attended by specific employees. 1:14 There's a conference that comes through our town, and so 1:17 we send every single employee to it. 1:20 What we really want is to increment each of these values by 1. 1:21 But before we simply just loop through these, we should step back and 1:25 lean on vectorization. 1:30 We currently don't have an array the same size, full of 1s. 1:31 We could create one, but that'd be some extra work. 1:34 What we can do is just add 1, and 1:37 our scalar value will be assumed to be the same size when the vectorization occurs. 1:39 Is just as if we had an array of all 1s. 1:45 Our 1 is broadcasted to all entries. 1:47 The series object supports both vectorization and broadcasting. 1:50 They are important tools to remember that you have at your disposal. 1:55 Let's explore how they work. 1:58
You need to sign up for Treehouse in order to download course files.Sign up