Welcome5:18 with Craig Dennis
Let's get started!
[MUSIC] 0:00 Hello, I'm Craig, and I'm a developer. 0:09 In this course, 0:11 we're going to to be exploring the wonderful data library, pandas. 0:12 Now, pandas is a portmanteau, or a combination of two words, 0:15 in this case, the words panel and data. 0:19 Panel data is data that is multidimensional, 0:21 involving measurements over time. 0:23 pandas are also an adorable creature, and I hope that you're here for the former, 0:25 but I totally understand that I might have clickbaited you into the latter. 0:29 pandas provides fast, flexible, and expressive data structures that have been 0:33 designed to make working with relational or 0:37 labeled data not only easy, but also intuitive. 0:39 It's the fundamental high level building block for doing practical and 0:42 real world data analysis in Python. 0:46 Before we get cooking, let's make sure that we're on the same page. 0:48 There are definitely some prerequisites for this course, 0:51 so please double check that you're all caught up. 0:54 The most important of the prerequisites is NumPy. 0:56 I'd like to make sure that you had a nice introduction to the NumPy library. 0:59 pandas relies heavily on NumPy, and 1:03 I'm going to assume that you have a basic understanding of its overarching concepts. 1:05 Now, don't worry if it's been a while since you've used it, 1:10 we'll retouch upon the concepts that you need here in just a bit. 1:13 Don't forget to check the teacher's notes that are attached to each video. 1:15 I'll try and remind you to look in there, but 1:19 please do get in the habit of checking that section out. 1:21 Lots of great information is tucked away in there waiting for you to dig into it. 1:23 In this course, I'm gonna try a new approach. 1:27 In an effort to give you more practice of how data professionals interact, 1:30 I'm going to rely more heavily than usual on Jupyter notebooks. 1:34 As you are most likely already aware, 1:37 Jupyter notebooks are a great place to capture your learnings. 1:39 They're also intended to be used for teaching. 1:42 I've gone ahead and build up some interactive content that will assist you 1:45 in exploring the pandas library. 1:48 In the Treehouse app, 1:50 you'll encounter these notebooks as textual instruction steps. 1:51 I've included information in the teacher's notes about how to get a hold of 1:54 the notebooks so that you can run them and follow along locally. 1:57 I'd love for you, as a lifelong learner, 2:00 to get in the habit of exploring every notebook that you come across. 2:02 Use it to poke around as you learn a new library, 2:06 much like you might expect to use the Python shell. 2:08 Explore the API and practice different approaches, and most importantly, 2:11 keep your own notes. 2:15 A common data science workflow involves multiple stages. 2:16 First you clean the data and then you analyze and model it. 2:19 And finally, you organize the results of the analysis into either a graph or 2:22 a table. 2:26 Great news, pandas can do all that, the entire workflow. 2:27 Even better news, it's really a pleasure to use. 2:30 Since you already have a fundamental understanding of the numerical library, 2:34 NumPy, pandas is going to feel very familiar to you. 2:38 In fact, pandas sits directly on top of NumPy like a little hat. 2:41 I don't know about you, but 2:45 one of the things that I have trouble with in NumPy is when I have an array. 2:46 I never know just which value is which. 2:49 Like for instance, in this array here, I don't really know who got the high score. 2:53 I have to remember that Robbie is the first one here at index zero. 2:58 But I just have to know that. 3:02 pandas gives you a new ability, you can label each value. 3:03 It's like a dictionary, a key and a value. 3:07 And that works great for a single dimension. 3:09 This example is the series of high scorers for a single game, Donkey Kong, 3:11 labeled by players' initials. 3:16 But as you know, we often want to have multidimensional data. 3:18 We could track more games by adding a new game dimension, 3:21 like we could add Pac-Man scores. 3:25 But now we have to remember two indexes, and 3:27 I have to remember that index zero is Donkey Kong and index one is Pac-Man. 3:29 Now, again, pandas does a great job with labeling. 3:34 You can also label each of these columns, so you end up with tabular data. 3:37 The two-dimensioned data structure here is known as a data frame. 3:41 This is a data frame of high scores on multiple games indexed by players' 3:46 initials. 3:49 And that ought to feel pretty familiar, assuming you've used tabular or 3:50 table based data before, like a spreadsheet or a database table, 3:54 anything with rows and columns. 3:58 With pandas, you can put any sort of data in there too. 4:00 It doesn't have the same restrictions like NumPy did. 4:03 pandas also lets you relate datasets by label. 4:05 So you can merge and 4:08 join together related information in a very straightforward manner. 4:09 pandas is a full-featured library, and we simply won't be able to get to all of its 4:13 amazing powers in this introductory course. 4:18 I do hope to give you a firm foundation and guide you to where you can learn and 4:20 practice more. 4:24 For this course, I'm gonna ask that you imagine that there is a new company 4:25 in town jumping in on that social banking app craze, like Cash App or Venmo. 4:29 They call themselves Cash Box. 4:32 Basically the way that their app works is that a user signs up, chooses a username, 4:34 and then they can send money to other users of the system by their username. 4:39 Now, a common use case for their app is when it's lunch time and 4:43 people don't have cash on them. 4:46 Their users can just send money through Cash Box to the person picking up 4:47 the bill. 4:51 Now, each user on Cash Box keeps a balance of their funds, and 4:51 the app tracks their transactions. 4:55 Good news, Cash Box is hiring and they are looking for a junior data scientist. 4:57 They've sent out a hiring challenge and 5:03 access to a sample of some of their datasets. 5:05 So what do you say we explore their data sets and 5:07 pick up some job skills along the way? 5:09 Let's get ready to rock the Cash Box. 5:12 [LAUGH] Good thing we aren't applying to be part of the marketing department. 5:14
You need to sign up for Treehouse in order to download course files.Sign up