1 00:00:01,990 --> 00:00:05,276 What is the relationship between Attack and Health Points? 2 00:00:07,981 --> 00:00:12,442 There are two types of plots for charting out relationships in Seaborn. 3 00:00:12,442 --> 00:00:15,302 Scatterplots and lineplots. 4 00:00:15,302 --> 00:00:16,964 Let's start with a scatterplot. 5 00:00:19,613 --> 00:00:22,628 Seaborn uses a declarative approach to plotting, 6 00:00:22,628 --> 00:00:26,282 which means that the function calls are self-descriptive. 7 00:00:26,282 --> 00:00:29,343 To make a scatterplot, 8 00:00:29,343 --> 00:00:33,934 we call the scatterplot function, 9 00:00:33,934 --> 00:00:40,820 sns.scatterplot, and it takes three parameters, 10 00:00:40,820 --> 00:00:46,031 data=pokemon, x=HP and y=Attack. 11 00:00:46,031 --> 00:00:49,103 Shift+Enter to run the cell. 12 00:00:49,103 --> 00:00:53,604 The scatterplot shows us that there may be a general positive 13 00:00:53,604 --> 00:00:57,763 correlation between HP and Attack, with one outlier. 14 00:00:57,763 --> 00:01:01,961 Generally as HP increases, so does Attack. 15 00:01:01,961 --> 00:01:06,851 Pokemon with larger health points tend to be stronger. 16 00:01:06,851 --> 00:01:10,310 Now let's try a lineplot. 17 00:01:10,310 --> 00:01:14,791 We call the lineplot function to make one. 18 00:01:14,791 --> 00:01:17,706 sns.lineplot. 19 00:01:17,706 --> 00:01:23,646 It takes the same three parameters, 20 00:01:23,646 --> 00:01:30,553 data=Pokemon, x=HP, and y= Attack. 21 00:01:30,553 --> 00:01:33,841 Shift+Enter to run the cell. 22 00:01:33,841 --> 00:01:38,635 The lineplot doesn't do a great job at showing us information that can be 23 00:01:38,635 --> 00:01:39,901 easily inferred. 24 00:01:39,901 --> 00:01:44,326 Recall, that a lineplot is better at showing an x-axis that 25 00:01:44,326 --> 00:01:47,461 follows a continuous variable like time. 26 00:01:47,461 --> 00:01:51,200 In this example, we're plotting out a discrete variable HP. 27 00:01:52,770 --> 00:01:56,544 So what happens is that the lineplot goes all over the place. 28 00:01:56,544 --> 00:02:02,422 And it's harder to infer a trend compared to the scatterplot, that's okay. 29 00:02:02,422 --> 00:02:07,320 A part of exploratory data analysis is trying out different things to see 30 00:02:07,320 --> 00:02:08,531 what works well. 31 00:02:08,531 --> 00:02:14,122 It's normal for some plots to show us better insights than other kinds. 32 00:02:14,122 --> 00:02:17,082 So let's explore more of the scatterplot. 33 00:02:17,082 --> 00:02:22,027 We can introduce the hue keyword to look at a breakdown of how different types 34 00:02:22,027 --> 00:02:24,512 are distributed in the scatterplot. 35 00:02:24,512 --> 00:02:30,259 Remember that the type of Pokemon is a categorical variable like water, 36 00:02:30,259 --> 00:02:31,962 grass and electric. 37 00:02:31,962 --> 00:02:38,139 sns.scatterplot, data=pokemon, 38 00:02:38,139 --> 00:02:41,511 x = HP, y = Attack. 39 00:02:47,883 --> 00:02:49,736 And hue = Type. 40 00:02:54,152 --> 00:02:56,797 I've got an AttributeError, 41 00:02:56,797 --> 00:03:02,296 module 'seaborn' has no attribute 'scatterplt' with no o. 42 00:03:02,296 --> 00:03:06,569 So let's fix that typo scatterplot, and 43 00:03:06,569 --> 00:03:11,342 now I've got a scatterplot with color codes. 44 00:03:11,342 --> 00:03:17,462 And this takes us into analyzing the categorical aspects of our data. 45 00:03:17,462 --> 00:03:22,226 We can break this scatterplot down further by using the relplot function and 46 00:03:22,226 --> 00:03:25,279 introducing the "col" Column keyword argument, 47 00:03:25,279 --> 00:03:28,789 to make many scatterplots based on the type of Pokemon. 48 00:03:32,032 --> 00:03:37,284 Sns.relplot is short for relationship plot. 49 00:03:37,284 --> 00:03:41,664 Starts with the same parameters 50 00:03:41,664 --> 00:03:47,185 data=pokemon, x=HP and y= Attack. 51 00:03:50,542 --> 00:03:55,593 We will pass it the hue keyword argument for Type, and 52 00:03:55,593 --> 00:04:00,990 now the "col" column keyword argument which also takes Type 53 00:04:04,124 --> 00:04:07,400 Shift+Enter to run the cell. 54 00:04:07,400 --> 00:04:11,603 And right now it's really difficult to see these new plots. 55 00:04:11,603 --> 00:04:15,944 So we'll add one more parameter, 56 00:04:15,944 --> 00:04:22,540 the col_wrap keyword argument, col_wrap=3. 57 00:04:26,513 --> 00:04:28,871 And now we have a scatterplot for 58 00:04:28,871 --> 00:04:34,560 each type of Pokemon that describes the relationship between HP and Attack. 59 00:04:34,560 --> 00:04:39,181 It's nice to have a breakdown of types like this because for some types, 60 00:04:39,181 --> 00:04:41,392 there are very few observations. 61 00:04:41,392 --> 00:04:46,084 Looking at the types that have many observations, we can infer that generally, 62 00:04:46,084 --> 00:04:49,120 HP and Attack are somewhat positively correlated. 63 00:04:50,460 --> 00:04:53,573 Now try practicing with the defense style of Pokemon. 64 00:04:53,573 --> 00:04:57,740 Can you infer a relationship between HP and defense? 65 00:05:02,448 --> 00:05:04,843 Before we move on to making more plots, 66 00:05:04,843 --> 00:05:08,047 let's write down our observations to record them. 67 00:05:10,441 --> 00:05:16,090 We'll use a mark down cell and say that generally, HP and 68 00:05:16,090 --> 00:05:23,045 Attack are somewhat correlated, somewhat positively correlated. 69 00:05:26,631 --> 00:05:33,260 Pokemon with more HP tend to be stronger. 70 00:05:35,543 --> 00:05:36,160 Awesome.