1 00:00:00,445 --> 00:00:03,549 What is the distribution of attack? 2 00:00:03,549 --> 00:00:08,604 We can use the histogram to visualize the distribution of Pokemon's attack. 3 00:00:08,604 --> 00:00:10,968 The histplot function will give us a histogram. 4 00:00:15,446 --> 00:00:25,126 sns.histplot(data=pokemon, x='Attack'). 5 00:00:25,126 --> 00:00:29,809 Notice that Seaborn picked an optimal bin size of 20 for us. 6 00:00:29,809 --> 00:00:34,978 If we want to change the bins, we can use the bins keyword argument. 7 00:00:34,978 --> 00:00:39,753 We might want to change the bin size in order to observe the distribution of 8 00:00:39,753 --> 00:00:42,380 data in differently shaped groupings. 9 00:00:42,380 --> 00:00:50,022 sns.histplot(data=pokemon, x='Attack', 10 00:00:50,022 --> 00:00:52,404 bins=10). 11 00:00:55,664 --> 00:01:03,077 Shift+Enter to run the cell And let's compare our two histograms. 12 00:01:03,077 --> 00:01:08,187 In the default histogram, the Pokemon appear to have a bimodal distribution. 13 00:01:08,187 --> 00:01:12,224 That is, there's two big humps. 14 00:01:12,224 --> 00:01:15,297 But when we look at our bin size of 10, 15 00:01:15,297 --> 00:01:19,490 where the groupings are broken down more segmentedly, 16 00:01:19,490 --> 00:01:25,937 we can see that there's more of a unimodal distribution, with a rightward skew. 17 00:01:28,282 --> 00:01:32,814 Another way to visualize distribution is to use KDE. 18 00:01:32,814 --> 00:01:36,570 Kernel density estimation is like a smoothed out histogram. 19 00:01:37,630 --> 00:01:41,189 It shows the distribution of values with a probability curve. 20 00:01:42,450 --> 00:01:45,490 The plot shows a similar distribution as a histogram. 21 00:01:46,880 --> 00:01:51,324 The advantage to using a KDE is that you can make quicker inferences about how 22 00:01:51,324 --> 00:01:54,810 the data is distributed because of the probability curve. 23 00:01:55,900 --> 00:02:01,754 Features such as central tendency, modality and skew. 24 00:02:01,754 --> 00:02:10,321 sns.kdeplot(data=pokemon, x='Attack'), 25 00:02:10,321 --> 00:02:12,930 Shift+Enter. 26 00:02:12,930 --> 00:02:17,626 And we can see here that the KDE is similar to our histogram that 27 00:02:17,626 --> 00:02:19,353 has a bin size of 10. 28 00:02:21,070 --> 00:02:25,909 So, since they're similar to each other, let's use them both together. 29 00:02:25,909 --> 00:02:29,753 Seaborn lets us overlay the KDE on a histogram by 30 00:02:29,753 --> 00:02:33,222 setting the kde keyword argument to true. 31 00:02:33,222 --> 00:02:37,633 For this plot, I'll copy and paste cell 10. 32 00:02:42,999 --> 00:02:46,740 And we'll set a new keyword argument, kde, to true. 33 00:02:50,743 --> 00:02:51,612 Nice. 34 00:02:53,402 --> 00:02:55,731 According to our histogram, 35 00:02:55,731 --> 00:03:01,422 most of our Pokemon have an attack distributed between 50 and 120. 36 00:03:01,422 --> 00:03:03,654 That's a nice spread. 37 00:03:03,654 --> 00:03:08,118 If we want to use the col keyword to break down each attack 38 00:03:08,118 --> 00:03:13,233 distribution by type, we'll have to use the displot function. 39 00:03:13,233 --> 00:03:18,327 sns.displot stands for distribution plot. 40 00:03:18,327 --> 00:03:24,398 data=pokemon, x='Attack'. 41 00:03:24,398 --> 00:03:29,313 Let's give our bins ten, our column by type. 42 00:03:32,186 --> 00:03:35,635 And we'll use our column wrap of 3. 43 00:03:35,635 --> 00:03:38,670 It worked well for us last time when we used our scatterplot. 44 00:03:38,670 --> 00:03:39,705 So we'll use it again here. 45 00:03:45,783 --> 00:03:49,810 It's nice to be able to make separate histograms based on the category of type. 46 00:03:51,060 --> 00:03:55,563 However, this doesn't paint a clear, at a glance, picture for us. 47 00:03:55,563 --> 00:03:59,815 So let's use some of Seaborn's categorical plots to help us dive 48 00:03:59,815 --> 00:04:04,006 further into analyzing Pokemon's attack based on their type. 49 00:04:04,006 --> 00:04:05,489 Before we move on, 50 00:04:05,489 --> 00:04:10,969 practice plotting out the data using Pokemon's defend statistics. 51 00:04:10,969 --> 00:04:14,708 And remember to record your observations in markdown once you've completed 52 00:04:14,708 --> 00:04:15,348 your plots. 53 00:04:17,371 --> 00:04:24,355 For this one, we say that most Pokemon have an Attack 54 00:04:26,252 --> 00:04:31,186 distributed between 50 and 120.