1 00:00:00,368 --> 00:00:04,088 Now we will import our data set. 2 00:00:04,088 --> 00:00:09,770 Use Pandas to import the data set and call head to get a preview of the data. 3 00:00:09,770 --> 00:00:14,982 In cell 3, we'll say pokemon 4 00:00:14,982 --> 00:00:23,608 = pd.read_csv('pokemon_40.csv'). 5 00:00:26,752 --> 00:00:33,950 Next line, pokemon.head(), run the cell. 6 00:00:35,560 --> 00:00:40,445 Let's examine together the data given to us by the head function 7 00:00:40,445 --> 00:00:42,619 to become familiar with it. 8 00:00:42,619 --> 00:00:45,119 Each row represents a Pokemon. 9 00:00:49,055 --> 00:00:53,210 Pokemon is a game where players battle monsters. 10 00:00:53,210 --> 00:00:58,408 Every monster or Pokemon has a name, a categorical type, 11 00:00:58,408 --> 00:01:04,360 like water, grass or electric, and some numerical statistics. 12 00:01:05,610 --> 00:01:09,238 These numerical statistics include HP, 13 00:01:09,238 --> 00:01:14,222 which stands for health points, attack, and defense. 14 00:01:14,222 --> 00:01:18,806 There are other statistics, too, but this data set is simplified so 15 00:01:18,806 --> 00:01:21,967 that even if you are not familiar with the game, 16 00:01:21,967 --> 00:01:25,696 you will be able to perform some statistical analysis. 17 00:01:25,696 --> 00:01:30,078 If you'd like to manually examine more of the data set, 18 00:01:30,078 --> 00:01:36,243 you can open pokemon_40.csv in a new tab to look at all 40 observations. 19 00:01:41,519 --> 00:01:45,030 Let's ask some questions about our data set. 20 00:01:45,030 --> 00:01:46,830 In this stage of the course, 21 00:01:46,830 --> 00:01:51,482 I will be asking questions about the attack statistics of these Pokemon. 22 00:01:51,482 --> 00:01:56,110 Then I will perform exploratory data analysis with different 23 00:01:56,110 --> 00:02:00,568 kinds of plots in order to find answers to these questions. 24 00:02:00,568 --> 00:02:04,954 After each plot, I'll challenge you to explore the data for 25 00:02:04,954 --> 00:02:07,708 the Pokemon's defense statistics. 26 00:02:07,708 --> 00:02:11,038 Here are my initial questions. 27 00:02:11,038 --> 00:02:16,014 What is the relationship between Attack and HP? 28 00:02:16,014 --> 00:02:19,549 What is the distribution of Attack? 29 00:02:19,549 --> 00:02:23,763 What is the relationship between Attack and Type? 30 00:02:23,763 --> 00:02:28,429 What is the distribution of Attack for each Type? 31 00:02:28,429 --> 00:02:33,301 What is the average, or mean, Attack for each Type? 32 00:02:33,301 --> 00:02:35,593 And what is the count of Pokemon for each Type? 33 00:02:38,494 --> 00:02:44,078 Notice that a lot of these questions ask about relationships between numerical and 34 00:02:44,078 --> 00:02:45,443 categorical data. 35 00:02:45,443 --> 00:02:49,869 Categorical data means data that is words instead of numbers. 36 00:02:49,869 --> 00:02:55,156 The categorical data for these questions is the type of Pokemon. 37 00:02:55,156 --> 00:02:58,065 This is one of the main strengths of Seaborn. 38 00:02:58,065 --> 00:03:01,075 Unlike Matplotlib, which is optimized for 39 00:03:01,075 --> 00:03:04,329 creating plots with strictly numerical data, 40 00:03:04,329 --> 00:03:09,880 we can use Seaborn to analyze data that has both categorical and numerical data.