1 00:00:00,000 --> 00:00:04,974 [MUSIC] 2 00:00:04,974 --> 00:00:08,690 So far, we've generated graphs with lists of data. 3 00:00:08,690 --> 00:00:10,010 That works pretty well. 4 00:00:10,010 --> 00:00:11,670 But we really can't expect, 5 00:00:11,670 --> 00:00:15,140 all of the data we want to explore to be in the form of Standard Python lists. 6 00:00:16,230 --> 00:00:19,820 Matplotlib relies heavily on NumPy arrays as a data type. 7 00:00:21,110 --> 00:00:23,720 We won't cover NumPy arrays in detail in this course, but 8 00:00:23,720 --> 00:00:26,870 I've included links to further resources in the teacher's notes. 9 00:00:27,920 --> 00:00:31,090 For this course, we'll be converting our data into Python lists, and 10 00:00:31,090 --> 00:00:34,390 we'll see how NumPy can be leveraged in a future course. 11 00:00:35,400 --> 00:00:39,430 Now, the Tree House Studio, where I'm at right now, is in Portland, Oregon. 12 00:00:39,430 --> 00:00:41,930 Approximately, one hour south of Portland, 13 00:00:41,930 --> 00:00:44,790 you'll find yourself in the heart of Oregon's Iris farms. 14 00:00:44,790 --> 00:00:46,420 In fact, every May, 15 00:00:46,420 --> 00:00:50,670 there's an entire festival dedicated to the humble iris flower. 16 00:00:50,670 --> 00:00:54,030 Other than a bit of trivia, what does that have to do with data visualization? 17 00:00:55,160 --> 00:00:59,680 Well, in 1936, a British statistician and biologist named, 18 00:00:59,680 --> 00:01:05,210 Ronald Fisher, used a dataset from Edgar Anderson on variations in iris flowers. 19 00:01:05,210 --> 00:01:08,280 This has become a popular dataset to explore and can make for 20 00:01:08,280 --> 00:01:10,752 some interesting visualizations. 21 00:01:10,752 --> 00:01:14,820 The dataset takes a look at three species of iris flowers, and the length and 22 00:01:14,820 --> 00:01:18,020 width of their flowers, sepals, and petals. 23 00:01:18,020 --> 00:01:21,460 For those who have forgotten, or never took botany, the sepals 24 00:01:21,460 --> 00:01:25,080 typically function as protection for the flower when it's in bud stage, and 25 00:01:25,080 --> 00:01:28,060 offer support for the petals when the flower is in bloom. 26 00:01:28,060 --> 00:01:30,710 The petals surround the reproductive portion of the flower and 27 00:01:30,710 --> 00:01:32,870 are often designed to attract pollinators. 28 00:01:33,880 --> 00:01:37,290 In the case of irises, the sepals and petals are very distinctive. 29 00:01:38,552 --> 00:01:42,690 The dataset itself is relatively small, with only 150 samples. 30 00:01:42,690 --> 00:01:46,020 This will be great for our purposes, as it will allow us to generate 31 00:01:46,020 --> 00:01:50,200 a variety of charts without providing an overabundance of data points. 32 00:01:50,200 --> 00:01:54,050 But it's large enough for us to work with several different charts, and 33 00:01:54,050 --> 00:01:55,950 see how they work from a reporting standpoint. 34 00:01:57,260 --> 00:02:00,990 We're going to take a look at this dataset with three different charts to see what, 35 00:02:00,990 --> 00:02:04,102 if any conclusions we can make with our Iris dataset. 36 00:02:05,330 --> 00:02:08,120 Let's briefly look at our dataset and prepare it for matplotlib. 37 00:02:09,360 --> 00:02:12,185 Let's start by looking at the iris_source.txt file. 38 00:02:14,742 --> 00:02:18,235 It provides the information about the data source and 39 00:02:18,235 --> 00:02:21,502 shows us the attributes of the data in iris.csv. 40 00:02:21,502 --> 00:02:25,750 Since that file doesn’t have a header row, this text file’s a great resource. 41 00:02:25,750 --> 00:02:30,088 The columns are sepal length, sepal width, petal length, 42 00:02:30,088 --> 00:02:35,700 petal width, and the botanical class to which the iris belongs. 43 00:02:35,700 --> 00:02:40,199 If we open up iris.csv, 44 00:02:40,199 --> 00:02:44,700 we see the 50 different 45 00:02:44,700 --> 00:02:48,996 samples of each class. 46 00:02:53,351 --> 00:02:55,576 So let's go back and create a new notebook. 47 00:03:00,962 --> 00:03:02,740 We'll need our imports. 48 00:03:02,740 --> 00:03:05,440 We'll be using the csv module to read our file. 49 00:03:05,440 --> 00:03:06,450 So we mean to import that. 50 00:03:10,040 --> 00:03:12,870 Now, we need our input filename and path. 51 00:03:12,870 --> 00:03:15,545 Let's assign it to a variable to make it easier to change and 52 00:03:15,545 --> 00:03:16,893 provide more readable code. 53 00:03:24,681 --> 00:03:27,822 Finally, we're reading the rows from our csv file and 54 00:03:27,822 --> 00:03:31,720 append them to a list, then we'll be ready to tackle some data vis. 55 00:03:34,290 --> 00:03:36,685 With open(input_file, 'r') 56 00:03:41,333 --> 00:03:45,858 We'll import that, as iris_data. 57 00:03:45,858 --> 00:03:52,249 We'll make our list irises from our csv.reader. 58 00:03:56,671 --> 00:04:00,016 Now, that we have our data in a usable data structure for matplotlib, 59 00:04:00,016 --> 00:04:01,440 let's take a short break. 60 00:04:01,440 --> 00:04:04,580 In our next video, we'll start doing some data visualization, 61 00:04:04,580 --> 00:04:07,210 and we'll see what conclusions bloom from our data.