Our Data Set - Flower Power4:07 with Ken Alger
An introduction to the dataset we'll be using in this stage of the course, the Iris Data Set.
Instructions to add files to your notebook
- Download the project files using the Download tab above.
- Unzip the file and navigate to the folder "s2v1" (Stage 2, Video 1).
- In your Notebook, go to File > Open and navigate to the unzipped folder.
From there you can view files just like we're doing in the video.
[MUSIC] 0:00 So far, we've generated graphs with lists of data. 0:04 That works pretty well. 0:08 But we really can't expect, 0:10 all of the data we want to explore to be in the form of Standard Python lists. 0:11 Matplotlib relies heavily on NumPy arrays as a data type. 0:16 We won't cover NumPy arrays in detail in this course, but 0:21 I've included links to further resources in the teacher's notes. 0:23 For this course, we'll be converting our data into Python lists, and 0:27 we'll see how NumPy can be leveraged in a future course. 0:31 Now, the Tree House Studio, where I'm at right now, is in Portland, Oregon. 0:35 Approximately, one hour south of Portland, 0:39 you'll find yourself in the heart of Oregon's Iris farms. 0:41 In fact, every May, 0:44 there's an entire festival dedicated to the humble iris flower. 0:46 Other than a bit of trivia, what does that have to do with data visualization? 0:50 Well, in 1936, a British statistician and biologist named, 0:55 Ronald Fisher, used a dataset from Edgar Anderson on variations in iris flowers. 0:59 This has become a popular dataset to explore and can make for 1:05 some interesting visualizations. 1:08 The dataset takes a look at three species of iris flowers, and the length and 1:10 width of their flowers, sepals, and petals. 1:14 For those who have forgotten, or never took botany, the sepals 1:18 typically function as protection for the flower when it's in bud stage, and 1:21 offer support for the petals when the flower is in bloom. 1:25 The petals surround the reproductive portion of the flower and 1:28 are often designed to attract pollinators. 1:30 In the case of irises, the sepals and petals are very distinctive. 1:33 The dataset itself is relatively small, with only 150 samples. 1:38 This will be great for our purposes, as it will allow us to generate 1:42 a variety of charts without providing an overabundance of data points. 1:46 But it is large enough for us to work with several different charts, and 1:50 see how they work from a reporting standpoint. 1:54 We're going to take a look at this dataset with three different charts to see what, 1:57 if any conclusions we can make with our Iris dataset. 2:00 Let's briefly look at our dataset and prepare it for matplotlib. 2:05 Let's start by looking at the iris_source.txt file. 2:09 It provides the information about the data source and 2:14 shows us the attributes of the data in iris.csv. 2:18 Since that file doesn’t have a header row, this text file’s a great resource. 2:21 The columns are sepal length, sepal width, petal length, 2:25 petal width, and the botanical class to which the iris belongs. 2:30 If we open up iris.csv, 2:35 we see the 50 different 2:40 samples of each class. 2:44 So let's go back and create a new notebook. 2:53 We'll need our imports. 3:00 We'll be using the csv module to read our file. 3:02 So we mean to import that. 3:05 Now, we need our input filename and path. 3:10 Let's assign it to a variable to make it easier to change and 3:12 provide more readable code. 3:15 Finally, we're reading the rows from our csv file and 3:24 append them to a list, then we'll be ready to tackle some data vis. 3:27 With open(input_file, 'r') 3:34 We'll import that, as iris_data. 3:41 We'll make our list irises from our csv.reader. 3:45 Now, that we have our data in a usable data structure for matplotlip, 3:56 let's take a short break. 4:00 In our next video, we'll start doing some data visualization, 4:01 and we'll see what conclusions bloom from our data 4:04
You need to sign up for Treehouse in order to download course files.Sign up