🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

✨ Earn college credits in Cybersecurity, JS, HTML, CSS and Python

Take our "AI in Tech Jobs" survey and win 3 months free Treehouse!

New No-Code Track! 🚀 New videos dropping every week—start learning today!

🤑 Join the Treehouse affiliate program and earn 25% commission!

🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

✨ Earn college credits in Cybersecurity, JS, HTML, CSS and Python

Well done!

You have completed Introduction to Data Visualization with Matplotlib!

Sign up for Treehouse Back to Library

Preview

Sign up for Treehouse Continue

Video Player

00:00

00:00

00:00

2x 2x
1.75x 1.75x
1.5x 1.5x
1.25x 1.25x
1.1x 1.1x
1x 1x
0.75x 0.75x
0.5x 0.5x

None
English

Use Up/Down Arrow keys to increase or decrease volume.

Our Data Set - Flower Power

4:07 with Ken Alger

An introduction to the dataset we'll be using in this stage of the course, the Iris Data Set.

Teacher's Notes
Questions?4
Video Transcript
Downloads
Workspaces

[MUSIC] 0:00

So far, we've generated graphs with lists of data. 0:04

That works pretty well. 0:08

But we really can't expect, 0:10

all of the data we want to explore to be in the form of Standard Python lists. 0:11

Matplotlib relies heavily on NumPy arrays as a data type. 0:16

We won't cover NumPy arrays in detail in this course, but 0:21

I've included links to further resources in the teacher's notes. 0:23

For this course, we'll be converting our data into Python lists, and 0:27

we'll see how NumPy can be leveraged in a future course. 0:31

Now, the Tree House Studio, where I'm at right now, is in Portland, Oregon. 0:35

Approximately, one hour south of Portland, 0:39

you'll find yourself in the heart of Oregon's Iris farms. 0:41

In fact, every May, 0:44

there's an entire festival dedicated to the humble iris flower. 0:46

Other than a bit of trivia, what does that have to do with data visualization? 0:50

Well, in 1936, a British statistician and biologist named, 0:55

Ronald Fisher, used a dataset from Edgar Anderson on variations in iris flowers. 0:59

This has become a popular dataset to explore and can make for 1:05

some interesting visualizations. 1:08

The dataset takes a look at three species of iris flowers, and the length and 1:10

width of their flowers, sepals, and petals. 1:14

For those who have forgotten, or never took botany, the sepals 1:18

typically function as protection for the flower when it's in bud stage, and 1:21

offer support for the petals when the flower is in bloom. 1:25

The petals surround the reproductive portion of the flower and 1:28

are often designed to attract pollinators. 1:30

In the case of irises, the sepals and petals are very distinctive. 1:33

The dataset itself is relatively small, with only 150 samples. 1:38

This will be great for our purposes, as it will allow us to generate 1:42

a variety of charts without providing an overabundance of data points. 1:46

But it's large enough for us to work with several different charts, and 1:50

see how they work from a reporting standpoint. 1:54

We're going to take a look at this dataset with three different charts to see what, 1:57

if any conclusions we can make with our Iris dataset. 2:00

Let's briefly look at our dataset and prepare it for matplotlib. 2:05

Let's start by looking at the iris_source.txt file. 2:09

It provides the information about the data source and 2:14

shows us the attributes of the data in iris.csv. 2:18

Since that file doesn’t have a header row, this text file’s a great resource. 2:21

The columns are sepal length, sepal width, petal length, 2:25

petal width, and the botanical class to which the iris belongs. 2:30

If we open up iris.csv, 2:35

we see the 50 different 2:40

samples of each class. 2:44

So let's go back and create a new notebook. 2:53

We'll need our imports. 3:00

We'll be using the csv module to read our file. 3:02

So we mean to import that. 3:05

Now, we need our input filename and path. 3:10

Let's assign it to a variable to make it easier to change and 3:12

provide more readable code. 3:15

Finally, we're reading the rows from our csv file and 3:24

append them to a list, then we'll be ready to tackle some data vis. 3:27

With open(input_file, 'r') 3:34

We'll import that, as iris_data. 3:41

We'll make our list irises from our csv.reader. 3:45

Now, that we have our data in a usable data structure for matplotlib, 3:56

let's take a short break. 4:00

In our next video, we'll start doing some data visualization, 4:01

and we'll see what conclusions bloom from our data. 4:04