Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
An introduction to the dataset we'll be using in this stage of the course, the Iris Data Set.
Instructions to add files to your notebook
- Download the project files using the Download tab above.
- Unzip the file and navigate to the folder "s2v1" (Stage 2, Video 1).
- In your Notebook, go to File > Open and navigate to the unzipped folder.
From there you can view files just like we're doing in the video.
Further Reading
- NumPy Array
- Introduction to NumPy (Treehouse course)
- Iris Data Set
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
[MUSIC]
0:00
So far,
we've generated graphs with lists of data.
0:04
That works pretty well.
0:08
But we really can't expect,
0:10
all of the data we want to explore to be
in the form of Standard Python lists.
0:11
Matplotlib relies heavily on
NumPy arrays as a data type.
0:16
We won't cover NumPy arrays
in detail in this course, but
0:21
I've included links to further
resources in the teacher's notes.
0:23
For this course, we'll be converting
our data into Python lists, and
0:27
we'll see how NumPy can be
leveraged in a future course.
0:31
Now, the Tree House Studio, where I'm
at right now, is in Portland, Oregon.
0:35
Approximately, one hour south of Portland,
0:39
you'll find yourself in the heart
of Oregon's Iris farms.
0:41
In fact, every May,
0:44
there's an entire festival dedicated
to the humble iris flower.
0:46
Other than a bit of trivia, what does
that have to do with data visualization?
0:50
Well, in 1936, a British statistician and
biologist named,
0:55
Ronald Fisher, used a dataset from Edgar
Anderson on variations in iris flowers.
0:59
This has become a popular dataset
to explore and can make for
1:05
some interesting visualizations.
1:08
The dataset takes a look at three species
of iris flowers, and the length and
1:10
width of their flowers,
sepals, and petals.
1:14
For those who have forgotten, or
never took botany, the sepals
1:18
typically function as protection for
the flower when it's in bud stage, and
1:21
offer support for
the petals when the flower is in bloom.
1:25
The petals surround the reproductive
portion of the flower and
1:28
are often designed to attract pollinators.
1:30
In the case of irises, the sepals and
petals are very distinctive.
1:33
The dataset itself is relatively small,
with only 150 samples.
1:38
This will be great for our purposes,
as it will allow us to generate
1:42
a variety of charts without providing
an overabundance of data points.
1:46
But it's large enough for us to work
with several different charts, and
1:50
see how they work from
a reporting standpoint.
1:54
We're going to take a look at this dataset
with three different charts to see what,
1:57
if any conclusions we can
make with our Iris dataset.
2:00
Let's briefly look at our dataset and
prepare it for matplotlib.
2:05
Let's start by looking at
the iris_source.txt file.
2:09
It provides the information
about the data source and
2:14
shows us the attributes
of the data in iris.csv.
2:18
Since that file doesnβt have a header row,
this text fileβs a great resource.
2:21
The columns are sepal length,
sepal width, petal length,
2:25
petal width, and the botanical
class to which the iris belongs.
2:30
If we open up iris.csv,
2:35
we see the 50 different
2:40
samples of each class.
2:44
So let's go back and
create a new notebook.
2:53
We'll need our imports.
3:00
We'll be using the csv
module to read our file.
3:02
So we mean to import that.
3:05
Now, we need our input filename and path.
3:10
Let's assign it to a variable
to make it easier to change and
3:12
provide more readable code.
3:15
Finally, we're reading
the rows from our csv file and
3:24
append them to a list, then we'll
be ready to tackle some data vis.
3:27
With open(input_file, 'r')
3:34
We'll import that, as iris_data.
3:41
We'll make our list irises
from our csv.reader.
3:45
Now, that we have our data in a usable
data structure for matplotlib,
3:56
let's take a short break.
4:00
In our next video,
we'll start doing some data visualization,
4:01
and we'll see what conclusions
bloom from our data.
4:04
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up