Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Let’s utilize a scatter plot to see what correlations if any, there are between the sepal length and width based on the variety of iris.
Color Dict
colors = {"Iris-setosa": "#2B5B84", "Iris-versicolor": "g", "Iris-virginica": "purple"}
Correlations
- Positive Correlation: as one variable increases so does the other. Height and shoe size are an example; as one's height increases so does the shoe size.
- Negative Correlation: as one variable increases, the other decreases. Time spent studying and time spent on video games are negatively correlated; as your time studying increases, time spent on video games decreases.
- No Correlation: there is no apparent relationship between the variables. Video game scores and shoe size appear to have no correlation; as one increases, the other one is not affected.
Further Reading
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
We left off with our data in a list,
just waiting to be used.
0:00
Let's utilize a scatter plot to
see what correlations, if any,
0:04
there are between the sepal length and
width, based on the variety of aggregates.
0:07
[SOUND] Recall that scatter plots are used
to show how much one variable is impacted
0:12
by another, or its correlation.
0:16
We use scatter plots to show
relationships between values.
0:18
In this case, our sepal length and width.
0:22
The scatter plot allows us to quickly
visualize the distribution of the data and
0:24
notice any outliers.
0:28
We can see if there’s a positive,
negative, or
0:30
nonexistent correlation between our
data based on the scatter plot results.
0:33
Let's jump back into our Python code and
develop our chart.
0:38
Let's get started from
the previous video's code and
0:42
rename name this notebook iris_scatter.
0:46
Since we'll be starting
our work with plots now,
0:58
we'll need to have our matplotlib.pyplot
import to our project.
1:02
matplotlib.pyplot as plt.
1:06
Let's create a dict for
1:12
our marker colors that we can use as
we loop through our list information.
1:13
The colors allow us to see the different
iris classes more easily, and
1:18
visualize a third
variable in our data set.
1:22
I'll paste that dict in here.
1:25
I have included a copy of
it in the teacher's notes.
1:27
Our colors then are the blue hue for
setosa.
1:38
Green, that short code, for versicolor.
1:42
And purple for virginica.
1:45
Our list of iris data also includes an
extra item that we don't need at the end,
1:48
so let's pop that off.
1:52
Now we'll want to loop through our array,
and assign our x and
1:57
y-values to the sepal length and width.
2:01
These are located in the first and
second columns of our array, respectively.
2:04
We can use a function in the itertools
library called groupby that allows us to
2:09
easily do that.
2:14
Let's add that import first and
I'll show you the code.
2:15
From itertools import groupby.
2:23
If you haven't used itertools,
it is a module that provides functions for
2:30
efficient looping.
2:34
Check the teacher's notes for
additional information.
2:36
So to start this, species and
2:38
group in groupby,
2:45
Group is a generator, so
you can only go over it one time.
2:54
And then we'll get our sepal length.
3:09
It's gonna be the float value.
3:16
Sepal widths, similar.
3:32
Then we assign that to plt.scatter.
3:51
Sepal_lengths, sepal_widths for
our y-value there.
3:55
We'll assign it a marker size of 10.
4:05
C for the colors will come from our
colors dict, and grab the species.
4:10
And we'll label based on species as well.
4:17
Now, before we call plt.show,
let's add a plot title, axes labels,
4:22
and legend to our chart here
to add context to our data.
4:27
This is an important thing to remember.
4:31
Always label your axes,
legends, and charts.
4:34
Plt.title, Fisher's Iris Data Set.
4:38
We'll give that a fontsize of 12.
4:47
Bring that up a little bit.
4:54
Our xlabel.
4:58
These are our sepal
lengths in centimeters.
5:01
We’ll assign that a fontsize of 10.
5:07
For our ylabel,
these are our sepal widths.
5:11
Again, in centimeters, and
we'll give that a fontsize of 10 as well.
5:16
We'll call plt.legend, And
5:27
we'll give this a location
in the upper right.
5:31
Here we are setting the legend location
to be displayed in the upper right
5:37
of the chart.
5:40
But we could display it in the upper left,
upper center, bottom left, etc.
5:41
Since there aren't any data points
being displayed in the upper right,
5:46
that seems like a good position.
5:50
Now we just call plt.show.
5:52
And run our cell.
5:58
We can see some patterns
here in our sepal data.
6:05
Iris-setosa is a pretty good grouping in
the upper left quadrant of our chart.
6:07
There are some outliers though.
6:12
The other two varieties seem
to be clumped together and
6:14
intermixed with some
even greater outliers.
6:17
Our plot looks a bit small here, though.
6:20
Let's assign a size to our figure
to make it a bit easier to see.
6:22
We do that, Go up here,
6:26
right under input_file,
that's kind of a standard spot for it.
6:30
We attach something to the figure object,
figsize.
6:36
7.5, 4.25 seems to work pretty well,
and we can run our cell again.
6:44
There, that's better.
6:54
From an analysis standpoint, we could draw
some conclusions based on this chart.
6:57
It appears that all three iris
varieties have a positive correlation
7:01
between sepal length and width.
7:05
Iris-setosa has a better defined positive
correlation than the other varieties.
7:06
Scatter plots are, of course,
only one way to explore our data.
7:13
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up