Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
      You have completed Introduction to Data Visualization with Matplotlib!
      
    
You have completed Introduction to Data Visualization with Matplotlib!
Preview
    
      
  Let’s utilize a scatter plot to see what correlations if any, there are between the sepal length and width based on the variety of iris.
Color Dict
colors = {"Iris-setosa": "#2B5B84", "Iris-versicolor": "g", "Iris-virginica": "purple"}
Correlations
- Positive Correlation: as one variable increases so does the other. Height and shoe size are an example; as one's height increases so does the shoe size.
 - Negative Correlation: as one variable increases, the other decreases. Time spent studying and time spent on video games are negatively correlated; as your time studying increases, time spent on video games decreases.
 - No Correlation: there is no apparent relationship between the variables. Video game scores and shoe size appear to have no correlation; as one increases, the other one is not affected.
 
Further Reading
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
                      We left off with our data in a list,
just waiting to be used.
                      0:00
                    
                    
                      Let's utilize a scatter plot to
see what correlations, if any,
                      0:04
                    
                    
                      there are between the sepal length and
width, based on the variety of aggregates.
                      0:07
                    
                    
                      [SOUND] Recall that scatter plots are used
to show how much one variable is impacted
                      0:12
                    
                    
                      by another, or its correlation.
                      0:16
                    
                    
                      We use scatter plots to show
relationships between values.
                      0:18
                    
                    
                      In this case, our sepal length and width.
                      0:22
                    
                    
                      The scatter plot allows us to quickly
visualize the distribution of the data and
                      0:24
                    
                    
                      notice any outliers.
                      0:28
                    
                    
                      We can see if there’s a positive,
negative, or
                      0:30
                    
                    
                      nonexistent correlation between our
data based on the scatter plot results.
                      0:33
                    
                    
                      Let's jump back into our Python code and
develop our chart.
                      0:38
                    
                    
                      Let's get started from
the previous video's code and
                      0:42
                    
                    
                      rename name this notebook iris_scatter.
                      0:46
                    
                    
                      Since we'll be starting
our work with plots now,
                      0:58
                    
                    
                      we'll need to have our matplotlib.pyplot
import to our project.
                      1:02
                    
                    
                      matplotlib.pyplot as plt.
                      1:06
                    
                    
                      Let's create a dict for
                      1:12
                    
                    
                      our marker colors that we can use as
we loop through our list information.
                      1:13
                    
                    
                      The colors allow us to see the different
iris classes more easily, and
                      1:18
                    
                    
                      visualize a third
variable in our data set.
                      1:22
                    
                    
                      I'll paste that dict in here.
                      1:25
                    
                    
                      I have included a copy of
it in the teacher's notes.
                      1:27
                    
                    
                      Our colors then are the blue hue for
setosa.
                      1:38
                    
                    
                      Green, that short code, for versicolor.
                      1:42
                    
                    
                      And purple for virginica.
                      1:45
                    
                    
                      Our list of iris data also includes an
extra item that we don't need at the end,
                      1:48
                    
                    
                      so let's pop that off.
                      1:52
                    
                    
                      Now we'll want to loop through our array,
and assign our x and
                      1:57
                    
                    
                      y-values to the sepal length and width.
                      2:01
                    
                    
                      These are located in the first and
second columns of our array, respectively.
                      2:04
                    
                    
                      We can use a function in the itertools
library called groupby that allows us to
                      2:09
                    
                    
                      easily do that.
                      2:14
                    
                    
                      Let's add that import first and
I'll show you the code.
                      2:15
                    
                    
                      From itertools import groupby.
                      2:23
                    
                    
                      If you haven't used itertools,
it is a module that provides functions for
                      2:30
                    
                    
                      efficient looping.
                      2:34
                    
                    
                      Check the teacher's notes for
additional information.
                      2:36
                    
                    
                      So to start this, species and
                      2:38
                    
                    
                      group in groupby,
                      2:45
                    
                    
                      Group is a generator, so
you can only go over it one time.
                      2:54
                    
                    
                      And then we'll get our sepal length.
                      3:09
                    
                    
                      It's gonna be the float value.
                      3:16
                    
                    
                      Sepal widths, similar.
                      3:32
                    
                    
                      Then we assign that to plt.scatter.
                      3:51
                    
                    
                      Sepal_lengths, sepal_widths for
our y-value there.
                      3:55
                    
                    
                      We'll assign it a marker size of 10.
                      4:05
                    
                    
                      C for the colors will come from our
colors dict, and grab the species.
                      4:10
                    
                    
                      And we'll label based on species as well.
                      4:17
                    
                    
                      Now, before we call plt.show,
let's add a plot title, axes labels,
                      4:22
                    
                    
                      and legend to our chart here
to add context to our data.
                      4:27
                    
                    
                      This is an important thing to remember.
                      4:31
                    
                    
                      Always label your axes,
legends, and charts.
                      4:34
                    
                    
                      Plt.title, Fisher's Iris Data Set.
                      4:38
                    
                    
                      We'll give that a fontsize of 12.
                      4:47
                    
                    
                      Bring that up a little bit.
                      4:54
                    
                    
                      Our xlabel.
                      4:58
                    
                    
                      These are our sepal
lengths in centimeters.
                      5:01
                    
                    
                      We’ll assign that a fontsize of 10.
                      5:07
                    
                    
                      For our ylabel,
these are our sepal widths.
                      5:11
                    
                    
                      Again, in centimeters, and
we'll give that a fontsize of 10 as well.
                      5:16
                    
                    
                      We'll call plt.legend, And
                      5:27
                    
                    
                      we'll give this a location
in the upper right.
                      5:31
                    
                    
                      Here we are setting the legend location
to be displayed in the upper right
                      5:37
                    
                    
                      of the chart.
                      5:40
                    
                    
                      But we could display it in the upper left,
upper center, bottom left, etc.
                      5:41
                    
                    
                      Since there aren't any data points
being displayed in the upper right,
                      5:46
                    
                    
                      that seems like a good position.
                      5:50
                    
                    
                      Now we just call plt.show.
                      5:52
                    
                    
                      And run our cell.
                      5:58
                    
                    
                      We can see some patterns
here in our sepal data.
                      6:05
                    
                    
                      Iris-setosa is a pretty good grouping in
the upper left quadrant of our chart.
                      6:07
                    
                    
                      There are some outliers though.
                      6:12
                    
                    
                      The other two varieties seem
to be clumped together and
                      6:14
                    
                    
                      intermixed with some
even greater outliers.
                      6:17
                    
                    
                      Our plot looks a bit small here, though.
                      6:20
                    
                    
                      Let's assign a size to our figure
to make it a bit easier to see.
                      6:22
                    
                    
                      We do that, Go up here,
                      6:26
                    
                    
                      right under input_file,
that's kind of a standard spot for it.
                      6:30
                    
                    
                      We attach something to the figure object,
figsize.
                      6:36
                    
                    
                      7.5, 4.25 seems to work pretty well,
and we can run our cell again.
                      6:44
                    
                    
                      There, that's better.
                      6:54
                    
                    
                      From an analysis standpoint, we could draw
some conclusions based on this chart.
                      6:57
                    
                    
                      It appears that all three iris
varieties have a positive correlation
                      7:01
                    
                    
                      between sepal length and width.
                      7:05
                    
                    
                      Iris-setosa has a better defined positive
correlation than the other varieties.
                      7:06
                    
                    
                      Scatter plots are, of course,
only one way to explore our data.
                      7:13
                    
              
        You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up