Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
Histograms are used to show distributions of data. Let's explore the Iris data set with this chart style.
Further Reading
 Matplotlib style sheets
 Number of binds and widths for histograms
 FreedmanDiaconis rule for Histogram Bin widths

0:00
As I've mentioned, histograms are used to show distributions of data.

0:04
This can be very useful to see how closely grouped together or

0:07
spread out a variable is.

0:09
The area of the rectangles in a histogram is proportional

0:12
to the frequency of the variable.

0:14
This allows for

0:15
the rough assessment of the probable distribution of a given variable.

0:19
The rectangles or

0:20
bends in a histogram, are important to consider when doing data visualization.

0:25
Both the number of overall bends and

0:27
the bend width can have an impact on the overall presentation of data.

0:32
From our iris data set let's generate a histogram chart to see the distribution

0:36
of petal length.

0:38
Let's examine the petal lengths of the iris virginica class and

0:41
visualize the distribution of that data.

0:43
Here's where we start off in a new notebook, iris histogram,

0:47
with bringing in our data and getting it stored in a list called irises.

0:51
Let's process through this list to just obtain the petal length

0:54
of the iris virginica species.

0:57
Create a list, hold our data.

1:08
Let's also create a variable for our bin numbers,

1:11
so we can see how changing bin numbers impacts our visualization.

1:18
Now let's loop through our data to get our petal lengths.

1:23
For petal in range of our iris data.

1:50
So if the species is Irisvirginica,

1:53
we'll add the petal length to our virginica_petal_length list.

2:12
And we'll get that from our iris data.

2:19
Now we can pass our data into our plot.hist method.

2:22
This method takes several parameters,

2:24
including the number of bins we'd like to have.

2:27
The color we'd like to set, along with alpha values.

2:30
Plot.hist pass in our virginica_petal_length.

2:40
Our number bins.

2:45
The color of our plot will be red.

2:50
And we give it an alpha value to make it slightly transparent.

2:55
As I've mentioned, it's always important to add labels to your charts.

2:59
For chart title.

3:05
Irisvirginica Petal length.

3:13
We'll give that a font size of 12.

3:16
For our xaxis, for xlabel,

3:20
we'll give it what it is,

3:23
Petal length in centimeters.

3:30
Font size of 10.

3:34
And for our ylabel.

3:36
We'll just call it Probability.

3:46
And again, we'll give that a font size of 10.

3:54
Cool and then we call our show method and run our cell.

4:02
We are shown a histogram chart with red rectangles.

4:05
However, the rectangles are all clumped together and

4:08
can be a challenge to differentiate.

4:10
Matplotlib allows for and includes some chart styling options which can help out.

4:16
Let's apply matplotlib's classic style to our chart and

4:19
see if it helps clear things up.

4:22
We'll go back up here and under where we assign our figure size.

4:32
We'll ask it to use the classic style and then we can run our cell.

4:40
That's much better.

4:41
Now we are setting our number of bins to ten,

4:44
which is also the matplotlib default for histograms.

4:47
Let's change that the 15 and then to 5 to see how that impacts our visualization.

4:58
Notice here that at 15 bins we have some empty bins.

5:02
While we get more detail about the data set, it also spreads the data into

5:07
a broken comb look that doesn't provide as clear of a picture of the distribution.

5:11
And if we go back and set it to 5 bins.

5:19
At 5 bins, the data isn't portrayed very well either.

5:23
There are a variety of formulas and considerations for the number of bins and

5:27
their widths to use.

5:28
I've included links to some resources for these in the teacher's notes.

5:33
It is not uncommon in practice to produce multiple histograms

5:37
with different numbers of bins, before settling on the best communication tool.

5:42
Histograms are great for exploring the distribution of data, but

5:46
our data set has many more ways that it can be explored.

5:49
Sepal length and sepal and

5:50
pedal width, can all be explored across all different species.

5:54
Before the next video,

5:56
practice creating some other histograms of this data on your own.

5:59
Next, we'll look at box plots.
You need to sign up for Treehouse in order to download course files.
Sign up