Histograms are used to show distributions of data. Let's explore the Iris data set with this chart style.

#### Further Reading

- Matplotlib style sheets
- Number of bins and widths for histograms
- Freedman-Diaconis rule for Histogram Bin widths

As I've mentioned, histograms are used
to show distributions of data.
This can be very useful to see
how closely grouped together or
spread out a variable is.
The area of the rectangles in
a histogram is proportional
to the frequency of the variable.
This allows for
the rough assessment of the probable
distribution of a given variable.
The rectangles or
bends in a histogram, are important to
consider when doing data visualization.
Both the number of overall bends and
the bend width can have an impact on
the overall presentation of data.
From our iris data set let's generate
a histogram chart to see the distribution
of petal length.
Let's examine the petal lengths
of the iris virginica class and
visualize the distribution of that data.
Here's where we start off in
a new notebook, iris histogram,
with bringing in our data and
getting it stored in a list called irises.
Let's process through this list
to just obtain the petal length
of the iris virginica species.
Create a list, hold our data.
Let's also create a variable for
our bin numbers,
so we can see how changing bin
numbers impacts our visualization.
Now let's loop through our
data to get our petal lengths.
For petal in range of our iris data.
So if the species is Iris-virginica,
we'll add the petal length to
our virginica_petal_length list.
And we'll get that from our iris data.
Now we can pass our data
into our plot.hist method.
This method takes several parameters,
including the number of
bins we'd like to have.
The color we'd like to set,
along with alpha values.
Plot.hist pass in our
virginica_petal_length.
Our number bins.
The color of our plot will be red.
And we give it an alpha value
to make it slightly transparent.
As I've mentioned, it's always
important to add labels to your charts.
For chart title.
Iris-virginica Petal length.
We'll give that a font size of 12.
For our x-axis, for xlabel,
we'll give it what it is,
Petal length in centimeters.
Font size of 10.
And for our ylabel.
We'll just call it Probability.
And again,
we'll give that a font size of 10.
Cool and then we call our show method and
run our cell.
We are shown a histogram
chart with red rectangles.
However, the rectangles
are all clumped together and
can be a challenge to differentiate.
Matplotlib allows for and includes some
chart styling options which can help out.
Let's apply matplotlib's
classic style to our chart and
see if it helps clear things up.
We'll go back up here and
under where we assign our figure size.
We'll ask it to use the classic style and
then we can run our cell.
That's much better.
Now we are setting our
number of bins to ten,
which is also the matplotlib default for
histograms.
Let's change that the 15 and then to 5 to
see how that impacts our visualization.
Notice here that at 15 bins
we have some empty bins.
While we get more detail about the data
set, it also spreads the data into
a broken comb look that doesn't provide as
clear of a picture of the distribution.
And if we go back and set it to 5 bins.
At 5 bins,
the data isn't portrayed very well either.
There are a variety of formulas and
considerations for the number of bins and
their widths to use.
I've included links to some resources for
these in the teacher's notes.
It is not uncommon in practice
to produce multiple histograms
with different numbers of bins, before
settling on the best communication tool.
Histograms are great for
exploring the distribution of data, but
our data set has many more
ways that it can be explored.
Sepal length and sepal and
5:49

across all different species.
Before the next video,
5:54

histograms of this data on your own.
Next, we'll look at box plots.
