**Heads up!** To view this whole video, sign in with your Courses account or enroll in your free 7-day trial.
Sign In
Enroll

Preview

Start a free Courses trial

to watch this video

Histograms are used to show distributions of data. Let's explore the Iris data set with this chart style.

#### Further Reading

- Matplotlib style sheets
- Number of bins and widths for histograms
- Freedman-Diaconis rule for Histogram Bin widths

As I've mentioned, histograms are used
to show distributions of data.
0:00

This can be very useful to see
how closely grouped together or
0:04

spread out a variable is.
0:07

The area of the rectangles in
a histogram is proportional
0:09

to the frequency of the variable.
0:12

This allows for
0:14

the rough assessment of the probable
distribution of a given variable.
0:15

The rectangles or
0:19

bends in a histogram, are important to
consider when doing data visualization.
0:20

Both the number of overall bends and
0:25

the bend width can have an impact on
the overall presentation of data.
0:27

From our iris data set let's generate
a histogram chart to see the distribution
0:32

of petal length.
0:36

Let's examine the petal lengths
of the iris virginica class and
0:38

visualize the distribution of that data.
0:41

Here's where we start off in
a new notebook, iris histogram,
0:43

with bringing in our data and
getting it stored in a list called irises.
0:47

Let's process through this list
to just obtain the petal length
0:51

of the iris virginica species.
0:54

Create a list, hold our data.
0:57

Let's also create a variable for
our bin numbers,
1:08

so we can see how changing bin
numbers impacts our visualization.
1:11

Now let's loop through our
data to get our petal lengths.
1:18

For petal in range of our iris data.
1:23

So if the species is Iris-virginica,
1:50

we'll add the petal length to
our virginica_petal_length list.
1:53

And we'll get that from our iris data.
2:12

Now we can pass our data
into our plot.hist method.
2:19

This method takes several parameters,
2:22

including the number of
bins we'd like to have.
2:24

The color we'd like to set,
along with alpha values.
2:27

Plot.hist pass in our
virginica_petal_length.
2:30

Our number bins.
2:40

The color of our plot will be red.
2:45

And we give it an alpha value
to make it slightly transparent.
2:50

As I've mentioned, it's always
important to add labels to your charts.
2:55

For chart title.
2:59

Iris-virginica Petal length.
3:05

We'll give that a font size of 12.
3:13

For our x-axis, for xlabel,
3:16

we'll give it what it is,
3:20

Petal length in centimeters.
3:23

Font size of 10.
3:30

And for our ylabel.
3:34

We'll just call it Probability.
3:36

And again,
we'll give that a font size of 10.
3:46

Cool and then we call our show method and
run our cell.
3:54

We are shown a histogram
chart with red rectangles.
4:02

However, the rectangles
are all clumped together and
4:05

can be a challenge to differentiate.
4:08

Matplotlib allows for and includes some
chart styling options which can help out.
4:10

Let's apply matplotlib's
classic style to our chart and
4:16

see if it helps clear things up.
4:19

We'll go back up here and
under where we assign our figure size.
4:22

We'll ask it to use the classic style and
then we can run our cell.
4:32

That's much better.
4:40

Now we are setting our
number of bins to ten,
4:41

which is also the matplotlib default for
histograms.
4:44

Let's change that the 15 and then to 5 to
see how that impacts our visualization.
4:47

Notice here that at 15 bins
we have some empty bins.
4:58

While we get more detail about the data
set, it also spreads the data into
5:02

a broken comb look that doesn't provide as
clear of a picture of the distribution.
5:07

And if we go back and set it to 5 bins.
5:11

At 5 bins,
the data isn't portrayed very well either.
5:19

There are a variety of formulas and
considerations for the number of bins and
5:23

their widths to use.
5:27

I've included links to some resources for
these in the teacher's notes.
5:28

It is not uncommon in practice
to produce multiple histograms
5:33

with different numbers of bins, before
settling on the best communication tool.
5:37

Histograms are great for
exploring the distribution of data, but
5:42

our data set has many more
ways that it can be explored.
5:46

Sepal length and sepal and
5:49

pedal width, can all be explored
across all different species.
5:50

Before the next video,
5:54

practice creating some other
histograms of this data on your own.
5:56

Next, we'll look at box plots.
5:59

You need to sign up for Treehouse in order to download course files.

Sign up