**Heads up!** To view this whole video, sign in with your Courses account or enroll in your free 7-day trial.
Sign In
Enroll

Start a free Courses trial

to watch this video

Histograms are used to show distributions of data. Let's explore the Iris data set with this chart style.

#### Further Reading

- Matplotlib style sheets
- Number of bins and widths for histograms
- Freedman-Diaconis rule for Histogram Bin widths

As I've mentioned, histograms are used to show distributions of data. 0:00 This can be very useful to see how closely grouped together or 0:04 spread out a variable is. 0:07 The area of the rectangles in a histogram is proportional 0:09 to the frequency of the variable. 0:12 This allows for 0:14 the rough assessment of the probable distribution of a given variable. 0:15 The rectangles or 0:19 bends in a histogram, are important to consider when doing data visualization. 0:20 Both the number of overall bends and 0:25 the bend width can have an impact on the overall presentation of data. 0:27 From our iris data set let's generate a histogram chart to see the distribution 0:32 of petal length. 0:36 Let's examine the petal lengths of the iris virginica class and 0:38 visualize the distribution of that data. 0:41 Here's where we start off in a new notebook, iris histogram, 0:43 with bringing in our data and getting it stored in a list called irises. 0:47 Let's process through this list to just obtain the petal length 0:51 of the iris virginica species. 0:54 Create a list, hold our data. 0:57 Let's also create a variable for our bin numbers, 1:08 so we can see how changing bin numbers impacts our visualization. 1:11 Now let's loop through our data to get our petal lengths. 1:18 For petal in range of our iris data. 1:23 So if the species is Iris-virginica, 1:50 we'll add the petal length to our virginica_petal_length list. 1:53 And we'll get that from our iris data. 2:12 Now we can pass our data into our plot.hist method. 2:19 This method takes several parameters, 2:22 including the number of bins we'd like to have. 2:24 The color we'd like to set, along with alpha values. 2:27 Plot.hist pass in our virginica_petal_length. 2:30 Our number bins. 2:40 The color of our plot will be red. 2:45 And we give it an alpha value to make it slightly transparent. 2:50 As I've mentioned, it's always important to add labels to your charts. 2:55 For chart title. 2:59 Iris-virginica Petal length. 3:05 We'll give that a font size of 12. 3:13 For our x-axis, for xlabel, 3:16 we'll give it what it is, 3:20 Petal length in centimeters. 3:23 Font size of 10. 3:30 And for our ylabel. 3:34 We'll just call it Probability. 3:36 And again, we'll give that a font size of 10. 3:46 Cool and then we call our show method and run our cell. 3:54 We are shown a histogram chart with red rectangles. 4:02 However, the rectangles are all clumped together and 4:05 can be a challenge to differentiate. 4:08 Matplotlib allows for and includes some chart styling options which can help out. 4:10 Let's apply matplotlib's classic style to our chart and 4:16 see if it helps clear things up. 4:19 We'll go back up here and under where we assign our figure size. 4:22 We'll ask it to use the classic style and then we can run our cell. 4:32 That's much better. 4:40 Now we are setting our number of bins to ten, 4:41 which is also the matplotlib default for histograms. 4:44 Let's change that the 15 and then to 5 to see how that impacts our visualization. 4:47 Notice here that at 15 bins we have some empty bins. 4:58 While we get more detail about the data set, it also spreads the data into 5:02 a broken comb look that doesn't provide as clear of a picture of the distribution. 5:07 And if we go back and set it to 5 bins. 5:11 At 5 bins, the data isn't portrayed very well either. 5:19 There are a variety of formulas and considerations for the number of bins and 5:23 their widths to use. 5:27 I've included links to some resources for these in the teacher's notes. 5:28 It is not uncommon in practice to produce multiple histograms 5:33 with different numbers of bins, before settling on the best communication tool. 5:37 Histograms are great for exploring the distribution of data, but 5:42 our data set has many more ways that it can be explored. 5:46 Sepal length and sepal and 5:49 pedal width, can all be explored across all different species. 5:50 Before the next video, 5:54 practice creating some other histograms of this data on your own. 5:56 Next, we'll look at box plots. 5:59

You need to sign up for Treehouse in order to download course files.

Sign up