Box Plot5:35 with Ken Alger
Box plots are useful when visualizing multiple variables simultaneously on the same chart.
We've explored some very useful ways of visualizing data with line charts, 0:00 scatter plots and histograms. 0:04 Now, let's take a look at box plots. 0:06 Much like histograms, they show the distribution of data. 0:09 Box plots are useful when visualizing multiple variables simultaneously 0:13 on the same chart. 0:17 This allows us to reduce the number of charts for presentation. 0:19 [SOUND] Each variable shows a box, or rectangle. 0:21 The bottom of the rectangle represents the level of the first quartile. 0:26 The top of the box represents the third quartile of the data set. 0:29 The median is typically represented by a line inside the box. 0:33 And the minimum and maximum values are represented by whiskers below and 0:37 above the box respectively. 0:42 If that sounds confusing, let's jump in and 0:44 see these plots in action with our iris data. 0:47 We'll start with our opening project code in a new iris_boxplot notebook. 0:51 And then get the data for 0:55 each of our iris classes, Setosa, Versicolor, and Virginica. 0:56 For this DataVis exercise, let's examine the petal length of each iris class. 1:01 We'll pass a Python list into our box plot. 1:06 So let's create an empty list. 1:09 We'll append values to this list as we loop through our data. 1:18 Now we'll create our for loop to generate our list. 1:23 This should be pretty familiar as well, but 1:26 we're appending the data to our list based on the iris class. 1:28 For species and group, And groupby, 1:34 And we'll append iris data, or our petal data. 1:52 And since we're using the groupby method again here, we'll need that import. 2:04 From itertools, import groupby. 2:14 Now we can pass our data into our box plot. 2:24 Plt.boxplot, and pass in petal_lengths. 2:29 And call the show method. 2:39 Now while that provides us with our plotting, let's add some 2:46 polish by setting some access size parameters and our plot labels. 2:49 Let's also get rid of that 1, 2, 3, and label those ticks. 2:53 Come up here, we'll set our axis, 2:59 0, 4, 0, 10. 3:06 For the tick labels, we pass in a list of the ticks with their default values and 3:12 an equal length list of the values we want them replaced with. 3:16 So xticks, 1, 2, and 3. 3:22 And we're gonna replace those with the iris names of Setosa, 3:28 Versicolor, And Virginica. 3:37 Now we'll add in our plot title and 3:48 move it to the left instead of the default center, and label the axis. 3:50 Fisher's Iris Data Set. 4:00 Petal Length. 4:05 Go to fontsize of 12, and the location is left. 4:09 Our xlabel, Iris Variety. 4:19 We'll do fontsize 10. 4:25 And for the ylable, Petal length, 4:30 In centimeters, and again we'll do a fontsize of 10 there. 4:38 And now, let's try it out, run our cell. 4:46 And looking at our results here, we see that Iris Setosa on the left has a much 4:52 tighter distribution of petal length than Iris Virginica on the right. 4:56 We can also see with the red lines in the boxes, which are their median values or 5:01 the petal lengths. 5:05 Iris Setosa has a median value around 1.5. 5:06 Versicolor, about 4.25. 5:10 And Virginica, just a bit over 5.5 centimeters long. 5:13 We've now seen four different charts that can be used to visualize our data. 5:18 But don't feel boxed in to using just these charts. 5:22 There are a lot of other options in Matplotlib. 5:25 Including the ability to display multiple charts in a single view. 5:27 Let's take a deeper look at how we can configure our output in the next video. 5:31
You need to sign up for Treehouse in order to download course files.Sign up