1 00:00:00,280 --> 00:00:04,450 We've explored some very useful ways of visualizing data with line charts, 2 00:00:04,450 --> 00:00:06,610 scatter plots and histograms. 3 00:00:06,610 --> 00:00:09,450 Now, let's take a look at box plots. 4 00:00:09,450 --> 00:00:13,010 Much like histograms, they show the distribution of data. 5 00:00:13,010 --> 00:00:17,590 Box plots are useful when visualizing multiple variables simultaneously 6 00:00:17,590 --> 00:00:19,160 on the same chart. 7 00:00:19,160 --> 00:00:21,885 This allows us to reduce the number of charts for presentation. 8 00:00:21,885 --> 00:00:26,009 [SOUND] Each variable shows a box, or rectangle. 9 00:00:26,009 --> 00:00:29,790 The bottom of the rectangle represents the level of the first quartile. 10 00:00:29,790 --> 00:00:33,710 The top of the box represents the third quartile of the data set. 11 00:00:33,710 --> 00:00:37,470 The median is typically represented by a line inside the box. 12 00:00:37,470 --> 00:00:42,170 And the minimum and maximum values are represented by whiskers below and 13 00:00:42,170 --> 00:00:44,410 above the box respectively. 14 00:00:44,410 --> 00:00:47,010 If that sounds confusing, let's jump in and 15 00:00:47,010 --> 00:00:49,470 see these plots in action with our iris data. 16 00:00:51,110 --> 00:00:55,283 We'll start with our opening project code in a new iris_boxplot notebook. 17 00:00:55,283 --> 00:00:56,480 And then get the data for 18 00:00:56,480 --> 00:01:00,830 each of our iris classes, Setosa, Versicolor, and Virginica. 19 00:01:01,830 --> 00:01:06,830 For this DataVis exercise, let's examine the petal length of each iris class. 20 00:01:06,830 --> 00:01:09,580 We'll pass a Python list into our box plot. 21 00:01:09,580 --> 00:01:11,011 So let's create an empty list. 22 00:01:18,351 --> 00:01:21,619 We'll append values to this list as we loop through our data. 23 00:01:23,507 --> 00:01:26,602 Now we'll create our for loop to generate our list. 24 00:01:26,602 --> 00:01:28,841 This should be pretty familiar as well, but 25 00:01:28,841 --> 00:01:32,001 we're appending the data to our list based on the iris class. 26 00:01:34,090 --> 00:01:40,025 For species and group, And groupby, 27 00:01:52,669 --> 00:01:57,223 And we'll append iris data, or our petal data. 28 00:02:04,765 --> 00:02:08,772 And since we're using the groupby method again here, we'll need that import. 29 00:02:14,301 --> 00:02:18,487 From itertools, import groupby. 30 00:02:24,817 --> 00:02:26,836 Now we can pass our data into our box plot. 31 00:02:29,862 --> 00:02:32,961 Plt.boxplot, and pass in petal_lengths. 32 00:02:39,777 --> 00:02:40,918 And call the show method. 33 00:02:46,010 --> 00:02:49,374 Now while that provides us with our plotting, let's add some 34 00:02:49,374 --> 00:02:53,750 polish by setting some axis size parameters and our plot labels. 35 00:02:53,750 --> 00:02:57,730 Let's also get rid of that 1, 2, 3, and label those ticks. 36 00:02:59,200 --> 00:03:02,023 Come up here, we'll set our axis, 37 00:03:06,026 --> 00:03:08,904 0, 4, 0, 10. 38 00:03:12,209 --> 00:03:16,716 For the tick labels, we pass in a list of the ticks with their default values and 39 00:03:16,716 --> 00:03:20,291 an equal length list of the values we want them replaced with. 40 00:03:22,673 --> 00:03:26,042 So xticks, 1, 2, and 3. 41 00:03:28,009 --> 00:03:31,282 And we're gonna replace those with the iris names of Setosa, 42 00:03:37,662 --> 00:03:42,758 Versicolor, And Virginica. 43 00:03:48,504 --> 00:03:50,281 Now we'll add in our plot title and 44 00:03:50,281 --> 00:03:53,904 move it to the left instead of the default center, and label the axis. 45 00:04:00,500 --> 00:04:01,449 Fisher's Iris Data Set. 46 00:04:05,341 --> 00:04:06,069 Petal Length. 47 00:04:09,886 --> 00:04:13,616 Go to fontsize of 12, and the location is left. 48 00:04:19,223 --> 00:04:22,535 Our xlabel, Iris Variety. 49 00:04:25,607 --> 00:04:26,792 We'll do fontsize 10. 50 00:04:30,832 --> 00:04:35,676 And for the ylabel Petal length, 51 00:04:38,317 --> 00:04:41,567 In centimeters, and again we'll do a fontsize of 10 there. 52 00:04:46,836 --> 00:04:49,217 And now, let's try it out, run our cell. 53 00:04:52,275 --> 00:04:56,607 And looking at our results here, we see that Iris Setosa on the left has a much 54 00:04:56,607 --> 00:05:00,755 tighter distribution of petal length than Iris Virginica on the right. 55 00:05:01,875 --> 00:05:05,495 We can also see with the red lines in the boxes, which are their median values or 56 00:05:05,495 --> 00:05:06,920 the petal lengths. 57 00:05:06,920 --> 00:05:10,529 Iris Setosa has a median value around 1.5. 58 00:05:10,529 --> 00:05:13,338 Versicolor, about 4.25. 59 00:05:13,338 --> 00:05:18,590 And Virginica, just a bit over 5.5 centimeters long. 60 00:05:18,590 --> 00:05:22,240 We've now seen four different charts that can be used to visualize our data. 61 00:05:22,240 --> 00:05:25,223 But don't feel boxed in to using just these charts. 62 00:05:25,223 --> 00:05:27,390 There are a lot of other options in Matplotlib. 63 00:05:27,390 --> 00:05:31,140 Including the ability to display multiple charts in a single view. 64 00:05:31,140 --> 00:05:35,150 Let's take a deeper look at how we can configure our output in the next video.