1 00:00:00,220 --> 00:00:04,020 Let's try and find out if our data is normally distributed by seeing how many 2 00:00:04,020 --> 00:00:09,980 finishers finished within one, two, and three standard deviations of the mean. 3 00:00:09,980 --> 00:00:13,570 But first, we'll need to know how many finishers there were. 4 00:00:13,570 --> 00:00:16,920 Let's add a row at the very top by right clicking on row 1 and 5 00:00:16,920 --> 00:00:18,450 choosing insert 1 above. 6 00:00:19,730 --> 00:00:26,349 Then let's add a label for number of finishers and make sure it's bold. 7 00:00:26,349 --> 00:00:30,182 Then in cell B1, let's type =COUNT, 8 00:00:30,182 --> 00:00:35,838 paste in our range of overall finish times, and hit Enter. 9 00:00:35,838 --> 00:00:41,190 And there we go, 26,410 total finishers. 10 00:00:41,190 --> 00:00:45,981 Getting back to our standard deviations, 11 00:00:45,981 --> 00:00:52,899 let's add three labels below our standard deviation label, 12 00:00:52,899 --> 00:00:57,970 and call them % in 1, % in 2, and % in 3. 13 00:00:57,970 --> 00:00:59,970 And let's leave them unbolded so 14 00:00:59,970 --> 00:01:04,260 they look like they belong with standard deviation, because they do. 15 00:01:04,260 --> 00:01:09,155 Now, for % in 1, we need to find out haw many runners finished within 1 16 00:01:09,155 --> 00:01:12,150 standard deviation of the mean. 17 00:01:12,150 --> 00:01:16,720 To accomplish this, we're going to use the COUNTIFS function, which lets us give some 18 00:01:16,720 --> 00:01:21,920 criteria and then only returns the count of values that match our criteria. 19 00:01:21,920 --> 00:01:26,340 We're going to count only runners that finished within 1 standard deviation. 20 00:01:26,340 --> 00:01:30,890 And then divide that by the total number of runners to get a percentage. 21 00:01:30,890 --> 00:01:39,300 Over in cell B11, let's type =COUNTIFS and hit Enter to select it. 22 00:01:39,300 --> 00:01:42,810 Then let's paste in the range of finishing times and add a comma. 23 00:01:43,900 --> 00:01:46,730 The next parameter is the conditional statement. 24 00:01:46,730 --> 00:01:49,120 And it's entered as a string. 25 00:01:49,120 --> 00:01:54,890 So let's add two quotation marks and in the middle, let's add a greater than sign. 26 00:01:56,330 --> 00:02:00,820 To find out if a runner is within 1 standard deviation of the mean, we need to 27 00:02:00,820 --> 00:02:06,040 check that their finishing time is greater than the mean minus 1 standard deviation. 28 00:02:07,200 --> 00:02:10,530 Unfortunately, this data exists in a cell. 29 00:02:10,530 --> 00:02:15,360 So instead of typing the data in, we should reference the cell directly. 30 00:02:15,360 --> 00:02:18,640 To do this, we need to combine our greater than sign 31 00:02:18,640 --> 00:02:23,410 with our cell data by using an ampersand to concatenate the strings. 32 00:02:23,410 --> 00:02:26,120 Let's add an ampersand after the last quotation mark. 33 00:02:27,290 --> 00:02:31,230 Then let's select the average, type a minus sign and 34 00:02:31,230 --> 00:02:33,880 then select the standard deviation. 35 00:02:33,880 --> 00:02:40,260 We're now counting all runners greater than 1 standard deviation below the mean. 36 00:02:40,260 --> 00:02:44,890 So to finish up counting all the runners within 1 standard deviation, we just need 37 00:02:44,890 --> 00:02:49,760 to add a criteria that they finished under 1 standard deviation above the mean, 38 00:02:49,760 --> 00:02:51,080 as well. 39 00:02:51,080 --> 00:02:55,510 To do this, let's just copy the range and criteria that we just entered, 40 00:02:56,740 --> 00:03:00,420 add a comma, and then paste them back in. 41 00:03:00,420 --> 00:03:06,484 Finally, we just need to change this greater than sign to a less than sign, 42 00:03:06,484 --> 00:03:09,102 and change this minus to a plus. 43 00:03:11,730 --> 00:03:13,380 And add a closing parentheses. 44 00:03:14,900 --> 00:03:19,101 For our last step, to turn this into a percentage we just need to divide it by 45 00:03:19,101 --> 00:03:20,914 the total number of finishers. 46 00:03:24,826 --> 00:03:27,967 Which gives us about 69.47%, 47 00:03:27,967 --> 00:03:34,010 which is pretty close to the 68 of a normal distribution. 48 00:03:34,010 --> 00:03:39,430 And to make it look like a percent, we can click up here and then choose percent. 49 00:03:39,430 --> 00:03:43,960 From here, we can find our other standard deviation percentages pretty easily. 50 00:03:43,960 --> 00:03:49,690 But first, let's use F4 to make all the references in this formula absolute. 51 00:03:50,860 --> 00:03:54,780 This way, when we drag the cell down, it'll keep the same references. 52 00:04:03,841 --> 00:04:06,278 Then let's drag the cell down twice. 53 00:04:08,460 --> 00:04:13,587 And to get the % in 2 and 3, inside the formula for those cells, 54 00:04:13,587 --> 00:04:19,383 we just need to multiply the standard deviation by 2 or 3 respectively. 55 00:04:19,383 --> 00:04:22,710 And the standard deviation for me is this teal-colored B10. 56 00:04:24,000 --> 00:04:27,404 So for % in 2, we'll multiply this by 2. 57 00:04:28,700 --> 00:04:32,108 And over here we'll multiply it by 2. 58 00:04:32,108 --> 00:04:36,234 And for % in 3 we'll do the same thing, except with 3. 59 00:04:41,425 --> 00:04:45,429 All right, we've got 69.48, 60 00:04:45,429 --> 00:04:50,300 94.91, and then 99.76%. 61 00:04:50,300 --> 00:04:58,234 Remember, a normal distribution should be about 68% within 1 standard deviation, 62 00:04:58,234 --> 00:05:02,670 95% within 2, and 99.7% within 3. 63 00:05:02,670 --> 00:05:06,570 So it looks like the finishing times of runners in the Boston Marathon 64 00:05:06,570 --> 00:05:09,590 are pretty close to normally distributed. 65 00:05:09,590 --> 00:05:10,940 Coming up in the next video, 66 00:05:10,940 --> 00:05:14,480 we'll talk about the many different flavors of data visualization.