Is Our Data Normal?5:15 with Ben Deitch
Lots of things end up being normally distributed. Are Boston Marathon results one of them?
Let's try and find out if our data is normally distributed by seeing how many 0:00 finishers finished within one, two, and three standard deviations of the mean. 0:04 But first, we'll need to know how many finishers there were. 0:09 Let's add a row at the very top by right clicking on row 1 and 0:13 choosing insert 1 above. 0:16 Then let's add a label for number of finishers and make sure it's bold. 0:19 Then in cell B1, let's type =COUNT, 0:26 paste in our range of overall finish times, and hit Enter. 0:30 And there we go, 26,410 total finishers. 0:35 Getting back to our standard deviations, 0:41 let's add three labels below our standard deviation label, 0:45 and call them % in 1, % in 2, and % in 3. 0:52 And let's leave them unbolded so 0:57 they look like they belong with standard deviation, because they do. 0:59 Now, for % in 1, we need to find out haw many runners finished within 1 1:04 standard deviation of the mean. 1:09 To accomplish this, we're going to use the COUNTIFS function, which lets us give some 1:12 criteria and then only returns the count of values that match our criteria. 1:16 We're going to count only runners that finished within 1 standard deviation. 1:21 And then divide that by the total number of runners to get a percentage. 1:26 Over in cell B11, let's type =COUNTIFS and hit Enter to select it. 1:30 Then let's paste in the range of finishing times and add a comma. 1:39 The next parameter is the conditional statement. 1:43 And it's entered as a string. 1:46 So let's add two quotation marks and in the middle, let's add a greater than sign. 1:49 To find out if a runner is within 1 standard deviation of the mean, we need to 1:56 check that their finishing time is greater than the mean minus 1 standard deviation. 2:00 Unfortunately, this data exists in a cell. 2:07 So instead of typing the data in, we should reference the cell directly. 2:10 To do this, we need to combine our greater than sign 2:15 with our cell data by using an ampersand to concatenate the strings. 2:18 Let's add an ampersand after the last quotation mark. 2:23 Then let's select the average, type a minus sign and 2:27 then select the standard deviation. 2:31 We're now counting all runners greater than 1 standard deviation below the mean. 2:33 So to finish up counting all the runners within 1 standard deviation, we just need 2:40 to add a criteria that they finished under 1 standard deviation above the mean, 2:44 as well. 2:49 To do this, let's just copy the range and criteria that we just entered, 2:51 add a comma, and then paste them back in. 2:56 Finally, we just need to change this greater than sign to a less than sign, 3:00 and change this minus to a plus. 3:06 And add a closing parentheses. 3:11 For our last step, to turn this into a percentage we just need to divide it by 3:14 the total number of finishers. 3:19 Which gives us about 69.47%, 3:24 which is pretty close to the 68 of a normal distribution. 3:27 And to make it look like a percent, we can click up here and then choose percent. 3:34 From here, we can find our other standard deviation percentages pretty easily. 3:39 But first, let's use F4 to make all the references in this formula absolute. 3:43 This way, when we drag the cell down, it'll keep the same references. 3:50 Then let's drag the cell down twice. 4:03 And to get the % in 2 and 3, inside the formula for those cells, 4:08 we just need to multiply the standard deviation by 2 or 3 respectively. 4:13 And the standard deviation for me is this teal-colored B10. 4:19 So for % in 2, we'll multiply this by 2. 4:24 And over here we'll multiply it by 2. 4:28 And for % in 3 we'll do the same thing, except with 3. 4:32 All right, we've got 69.48, 4:41 94.91, and then 99.76%. 4:45 Remember, a normal distribution should be about 68% within 1 standard deviation, 4:50 95% within 2, and 99.7% within 3. 4:58 So it looks like the finishing times of runners in the Boston Marathon 5:02 are pretty close to normally distributed. 5:06 Coming up in the next video, 5:09 we'll talk about the many different flavors of data visualization. 5:10
You need to sign up for Treehouse in order to download course files.Sign up