1
00:00:00,220 --> 00:00:04,020
Let's try and find out if our data is
normally distributed by seeing how many
2
00:00:04,020 --> 00:00:09,980
finishers finished within one, two, and
three standard deviations of the mean.
3
00:00:09,980 --> 00:00:13,570
But first, we'll need to know
how many finishers there were.
4
00:00:13,570 --> 00:00:16,920
Let's add a row at the very top
by right clicking on row 1 and
5
00:00:16,920 --> 00:00:18,450
choosing insert 1 above.
6
00:00:19,730 --> 00:00:26,349
Then let's add a label for number of
finishers and make sure it's bold.
7
00:00:26,349 --> 00:00:30,182
Then in cell B1, let's type =COUNT,
8
00:00:30,182 --> 00:00:35,838
paste in our range of overall
finish times, and hit Enter.
9
00:00:35,838 --> 00:00:41,190
And there we go, 26,410 total finishers.
10
00:00:41,190 --> 00:00:45,981
Getting back to our standard deviations,
11
00:00:45,981 --> 00:00:52,899
let's add three labels below
our standard deviation label,
12
00:00:52,899 --> 00:00:57,970
and call them % in 1, % in 2, and % in 3.
13
00:00:57,970 --> 00:00:59,970
And let's leave them unbolded so
14
00:00:59,970 --> 00:01:04,260
they look like they belong with
standard deviation, because they do.
15
00:01:04,260 --> 00:01:09,155
Now, for % in 1, we need to find out
haw many runners finished within 1
16
00:01:09,155 --> 00:01:12,150
standard deviation of the mean.
17
00:01:12,150 --> 00:01:16,720
To accomplish this, we're going to use the
COUNTIFS function, which lets us give some
18
00:01:16,720 --> 00:01:21,920
criteria and then only returns the count
of values that match our criteria.
19
00:01:21,920 --> 00:01:26,340
We're going to count only runners that
finished within 1 standard deviation.
20
00:01:26,340 --> 00:01:30,890
And then divide that by the total
number of runners to get a percentage.
21
00:01:30,890 --> 00:01:39,300
Over in cell B11, let's type =COUNTIFS and
hit Enter to select it.
22
00:01:39,300 --> 00:01:42,810
Then let's paste in the range of
finishing times and add a comma.
23
00:01:43,900 --> 00:01:46,730
The next parameter is
the conditional statement.
24
00:01:46,730 --> 00:01:49,120
And it's entered as a string.
25
00:01:49,120 --> 00:01:54,890
So let's add two quotation marks and in
the middle, let's add a greater than sign.
26
00:01:56,330 --> 00:02:00,820
To find out if a runner is within 1
standard deviation of the mean, we need to
27
00:02:00,820 --> 00:02:06,040
check that their finishing time is greater
than the mean minus 1 standard deviation.
28
00:02:07,200 --> 00:02:10,530
Unfortunately, this data exists in a cell.
29
00:02:10,530 --> 00:02:15,360
So instead of typing the data in,
we should reference the cell directly.
30
00:02:15,360 --> 00:02:18,640
To do this,
we need to combine our greater than sign
31
00:02:18,640 --> 00:02:23,410
with our cell data by using an ampersand
to concatenate the strings.
32
00:02:23,410 --> 00:02:26,120
Let's add an ampersand after
the last quotation mark.
33
00:02:27,290 --> 00:02:31,230
Then let's select the average,
type a minus sign and
34
00:02:31,230 --> 00:02:33,880
then select the standard deviation.
35
00:02:33,880 --> 00:02:40,260
We're now counting all runners greater
than 1 standard deviation below the mean.
36
00:02:40,260 --> 00:02:44,890
So to finish up counting all the runners
within 1 standard deviation, we just need
37
00:02:44,890 --> 00:02:49,760
to add a criteria that they finished under
1 standard deviation above the mean,
38
00:02:49,760 --> 00:02:51,080
as well.
39
00:02:51,080 --> 00:02:55,510
To do this, let's just copy the range and
criteria that we just entered,
40
00:02:56,740 --> 00:03:00,420
add a comma, and then paste them back in.
41
00:03:00,420 --> 00:03:06,484
Finally, we just need to change this
greater than sign to a less than sign,
42
00:03:06,484 --> 00:03:09,102
and change this minus to a plus.
43
00:03:11,730 --> 00:03:13,380
And add a closing parentheses.
44
00:03:14,900 --> 00:03:19,101
For our last step, to turn this into
a percentage we just need to divide it by
45
00:03:19,101 --> 00:03:20,914
the total number of finishers.
46
00:03:24,826 --> 00:03:27,967
Which gives us about 69.47%,
47
00:03:27,967 --> 00:03:34,010
which is pretty close to the 68
of a normal distribution.
48
00:03:34,010 --> 00:03:39,430
And to make it look like a percent, we can
click up here and then choose percent.
49
00:03:39,430 --> 00:03:43,960
From here, we can find our other standard
deviation percentages pretty easily.
50
00:03:43,960 --> 00:03:49,690
But first, let's use F4 to make all
the references in this formula absolute.
51
00:03:50,860 --> 00:03:54,780
This way, when we drag the cell down,
it'll keep the same references.
52
00:04:03,841 --> 00:04:06,278
Then let's drag the cell down twice.
53
00:04:08,460 --> 00:04:13,587
And to get the % in 2 and 3,
inside the formula for those cells,
54
00:04:13,587 --> 00:04:19,383
we just need to multiply the standard
deviation by 2 or 3 respectively.
55
00:04:19,383 --> 00:04:22,710
And the standard deviation for
me is this teal-colored B10.
56
00:04:24,000 --> 00:04:27,404
So for % in 2, we'll multiply this by 2.
57
00:04:28,700 --> 00:04:32,108
And over here we'll multiply it by 2.
58
00:04:32,108 --> 00:04:36,234
And for % in 3 we'll do the same thing,
except with 3.
59
00:04:41,425 --> 00:04:45,429
All right, we've got 69.48,
60
00:04:45,429 --> 00:04:50,300
94.91, and then 99.76%.
61
00:04:50,300 --> 00:04:58,234
Remember, a normal distribution should be
about 68% within 1 standard deviation,
62
00:04:58,234 --> 00:05:02,670
95% within 2, and 99.7% within 3.
63
00:05:02,670 --> 00:05:06,570
So it looks like the finishing times
of runners in the Boston Marathon
64
00:05:06,570 --> 00:05:09,590
are pretty close to normally distributed.
65
00:05:09,590 --> 00:05:10,940
Coming up in the next video,
66
00:05:10,940 --> 00:05:14,480
we'll talk about the many different
flavors of data visualization.