**Heads up!** To view this whole video, sign in with your Courses account or enroll in your free 7-day trial.
Sign In
Enroll

Preview

Start a free Courses trial

to watch this video

Lots of things end up being normally distributed. Are Boston Marathon results one of them?

Let's try and find out if our data is
normally distributed by seeing how many
0:00

finishers finished within one, two, and
three standard deviations of the mean.
0:04

But first, we'll need to know
how many finishers there were.
0:09

Let's add a row at the very top
by right clicking on row 1 and
0:13

choosing insert 1 above.
0:16

Then let's add a label for number of
finishers and make sure it's bold.
0:19

Then in cell B1, let's type =COUNT,
0:26

paste in our range of overall
finish times, and hit Enter.
0:30

And there we go, 26,410 total finishers.
0:35

Getting back to our standard deviations,
0:41

let's add three labels below
our standard deviation label,
0:45

and call them % in 1, % in 2, and % in 3.
0:52

And let's leave them unbolded so
0:57

they look like they belong with
standard deviation, because they do.
0:59

Now, for % in 1, we need to find out
haw many runners finished within 1
1:04

standard deviation of the mean.
1:09

To accomplish this, we're going to use the
COUNTIFS function, which lets us give some
1:12

criteria and then only returns the count
of values that match our criteria.
1:16

We're going to count only runners that
finished within 1 standard deviation.
1:21

And then divide that by the total
number of runners to get a percentage.
1:26

Over in cell B11, let's type =COUNTIFS and
hit Enter to select it.
1:30

Then let's paste in the range of
finishing times and add a comma.
1:39

The next parameter is
the conditional statement.
1:43

And it's entered as a string.
1:46

So let's add two quotation marks and in
the middle, let's add a greater than sign.
1:49

To find out if a runner is within 1
standard deviation of the mean, we need to
1:56

check that their finishing time is greater
than the mean minus 1 standard deviation.
2:00

Unfortunately, this data exists in a cell.
2:07

So instead of typing the data in,
we should reference the cell directly.
2:10

To do this,
we need to combine our greater than sign
2:15

with our cell data by using an ampersand
to concatenate the strings.
2:18

Let's add an ampersand after
the last quotation mark.
2:23

Then let's select the average,
type a minus sign and
2:27

then select the standard deviation.
2:31

We're now counting all runners greater
than 1 standard deviation below the mean.
2:33

So to finish up counting all the runners
within 1 standard deviation, we just need
2:40

to add a criteria that they finished under
1 standard deviation above the mean,
2:44

as well.
2:49

To do this, let's just copy the range and
criteria that we just entered,
2:51

add a comma, and then paste them back in.
2:56

Finally, we just need to change this
greater than sign to a less than sign,
3:00

and change this minus to a plus.
3:06

And add a closing parentheses.
3:11

For our last step, to turn this into
a percentage we just need to divide it by
3:14

the total number of finishers.
3:19

Which gives us about 69.47%,
3:24

which is pretty close to the 68
of a normal distribution.
3:27

And to make it look like a percent, we can
click up here and then choose percent.
3:34

From here, we can find our other standard
deviation percentages pretty easily.
3:39

But first, let's use F4 to make all
the references in this formula absolute.
3:43

This way, when we drag the cell down,
it'll keep the same references.
3:50

Then let's drag the cell down twice.
4:03

And to get the % in 2 and 3,
inside the formula for those cells,
4:08

we just need to multiply the standard
deviation by 2 or 3 respectively.
4:13

And the standard deviation for
me is this teal-colored B10.
4:19

So for % in 2, we'll multiply this by 2.
4:24

And over here we'll multiply it by 2.
4:28

And for % in 3 we'll do the same thing,
except with 3.
4:32

All right, we've got 69.48,
4:41

94.91, and then 99.76%.
4:45

Remember, a normal distribution should be
about 68% within 1 standard deviation,
4:50

95% within 2, and 99.7% within 3.
4:58

So it looks like the finishing times
of runners in the Boston Marathon
5:02

are pretty close to normally distributed.
5:06

Coming up in the next video,
5:09

we'll talk about the many different
flavors of data visualization.
5:10

You need to sign up for Treehouse in order to download course files.

Sign up