**Heads up!** To view this whole video, sign in with your Courses account or enroll in your free 7-day trial.
Sign In
Enroll

Preview

Start a free Courses trial

to watch this video

Lots of things end up being normally distributed. Are Boston Marathon results one of them?

This video doesn't have any notes.

**Related Discussions**

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up**Related Discussions**

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Let's try and find out if our data is
normally distributed by seeing how many
0:00

finishers finished within one, two, and
three standard deviations of the mean.
0:04

But first, we'll need to know
how many finishers there were.
0:09

Let's add a row at the very top
by right clicking on row 1 and
0:13

choosing insert 1 above.
0:16

Then let's add a label for number of
finishers and make sure it's bold.
0:19

Then in cell B1, let's type =COUNT,
0:26

paste in our range of overall
finish times, and hit Enter.
0:30

And there we go, 26,410 total finishers.
0:35

Getting back to our standard deviations,
0:41

let's add three labels below
our standard deviation label,
0:45

and call them % in 1, % in 2, and % in 3.
0:52

And let's leave them unbolded so
0:57

they look like they belong with
standard deviation, because they do.
0:59

Now, for % in 1, we need to find out
haw many runners finished within 1
1:04

standard deviation of the mean.
1:09

To accomplish this, we're going to use the
COUNTIFS function, which lets us give some
1:12

criteria and then only returns the count
of values that match our criteria.
1:16

We're going to count only runners that
finished within 1 standard deviation.
1:21

And then divide that by the total
number of runners to get a percentage.
1:26

Over in cell B11, let's type =COUNTIFS and
hit Enter to select it.
1:30

Then let's paste in the range of
finishing times and add a comma.
1:39

The next parameter is
the conditional statement.
1:43

And it's entered as a string.
1:46

So let's add two quotation marks and in
the middle, let's add a greater than sign.
1:49

To find out if a runner is within 1
standard deviation of the mean, we need to
1:56

check that their finishing time is greater
than the mean minus 1 standard deviation.
2:00

Unfortunately, this data exists in a cell.
2:07

So instead of typing the data in,
we should reference the cell directly.
2:10

To do this,
we need to combine our greater than sign
2:15

with our cell data by using an ampersand
to concatenate the strings.
2:18

Let's add an ampersand after
the last quotation mark.
2:23

Then let's select the average,
type a minus sign and
2:27

then select the standard deviation.
2:31

We're now counting all runners greater
than 1 standard deviation below the mean.
2:33

So to finish up counting all the runners
within 1 standard deviation, we just need
2:40

to add a criteria that they finished under
1 standard deviation above the mean,
2:44

as well.
2:49

To do this, let's just copy the range and
criteria that we just entered,
2:51

add a comma, and then paste them back in.
2:56

Finally, we just need to change this
greater than sign to a less than sign,
3:00

and change this minus to a plus.
3:06

And add a closing parentheses.
3:11

For our last step, to turn this into
a percentage we just need to divide it by
3:14

the total number of finishers.
3:19

Which gives us about 69.47%,
3:24

which is pretty close to the 68
of a normal distribution.
3:27

And to make it look like a percent, we can
click up here and then choose percent.
3:34

From here, we can find our other standard
deviation percentages pretty easily.
3:39

But first, let's use F4 to make all
the references in this formula absolute.
3:43

This way, when we drag the cell down,
it'll keep the same references.
3:50

Then let's drag the cell down twice.
4:03

And to get the % in 2 and 3,
inside the formula for those cells,
4:08

we just need to multiply the standard
deviation by 2 or 3 respectively.
4:13

And the standard deviation for
me is this teal-colored B10.
4:19

So for % in 2, we'll multiply this by 2.
4:24

And over here we'll multiply it by 2.
4:28

And for % in 3 we'll do the same thing,
except with 3.
4:32

All right, we've got 69.48,
4:41

94.91, and then 99.76%.
4:45

Remember, a normal distribution should be
about 68% within 1 standard deviation,
4:50

95% within 2, and 99.7% within 3.
4:58

So it looks like the finishing times
of runners in the Boston Marathon
5:02

are pretty close to normally distributed.
5:06

Coming up in the next video,
5:09

we'll talk about the many different
flavors of data visualization.
5:10

You need to sign up for Treehouse in order to download course files.

Sign upYou need to sign up for Treehouse in order to set up Workspace

Sign up