**Heads up!** To view this whole video, sign in with your Courses account or enroll in your free 7-day trial.
Sign In
Enroll

Preview

Start a free Courses trial

to watch this video

It's not always obvious how you should analyze your data. In this video we'll analyze whether some ages are over-represented at the Boston Marathon.

All right, we've got our lines in.
0:00

Now it's time to find out if any
of our data crosses the line.
0:02

Let's start by creating
a table of ages and
0:06

how many runners of each age ran the race.
0:08

Let's leave a space below Max Difference
and then in row 6 let's add two labels,
0:12

Age and Count.
0:18

And let's go ahead and bold our labels.
0:22

Then, below our age label,
let's add 18, and then 19.
0:25

Then let's highlight both of those cells,
and drag down until we have an entry for
0:31

each age between the min at 18 and
the max at 84.
0:38

Next, under the count label,
0:51

we need to count how many runners
were the age reported in column A.
0:54

To do this,
we'll use the Count if function,
1:00

which counts all the values in a range,
but only if they meet certain criteria.
1:03

So, under our Count label,
let's type =CountIf and hit Enter.
1:07

And then, we need to specify
the range we are interested in,
1:14

which in this case is
the ages of all the runners.
1:17

We already have that range and
the formulas up here.
1:21

So I'll take a second to copy that.
1:24

And then down here,
looks like I lost my CountIf,
1:32

but we can bring it back,
and paste in the range.
1:36

Then, for the second parameter,
1:40

we need to check that the value is
equal to the number in column A.
1:42

And remember,
this needs to be represented as a string.
1:46

So let's type, in quotation marks,
an equal sign, and then
1:50

let's concatenate that equal sign with the
cell in column A by typing an ampersand.
1:55

And then clicking over here
to select the cell and
2:02

hit Enter to finish our account statement.
2:05

Finally let's the drag list formula
down to page 84 to give us the counts.
2:07

And notice that if we
click on this last count,
2:25

the range of ages is locked
in as C2 to C26,411.
2:30

You can get some pretty hard to
find errors if you forget to lock
2:36

in these rangers by
using the dollar signs.
2:39

So, if you want to be using the exact
same range in a bunch of different cells,
2:42

make sure you're specifying your
range with those dollar signs.
2:47

Now that we've got all our accounts,
lets go back to the top and
2:52

add another column to keep track of
the difference between the two ages.
2:55

Let's call it difference and
3:00

then quickly bold it, and
then let's skip row 7 and
3:04

then row 8, let's type =B8-B7.
3:10

Then let's drag from there all the way
to the bottom to give us all of our
3:14

differences.
3:18

We can already tell that some
of these values are over 200.
3:21

But let's try and make them standout
by using some conditional formatting.
3:26

Conditional formatting is
formatting that only happens
3:30

once a certain condition is met.
3:33

For example, if you wanted to highlight
all the positive cells in green and
3:35

all the negative cells in red,
you could use conditional formatting.
3:40

To add conditional formatting
to a Difference column,
3:43

let's select all of the differences And
3:46

then choose Format,
Conditional formatting.
3:52

On the right we have options for
our formatting.
3:57

Let's change the condition from if
Cell is not empty to if Greater than
4:00

And add a value of 200.
4:10

Finally, let's change the formatting
style to have a red background.
4:13

And there we go.
4:18

It looks like we've got one, two, three,
4:19

four instances where
the difference is more than 200.
4:24

However, we really shouldn't
be hard coding this 200 value.
4:30

It should be set to the value
of our max difference.
4:33

So, instead of 200, let's type =,
and this is cell B4 and
4:38

we're not allowed click it,
which means we have to type it in.
4:44

Well that doesn't look quite right.
4:51

What's going on here?
4:53

Well, it turns out that when we type
B4 like this, it starts with C8 and
4:54

it compares it to B4, but then it moves
on to C9 and it compares it to B5 and
5:01

C10 with B6, and so on.
5:07

If we want this to compare
each cell with B4,
5:12

we need to use dollar signs to
make this an absolute reference.
5:15

That's better.
5:23

Then let's click done, and
5:24

by our criteria, we can now answer
the question we set out to solve.
5:27

Some ages do have an easier
time qualifying than others.
5:32

Though before we move on,
5:36

we should briefly talk about
statistical significance.
5:38

Imagine you just created a new drug.
5:41

Well, before you'd be able to sell that
drug, you'd need to make sure it was safe.
5:44

You'd want to test it on
a small subset of people
5:49

before releasing it to
the whole population.
5:51

But how many people would
you need to test it on?
5:54

If you tried it on five people and
they all turned out fine,
5:57

would you say that it's safe for
the whole population?
6:00

No way, but then,
how many people do you need?
6:03

1,000?
6:07

10,000?
6:08

Either way,
at some point your sample is big enough,
6:10

that you can start to drag conclusions
about the whole population.
6:12

This is what's statistical
significance is all about.
6:17

It tells us if our findings can be safely
extrapolated to the whole population.
6:20

If you'd like to read more
about statistical significance,
6:26

check out the teacher's notes below.
6:28

Coming up we'll look at how we can
present our findings to management.
6:30

You need to sign up for Treehouse in order to download course files.

Sign up