Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
It's not always obvious how you should analyze your data. In this video we'll analyze whether some ages are overrepresented at the Boston Marathon.

0:00
All right, we've got our lines in.

0:02
Now it's time to find out if any of our data crosses the line.

0:06
Let's start by creating a table of ages and

0:08
how many runners of each age ran the race.

0:12
Let's leave a space below Max Difference and then in row 6 let's add two labels,

0:18
Age and Count.

0:22
And let's go ahead and bold our labels.

0:25
Then, below our age label, let's add 18, and then 19.

0:31
Then let's highlight both of those cells, and drag down until we have an entry for

0:38
each age between the min at 18 and the max at 84.

0:51
Next, under the count label,

0:54
we need to count how many runners were the age reported in column A.

1:00
To do this, we'll use the Count if function,

1:03
which counts all the values in a range, but only if they meet certain criteria.

1:07
So, under our Count label, let's type =CountIf and hit Enter.

1:14
And then, we need to specify the range we are interested in,

1:17
which in this case is the ages of all the runners.

1:21
We already have that range and the formulas up here.

1:24
So I'll take a second to copy that.

1:32
And then down here, looks like I lost my CountIf,

1:36
but we can bring it back, and paste in the range.

1:40
Then, for the second parameter,

1:42
we need to check that the value is equal to the number in column A.

1:46
And remember, this needs to be represented as a string.

1:50
So let's type, in quotation marks, an equal sign, and then

1:55
let's concatenate that equal sign with the cell in column A by typing an ampersand.

2:02
And then clicking over here to select the cell and

2:05
hit Enter to finish our account statement.

2:07
Finally let's the drag list formula down to page 84 to give us the counts.

2:25
And notice that if we click on this last count,

2:30
the range of ages is locked in as C2 to C26,411.

2:36
You can get some pretty hard to find errors if you forget to lock

2:39
in these rangers by using the dollar signs.

2:42
So, if you want to be using the exact same range in a bunch of different cells,

2:47
make sure you're specifying your range with those dollar signs.

2:52
Now that we've got all our accounts, lets go back to the top and

2:55
add another column to keep track of the difference between the two ages.

3:00
Let's call it difference and

3:04
then quickly bold it, and then let's skip row 7 and

3:10
then row 8, let's type =B8B7.

3:14
Then let's drag from there all the way to the bottom to give us all of our

3:18
differences.

3:21
We can already tell that some of these values are over 200.

3:26
But let's try and make them standout by using some conditional formatting.

3:30
Conditional formatting is formatting that only happens

3:33
once a certain condition is met.

3:35
For example, if you wanted to highlight all the positive cells in green and

3:40
all the negative cells in red, you could use conditional formatting.

3:43
To add conditional formatting to a Difference column,

3:46
let's select all of the differences And

3:52
then choose Format, Conditional formatting.

3:57
On the right we have options for our formatting.

4:00
Let's change the condition from if Cell is not empty to if Greater than

4:10
And add a value of 200.

4:13
Finally, let's change the formatting style to have a red background.

4:18
And there we go.

4:19
It looks like we've got one, two, three,

4:24
four instances where the difference is more than 200.

4:30
However, we really shouldn't be hard coding this 200 value.

4:33
It should be set to the value of our max difference.

4:38
So, instead of 200, let's type =, and this is cell B4 and

4:44
we're not allowed click it, which means we have to type it in.

4:51
Well that doesn't look quite right.

4:53
What's going on here?

4:54
Well, it turns out that when we type B4 like this, it starts with C8 and

5:01
it compares it to B4, but then it moves on to C9 and it compares it to B5 and

5:07
C10 with B6, and so on.

5:12
If we want this to compare each cell with B4,

5:15
we need to use dollar signs to make this an absolute reference.

5:23
That's better.

5:24
Then let's click done, and

5:27
by our criteria, we can now answer the question we set out to solve.

5:32
Some ages do have an easier time qualifying than others.

5:36
Though before we move on,

5:38
we should briefly talk about statistical significance.

5:41
Imagine you just created a new drug.

5:44
Well, before you'd be able to sell that drug, you'd need to make sure it was safe.

5:49
You'd want to test it on a small subset of people

5:51
before releasing it to the whole population.

5:54
But how many people would you need to test it on?

5:57
If you tried it on five people and they all turned out fine,

6:00
would you say that it's safe for the whole population?

6:03
No way, but then, how many people do you need?

6:07
1,000?

6:08
10,000?

6:10
Either way, at some point your sample is big enough,

6:12
that you can start to drag conclusions about the whole population.

6:17
This is what's statistical significance is all about.

6:20
It tells us if our findings can be safely extrapolated to the whole population.

6:26
If you'd like to read more about statistical significance,

6:28
check out the teacher's notes below.

6:30
Coming up we'll look at how we can present our findings to management.
You need to sign up for Treehouse in order to download course files.
Sign up