Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
Before you start an analysis, you'll first want to define the question you're trying to answer.

0:00
Management typically starts the data analysis process

0:03
when they need to know something.

0:05
Let's work through this process by pretending the boss of an athletic

0:08
association has received complaints that some ages have an easier time qualifying

0:12
than others and they've tasked us with getting to the bottom of it.

0:16
The first we'll need to do is define the question.

0:19
Let's go with, do some ages have an easier time qualifying for the Boston Marathon?

0:25
Awesome.

0:26
Next, we need to turn that question into something concrete,

0:29
something we'll be able to answer with our data.

0:32
One way to find out if some ages have an easier time qualifying

0:36
is to compare the number of participants for each age.

0:39
If we find a big difference between two consecutive ages,

0:43
then we'll know something is up.

0:44
But this provides us with another issue.

0:47
How do we define a big difference?

0:50
Remember, we're trying to answer a yes or no question.

0:53
So at some point, we need to draw a line in the sand and

0:56
say, this difference is too much.

0:58
Analyzing data is all about asking questions.

1:02
You wanna approach each step and decision along the way with an inquisitive mind.

1:06
Always asking if things need clarification, or

1:09
could be better in anyway.

1:11
So, in this example, we're asking what's an appropriate difference.

1:15
To figure out the answer, let's go back to the spreadsheet.

1:19
And let's start off by creating a new tab at the bottom and naming it Age Breakdown.

1:29
Then, to figure out where we should draw that line,

1:32
let's first find out how many ages took part in the race.

1:35
In Column A, let's add labels for youngest and oldest.

1:42
Then, in Column B1, let's set it equal to men and

1:47
then let's head over to the 2017 tab and select all the age data.

1:59
By clicking in cell C2 and using control, shift, down or

2:03
command, shift, down, then hit enter.

2:07
And there we go.

2:09
Now let's clean up that formula by using F4 to make those references absolute.

2:19
Then we can drag that down to oldest and replace min with max.

2:28
Perfect.

2:30
Next, to give us some idea of how big is too big, let's figure out

2:35
how many runners of each age there would be if ages were uniformly distributed.

2:40
[SOUND] So, if each age have the same number of runners,

2:43
how many runners would that be?

2:46
Now this is almost certainly not the case,

2:48
but it's easy to calculate and gives us a good jumping off point.

2:52
Below oldest, let's add a new label called runners per age.

2:58
And let's make this column just a little bit wider.

3:03
And let's say that equal to the total number

3:08
of runners from the summary tab divided by and

3:14
parenthesis oldest which I'll just write in B2 minus youngest

3:24
End the parentheses and hit Enter, great.

3:27
So in a uniform distribution, each age would account for about 400 runners.

3:33
From here, we just need to use this figure to decide how much of a difference

3:37
is acceptable between two consecutive ages.

3:40
I think 400 is probably too high and 100 is probably too low.

3:48
But between those two, it's difficult to say where we should end up.

3:51
We can only do so much in trying to figure things out.

3:55
At some point we just have to pick something.

3:57
So, let's go with 200, or about half of our runners per age.

4:02
Let's add a new label below runners per age called max difference.

4:11
And let's set it equal to runners per age divided by two.

4:18
Let's also format these two cells to have less decimal points by clicking on this

4:23
button up here.

4:28
And let's also bold our labels to make them easier to read.

4:34
In the next video, we'll dive into the aged data and see what we find.
You need to sign up for Treehouse in order to download course files.
Sign up