Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Video Player
00:00
00:00
00:00
- 2x 2x
- 1.75x 1.75x
- 1.5x 1.5x
- 1.25x 1.25x
- 1.1x 1.1x
- 1x 1x
- 0.75x 0.75x
- 0.5x 0.5x
Before you start an analysis, you'll first want to define the question you're trying to answer.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Management typically starts
the data analysis process
0:00
when they need to know something.
0:03
Let's work through this process by
pretending the boss of an athletic
0:05
association has received complaints that
some ages have an easier time qualifying
0:08
than others and they've tasked us
with getting to the bottom of it.
0:12
The first we'll need to do
is define the question.
0:16
Let's go with, do some ages have an easier
time qualifying for the Boston Marathon?
0:19
Awesome.
0:25
Next, we need to turn that
question into something concrete,
0:26
something we'll be able
to answer with our data.
0:29
One way to find out if some ages
have an easier time qualifying
0:32
is to compare the number of
participants for each age.
0:36
If we find a big difference
between two consecutive ages,
0:39
then we'll know something is up.
0:43
But this provides us with another issue.
0:44
How do we define a big difference?
0:47
Remember, we're trying to answer a yes or
no question.
0:50
So at some point,
we need to draw a line in the sand and
0:53
say, this difference is too much.
0:56
Analyzing data is all
about asking questions.
0:58
You wanna approach each step and decision
along the way with an inquisitive mind.
1:02
Always asking if things
need clarification, or
1:06
could be better in anyway.
1:09
So, in this example, we're asking
what's an appropriate difference.
1:11
To figure out the answer,
let's go back to the spreadsheet.
1:15
And let's start off by creating a new tab
at the bottom and naming it Age Breakdown.
1:19
Then, to figure out where
we should draw that line,
1:29
let's first find out how many
ages took part in the race.
1:32
In Column A, let's add labels for
youngest and oldest.
1:35
Then, in Column B1,
let's set it equal to men and
1:42
then let's head over to the 2017 tab and
select all the age data.
1:47
By clicking in cell C2 and
using control, shift, down or
1:59
command, shift, down, then hit enter.
2:03
And there we go.
2:07
Now let's clean up that formula by using
F4 to make those references absolute.
2:09
Then we can drag that down to oldest and
replace min with max.
2:19
Perfect.
2:28
Next, to give us some idea of how
big is too big, let's figure out
2:30
how many runners of each age there would
be if ages were uniformly distributed.
2:35
[SOUND] So, if each age have
the same number of runners,
2:40
how many runners would that be?
2:43
Now this is almost certainly not the case,
2:46
but it's easy to calculate and
gives us a good jumping off point.
2:48
Below oldest, let's add a new
label called runners per age.
2:52
And let's make this column
just a little bit wider.
2:58
And let's say that equal
to the total number
3:03
of runners from the summary
tab divided by and
3:08
parenthesis oldest which I'll
just write in B2 minus youngest
3:14
End the parentheses and hit Enter, great.
3:24
So in a uniform distribution, each age
would account for about 400 runners.
3:27
From here, we just need to use this
figure to decide how much of a difference
3:33
is acceptable between
two consecutive ages.
3:37
I think 400 is probably too high and
100 is probably too low.
3:40
But between those two, it's difficult
to say where we should end up.
3:48
We can only do so
much in trying to figure things out.
3:51
At some point we just
have to pick something.
3:55
So, let's go with 200, or
about half of our runners per age.
3:57
Let's add a new label below runners
per age called max difference.
4:02
And let's set it equal to
runners per age divided by two.
4:11
Let's also format these two cells to have
less decimal points by clicking on this
4:18
button up here.
4:23
And let's also bold our labels
to make them easier to read.
4:28
In the next video, we'll dive into
the aged data and see what we find.
4:34
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up