Bummer! This is just a preview. You need to be signed in with a Basic account to view the entire video.
Start a free Basic trial
to watch this video
Data isn't always distributed the way you want. In this video we'll talk about a few of the different ways we can measure the spread of our data.

0:00
We've got the extremes of our data and we've got the middle.

0:03
But how is our data distributed?

0:05
One common way to describe the spread of our data is to use the standard deviation

0:10
which is commonly represented as the Greek letter sigma.

0:13
The standard deviation aims to tell us how far away our data is from the average.

0:18
To calculate it,

0:19
we start by taking the difference between each value and the average.

0:23
Then we square each of those values, add them up, and

0:26
divide by the total number of values.

0:29
This gives us the standard deviation squared which is also called the variance.

0:34
So to get this standard deviation, we just take the square root and there we go.

0:39
We've got a standard deviation of 64.29, so if we were to put this on a graph,

0:44
we'd put the average in the middle and then go 64.29 above and below the average.

0:50
Then we can say that any data in this range is within one standard deviation

0:55
of the average.

0:56
So that's a pretty big range.

0:58
Let's see what happens if instead of a perfect game, our first bowler,

1:03
bowls a 135.

1:04
Now, instead of an average of 134.5, we've got an average of about 114 and

1:10
our standard deviation is all the way down to just 17.

1:14
So if we make a plot of this new standard deviation, we can see that this data

1:19
is much more clustered together than when it included a perfect game.

1:23
Let's calculate the standard deviation for the finishing times.

1:27
First, let's add a new label for Standard Deviation in row nine.

1:36
And let's make it bold and

1:38
then doubleclick right here to automatically set the width of the column.

1:45
Then, in the cell next to it, let's type =STDEV and hit Enter to select a function.

1:54
Then let's paste in the range and hit Enter again and

1:58
it looks like we've got a Standard Deviation of about 42 minutes.

2:02
Also, if you're not seeing 42 minutes here, you can come over here and

2:07
change the data type to Duration and that should fix your issue.

2:12
So most racers finished within 42 minutes of the average finish time.

2:17
But standard deviation doesn't tell the whole story,

2:21
it only tells us how compact or spread out our data is.

2:25
To get the rest of the picture, we need to talk about skew.

2:29
Skew is when your data seems to favor one side over the other.

2:34
Most of the data is either to the right or left of the middle.

2:37
And depending on which side has the long tail,

2:40
you would say that this data is either skewed negatively or positively.

2:45
An easy way to remember skew directions is to start at the peak and

2:49
draw an arrow towards the long tail.

2:52
The direction that arrow points is how the data is skewed.

2:56
So this data has a negative skew.

2:59
On the other hand, if your data has no skew and its mean, median, and

3:04
mode are all right in the middle, then your data is said to have

3:08
a normal distribution which is frequently referred to as a bell curve.

3:13
Normal distributions have many convenient properties and

3:16
they occur fairly frequently in real life.

3:19
People's heights, test scores, and

3:21
even blood pressures are all normally distributed.

3:25
One property of normal distributions is how many values occur within a given

3:29
standard diviation of the mean.

3:30
68% of the data should be contained within 1 standard deviation,

3:35
95% should be contained within 2.

3:39
And if you go out to 3 standard deviations at 99.7%,

3:44
that should be pretty much all of the data.

3:46
Let's see if our data is normally distributed by seeing how

3:49
close we come to these numbers in the next video.
You need to sign up for Treehouse in order to download course files.
Sign up