**Heads up!** To view this whole video, sign in with your Courses account or enroll in your free 7-day trial.
Sign In
Enroll

Preview

Start a free Courses trial

to watch this video

Data isn't always distributed the way you want. In this video we'll talk about a few of the different ways we can measure the spread of our data.

This video doesn't have any notes.

**Related Discussions**

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up**Related Discussions**

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

We've got the extremes of our data and
we've got the middle.
0:00

But how is our data distributed?
0:03

One common way to describe the spread of
our data is to use the standard deviation
0:05

which is commonly represented
as the Greek letter sigma.
0:10

The standard deviation aims to tell us how
far away our data is from the average.
0:13

To calculate it,
0:18

we start by taking the difference
between each value and the average.
0:19

Then we square each of those values,
add them up, and
0:23

divide by the total number of values.
0:26

This gives us the standard deviation
squared which is also called the variance.
0:29

So to get this standard deviation, we just
take the square root and there we go.
0:34

We've got a standard deviation of 64.29,
so if we were to put this on a graph,
0:39

we'd put the average in the middle and
then go 64.29 above and below the average.
0:44

Then we can say that any data in this
range is within one standard deviation
0:50

of the average.
0:55

So that's a pretty big range.
0:56

Let's see what happens if instead of
a perfect game, our first bowler,
0:58

bowls a 135.
1:03

Now, instead of an average of 134.5,
we've got an average of about 114 and
1:04

our standard deviation is
all the way down to just 17.
1:10

So if we make a plot of this new standard
deviation, we can see that this data
1:14

is much more clustered together than
when it included a perfect game.
1:19

Let's calculate the standard deviation for
the finishing times.
1:23

First, let's add a new label for
Standard Deviation in row nine.
1:27

And let's make it bold and
1:36

then double-click right here to
automatically set the width of the column.
1:38

Then, in the cell next to it, let's type
=STDEV and hit Enter to select a function.
1:45

Then let's paste in the range and
hit Enter again and
1:54

it looks like we've got
a Standard Deviation of about 42 minutes.
1:58

Also, if you're not seeing 42 minutes
here, you can come over here and
2:02

change the data type to Duration and
that should fix your issue.
2:07

So most racers finished within 42
minutes of the average finish time.
2:12

But standard deviation
doesn't tell the whole story,
2:17

it only tells us how compact or
spread out our data is.
2:21

To get the rest of the picture,
we need to talk about skew.
2:25

Skew is when your data seems to
favor one side over the other.
2:29

Most of the data is either to the right or
left of the middle.
2:34

And depending on which
side has the long tail,
2:37

you would say that this data is either
skewed negatively or positively.
2:40

An easy way to remember skew
directions is to start at the peak and
2:45

draw an arrow towards the long tail.
2:49

The direction that arrow points
is how the data is skewed.
2:52

So this data has a negative skew.
2:56

On the other hand, if your data has
no skew and its mean, median, and
2:59

mode are all right in the middle,
then your data is said to have
3:04

a normal distribution which is
frequently referred to as a bell curve.
3:08

Normal distributions have many
convenient properties and
3:13

they occur fairly frequently in real life.
3:16

People's heights, test scores, and
3:19

even blood pressures are all
normally distributed.
3:21

One property of normal distributions
is how many values occur within a given
3:25

standard diviation of the mean.
3:29

68% of the data should be contained
within 1 standard deviation,
3:30

95% should be contained within 2.
3:35

And if you go out to 3
standard deviations at 99.7%,
3:39

that should be pretty
much all of the data.
3:44

Let's see if our data is normally
distributed by seeing how
3:46

close we come to these
numbers in the next video.
3:49

You need to sign up for Treehouse in order to download course files.

Sign upYou need to sign up for Treehouse in order to set up Workspace

Sign up