💥 2026 New Year's Sale 💥 Take 50% off your first 6 months! (new subscribers only, renews at regular price).

Join the Treehouse affiliate program and earn 25% recurring commission!

Well done!

You have completed Data Analysis Basics!

Sign up for Treehouse Back to Library

Preview

Sign up for Treehouse Continue

Analyzing Data Spread

3:53 with Ben Deitch

Data isn't always distributed the way you want. In this video we'll talk about a few of the different ways we can measure the spread of our data.

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

This video doesn't have any notes.

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

We've got the extremes of our data and we've got the middle. 0:00

But how is our data distributed? 0:03

One common way to describe the spread of our data is to use the standard deviation 0:05

which is commonly represented as the Greek letter sigma. 0:10

The standard deviation aims to tell us how far away our data is from the average. 0:13

To calculate it, 0:18

we start by taking the difference between each value and the average. 0:19

Then we square each of those values, add them up, and 0:23

divide by the total number of values. 0:26

This gives us the standard deviation squared which is also called the variance. 0:29

So to get this standard deviation, we just take the square root and there we go. 0:34

We've got a standard deviation of 64.29, so if we were to put this on a graph, 0:39

we'd put the average in the middle and then go 64.29 above and below the average. 0:44

Then we can say that any data in this range is within one standard deviation 0:50

of the average. 0:55

So that's a pretty big range. 0:56

Let's see what happens if instead of a perfect game, our first bowler, 0:58

bowls a 135. 1:03

Now, instead of an average of 134.5, we've got an average of about 114 and 1:04

our standard deviation is all the way down to just 17. 1:10

So if we make a plot of this new standard deviation, we can see that this data 1:14

is much more clustered together than when it included a perfect game. 1:19

Let's calculate the standard deviation for the finishing times. 1:23

First, let's add a new label for Standard Deviation in row nine. 1:27

And let's make it bold and 1:36

then double-click right here to automatically set the width of the column. 1:38

Then, in the cell next to it, let's type =STDEV and hit Enter to select a function. 1:45

Then let's paste in the range and hit Enter again and 1:54

it looks like we've got a Standard Deviation of about 42 minutes. 1:58

Also, if you're not seeing 42 minutes here, you can come over here and 2:02

change the data type to Duration and that should fix your issue. 2:07

So most racers finished within 42 minutes of the average finish time. 2:12

But standard deviation doesn't tell the whole story, 2:17

it only tells us how compact or spread out our data is. 2:21

To get the rest of the picture, we need to talk about skew. 2:25

Skew is when your data seems to favor one side over the other. 2:29

Most of the data is either to the right or left of the middle. 2:34

And depending on which side has the long tail, 2:37

you would say that this data is either skewed negatively or positively. 2:40

An easy way to remember skew directions is to start at the peak and 2:45

draw an arrow towards the long tail. 2:49

The direction that arrow points is how the data is skewed. 2:52

So this data has a negative skew. 2:56

On the other hand, if your data has no skew and its mean, median, and 2:59

mode are all right in the middle, then your data is said to have 3:04

a normal distribution which is frequently referred to as a bell curve. 3:08

Normal distributions have many convenient properties and 3:13

they occur fairly frequently in real life. 3:16

People's heights, test scores, and 3:19

even blood pressures are all normally distributed. 3:21

One property of normal distributions is how many values occur within a given 3:25

standard diviation of the mean. 3:29

68% of the data should be contained within 1 standard deviation, 3:30

95% should be contained within 2. 3:35

And if you go out to 3 standard deviations at 99.7%, 3:39

that should be pretty much all of the data. 3:44

Let's see if our data is normally distributed by seeing how 3:46

close we come to these numbers in the next video. 3:49

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up