1 00:00:00,025 --> 00:00:03,460 We've got the extremes of our data and we've got the middle. 2 00:00:03,460 --> 00:00:05,570 But how is our data distributed? 3 00:00:05,570 --> 00:00:10,240 One common way to describe the spread of our data is to use the standard deviation 4 00:00:10,240 --> 00:00:13,490 which is commonly represented as the Greek letter sigma. 5 00:00:13,490 --> 00:00:17,890 The standard deviation aims to tell us how far away our data is from the average. 6 00:00:18,890 --> 00:00:19,730 To calculate it, 7 00:00:19,730 --> 00:00:23,730 we start by taking the difference between each value and the average. 8 00:00:23,730 --> 00:00:26,950 Then we square each of those values, add them up, and 9 00:00:26,950 --> 00:00:29,580 divide by the total number of values. 10 00:00:29,580 --> 00:00:34,550 This gives us the standard deviation squared which is also called the variance. 11 00:00:34,550 --> 00:00:39,110 So to get this standard deviation, we just take the square root and there we go. 12 00:00:39,110 --> 00:00:44,880 We've got a standard deviation of 64.29, so if we were to put this on a graph, 13 00:00:44,880 --> 00:00:50,940 we'd put the average in the middle and then go 64.29 above and below the average. 14 00:00:50,940 --> 00:00:55,210 Then we can say that any data in this range is within one standard deviation 15 00:00:55,210 --> 00:00:56,630 of the average. 16 00:00:56,630 --> 00:00:58,741 So that's a pretty big range. 17 00:00:58,741 --> 00:01:03,453 Let's see what happens if instead of a perfect game, our first bowler, 18 00:01:03,453 --> 00:01:04,543 bowls a 135. 19 00:01:04,543 --> 00:01:10,726 Now, instead of an average of 134.5, we've got an average of about 114 and 20 00:01:10,726 --> 00:01:14,830 our standard deviation is all the way down to just 17. 21 00:01:14,830 --> 00:01:19,100 So if we make a plot of this new standard deviation, we can see that this data 22 00:01:19,100 --> 00:01:22,461 is much more clustered together than when it included a perfect game. 23 00:01:23,660 --> 00:01:27,390 Let's calculate the standard deviation for the finishing times. 24 00:01:27,390 --> 00:01:31,724 First, let's add a new label for Standard Deviation in row nine. 25 00:01:36,784 --> 00:01:38,547 And let's make it bold and 26 00:01:38,547 --> 00:01:43,770 then double-click right here to automatically set the width of the column. 27 00:01:45,420 --> 00:01:52,680 Then, in the cell next to it, let's type =STDEV and hit Enter to select a function. 28 00:01:54,020 --> 00:01:58,220 Then let's paste in the range and hit Enter again and 29 00:01:58,220 --> 00:02:02,610 it looks like we've got a Standard Deviation of about 42 minutes. 30 00:02:02,610 --> 00:02:07,550 Also, if you're not seeing 42 minutes here, you can come over here and 31 00:02:07,550 --> 00:02:11,180 change the data type to Duration and that should fix your issue. 32 00:02:12,240 --> 00:02:17,700 So most racers finished within 42 minutes of the average finish time. 33 00:02:17,700 --> 00:02:21,000 But standard deviation doesn't tell the whole story, 34 00:02:21,000 --> 00:02:25,670 it only tells us how compact or spread out our data is. 35 00:02:25,670 --> 00:02:28,560 To get the rest of the picture, we need to talk about skew. 36 00:02:29,730 --> 00:02:34,050 Skew is when your data seems to favor one side over the other. 37 00:02:34,050 --> 00:02:37,520 Most of the data is either to the right or left of the middle. 38 00:02:37,520 --> 00:02:40,340 And depending on which side has the long tail, 39 00:02:40,340 --> 00:02:45,950 you would say that this data is either skewed negatively or positively. 40 00:02:45,950 --> 00:02:49,690 An easy way to remember skew directions is to start at the peak and 41 00:02:49,690 --> 00:02:52,340 draw an arrow towards the long tail. 42 00:02:52,340 --> 00:02:56,500 The direction that arrow points is how the data is skewed. 43 00:02:56,500 --> 00:02:59,370 So this data has a negative skew. 44 00:02:59,370 --> 00:03:04,130 On the other hand, if your data has no skew and its mean, median, and 45 00:03:04,130 --> 00:03:08,660 mode are all right in the middle, then your data is said to have 46 00:03:08,660 --> 00:03:12,730 a normal distribution which is frequently referred to as a bell curve. 47 00:03:13,740 --> 00:03:16,730 Normal distributions have many convenient properties and 48 00:03:16,730 --> 00:03:19,580 they occur fairly frequently in real life. 49 00:03:19,580 --> 00:03:21,610 People's heights, test scores, and 50 00:03:21,610 --> 00:03:25,200 even blood pressures are all normally distributed. 51 00:03:25,200 --> 00:03:29,350 One property of normal distributions is how many values occur within a given 52 00:03:29,350 --> 00:03:30,925 standard diviation of the mean. 53 00:03:30,925 --> 00:03:35,832 68% of the data should be contained within 1 standard deviation, 54 00:03:35,832 --> 00:03:39,080 95% should be contained within 2. 55 00:03:39,080 --> 00:03:44,020 And if you go out to 3 standard deviations at 99.7%, 56 00:03:44,020 --> 00:03:46,700 that should be pretty much all of the data. 57 00:03:46,700 --> 00:03:49,920 Let's see if our data is normally distributed by seeing how 58 00:03:49,920 --> 00:03:52,800 close we come to these numbers in the next video.