• ## Summary

Search form
• 00:01

Hello, welcome to this section of Mastering Statistics.In the last section, we introducedthe concept of the variance.The sample variance you calculatewhen you have a collection of samples there,and the population variance, whichis what you would calculate if you knewevery single value in the population, whichis a large collection usually.Most of the time you can't calculate the population

• 00:23

variance, because the population is too big,but there is an equation there if you need it.So in this case what we're going to do in this lessonis practice that and just give you an idea.So, if I wanted to calculate the sample varianceI'll say find the sample varianceof the following things, then how would we do that.

• 00:44

Let me look and change colors a little bit,and say our data set is 5, 8, 7, 6, and 9.All right, and so what we want to dois find that sample variance.So because of the sample variance,this is a very small set of samples, then what you use

• 01:05

is the s squared equation.It's the sum of all of these little deviationsabout the mean.But we square each one before we do any summing,and then we divide by n minus 1.In this case n, the number of samples, is five.So on the bottom we'll have 5 minus 1.Now before we can do anything, weknow we're going to need to find the mean.

• 01:26

Because every data point is subtracting the mean from it.So what we'll do is just kind of go off to the sideand do that right now.The mean is, for a sample mean, we alreadyhave done this before, 5 plus 8 plus 7 plus 6 plus 9.And there are 1, 2, 3, 4, 5 values, so we divide by 5.And what we'll get when we add up the top is 35 over 5,

• 01:49

which is 7.So the mean is actually 7.So then it becomes a matter of just substituting in and makingsure you don't make any mistakes.My advice when you're calculatingthe variance, since there's a lot of numbers, isdon't skip any steps.Write everything down, take time to simplify it,

• 02:11

and simplify it, don't try to do too many things in advance,because you're probably going to make an error.So the way I write it is like this.I'll open up a parentheses and I'llsay OK the first point, notice the way the equation's written.It's the data point minus the mean.The first data point is 5 minus the mean.The mean is going to always be 7,so I'll write it as 5 minus 7, and then I have to square it,

• 02:34

because that's what we're doing here.The next point is 8.So I'll open up another parentheses and say 8 minus 7,because in all cases I'm subtracting the same mean.I gotta square that guy.And then the third data point is 7 minus the mean, which is 7,and that's going to be squared.And then the next one is going to be 6 minus 7,

• 02:55

because that's that data point right there.And then the final one is 9, so it'll be 9 minus 7,and I'll square that one off at the end.So I have kind of on the numeratorhere, I'm adding up five things.Each little nugget here corresponds to a point.This is the data point minus the mean.This is the data point minus the mean.This is the data point minus the mean.

• 03:16

You do that for all of the data points.After the subtraction, you square it,and that's why this is the squaringis on the outside of the parentheses.Now notice, I know that all of you know that 7 minus 7 is 0.I know that all of you know that 8 minus 7 is one.I know that all of you know that you can figure outwhat all of these points are.But notice I didn't do that addition

• 03:38

or subtraction in my head.I write it all down because when you have a large number of datapoints, you're probably going to make an error at some pointjust because you get tired or whatever.And as a teacher you need to be able to see that the studentknows what they're doing.So by writing it all out, I can seeexactly what you meant to do, even if you make a mistake.So on the bottom is going to be n minus 1.

• 03:59

n is the number of data points.So 5 minus 1, that's what's goingto end up being on the bottom.Notice that I know that you know 5 minus 1 is 4,but I'm not going to do that and kind of short circuiteverything.I'm going to write it down explicitly, and thensimplify in a subsequent step.So what I'm going to have for the sample variance here.

• 04:20

Now I ask you, what's 5 minus 7.Well it's minus 2 or negative 2.I still need to square it.What's 8 minus 7, that's one, but I still need to square it.What is 7 minus 7.That's 0.I still need to square it.What is 6 minus 7.That's a negative 1, I still need to square that.What is 9 minus 7.That's two, I still need to square it.

• 04:40

Now on the bottom.What is 5 minus 1, that's 4.Now again, I know that 1 squared is 1,and I know that zero squared is zero,but I don't do that too far in advance.I write it all down and then I simplify it in the next step.It really helps.OK so then what I'm going to have here, the sample variance.

• 05:03

Negative 2 squared is going to be 4.1 squared is 1.0 squared is 0.And then we have negative 1 squared is going to be positive1, because don't forget what a square is.This is negative 1 times negative 1,so that makes it positive.This is negative 2 times negative 2,

• 05:24

that makes it positive.And then 2 squared is 4.And on the bottom I'm still going to have a 4.So in the numerator, if you think about it,4 plus 1 plus 1 plus 4 is going to be 10 over 4.10 over 4 i what I get, so the sample variance,when I do ten divided by 4 is 2.5.

• 05:45

It's a relative measure of variation,that's why it's called variance about the mean.Really the stuff should be called instead of variance,it should be called variance about the mean.Or whatever.It's all in well in reference to how far isthe data spread about the mean.And I don't want to say more than that now because we'regoing to talk about it a lot more in the next section when

• 06:05

we cover standard deviation, but for now, justthink of the variances being a relative indicator of howspread apart your data set is.Now notice we said this was a sample,so we use the equation for calculating the samplevariance.Now let's do another one.And in this case, the problem is going to tell youthat it's the population.So we'll find the population variance.

• 06:32

Now as I've said many, many times,usually you're not calculating the population anything,or the population variance, because you usuallyhave like 10,000 data points or whatever.A 100 million data points for your whole population.So you usually don't have all that data.But this is a problem to teach you something.So what we're going to do is justkind of assume that our entire population consist

• 06:55

of the numbers 8, 12, 10, 12, 4, 13, 5, 17, and 10.So we have 1, 2, 3, 4, 5, 6, 7, 8, 9 data points there,and we're saying that that's our entire population.Maybe our population consists of, you know, all children

• 07:16

in some neighborhood, and there's only this many kidsin the neighborhood.Maybe that's what we're studying.So in that case, the population is so small we can actuallygo get all that information.So the first thing we need to do is write downwhat we're trying to find.And since this is known to be a population,we write sigma as the population variance.

• 07:37

And we say that it's the sum of the datapoints x minus the population mean mu, over n.Notice that this equation is different than this one.This is what we covered in the last section.The mean is calculated the same way.It's just that we use a different symbol when we're

• 07:58

talking about population mean.But it's calculated the same way.The real difference is on the bottom, whenwe had samples we said n minus 1 here, the n on the bottomis just the total number of people in the population.We already said that's going to be 9,so 9 is going to be on the bottom.And that's just a difference in calculation.The reason why it's like that is beyond the scope of really what

• 08:19

I want to talk about here.So the first thing we need to do is find this mean.So we'll say the mean is going to be the sum of allthis stuff, so we'll say 8 plus 12 plus 10 plus 12plus 4 plus 13 plus 5 plus 17 plus 10.

• 08:39

This is where calculators come in handy.And we're dividing by 1, 2, 3, 4, 5, 6, 7, 8, 9 items.And so what we'll get is 91 on the top, 9 on the bottom.So the population mean is 10.11.10.11.So then what we will do is go in here

• 09:02

and calculate the population variance.So what we'll have; is population varianceis equal to.And we're going to do exactly the same thing we did before,but there's just a lot more data points.So it's every data point, I could put in a little i here,x of i minus, just to make it match.

• 09:24

Every data point minus the mean, like we've talked about.So here the first data point's 8.So it would be 8 minus the mean, which is 10.11.Now I need to square that.The next data point is 12 minus 10.11.The third data point is 10 minus 10.11.

• 09:45

And this is going to be squared.The next one's a 12.Every single data point we're subtracting the mean from it.So the last one I did was 12, so I have a 4 minus 10,11.We have 13 minus 10.11.We have a 5, like that.

• 10:10

And then we need to continue on down the next line,and we'll say after 5 we have 17.And then we have a 10.All right.So there is every single point.Make sure you have it right.1, 2, 3, 4, 5, 6, 7, 8, 9 data points.

• 10:30

We have 8, 12, 10, 12, and we have 4,and we have 13, then we have 5, then we have 17,then we have 10.Every one of them is squared.That is really all we needed to do.by the way, we're not done.That sum is on the top.Then we have to divide the whole enchilada, because don'tforget, that's just what's on top, by nine,

• 10:52

which is the number of items in our sample.So now what I'm going to do is pull this board over here.And what we're going to do is continue on and calculate this,and just kind of bang it out.At this point it becomes just an exercisein not doing the subtraction wrong.what we're going to have when we do 8 minus 10.11

• 11:13

is negative 2.11 squared.And then we'll have 12 minus 10.11 so we have 1.89 squared.The next one is going to when we do that subtraction 0.11square.Then we'll have 1.89 squared.

• 11:34

And it we'll have negative 6.11 squared.2.89 squared.Negative 5.11 squared.Plus 6.89 squared.Plus negative 0.11 square.All of this stuff is divided by 9.

• 11:55

So you're going to have to trust mea little bit, because what we've done hereis a lot of subtractions, right.A lot of subtractions, and there's a decimal involved,so you have to kind of do that in a calculator or by hand,if you really want to.Some of them are positive, some of them are negative,but we're squaring everything and so in the next stepis kind of where the magic happens.

• 12:16

Just to kind of make it absolutely explicit,when you square a negative 2.11, when you get positive 4.45.Everything here is going to be changed to a positive value.That's the whole reason we're kind of doing it this way.So we'll have 0.01, 3.57.Then we'll have 37.33.

• 12:38

8.35.26.11.Then we'll have 47.47.0.01.forget all of this is divided by 9.So all this stuff is calculator work, doing all the squaring.But every negative value changes to a positive value,and so then finally, at the end of the day, just

• 13:00

to kind of get to the end of the road, this is a variance.When you write the variance down just noticing,you always want to write the population variancewith a sigma square like that.the population variance, on the topwhen you add all these numbers together, you get 130.88,

• 13:20

and on the bottom you have your 9.So what you have is 14.5.So the population variance is 14.5,you can write pop variance.You don't even need to write pop variance, honestly,because since it says sigma squared,that always means population variance.

• 13:41

Couple things I want to point out in these calculations.In this one and in the other one,notice that when we did our subtraction some of the valueswere positive and some of the values were negative.That's what I was telling you before.If we didn't do the squaring business,then we would have a problem, because some of these valueswould be positive and negative and theywould end up canceling out.But since we ended up with a situation

• 14:03

where we're squaring them, then it kind ofaverts that whole problem.And so everything becomes positiveand then we can kind of add everything together,divide it, and then we say the populationvariance is equal to this.Notice we did divide by the number of data points,because we're dividing by n when we'redoing a population variance.When we did the sample variance from earlier,

• 14:23

the only real difference in the calculationwas n minus 1, which is just subtractingone off of the number of data points that you have.Now I want to say that, variance,I'll just reiterate it again, is so very, very, very, veryimportant to statistics.It's kind of a core pillar.So make sure you understand it.Mostly I want you to understand the concept of it.

• 14:45

When you see a sigma squared around in an equation later on,or when you see an s squared, then youneed to start thinking, oh that's just a variance.That's just telling us how spread this distribution is.And it's a relative indicator.If we go back to our first problem,the variance of the first problem was 2.5.The variance of the last problem was 14.5.

• 15:06

So if I wanted to compare these two things,then I would conclude that 14.5 in an absolute senseis more spread apart than the other guyhere, about it's mean.And you can kind of see that.The data has 8's and 4's and 10's, but it alsohas 17's and so on, it's kind of spread out.

• 15:27

The original data I had here was much more tightly compacted.5, 8, 7, 9, and so on.So, it seems at a glance that it agrees with what we see,which is that one of these data setsis more spread out in terms of how far apart the data goescompared to the other one.So make sure you understand it.The calculation can be challenging and tedious,that's why I say write everything down.

• 15:49

But you know, it's all subtraction, multiplication,and addition, and division.There's no fancy, fancy math here.It's just a lot of arithmetic, but youhave to write it all down like I've done it here,so that you make sure not making mistakes or errors.Because if you do, then you're justgoing to get the problem wrong for really not a great reason.So practice it, make sure you understand it.

• 16:09

Follow me on to the next section wherewe will continue learning about these fundamental coreconcepts in statistics.

### Video Info

Series Name: Mastering Statistics, Vol 1

Episode: 15

Publisher: Math Tutor DVD

Publication Year: 2013

Video Type:Tutorial

Methods: Population variance

### Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:

Keywords:

## Abstract

Jason Gibson discusses how to calculate a variance. He explains that variance is a core principle to many aspects of statistics. He then provides some example problems to clarify the topic.