Hello.Welcome to this section of Mastering Statistics.Here we're going to continue talkingabout how to calculate the variance and standard deviationof data.But in this case, we're going to kind of change itup just a little bit.If you recall, we have learned howto calculate the variance and the standard deviation of datawhen we know all of the data points.
Well, when we know all of the numbers.And we've gone through that in the previous section.But if you also remember, we did a lotof talking about these frequency tables, or frequencydistributions.It's a type of way to represent your data that'svery common in statistics when you take a sampling of items.And so what we would like to do istry to figure out if we can calculate this variance
or standard deviation from the table of valuesthat we have in a frequency table.So just kind of keep in the back of your mind,we've already covered the main topic here.We're not doing too much different.We're still talking about standard deviation.It's just that we're trying to figure outhow to calculate it when we really don't have all the datapoints in front of us.We have a frequency table, which we've
talked about a great deal.So if you recall, the standard deviation-- thisis just a review just for a quick second.The standard deviation for samples, for instance,is s, is equal to-- and what we have is the sum.
And on the inside, we have each data pointminus the average value of all the data points.And then we square them.We add all these things together after we've done the squaring.And then we have n minus 1, and we take this whole quantityand we take the square root.That gives us the standard deviation.So when we have all of the data points,like we've been doing in the last section,we literally have every data point written down
on a piece of paper, we can calculate the average valueof all of these data points.We subtract the value of the point minus the average value.We get that number.We square it.We sum all of this up for all the data points.And then we divide by n minus 1, the numberof data points minus 1.That gives us a number, we take the square root,we call that the standard deviation.
But in order to do this, in order to pull this off,we have to have data of all of the points.Because that's the way this formula is written.Each data point minus the average value of all those datapoints.And we kind of iterate and go through all of the pointsto get there.So if you have 1,000 data points,
or however big your survey is, or however big your data setis, you can always find the most accurate standard deviationby doing this.But if you remember, we spent a lotof time talking about this concept of,I'll say recall frequency distribution.
And what I mean by frequency distributionis, for instance, the following table.Let's say that we have a classroom full of studentsand we're going to give them grades.But instead of knowing every grade of every possible personin the class, the professor gives you the following tableand says, here are the grades for my class.So we have is grade, right?
And we have the frequency.Well, you know what, let me move this closer together.Frequency.Like this.So the professor says, all right,from the grades 94 to 100, I had five studentsin the class for that grade.87 to 93, I had eight students get that grade.
From the grades 80 to 86, I had 12 students with that grade.73 to 79, I had seven people get that.And then from 66 through 72, I had four students, like this.So this is a frequency distribution.This is the same kind of distributionthat we learned about a long time ago.
We were doing problems like how many laptops of a certain priceare in the department store.And we had a table with the ranges.These are the classes.We just label it grade.But it's the classes.Notice that every class has the same width-- 94 to 100,87 to 93, 80 to 86.If you look at the end points and do the subtraction,
you'll find that each one of these classeshas the same width, which is what you want.And we know how many students in each of these ranges exist.So if we add all of these people up,that's the total number of kids in the classroom.But we know that five of them got the highest gradesin the class.8 of them did very well and got upper B's and lower A's.Most of the kids did solid B's.
Some of them had C's.Some of them had F's and low D's.All right.But that's everything.Now, we want to try to figure out,how do we calculate the standard deviation of data when it'spresented in a chart like this?So we say, OK, no problem.We're going to go apply our formula.So we try to do this and we figure outvery quickly that we actually can't use this formula.
Because in order to use the formula,we've got to know the average value of the grades.That's what x bar is.The average value of the grades.That's what I'm measuring-- the average valueof the grades in the classroom.But I don't know all of the grades.See, when you present data in a frequency distribution,it's a great way to summarize data.
It's a great way to graph data in terms of like a histogramlike we've done before.But unfortunately, you lose some granularity.You loose some detail from the raw datawhen you present it like this.Because I know that five kids got 94 to 100,but I don't really know, did they all get 94's?Did they all actually get a perfect score of 100?Did they all get right in the middle,
like somewhere around 95 or something like that?What actually happened in each one of these ranges?I know that 12 kids were in here,but I don't know if all 12 of them were 80, or half of themwere 80 and some of them were higher.I lose some granularity in frequency distributions.So I'm unable to calculate an average value herebecause I don't really have all the data points anymore.
And I can't do each data point minus the average valuebecause I don't have the data points.So this equation for standard deviation,which is very similar to the equation that we we'reusing for the variance-- we've talked about that before-- it'sbasically impossible to apply it to situationsunless you have all of the data points.
So how do we find the standard deviation?We still know that we have grades.We know that there's got to be some average value, some mean,right?And we know that there's got to be some spread of that data.We know the kids-- we can kind of see there's some spread herebecause we can see the data.But we want to calculate that standard deviation.So how do we do it when we have a frequency table?And that's what we're trying to find here.
So I'm not going to prove this, but I'm going to say,it can be proven that the following--the standard deviation s is equal to n, which is
the number of samples you have.It's a big equation here.Bracket, then I have a sum.And then I have f-- and I'll explain all thisin a second, so don't worry about it.This is going to look kind of complicated.Minus-- let's see here.I've got to close this bracket off here.Minus, another bracket, sum f dot x.
And on the outside of this bracket, I have a square.And then on the bottom of this entire calculation,I have n times n minus 1.And then this entire thing has a square root around it.And then I can say, for frequency distributions.
All right.So what I'm trying to say here isthat calculating the standard deviation,and the variance, of course, is incrediblyimportant in statistics.Anytime you have a data set, you're almost always goingto want to know, what is its mean,what is its standard deviation?Almost always.If I have all of the data points,literally every single point, then you'realways going to use this for a sample standard deviation.
Remember that the population standard deviation looks veryslightly different from this.But if you have a sample standard deviation,you're going to plug it in here and you'regoing to calculate it like we always have.But if you have a frequency distribution, whichis a table of ranges with the number of people thatfit into those ranges, there stillis a standard deviation associated with this,but you can't use that.You have to use this more complicated form.
I'm not going to prove to you why this is true because it'sbeyond the scope of this class.Sometimes when you learn math, I chooseto explain where it comes from.I've done that with standard deviation concept, right?So you can kind of have some feelingof where does it come from.But sometimes, like in this case,the proof of why this works is just a little bit too long
winded and a little bit too lengthy to really benefit youtoo much.Because ultimately, you're just going to use the equation,and you know what it represents.It represents the spread of that data in a standard deviation.So this once, you have to kind oftake it on faith that this is truthand you just need to use it.And so what we're going to focus on is learning how to use it.So what it is is the standard deviation is the square root.
And in the numerator, this giant numerator,I have n, which is the number of samples that I have.And then I have a bracket that's open.Inside the bracket, I have the sum of the following terms.f is the frequency that I'm talking about.And then I have x.Now, what did we use for x, because we've said,
we don't have all of the data points.So let me put down here-- actually,let me erase this line.That was a little premature.What I want to put here is I wantto say-- I'll do it over here-- where n is the sample size,f is the frequency.
Because don't forget, this is a frequency table,so you're going to have f's running around.And then x in this equation is the midpoint of the class.All right.So x in this equation here, x in this equationis not the individual data points,because we don't have those anymore.
We have a table here with ranges.In this equation, x represents the midpoint of this class.So if you look up here, the midpoint of this class,what do you think it's going to be?The midpoint of that class right here, 94 to 100,the midpoint of that would be 97.And you can see, you would just average these two together.94 plus 100, divide by 2, that's going to give you
the midpoint, the middle there.It's a 97.You can also see it because this is so easy,and you can just count up from 94.95, 96, 97.And then from 97, 98, 99, 100.So 97 is right in the middle of this range.And so the midpoint of each of theseranges we're labeling x here.In this equation, we have some frequencies,
because we have the frequencies of each class.So these are the frequencies that they're talking about.We also have n, which is the total sample size.The sample size is just going to bethe sum of these frequencies, because don't forget,think back to what we we're talking about.These are number of people in the classroom.Five people got this, eight people got this, and so on.So if I add them all up, that's the total sample size
of my class.All right.So I think what we need to do is solve a problem,because this equation looks very complicated.You've got n.You've got the sum of f times x squared, wherex is the midpoint of the class.You're subtracting off another sum, f times x.And then after the sum is done, you square that.And then you have n times n minus 1 on the bottom.And then you get that whole number
and you take the square root.So it's kind of ugly.I'm not going to lie to you.But there is a way to do it to make it easier.So what we're going to do is go back to this problemthat we have already written on the board.We want to now calculate, to give you some practice,the standard deviation s of this data set here.So the easiest way to do that-- and I'm
going to use different colors herebecause this is the raw data that was given to us.All right.Now, let's create some additional columns.We know that we're going to need the class midpoint, whichis called x.Because that equation over there has x's running around.
So what is the class midpoint here?It would be 94 plus 100, divide by 2.We've already said 97 is the class midpoint there.What is the class midpoint here, between 87 and 93?Well 90 is right in the middle of those two numbers,so that's the midpoint.What about 80 and 86?The midpoint here would be 83, because that's
the value right in the middle.That's all that the class midpoint is.It's the middle value.What about 73 to 79?It's going to be 76, because that lies right in the middlehere.What about between 66 and 72?It's going to be 69, because this value lies exactlyin the middle of 66 and 72.So the first column you want to create is the class midpoint x.
Now, let's go and create another column.This is going to be f-- whoops.Let me do it like this.f times x.Now, notice what I'm doing.I'm going to draw another column here.F times x.Why do you think I'm creating this column f times x?Because for every class-- if you look at your equation
that we're actually trying to use--don't worry about this so much here, but look over here.What we're going to end up doing is for every class,we're going to take the class frequency times the classmidpoint.That's what f is and that's what x is.So we need to know what the product is here.And then we're going to end up adding them all up.So what I'm going to do is this equation
would just be really long to write out term by termby term by term, and just keep writing itall out for all the classes.So what we're going to do is work with itinside of this table.So let's calculate f times x, because we knowwe're going to need that there.And since you're doing it in table,it's very easy, because f is 5 in this class,and x, which is the midpoint, is 97.
So 485 is the product of these two numbers.That's all you have to do.8 times 90 is f times x, which will give you 720.12 times 83 is going to give you 996.7 times 76 is going to give you 532.
And 4 times 69 is 276.All right.So we've calculated that.We know we're going to need those values.Now let's move on to-- let's use-- we'll use red again.What other column do we think we need?Let's go ahead and find a column that is f times x squared.
What that means is for each class,we're taking the frequency times 97 squared.So x squared.X is 97 in this case.So in this case, it was f times x.Now we're doing f times x squared.So 5 times 97 squared is a large number.It's 47,045.
All right?What is 8 times 90 squared?8 times 90 squared-- 64,800.What's 12 times 83 squared?So don't forget, it's 12 times 83 times83, that's what squared means, right?So what you get here is 82,668.
Put some commas in here so you cansee how big these numbers are.What's 7 times 76 squared?You get 40,432.4 times 69 squared is going to be 19,044.Now, I know this looks complicated, guys,but just kind of stick with me to the end of the problem
and just kind of trudge through it.And then you'll see that it's not that big of the deal.So what we have done is we have createda column called f times x, because thatis in our equation.We know we're going to need this product.And for every class, we calculate that.Then we create another column called f times x squared.That is because that also exists in this equation.
And for every class, we calculate f times x squared.Now, notice in both of these cases,you have this sum out here.That's what the sigma means.So really what we're doing is we're summingthe product of f times x.So we take the f times x for every class,and when we get all of them, were adding all of thosetogether.So in order to pull that off, what we're going to do here
is I'm going to draw a line under here,and I'm going to put the sum equals,and we're going to add all this stuff.3,009.That's the sum of everything in this column, 3,009.The reason I'm adding it up is because in my equation,it tells me I'm multiplying these together.But you see how the parentheses are there?I'm summing up all of the class multiplications that I've done.
And so instead of writing the equation over and over again,I'm doing it in a table.Notice we have f times x squared,which we've calculated for each class.Now we have to sum them up.So just draw yourself a little line under hereand add all of these numbers together.And whenever you get the sum-- you know what,let me make it even just a little bit clearer.
Right here, I'm going to have sum of f times x,because that's what we've summed together.And that equals 3,009.Here I'm going to have sum of f times x squared.That's the sum that I'm doing.And what I get here is 253,989.
So now I've got f times x.I've summed them all together.I've got f times x squared.I've summed them all together.I also happen to know that I'm going to needto know in this equation n.See, n is running around everywhere.n is the sample size.We talked about that.In this case, it's how many kids are in the room.So in order to do that, we sum this frequency column.
So what we will get is n is equal to-- we need to 12 times8-- I mean 12 plus 8 plus 7 plus 4, and so on.You get 36 people in this room.So now I actually have everything calculatedto just put straight into this equationand calculate the standard deviation.And that's what we're going to do right now.
So let me draw a little line here.And we will go ahead and do that right now.Let's switch colors to green, justto give us a little variety.All right.So the equation is already on the board.So what we're going to have is-- letme go down here a little bit.s is equal to.We'll do the square root business here in a second.n is right up on top. n we said was 36.
We just calculated that.So we'll have 36.We're going to open a bracket, and inside wehave the sum of f times x squared.The sum of f times x squared is this big number-- 253,989.That's what goes inside here.And so what I'm going to do is close this guy off here.
So this is n times the sum, which we have then found.Then we open up another bracket.And inside this bracket is the sum of f times x. f times x,we already calculated the sum to be 3,009.But don't forget, this is squaredon the outside of this bracket.So we have to put the square here.Then we draw this line here.
n times n minus 1. n was 36, so it's 36, 36 minus 1.And then the whole thing is square rooted.Now, that shouldn't look quite as intimidating to you,because now it's just a bunch of numbers.So the standard deviation s, in this case, is going to be,what is-- 36 times this is going to give us 9,143,604.
So that's a big number.And then we're subtracting.What is 3,009 squared?That's going to give us 9,054,081 like this.And on the bottom, you're going to have36 times 35, which is 1,260.And don't forget, the whole thing
is square rooted like this.Now, when I go ahead and do this,I'm going to get a square root.On the top, when you take this number minus this number,you're going to get 89,523.And on the bottom, you just keep your 1,260 for now.All right.So the standard deviation then is the square root.
When you take this number and divide it by this one,you get 7,105.You're taking the square root of that.So the standard deviation is 8.43.That is the standard deviation.That is the answer.That means that for this set of kidsin this classroom, the standard deviation
about the mean-- notice we didn't evenhave to calculate the mean, right?We didn't have to calculate the mean.But whatever the mean is of this data, whateverthe average grade is, the standard deviation,or the spread, about that mean is 8.43.So 8 points, just under 8 and 1/2 pointseither side of the mean of this data is what it is.
So want to stress to you, I want to stress and makesure you understand that you only do this when youhave a frequency table of data.If you have all of the data points,then you're always going to want to use the exact formulafor standard deviation.When you don't have that, when you have a frequency table,then the best thing to do is to go ahead and create and extendyour table.So here's the grades, which are the classes.
Here's the frequencies.First thing you do, calculate the class midpoint.That's just the middle of each class.Then the frequency times the class midpoint.So you're just multiplying these two numbers hereand you get this column and you add them all up.Then the frequency times the class midpoint squared.You get each one of these values and you add all of them up.And then you add all of the frequencies up, giving you n.
And then you plug these values straight into the equation.Don't forget that you have n here and then you have the sum.And then you have to square this one,because the square's on the outside.And then you just go through and do the math and figure it out.And you arrive at the standard deviation.Now, what if I asked you, what is the samplevariance of this data?What is the sample variance?And you say, well, you didn't give me
an equation for the sample variance.But think about it for a second.Remember, the standard deviation is written as s,right? s is equal to whatever.The variance is s squared.That's what we've always talked about before.So if somebody asked you, what's the variance,you just take your answer and squareit, which really is going to end up being 71.05.So if they ask you for the variance,s squared, you would just give them what's under the radical
here.If they ask you for the standard deviation,you give them the final answer that we've calculated.So I think that's enough for this kind of concept.Really, any data set that you are given,if they ask you to find the standard deviationand you have a frequency table, you'realways going to do it the same way.So follow this kind of bulletproof technique.Make these columns, make these sums,
plug it in, get the information.Now I think you can see why I didn't prove this.I mean, yes, there's a way you can prove that thisis the standard deviation.But in reality, it doesn't help you too much,because what I mostly want you to knowis, what is a standard deviation?Why is it important?And I think I've conveyed that to you.How to calculate it is what we've done here.And at some point, you have to draw a line
and not prove every single thing.And so that's what I've done here today.So make sure you understand how to use this equation.That's the most important thing.Make sure it makes sense to you.Make sure you can reproduce my calculations sothat when you get to your own problems,it will be very simple and you'llknow exactly how to do it.
Series Name: Mastering Statistics, Vol 1
Publisher: Math Tutor DVD LLC.
Publication Year: 2013
Segment Num.: 1
Jason Gibson explains how to find a standard deviation using data in a frequency table. He discusses the uses of this statistical technique and provides sample problems to further clarify the method.
Looks like you do not have access to this content.
Jason Gibson explains how to find a standard deviation using data in a frequency table. He discusses the uses of this statistical technique and provides sample problems to further clarify the method.