- 00:01
Hello.Welcome to this section of Mastering Statistics.In this section, we're going to talk about the coefficientof variation.It's got a complicated sounding name,but it really relates to what we've talked about before.We've talked about the variance in the standard deviation,measuring the spread of a data set.And this is kind of in the same idea.Now, we've talked many times before

- 00:22
and I've told you that the standard deviation,or the variance, just looking-- if you have two sets of dataand they have different standard deviations,then I can tell which one is more spread outby just looking at those standard deviationsor by looking at the variance.The larger the value, the more spread the data is.So if I have data set one and data set two,and data set one has a larger standard deviation than data

- 00:45
set two, then you would know that that dataset is the one that has the data more spread out.So that's true.Absolutely true.In the course of this discussion we'regoing to have, just keep that in the back your mind.But I do want to point something out to you that you might nothave thought too much about.Let's say that I have two neighborhoods.And I'm looking at house prices.Neighborhood-- that's NH-- number one

- 01:10
has the following data.Neighborhood number one has the following data.Now, that's not very clear.Let's make this a number sign here.Neighborhood number one has the following data--the average price of a home in neighborhood one is $120,000.

- 01:31
The standard deviation of the houses in that neighborhoodis $2,000.Now, what this means, and we've talked about it many timesbefore, but I want to reiterate it,is that the average selling price, or the average valueof the houses in that neighborhood, is $120,000.The standard deviation is $2,000.That means that a good chunk-- and I haven't really

- 01:53
defined what that means yet.We'll get to it in another section.But it means a good chunk of the houses in neighborhood oneactually are plus or minus $2,000 around this mean.So since it's $120,000, that wouldmean between $118,000 up to $122,000.In that bracketed region, a lot of the houses,a good chunk of them, actually, are falling into that window.

- 02:17
So that's neighborhood number one.Now, let me let you compare that to neighborhood number two.Neighborhood number two.Neighborhood number two, I'm going to write in blue.The average selling price in neighborhood number twois $900,000.Way outside of my price range. $900,000.

- 02:40
But the standard deviation in neighborhood number twois $10,000.So for neighborhood number two, itmeans that the houses are much, much moreexpensive on average, almost a million dollars, $900000.And the standard deviation is $10,000on either side of that mean.So that means that most of the houses in this neighborhoodlie between $890,000 up to $910,000.

- 03:07
Plus or minus $10,000.In that bracketed window, most of the house prices lie.So my question to you-- and I'm goingto write it down because we're going to talk about it.Which has more spread?Which has more spread?And what I mean by that is which data set,

- 03:28
neighborhood one or neighborhood two, to you,looks to be more spread?Well, my first gut is to say, well, neighborhood number twois more spread because the house prices,their standard deviation is $10,000.Whereas, the house prices here, the standard deviationis only $2,000.So it would stand to reason that neighborhood two has a lot more

- 03:50
spread in their houses because $10000 standard deviation,that's a very large spread in the amount of the housecost in neighborhood number two.Which is absolutely true.So if you wanted to know which one was really spread out more,that would be the answer.However, there's one problem with that.And that is that, yes, this is more spread out,

- 04:11
but look at the difference in the mean price.Yes, it's more spread out, but the mean is way higherthan the mean of this.So is it truly a good measure to look at thisand just say, oh, the standard deviation is bigger,so the spread of this data is bigger.Well, yes, the spread is bigger, but the average valueof the data is so much higher, that it would be nice

- 04:32
if I could compare the spread of these two data sets,taking into account the fact that the core priceof the housing market in there is just so different.And so the way we do that is by using somethingcalled a coefficient of variation.

- 04:56
Coefficient of variation.Basically, it's a way to let you compare two data sets to tellyou which one has more spread.The standard deviation does tell you what has more spread.I guess I should say that it letsyou see which one has more spread in relation or comparedto its mean.And so the way that you calculate it is very simple.

- 05:18
The coefficient of variation is the standard deviationof the data divided by the mean.And then you multiply by 100 because whatyou're going to get at the end is a percentage.So the coefficient of variation is alwaysgoing to give you a percentage.Let's go ahead and go through the calculationhere and see if we can understand something.So for house number one, or I should say neighborhood number

- 05:42
one, the coefficient of variation--the standard deviation was $2,000.So will say 2000.And the house average x bar-- or x average here is 120,000.We'll multiply by 100 here.And so what we get when we do this-- when you take 2000,

- 06:06
divide by 120,000, you're going to get a decimal.You'll multiply it by 100.You'll get 1.66%.so I'm going to say, for neighborhood number one.So for number two, let's go crank through that.For neighborhood number two, the coefficient of variation

- 06:27
is the standard deviation, which was$10,000, over the average price, which was huge, $900,000.And we're going to multiply that by 100.And so when we do this division and multiply by 100,the coefficient of variation turns out to be 1.11%.

- 06:51
So this is for a neighborhood number two.Now, look at these two numbers.Which one of them seems to be more spread?Well, the coefficient of variation in this neighborhoodwas 1.66%.The coefficient of variation in this neighborhood was 1.11%.So when you're comparing two different data setswith totally different means, the way

- 07:11
that you really get a good sense of which one is more spread outis to calculate the coefficient of variation.Because although the raw numbers told a different story,and we thought that neighborhood number two was morespread out-- and it is more spread out in absolute terms--when you compare it to the mean, whichis kind of like the core anchor of the data set,it turns out that this guy is 1.66% spread out

- 07:36
about the mean.This guy is 1.11% spread out about its mean.So when I compare two different data sets,that's how you really should do it to seewhich one is more spread out.Another great example-- I'm not goingto go through the problem.But another great example would be test scores,like SAT scores or GRE scores, standardized test scores.

- 07:56
Let's say I have group A and group B. Group A getsa certain mean and a certain standard deviationof their test scores.Group B gets a different mean anda different standard deviation of their test scores.The means in these two rooms happen to be totally different.These guys scored really, really well on average.And these guys scored really, really poor on average.

- 08:17
So, yes, their standard deviationsdo tell us in absolute terms how much the data is spread.But in order to really compare them,we should look at it as a percentageof how far away that data is spreading from its mean.So we would calculate the coefficientof variation over here by taking the standard deviation, dividedby the mean, times 100.We would do the same thing over here.

- 08:38
And that's going to give us a percentage,just like it did here.So here, we have a 1.66% on plus or minus of the mean.That tells us this data compared to it's mean is actuallyspread out more than neighborhood numbertwo, even though the standard deviation lookto be a lot bigger.So coefficient of variation-- not something youuse very much.

- 08:59
But if you are comparing two data sets,you're going to see that kind of thing running around.It's a great example.I opened up the whole course and I told you65% of all statistics are made up.It's the same thing.I could present this data and I could tell you,hey, the standard deviation in this neighborhoodis way bigger than this one.Let's say that's my agenda.I want to convince you that these guys-- the spread

- 09:19
in this data is bigger and there is some reason Iwant to convince you of that.Who knows why?I can present this data and I could convince you at a glancethat, yes, it is actually spread out more in this data set.But as a percentage, compared to the actual numbersin the data set, the mean, the top neighborhoodactually had a larger standard deviation as a percentage.

- 09:40
Sometimes that's useful.So make sure you understand that.Make sure you can reproduce these calculations.And then follow me on to the next section, wherewe will continue learning these topics and statistics,making them all clear.And giving you practice with every one of them.

### Video Info

**Series Name:** Mastering Statistics, Vol 1

**Episode:** 18

**Publisher:** Math Tutor DVD

**Publication Year:** 2013

**Video Type:**Tutorial

**Methods:** Analysis of variance, Standard deviations

**Keywords:** mathematical computing; mathematics

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Jason Gibson explains the coefficient of variation. He provides a few sample problems to illustrate how to use this statistical tool.