 • ## Summary

Search form
• 00:01

Hello.Welcome to this lesson of Mastering Statistics.Here, we're going to apply the central limittheorem to the concept of a population proportion.And so I need to explain some concepts in the beginning hereto tie it together with what you alreadyknow about the central limit theorem.Then we're going to work a few problems to give yousome practice.I think you'll see immediately how

• 00:21

this type of problem that we're going to dois something that you've actuallyseen all the time in statistical ideas on televisionor surveys or things of that nature.So first, we're going to do a couple of definitions.First, let's talk about a proportion,a concept of proportion.When you're talking about a proportion in statistics,it's a little bit different than whenwe talk about a proportion in basic mathematics.

• 00:43

So just forget about what you alreadythink you know about the idea of a proportion.In statistics, the idea of a proportionis a very specific thing.And specifically, there's a population proportion.So I'm going to call it population proportion.And the easiest way-- there's a bunch of different definitions,

• 01:05

but in my little definition, it'sthe percentage of a population that has a characteristic.That's kind of a weird, nebulous definition,but it's very simple when I give you an example.

• 01:25

So for instance, now remember, we'retalking about population proportion.So whatever population you're studying,which is a large collection of people or things,the population proportion might be,for instance, 97% of all people carry cellphones.

• 01:47

That is a population proportion.The proportion would be the 97% of all the people-- allthe people meaning everybody in your population.So when you see a percentage, it'sdefinitely going to be referring to what we call a proportion.You know it's a population proportion because it'sreferring to all people.In that case, it's a very large group of people.Or you could say another population proportion would be

• 02:10

1/10 of kids brush their teeth.Hopefully, it's actually better than that.I'm just making these numbers up.But the idea is the population that you're talking aboutwould be all kids.The proportion would be 1/10.So when you're dealing with proportion,

• 02:31

you can have a percentage in therethat's representing some fraction of the population.You could actually have it writtenas a fraction, which obviously representsa fraction of the population.Or you could have it written as a decimal.We could say 0.35 of the population of kids.That's representing the same thingas these fractions or these percentages.Typically, in a statistics course,you're going to see it represented as a percentage

• 02:53

almost always.But just keep in mind that it's basically you'redenoting a fraction of the population--we call it a proportion of the population-- that hassome sort of characteristic.That's why the definition is a little weird.A characteristic can be anything.In this case it's cellphones, brushing their teeth.We could say 50% of the population loves cartoons.Anything related to a fraction of the population,

• 03:15

we call that a proportion.And when we're doing statistical calculations,we denote this-- the population proportion,we denote it as a lowercase p.So we might say p is equal to 97%, p is equal to 1/10,p is equal to 0.75, something like that.So since we have the idea of a population proportion in place,

• 03:36

the idea of a sample proportion shouldn'tbe too foreign to you.It should not be too foreign to you.You probably can guess what it is.I'll write it down.It's the percentage of a sample that has a characteristic.We'll abbreviate that char, so characteristic.

• 03:59

So for instance, you might say, weask 10 people-- PPLs, people-- if they like coffee.Two people say yes.

• 04:22

So you could say the proportion would be 2 out of 10,or you could reduce this to a fraction,or you could make that into a percentage.For instance, 2 out of 10 is the same thingas 0.2, which is the same thing as 20%.If you take 0.2 and move the decimal two spots to the right,you get a percentage.So it's represented the same way,

• 04:44

but in a sample proportion, it's the fractional partof whatever our sample is.In this case, our sample size was 10 people,and we had two people give us a specific answer.So we'll say the proportion of these people is 20%or whatever.Now, in this case, since we're dealing with a sampleproportion, we need to have a different way to denote it.So we denote it as p with a little hat on top.

• 05:11

That's a little house or a hat on top.So the way you would call this is you would call this pand you would call this p hat.Because when we're doing these problems,you're going to be given informationabout the population, in which you denoteit P so that you're not confused with what you're talking about.And if you're taking a sample, youmight want to talk about the proportion of that sample

• 05:32

is doing, something so you need to denotethat differently, p hat.All right.So where are we going with this?The idea is something that you are very, very comfortablewith, because we have voted in elections.And so you know that when they go to the pollsand they take everybody coming out of the polling station,they may sample 45 people or 50 people or 90 people,

• 05:53

and they may say, who'd you vote for?Which candidate did you vote for?And then some of them say I voted for candidate A,and some of them say I voted for candidate B.And a lot of times, we're trying to draw conclusionson predicting who's going to win based on a small sample.So we're using the central limit theoremas a way to look at a sample size in proportions

• 06:15

and try to draw some conclusions from it.That's not the whole story, but that'sthe idea why proportions are very important.You might also look at a defect rate.You might look at an assembly line.And you might say, all right, we're producing cellphones,and I'd look at a sample of 1,000that come off the assembly line in a given day.Or maybe I'd take the 1,000 samples randomly over a month.

• 06:37

And then I may find out that 35 out of 1,000 were defective.So my proportion of those sampleswould be 35 out of 1,000, 35 over 1,000.You can convert that to a decimal.You can convert that to a percentage,and that would be the sample proportion.Now, is that sample proportion representativeof the entire factory?Did we make our sample size large enoughto make that claim, or is it something that's useful or not?

• 06:59

That's we're basically going to study in this section.So whenever you're studying these problems,think back to what we did in the last section.In the last section, we said we're going to do sampling.So we're going to take-- let's say our sample size is 50.We'll take 50 people.In the previous section, we calculatethe mean of that data.Take 50 more people, calculate the mean, 50 more people,

• 07:20

calculate the mean.In this case, we're not going to be calculatingthe mean of anything, because we're not studying the mean.But what we might do is we might go to a polling placeand take 30 people and say, who'd you vote for,a Democrat or Republican?And they'll answer.And we'll figure out from that data that of those people,35% of that sample voted Democrat, let's say.

• 07:41

Then we go to another polling place,and we sample the same number of people.We ask them the same question.We find out that 25% voted Democrat coming outof that polling station.Then we go to a third polling station, fourth police station,fifth polling station.Each time we're doing the sample,we have a sample size the same as before.But instead of measuring numbers and taking their average,we're just going to look at the answers they give

• 08:02

and calculate the proportion of that sample-- that'scalled sample proportion-- that answers a certain way.We'll do it for this group of people,and this group of people, and this group of people,and this group of people.We'll get a bunch of sample proportions.And guess what we get from that?We get something called sampling distributionof sample proportions.Let me write that down.What we get when we do this experiment

• 08:23

is a sampling distribution of sample proportions.Ran out of space there.Basically, that's what you have.If you remember back from the last two sections,we had sampling distribution of sample means.

• 08:45

So you see it's very descriptive.It's a very long-winded name, but it's very descriptive.Because when we're doing means-- calculating the means--sampling distribution of the samplemeans that we are calculating, that makes sense.Here, we're sampling everything, but we'relooking at the proportion, so we get a sampling distributionof sample proportions.The central limit theorem, as you might guess,is very useful.Because what it's telling us is itdoesn't matter what the initial population is doing.

• 09:07

That's irrelevant.We don't care what the shape of the original population is.Under certain conditions, if we're studying proportions,then our sampling distribution of sample proportionswill also look normal under certain conditions,which I'll write on the board.And the reason that's useful is the same reasonit was useful before-- because weknow it's a normal distribution, wecan use our tables to solve problems, practical problems

• 09:29

dealing with proportions.I've already given many examples of howthat would be useful in manufacturingand politics and other things.So the question is, under what conditions canyou use the central limit theorem to guaranteethat the sampling distribution is actually going to be normal?Well, when we're doing sample means, it was very simple.We just said make your sample size greater than 30.

• 09:50

Then we know that the sampling distributionis going to be normal and everything is cool.It's a little bit different for proportions.I'm not going to prove it, but here are the requirementsto make sure that we can use this guy in terms of havinga normal distribution.So we need to ensure the following two things.I'm going to tell you right now that I'm writing them

• 10:13

down the board for completeness, but for this problemand for all the problems in this course, almost certainlyall the problems in your statistics course,these constraints are going to always be true.Otherwise, you can't use the central limit theoremto help you.But anyway, the constraints are that the sample sizen times the proportion has got to begreater than or equal to 5.

• 10:35

And the sample size times 1 minus the proportionhas got to be greater than or equal to 5.I know this doesn't make a lot of sense.You're like, how does this come about?Well, I'm not proving any of it.There's a lot of very smart statisticians out therethat have worked a lot with the central limit theorem,

• 10:55

and trying to investigate the situations in which itcan be applied.There's a lot of statistical theory behind that.The goal in this course is not to give youa statistical theory of why this is true.But you need to start somewhere in life.And so in order for you to start somewhere,you just take this as truth.And whenever both of these conditionsare true-- that's why we have both of themare true-- then the sampling distribution of sample

• 11:22

proportions is normal.And again, that's important, because if it wasn't normal,then we wouldn't be able to use the normal distribution tablesto actually solve problems.So we're going to assume in this classthat n times p is greater than or equal to 5,and also, n times 1 minus p is greater than or equal to 5.And when those two things are true, it's cool,and we can proceed.

• 11:43

Now, the other thing we need to do-- if you remember backto the previous lesson on sample means,we said, hey, if the sample size is greater than 30, we're good,and we're going to use the z-scoreto calculate the z-score for whatever we need,and we're going to look it up in the table,and we're going to get our answer.But when you're dealing with sample proportions,the formula for the z-score changes.Again, I'm not going to prove it to you.

• 12:04

But the formula for the z-score is the following.So I'm going to say the formula for the z-score changesas follows.Let me write that down.It change as follows. z is p hat-- that's

• 12:28

the sampling proportion or the sampleproportion-- minus the populationproportion over the square root of the populationproportion 1 minus the population proportion dividedby n.And that looks nasty.It looks ugly.But what you're going to find is that as we work a problem,it's going to be very simple.Because in the problem, once I teach you

• 12:50

how to identify what the problem is telling you,you'll know what goes everywhere.You just put the p's in their proper place.You put the p hat where it is.You put the sample size here.You do this math here.You do this math here, take the square root, divide,and you're going to end of getting a z-score just like youdid before.So remember before when you did the z-score,it was like x minus the mean over the standard deviation.

• 13:10

And you take the information from the problem,you stick it in there, you get a z-score.And you take that z-score to the table,and you look it up and calculate the probability for the problemthat you care about.Nothing is different here.You're doing all the same broad steps.It's just some of the details are different for sampleproportion problems.So you need to make sure that in orderto even use this stuff at all, these two constraints are true.

• 13:31

But I'm telling you they will be true for all of our problems.And when you get down to brass tax of calculating the z-scoreto get the answer, you don't do it with the same equation.You do it with this.We're not going to prove it.It's something that's proved.In more advanced statistics courses,you can go look that up if you want.But we're not going to prove it here.So what we're going to do is get some practice with this.

• 13:53

And we'll do a problem here.We'll go on to the next section as well and do another problem.You'll see that it's not so bad once you get the hang of it.So first problem is 79% of votersin a city are registered Democrats.If we sample 100 people at random,what is the probability that more than 68 of those people

• 14:14

will vote Democrat?So now you can instantly see how useful the central limittheorem is, because this kind of problemis what we do every year with elections.We know which cities in the country, whichstates in the country lean Democrat or Republican.And if you're watching this videofrom another part of the world, lots and lots

• 14:35

of countries around the world havedifferent political parties, so you can substitute themin for this discussion.So what we want to do a lot of timesis we want to sample people and figure out,what is the probability these people aregoing to answer a certain way?That's a very useful thing to want to know.Now, in this particular case, we know that 79% in the cityare registered Democrat.

• 14:55

And we sample 100 people, and we ask them, hey,what's the probability that more than 68 will vote Democrat?Well, what do you think the answer is going to be?Even without doing any calculations,if I'm asking 100 people and 68 of them--I'm wondering what's the probability of more than 68voting Democrat, I should have pretty good probability

• 15:17

of getting that to happen.Because we already established that 79%-- that means 79 outof 100 people are registered Democrats in the city.So if I sample 100 people, unless Isample the wrong people just out of blind, sheer luck,if I sample them randomly, I shouldhave a pretty significant chance of getting at least 68 of themto say that they would vote Democrat just because of what

• 15:38

we know about the population.So the first step in all of these problemsis to write down what you know.So let's write down what we know.And the first thing we know is the sample size is 100.It tell you right here.If we sample 100 people, what is the probability, blah, blah.What else do we know?Well, we know the population proportion.

• 15:59

We call that p.There's no hat on this, because it says in the problem79% of voters in a city-- you can think of the populationbeing the voters in that city-- 79% of themare registered Democrats.Now, you don't want to put-- you can put 79% here if you want.But when you do your calculations,you're always going to represent it as a decimal.So this is 79%.You move the decimal two spots to the left.

• 16:21

That's actually important for you.When you're doing statistical calculations,never ever put a percent into an equation.You convert it to a decimal, and that's what you use.You don't want to put 79 in there.You want to put 0.79.Now let's go and verify.First of all, before we do anything else,let's go verify with this information,

• 16:44

with the sample size and the population proportion,is the central limit theorem even going to apply?So let me draw a little curly brace here.And we said we need n times p to be greater than or equal to 5.So let's go ahead and do that.So the sample size is 100.The population proportion is 0.79.And whenever I get that, when I multiply that out,

• 17:06

I'll get 79, which is indeed greater than or equal to 5.So yes, the first constraint is true.But we have to test both constraints.And the second one is that n times 1 minus pis greater than or equal to 5.So let's go ahead and check that out.What we're going to get is n, which is 100, 1 minus 0.79.

• 17:32

And when we do this, we'll have 21.Take this subtraction, multiply by 100.21, which is greater than or equal to 5.So both of these check out.n times p is greater than or equalto 5. n times 1 minus p is also greater than or equal to 5.So because both of those constraints are true,we don't need to do anything else.We know that the sampling distribution

• 17:53

of sample proportions is going to look normal.And because of that, we can proceed.So what else can we pull out of this problem statement?We're also told that if we sample 100 people-- what'sthe probability that more than 68 will vote Democrat?So what we're told is that the sample proportion,which is p hat, the sample proportion,

• 18:14

is 68 people over 100.68 out of 100 people.This is the proportion of people that were curious.That's our breakpoint.We're asked to see what's the probability of somethinggreater than that.But this is the sample proportion, 68 over 100.And when you convert that to a decimal, you get 0.68.So we're asked to figure out what is the probability that we

• 18:37

have an answer of more than 68 outof 100 people giving us that answer.So what we're going to do is calculate a z-score.The z-score is sample proportion minus the population proportion

• 18:58

over the square root of the population proportion 1 minusp over n.I know it looks complicated, but you justput everything in that you know from the problem statement.p hat is 0.68.p, which is the population proportion, 0.79.On the bottom, you have a square root over all of this stuff.Then you have p, which is 0.79, and then you have 1 minus 0.79,

• 19:22

and then the sample size is 100.Don't forget to put the proper sample sizefrom the problem, which is 100.So you see, it's numbers, numbers everywhere.It's not rocket science to do that kind of thing.But when you're going to end up gettingis on the top when you do the subtraction,you're going to get negative 0.11.On the bottom, when you do this subtractionand multiply by 0.7, divide by 100, it gives you one number.

• 19:45

You take the square root on the bottom.You get 0.041.So what you get is that z is equal to negative 2.68.So it's a z-score the same as all of the other problemsthat we have done.It's a z-score that we calculate.It's just we're using a different formula,because we're doing a different type of problem.

• 20:05

We're not proving where this comes from.This is what we use for sample proportion type of problems.Now, what we want to find, ultimately,is what is the probability that z is going to be greaterthan negative 2.68.The reason we put a greater than signthere is because the original problem says,what is the probability that out of 100 people, more

• 20:25

than 68 vote Democrat?So we're looking for the area to the right of that proportion.And since we converted it to a z-score,we're looking for the area to the right of that.Now, our z chart table is not set upto give us areas to the right.So what we end up having is the probability of z.We can flip this arrow around to be less than.

• 20:46

And because of that, we put a negative sign in front.But this already has a negative sign,so it becomes positive 2.68.That's how we do this part, because that's howwe look things up in the chart.We can't look it up in the chart like this,because we always get areas to the left in our z chart table.So finally, the probability-- when

• 21:07

you look up positive 2.68 in the chartand go into your z chart table, what you will get is 0.9963.That's the probability, 0.9963, so very close to 1.So you should always do a sanity check to see if it makes sense.

• 21:27

In our problem, we know that 79% of votersare registered Democrats.We ask 100 people off the streets.What are the odds that more than 68 peopleare going to tell us that they're going to vote Democrat?The odds are pretty high, because we alreadyknow the fraction of the population thatare registered Democrats.So unless I'm just really unluckyand choose the wrong 100 people, oddsare if I choose a random sampling,

• 21:50

and I get the sample proportion, which is 68%,then I'm going to get something greater than that.And that's why it's so important to do a sanity checkon your answers and just read over what you've done,make sure it makes sense from a logical point of view.So that's an introduction to using the central limit theoremwhen we're talking about proportions and problems thatdeal with sampling proportions.

• 22:11

Basically, you still do a sampling of the population.You're still creating a sampling distribution.But instead of focusing on the mean,we're looking at the proportion of peoplewho answer a certain way.Or if you want to think about manufacturing,you're looking at the fraction of things coming offthe assembly line that's defective, or malformed,or whatever.That's very, very common stuff you use statistics for.

• 22:33

So we're just tipping our toes into that here.Follow me out to the next section.We'll do a couple more problems to give youan idea of how to use the central limit theoremin these types of problems.The main difference is that in order for the whole thingto work at all, these two constraints need to be met.Basically, your sample size needsto be large enough to make this calculation greater thanor equal to 5.

• 22:53

So you can just do a quick check before your problemto make sure it's true.And then when you calculate the z-score appropriately,you put the sample proportion that you care abouthere, put everything in, you get a z-score,and then you proceed as normal for everything else.So make sure you can understand and work this one yourself.Follow me on to the next lesson, wherewe'll continue getting practice with sampling distribution

• 23:14

of sample proportions.

### Video Info

Series Name: Mastering Statistics, Vol 3

Episode: 7

Publisher: Math Tutor DVD LLC.

Publication Year: 2014

Video Type:Tutorial

Methods: Normal distribution

### Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:

Keywords:

## Abstract

Jason Gibson explains how to apply the central limit theorem to population proportions. He also covers how to use a sample proportion. Gibson then provides example problems for the central limit theorem as applied to population proportions.