[Sampling Distributions: MathTutorDVD.com]
JASON GIBSON: Hello, welcome to this lessonin Mastering Statistics.I'm Jason.I'll be your host and your teacherfor this lesson and the next several lessonswhere we're going to cover some of the mostcentral and important topics in all of statistics,because don't forget, the sort of the whole purposeof statistics really is that we havea large population of individuals,
JASON GIBSON [continued]: or something coming off of an assembly line,or something that we wish to learn about.But because there are so many people in the countryor in the world, we can never go survey everybody.So what we typically are trying to dois we're trying to do surveys, and samples,and trying to look at a small grouping of samples
JASON GIBSON [continued]: or of information that we can collect.And from that, we're trying to draw some conclusions.And sort of the whole topic of statisticsis really trying to understand what conclusions can we draw.What can we learn about the population of the worldby taking a small sample?And what are the limitations on taking that sample?And so up until now in statistics
JASON GIBSON [continued]: we've learned a lot of the bedrock material.We've learned about the normal probability distributionand several other topics that have been really layingthe foundation for what we're going to cover here when we'regoing to begin to start to talk about samplingto start to learn about that larger population.So keep the big picture in mind.Right here we're going to talk about something called
JASON GIBSON [continued]: the sampling distribution.But it's dovetailing into other topicswhere basically we're going to be wantingto learn about the population that we're basicallytrying to study by doing some sampling.All right, so in this lesson, we'regoing to talk about sampling distributions.A lot of the stuff of statistics, believe it or not,seems difficult because the concepts seem fuzzy.
JASON GIBSON [continued]: If you read a textbook in statistics, a lot of timesyou shut the book.And you're not really sure what you read, because there'sa lot of definitions.And if you don't take the time to really understandreally and truly what those definitions are then, then whenyou get to a problem, you don't know what to do,because you don't ever really take the timeto understand the definitions.So I know it's a little bit long and drawn out.But we really need to take some time in this lesson
JASON GIBSON [continued]: and in the next lesson to understandsome very important definitions and get our terminology down.The first thing is I want to remind you of something that wehave already talked about.We've talked about the concept of a population.And I'm going to remind you of that here.The population is the large collection of thingsthat we're really trying to study.But we need to study them indirectly,
JASON GIBSON [continued]: because there's no way-- for instance,if we're trying to study all of the adultsin North America, that's millions of people.There's no way we can go and really study all of themindividually.But we still consider that collection of peopleto be what we call the population.Now, when we take a sample, it is a small collection of peoplethat we can then go and study.So I may take 35 people randomly from North America.
JASON GIBSON [continued]: And then I call that the sample.And I may go and calculate whatever it is.Let's say we're studying IQ.So we'll give each of these peoplean IQ test to see how they score.And they're going to have some average IQ.But we would like to know how can we relate that informationfrom the small sample that we have back to the populationfrom which it came.Clearly, they're not going to be exactly equal.
JASON GIBSON [continued]: What are the odds that we happen to pick the perfect amountof people who scored in a representative mannerthe same as everybody in the country?We know they're not going to be equal.But we still want to know how we can draw conclusionsfrom that small sample based on what we're trying to studywhich is our population.So when we talk about a sampling distribution,it's very similar to that conceptthat we're verbalizing here.
JASON GIBSON [continued]: We have a population.That's always going to be the same.And in this case, we just grab a sample of 50 peopleto try to study them.Now, keep that in the back of your mind.And let's go turn your attention to something elsethat you have a lot of experience with already.Let's say you're making a cup of coffee.So your cup of coffee about this big.You've got some coffee in there.And you want to put some sugar in it.
JASON GIBSON [continued]: So what we do is we put the sugar inside.And we stir it up to dissolve the sugar.And then you take a sip.You're sampling the coffee.You're taking a sample.Now, clearly the cup of coffee is very largecompared to the tiny, little sip that I'm taking.But when I taste it, I'm assuming that everything'swell mixed in the coffee.And what I'm casing is I'm making the assumptionthat what I'm tasting in my mouth is more or less the same
JASON GIBSON [continued]: as what is actually in the cup, because everything'sbeen mixed.But you know that unless you've reallymixed it for a really long time, that's not exactly true.In other words, if I had a very large cup of coffee--let's say I work in a coffee factorywhere I'm making giant barrels of coffeelike the size of a swimming pool.So there's a giant swimming pool full of coffee.
JASON GIBSON [continued]: And I put sugar in one side of the swimming pool.Now, clearly, if I take a sample from the center of the pooland test it for how sweet it is either with my mouthor with some machine, I'm probablygoing to get a different answer than if I go right to wherethe sugar is being poured in.Let's say I'm dumping a giant barrel of sugarinto the swimming pool.If I sample the coffee very near the sugar,
JASON GIBSON [continued]: it's probably going to be really sweet.If I sample the sugar in the center of the pool,or the giant barrel, it's probablygoing to be not quite as sweet.And if I sample the coffee at the far end of this barrelor of this pool of giant thing that I'm making of coffee,it's going to be probably less sweet still.If I mix it up for a long period of time,I assume everything will mix properly
JASON GIBSON [continued]: and then everything will be uniform.But certainly, there are cases wheneverI take a sample from here, and I take a sample from here,and I take a sample from here.And they're not going to all be the same.The same thing is happening when you make a sampleor you take samples of any kind of population.Let's go back to our example of IQ.Let's go and say, I want to study the IQ of everybody
JASON GIBSON [continued]: in the country.All right?So what I do is I go take 25 peoplefrom Texas, 25 people from Florida,25 people from California.So I take 25 people.Let's say I go make collections of 25 people.I put them all in different school buses.And then I give them an IQ test.Well, each set of people each-- each a set of samplesthat I collected from different regions of the country--
JASON GIBSON [continued]: of course they are going to score differently,because all the schools are different everywhere.So there's going to be regional variations, because certainlydifferent people from different backgroundshave a different set of skills is what I'm trying to say.So when you make a sample of something from a population,it's going to almost always be truethat what you choose from sample A
JASON GIBSON [continued]: is going to give a slightly different answer from whatyou choose from sample B.If you don't want to think about IQs,you can think of things like maybe the height of people--how tall they are-- if you want to lookat all males in the United States, all right?All males in the United States.So if I go collect 25 people from California and 25 people--let's change the number.Let's say I collect 10 people from California,
JASON GIBSON [continued]: 10 people from New York, 10 people from Texas,10 people from Connecticut, and Igo get their average height of each set of 10 peopleand calculate an average height of each setof 10 people I have-- each grouping of samples.Well, clearly, I'm not going to get the same answer.There's going to always be differences--
JASON GIBSON [continued]: regional variations-- different ethnicitieswill yield different heights.And even if everybody is mixed uniformly,still not everyone grows to the same height.So the only thing I'm trying to get-- I'mdoing a lot of talking because I reallyneed you to understand the concept of sampling something.If I go take a sample of 10, and another sample of 10,and another sample of 10, and another sample of 10from any population of any kind, I
JASON GIBSON [continued]: can expect that I'm going to get averageif I take average values and average some quantitywhether it's height, or IQ scores,or anything from different sets of people,I'm going to always expect to get different answers.But still, we like to do that because we can thencollect the results and see what we can learn from them.
JASON GIBSON [continued]: So let's talk about the concept of a sampling distribution,which is basically what I've justdescribed to you without telling you what it is.When you have a population and then yougo collecting a group of people or a group of informationcalculate something like the average IQor the average height, and I go do it againfor another set of people, and againfor another set of people, and againfor another set of people, and I collect all that information,
JASON GIBSON [continued]: then what I've got is a sampling distribution, because it'ssampled, it's data taken from the population.That's why call it sampled.But it's a distribution of valuesbecause sample A is going to be different than whatI get from sample B, which will be different than what I getfrom sample C, which will be different than whatI get from sample D. So they are a distribution.There's a wide variety of answers
JASON GIBSON [continued]: I'm going to get from my experiment.But I all know that they're goingto be sort of related, because I'm studying humankindor whatever.So I expect them to be tightly packed somewhat.But I expect them to be variations in the answers.That's called a sampling distribution.All right, so if I have different kindof distributions-- just recall from distributions
JASON GIBSON [continued]: there are several different kindsof distributions in the world.And we've studied some of them.The most important one is called the normal distribution.We studied that significantly.A normal distribution is this thingthat we called the bell curve.It starts out here.We start here, kind of go down, never quitetouches the x-axis like that.
JASON GIBSON [continued]: So it's symmetrical about this line here.It never quite touches the x-axis on either side.But notice it has this nice bell shape.A lot of things in the Universe reallydo behave like a normal distribution.A lot of things do-- in manufacturing,in studying people, in studying heights, in studyinglots of different things.If you look at the population large enough, many, many, many
JASON GIBSON [continued]: times you will get a normal type of distributionwhere most of the population falls kind of near the centerhere whatever we're measuring.And then as you get farther away from that average value,the chances of getting someone out thereis less, and less, and less.It looks like a normal distribution a lot.This is called normal.But we have, of course, different kinds
JASON GIBSON [continued]: of distributions-- some we really haven't studied much.Here's one called skewed.This is a skewed distribution.And you can probably guess what that is.Basically, it's a distribution that'snot symmetrical like this.So for instance, we could have something like this.Notice that these guys look sort of the same.They both have a peak here.
JASON GIBSON [continued]: But this one is very symmetrical.But this one is not.So a skewed distribution might be just, for instance,let's I'm studying the IQ of people.But let's say my population isn't the world.Let's say my population is only the incoming freshmenof Stanford University, or of MIT,
JASON GIBSON [continued]: or some other top name university.Right?Now, I'm going out on a limb here.But I'm assuming that if people get acceptedto Stanford, or MIT, or any of these other top schools,they probably have done pretty well in school.They probably have an above average type of IQ.So if you just look at the population of peoplecoming into Stanford, their IQ is probably
JASON GIBSON [continued]: skewed to the right.And so you could have something that'snon-symmetrical like that.You're not looking at a cross-section of the populationof the world at that point.So you don't have a nice center value and everybodygently falling off.You're basically skewing things appropriately,because basically the selection criteria to those schoolsare throwing out some of the other people who
JASON GIBSON [continued]: are not in this range here.So that's a skewed distribution.I'm using IQ.But it could be anything.We could be studying the height of people.Certainly if we're looking at the height of peoplein Asia, the height of people in comparedto North America, South America, different peoplebased on their genetics have different average heights.So you could get a skewed distributiondepending on who you're actually studying.
JASON GIBSON [continued]: And then there's this other thingthat we don't really use too much.But I'll just mention it to you.I'll just write it over here.This is called a uniform distribution.A uniform distribution is basically when we don't reallyhave a bell shape at all.But we have something like this.And there's not really a great practical exampleto show you that.
JASON GIBSON [continued]: But it's kind of a theoretical thingto show you that in theory you could have a distribution whereevery point here had the same probability or the same chancesof happening.So whether you look over here or over here,you're basically going to have an equal chance of everybodyfalling in that area.If you're talking about heights, or IQs, or something,
JASON GIBSON [continued]: then it would basically be no matterif you have a lower IQ or a higher IQ,you have exactly the same cross-section of the populationat every little sliver there.So you don't have a nice peak showingan average value of things.Everybody is kind of equal, so to speak.And so that's a uniform distribution.The point is these distributions--
JASON GIBSON [continued]: I want you to start thinking about them in termsof those are characteristics of the population.The population-- whatever it is you're studying-- whateveryou've defined your population to behas some distribution associated with it.Most likely it's going to be normal.But depending on what your population is,it could be skewed or it could be something else.But generally, we don't really know what the population is.
JASON GIBSON [continued]: If we knew what the population was,we wouldn't have to use statistics.We would just know everything.But really, if I'm trying to study all students in NorthAmerica, I'm probably guessing that whatever I'm studyingis most likely normal.But I don't really know, because I can't sample everybody.So what I then do in order to study them is Istart taking surveys or samples.
JASON GIBSON [continued]: So what I do is I sample the population by selecting values.Now, when I say selecting values here, what I'm talking aboutis when you're sampling a population,
JASON GIBSON [continued]: usually you're talking about taking a survey of people.Like if I want to study the cross-section of heightsin North America, I'll call them up on the phone.I'll say, hey, what's your height?And they'll give me the data.That will be a data point.And I record that.But I may be working in a factory.I may not have anything to do with people.I may be studying candy bars coming off the assembly line.And I may be studying the length of the candy bar.
JASON GIBSON [continued]: Maybe every one million candy bars,the machine messes up and cuts a candy bar in half.So I may go take a small cross-section.I can't study every candy bar.That would take forever.But maybe I go look at every 10, or every 20 candy bars,or every 100 candy bars.And I use those as my random samplesto see if I can learn anything about the population which
JASON GIBSON [continued]: would be all of the candy bars that are being manufacturedin there.So when we talk about sampling, what we're sayingis we're taking a subset of the population.And we're choosing to study them.Not a huge surprise there-- so somethingyou need to keep in mind is that the samplesize is, in my lectures here, always going to be called n.
JASON GIBSON [continued]: In your course, they may label it a different variable.In almost every book I can think of, they're going to use n.So if you're working a problem or reading a chapter regardingsampling or sampling distributions,and it talks about the sample size of n,well, then that's the number of peopleI'm choosing in this one batch to study.Remember, how we said if we're going to study IQs,I might look at this 10 people over here,
JASON GIBSON [continued]: this 10 people over here, this 10 people over here.And I keep selecting cross-sections of 10 people.Well, the 10 people-- the number 10 is my sample size.That's the size of how many peopleI'm choosing to study every time I do it.Now, I'm going to do it several times.That's not the sample size.The sample size is how many data points my collecting in one
JASON GIBSON [continued]: kind of unit to then analyze.And I may do that over and over and over again.So the sample size is how many datapoints I'm going to collect at any given time.Next thing to know is that-- and this is important.So we're going to be getting into a little bit of math here.
JASON GIBSON [continued]: For each sample, which means for each little collectionof people or things I'm studying, we can calculate.We can calculate a few things, because Ihave some numbers now.I can calculate the mean of that sample.Whoops, not the meme-- the mean, which is the average value.I can calculate its variance.
JASON GIBSON [continued]: I can calculate its standard deviation.Remember that?We studied standard deviation in previous lessons.I could even calculate some other valuesbecause statistics is full of different thingsyou can calculate.But some of the most common thingsyou'll ever find yourself studying when you're samplingis you're going to be calculating the mean-- almostalways calculating the mean.But you might be calculating the variance
JASON GIBSON [continued]: of the standard deviation.The mean is the average value.That's what the mean really is.So let's just say for sake of argumentthat I'm studying my population--the large collection of people of all studentsin North America.And let's say I'm interested in their IQ.And I'm trying to do a study.And I want to say, what is the-- how
JASON GIBSON [continued]: is the IQ of all the students in North America-- what can Ilearn about that is what I'm trying to do.Obviously, I can't study them all.So I choose to set up a survey.So I choose some sample size.We call it n.And it could be whatever.We'll learn a lot more about sample size in a little bit.But let's just say I pick 50 people at a time,because I'm employing people in a call center.
JASON GIBSON [continued]: And I'm going to tell him call 50 people, right?So this person here calls this set of 50 people.I'm going to use my hands a lot in this class.So this set of 50 people is the sort of the first collection--the first sample.When I call it a sample, it's the collection of 50 peoplethere.Right?And from those 50 people, I can calculate the average IQ,because this guy, this guy, this guy, this guy, this girl,
JASON GIBSON [continued]: this girl, this guy-- they all have different IQs.But in my sample size of 50, I can certainlycalculate the average IQ.So I can calculate the mean of sample number 1.Then I go call 50 more people.And I can find 50 more IQs from sample number two.And I can calculate the average IQ of sample number two.And then I can do it again and again and again,
JASON GIBSON [continued]: and call another 50 people, and call another 50 people,and call another 50 people.But every time I do it, I average their IQs,find their means, and then I'm basicallyrecording the average IQ I'm gettingfrom a batch of 50 people.These are very important conceptsthat are going to come back when we study the central limittheorem in the next section.So that's why I'm taking some time here.
JASON GIBSON [continued]: But basically, what I'm trying to say in the third bullet,you can calculate things.I can find the standard deviation of their IQs--that means how spread apart are their IQs about it'saverage-- about the average IQ from that sample.I can calculate things for each sample.All right?This is important because it's directly related to whatwe're talking about here.
JASON GIBSON [continued]: If I choose every possible sample of sizen from my population, and I'm going to write population
JASON GIBSON [continued]: as population.Then I'm going to write something in red, or in pinkhere, then when I get-- a sampling distribution.I'm going to put it in quotes here, because this is a term.
JASON GIBSON [continued]: You know, the important terms I'mtrying to pull out for you-- sampling distribution.You have to understand the terminology.If you're reading in a book, hey, the samplingdistribution of the IQs-- you haveto know what they're talking about.When you see the term sampling distribution,it means that someone spent a lot of money and timetrying to collect this information.What they did is they said, all right, we'regoing to have a sample size.
JASON GIBSON [continued]: It's going to be a certain size.Let's say 50 people or 20 people.And we're going to collect this sample, and thenthis sample, and then this sample.And we're going to collect as many as we can.But if we somehow could collect them all from our population,we can organize it into something wecall a sampling distribution.So in the case of IQs, let's say, I have an IQ here,an IQ here-- from this set of 50 people,
JASON GIBSON [continued]: from this set of 50 people.If I find the average IQ for every sample,from every set of 50 people, I'm goingto get a bunch of average IQs from everybody.And I can of course plot them in a histogram.And I can find out that most of the time, what I'm gettingis an IQ that's representative of my population.We're going to get to that a little bit more later.But the point is this is called a sampling distribution
JASON GIBSON [continued]: if I take that information and I organize it.And if I organize it into a curve of some type,that's called a sampling distribution.And then the final thing I want to say hereis what I've basically just said--if we calculate the mean of every sample,
JASON GIBSON [continued]: then what we get is a very complicated sounding thing.But it's not very complicated.What we get is the following-- we get a samplingdistribution of sample means.
JASON GIBSON [continued]: Don't overlook how important these words are here.It's very descriptive.It's a sampling distribution, because I got the informationby sampling the population, by going and collectinginformation.That's what the first part means of sample means.So as I said before, let's say it was IQ.Let's get away from IQ.We want to talk about the same thing over and over again.Let's talk about-- oh, gosh, it could be anything.
JASON GIBSON [continued]: Let's say we're going to sample everybody in the country.And we're going to ask them how many siblings do you have?Brothers and sisters.So clearly they are going to give you a number back.So let's say my sample size is now20-- sample size n-- that's 20.So I go ask the first 20 people, how many brothers and sistersdo you have total?Well, this guy, this guy, this guy-- they'reall going to give me different answers right
JASON GIBSON [continued]: but I can take that information from that sample.And I can calculate an average number of brothers and sisters.Maybe it's 2.5, or 3.2, or whatever it is-- Ican calculate that number.That is a sample mean, because thisis a sample-- a stack of 20 people that I've called.And I'm calculating the mean.OK, then I go do it again.For this stack of people, I go randomly choose thosefrom the population, I get another sample mean,
JASON GIBSON [continued]: which will be different than the original onebecause the odds of them being the same are very low.They have exactly the same amount of brothers and sisters.But I can do this process again, and again,and again asking everybody, every stack of 20,how many brothers and sisters you have?They will answer.And I can calculate a sample mean.Now, if I do this for the whole population,which is really impractical.
JASON GIBSON [continued]: You're really not going to be able to call everybodyin North America.But what you get whenever you theoretically call everybodyin terms of the sample-- you get the samplemeans like that-- then what you getis a sampling distribution of sample means.That's what you get.And that's what I'm really tryingto teach you in this lesson is the terminology.The sampling distribution is when you get that information,
JASON GIBSON [continued]: and you you're getting a distribution of answers.And usually when you do that, you're calculating something.Usually, it's the mean, or the standard deviation,or whatever it is you're trying to study.But that's called a sampling distribution of sample means.There are other types of sampling distributionswe'll get into later.But mostly we're going to talk about sampling distributionof sample means.So what I'd like to do is close the section down here.
JASON GIBSON [continued]: Mostly, I wanted to give you an introduction of what a samplingdistribution is, the concept of a samplesize, the concept of a sample mean,and then the idea that if you do that enough,you get a sampling distribution of sampling means.Now, in the next section, we're goingto study come to something called the central limittheorem, which is directly going to start to comefrom where we ended here.And we're going to be able to tie some numbers to itand see what we can actually learn.
JASON GIBSON [continued]: So far, we've just been doing definitions.I haven't really told you what a samplingdistribution of sampling means is really useful for.In the next section, with the central limit theorem,you're going to learn why it's useful.So don't feel too bad.We haven't done any problems.We're just getting our feet wet.We're just learning.Follow me on to the next section with the central limit theorem.And you'll see how to use this stuff and solve real problems.
Jason Gibson explains how to use a sampling distribution. He starts by defining the sampling distribution, then continues into how a student might find a sampling distribution in practice.
Looks like you do not have access to this content.
Jason Gibson explains how to use a sampling distribution. He starts by defining the sampling distribution, then continues into how a student might find a sampling distribution in practice.