- 00:03
Business Mathematics.Module 10: Statistics.Part 2.3: Introduction to Statistics.

- 00:10
RICHARD WATERMAN: In this particular example though,my key assumptions were approximate normalityfor the returns and furthermore, that the future looks somethinglike the past because I'm going to use the past data to makesome probability statement about the future event.So, that's our value at risk calculation.

- 00:31
RICHARD WATERMAN [continued]: And I would just leave this slideby making a couple comments.So bar is a measure of risk.Here, my example, it came from the empirical world.There are many different measures of risk.And from a management perspective,it's sometimes not critical that youhave the right measure of risk.I mean, the fact is there isn't going

- 00:51
RICHARD WATERMAN [continued]: to be a single unique correct measure of risk.What's more important is that you'vecreated a metric for risk.And once you've created a metric for risk,then you can do useful things with that metric.For example, you could compare different tradersin terms of their value at risk.

- 01:11
RICHARD WATERMAN [continued]: So you can do comparisons once you've got a metric.You could also create a benchmarkthat you wanted to compare your bar against.So you can manage towards a benchmark as well.So quite often, the benefit of doingthese statistical approaches, creating an estimating metricisn't necessary that we got a right answer in place,

- 01:33
RICHARD WATERMAN [continued]: but we provided a context within whichwe can better manage the process that we're looking at.And I think this var calculation is a good exampleof such an effect happening.We're now going to move on to the next setof statistical ideas.And these statistical ideas are required in order for us

- 01:55
RICHARD WATERMAN [continued]: to make decisions.And you might say, well, we're always making decisionsso why do we need some extra ideas?Well, the answer is that we're goingto think about making decisions in the presence of uncertainty.And when we do that, we just haveto be perhaps a little bit more careful.But nonetheless, it is more realistic.We're always surrounded with incomplete information,

- 02:18
RICHARD WATERMAN [continued]: with noise in the system, so to speak.And we need to understand that and incorporate itin the decisions that we make and the way that wepresent those decisions.And so, I'm going to start this section by introducingan example and then lay down the ideas that we

- 02:42
RICHARD WATERMAN [continued]: need to answer the question.So, a company that lends to small businessesclaims that on average, its clientspay off 92% of the total amount thatshould be repaid on the loan.And so, imagine a company making loans,it wants to essentially what percentage it gets paid back

- 03:05
RICHARD WATERMAN [continued]: on those loans.And now you might say, well 92% doesn't look good.That's less than 100%.But of course, they set an amountto be repaid that is higher than the amount that they lend,specifically because they know that some people don'tpay everything back.So it isn't necessarily that bad newsthat they're only getting 92% of the amount to be paid back.

- 03:27
RICHARD WATERMAN [continued]: That could still be enough for the company to be profitable.But nonetheless, that's the claimthat we're interested in testing in thinking about.So, company tells you that it got a 92% payoffrate on its loans.It's your job to figure out whether or notthat is a reasonable assertion.

- 03:47
RICHARD WATERMAN [continued]: And let's consider ourselves as auditors.And as auditors, we have access to not allof the loans that the company has given out,but to a sample of those loans.And in this example, we've got little n equal to 400.So I've got 400 loans from the entire population of loans

- 04:09
RICHARD WATERMAN [continued]: that have been given out.And based on that sample of 400 loans,I need to figure out whether or not the company's claim 92% isreasonable.Is it supported by the data?Now you can see an acronym here that, on the second bulletpoint, as an auditor you have access to,and that's an acronym IID that I'm going

- 04:29
RICHARD WATERMAN [continued]: to define in just a second.It stands for Independent and Identically Distributed.So I'll define that in just a second.But we've got a sample of loans, 400.Our job is to see whether or not what we learn from the sampleis consistent with the claim made by the lender,

- 04:50
RICHARD WATERMAN [continued]: That they've got a 92% payback rate.Now, if we're going to make progress on this question,then we have to put down some ideas.And so I'm now going to discuss the three key ideasthat we need in place to answer this question.So the first idea is about how to draw data from a population

- 05:15
RICHARD WATERMAN [continued]: so that you can assert that what you learn from the sampleis representative of the population.And the way that we're going to do thatis by taking what's called a simple random sample.So in just a second, I'll tell youabout a simple random sample.That's one, idea one.The second idea is called the standard error of the mean.

- 05:38
RICHARD WATERMAN [continued]: And what that object is going to do for us is tell ushow variable the sample means themselves are,all to be discussed in a couple slides.And the last idea is a very important one.It's called the central limit theoremand it's going to tell us what the shape of the distributionof sample means looks like.

- 05:59
RICHARD WATERMAN [continued]: And I will discuss each of the in turn,but these are the three ideas that youneed to be able to understand to be able to articulate.With these three ideas, we're goingto be able to put them together and then answer the questionthat we have just posed.Based on this sample of 400 loans,does it look like the company's assertion of a 92%

- 06:22
RICHARD WATERMAN [continued]: payback rate is reasonable?So, we'll look at each of these in turn.So I'm going to first talk about the objectcalled a simple random sample.So remember the entire paradigm behind these statistics,inferential statistics.There is a population out there.We want to make inferences about it.We only have access to a sample.

- 06:43
RICHARD WATERMAN [continued]: We're going to use what we learn on the sampleto make a statement about the population.Now, under what conditions is that goingto be a reasonable approach?Under what conditions can we be sure that the sample reallyis generalizable to the population?Well there is a sort of very direct answerto that which is, make sure you take a probability sample.

- 07:05
RICHARD WATERMAN [continued]: One in which each unit in the populationhas a known and positive probabilityof coming into the sample.That sort of answers the question.These probabilities samples allowus to be comfortable in generalizingwhat we learn in the sample to the population.Now, there are, in fact, lots of different types

- 07:26
RICHARD WATERMAN [continued]: of probability samples.So I'm going to focus on what one might term the entry levelprobability sample known as a simple random sample,or an acronym SRS.When I say this is the entry level sample,I certainly don't mean to demean it in any way.Typically, if I'm called on to create some sampling scheme,

- 07:48
RICHARD WATERMAN [continued]: if I can implement a simple random sample,then I'm often very happy with that.It's something that most people are going to understand.So what's the definition of a simple random sample?It's a sample in which all possible subsets of a givensize-- in our particular example, 400--are equally likely to occur.And the way that I think of that, the waythat I operationalize it is to think of simply

- 08:10
RICHARD WATERMAN [continued]: shuffling a deck of cards.And I pull a given number of cardsoff of the top of the deck.If I'm doing that, then I'm creatinga simple random sample.Now, I might have to extend this idea to a deck of 400 cardsor a deck of 4 million cards, howevermany cards there are in my population,

- 08:31
RICHARD WATERMAN [continued]: but it's the same idea.All, as I say, subsets are equally likely to occur.So if you can think of shuffling a deck of cards,then you can think of what's sittingbehind a simple random sample.Now, if we've got a very large population,then the observations drawn are essentiallyan independent sample from the population.

- 08:54
RICHARD WATERMAN [continued]: And remember I talked about the word independent in last time'sclass when we were discussing probability,and the intuition there was that the outcome of one eventgave us no information about a subsequent event.So in this context, obtaining one unit in our sampleshould not change the probability of another unitcoming into our sample.And when we have this situation that our observations are

- 09:20
RICHARD WATERMAN [continued]: independent, then we're going to call it an IID,independent identically distributed sample for short.And that's the sort of sample that we'regoing to be considering when we answer this question.And so in practice, we don't have decks of cards here.We have populations from which we get a sample

- 09:44
RICHARD WATERMAN [continued]: and we use a computer to identify the sample for us.So we'll use a computer to identifythe simple random sample.If I can do that, measure the valuesof interest on the sample, in this casethe proportion of the loan that's being paid back,

- 10:04
RICHARD WATERMAN [continued]: I'm going to feel comfortable that what I learnabout the sample is going to be generalizable,is going to be a legitimate extrapolationto the population.That's our first idea then, the ideaof collecting a legitimate examplethrough simple random sampling, creating an IID sample.

- 10:26
RICHARD WATERMAN [continued]: Now in order to make progress on this loan problem,I'm going to have to introduce an abstraction.And the abstraction is that I collectmultiple samples of size 400.Now in reality, I'm only going to get a single sample of size400 but I need to consider what would happen if I got

- 10:48
RICHARD WATERMAN [continued]: lots of samples of size 400.And for each of those samples that I take,I'm going to consider what the sample mean is of that sampleand also what the standard deviation of those sample meanswould be.So just backing up a little bit, each time I

- 11:09
RICHARD WATERMAN [continued]: take a sample of size 400, it gives me a mean.I then need to consider what the average of those sample meansis and also the standard deviationor the spread of those sample means themselves.And that's what I mean by the spread of the sample mean.

- 11:32
RICHARD WATERMAN [continued]: So this is clearly an abstractionbecause in reality, you only havea single sample of size 400.But it turns out that by understandingthe properties of the distribution of the samplemeans, it tells us what to do with the single samplemean that we have.So it's not uncommon in problems that weface that we have to introduce some form of abstraction

- 11:54
RICHARD WATERMAN [continued]: in order to be able to get an operational answer to them.And again, the idea here that we need to understandis that I'm going to think about a what if scenario.What if I got a lot of all different samples of size 400,each sample of size 400 gives me a sample mean, an x bar.

- 12:17
RICHARD WATERMAN [continued]: I now want to talk about what the distributionof those x bars is, the spread of the x bars themselves.And bear with me.Once we've done that, we're goingto be able to go back and put the ideas togetherto answer our question of interest, whether or notthe data that we have, our single sample of size 400,supports the company's assertion.

- 12:38
RICHARD WATERMAN [continued]: So let's see what happens if we were to takemultiple samples of size 400.So I titled this slide "Imagine" because again, we'renot going to do this in practice,we're going to learn what would happen if we were to do it.And then given what we learn, we'llbe able to figure out what to do with the one samplethat we have.

- 12:58
RICHARD WATERMAN [continued]: So consider getting lots of IID samples of loans,each of size 400.So lots of samples of size 400.Every time you get a new sample of size 400,it's going to provide you with its own personal x bar.And let's say we had k of these samples.Then I could list out the x bars from each sample

- 13:21
RICHARD WATERMAN [continued]: as x1 bar, x2 bar, all the way out to xk bar.So in my own mind, I imagine a spreadsheetwhere each time I take a sample, I find the sample mean x barand I drop that into a cell in the spreadsheet.Then I take another sample, it gives me another x bar,I drop it into the next cell of the spreadsheet.That would give me a column x bars.And what I'm thinking about now is

- 13:43
RICHARD WATERMAN [continued]: what are the properties of that column of x bars?And in particular, I'm going to askwhat's the average of those x barsand what the standard deviation of the x bars?Our two key summaries, but what you have to understand nowis that I'm not applying those summaries to the raw data,

- 14:03
RICHARD WATERMAN [continued]: I'm applying them to a set of x bars,where each x bar comes from one of these samplesthat I've taken.So here are the facts.And these facts I am not deriving,I'm presenting them to you.The first one is that the expected valueof the x bars, the average of the samplemeans-- not the raw data, the average of the sample

- 14:24
RICHARD WATERMAN [continued]: means-- actually just equals mu, the mean of the distribution.So, maybe that's not too surprising.The next fact is that the standard deviation of the xbars, so how spread out of the samplemeans themselves if I were to take many sample means?Well, it turns out that that has a very neat formula.

- 14:45
RICHARD WATERMAN [continued]: And that formula is sigma, the standard deviationof the raw data, divided by the square root of n,where n is the number of observationsgoing into the sample mean.The number of observations going into each sample meanis 400 in this particular example.So, the standard deviation of x bar

- 15:06
RICHARD WATERMAN [continued]: is sigma over the square root of n.So those are facts about the distributionof the x bar in terms of summaries of that distribution.The x bars are centered around the meanand they have a spread equal to sigma,the standard deviation of the raw data,divided by the square root of n.So the second of these two formulae that we saw,

- 15:28
RICHARD WATERMAN [continued]: the sigma over the square root of n formula,is often called the standard error of the sample mean,and sometimes the standard error of the mean.Now don't get confused.It is just the standard deviation of the x bars,but it's an important enough conceptthat we give it its own terminology, standard errorof the sample mean.And what that is telling you is how spread out the sample means

- 15:52
RICHARD WATERMAN [continued]: would be if I were to collect a whole bunch of sample means.Now, the important point about this formulais that we can use it to tell us about this hypothetical setof x bars.Hypothetical because we're not going to get lots of x bars.In practice, we're only going to get one.

- 16:13
RICHARD WATERMAN [continued]: But we can let this formula tell uswhat the spread of all of those x barswould be if we were to collect them.And the reason we can do that is because the formula justdepends on two things that we're going to know, or at least beable to estimate.The sigma, which is the standard deviationof the raw data and we are going to beable to estimate that with s when the time comes.

- 16:35
RICHARD WATERMAN [continued]: So the data deviation of the underlying data,it depends on that.And it depends on the sample size,the number of observations going into the meanthat we will know.In our particular example, we're using 400.So, it's not exactly a miracle but it's far from obviousat first that in fact based on one sample,we're going to be able to obtain enough information to tell us

- 16:59
RICHARD WATERMAN [continued]: what would happen had we got lots of samples.And so as I said, not an obvious idea,but that's why this formula is going to be so useful to us.Now, notice that in the formula, the sigma over the square rootof n, as n, the sample size or the batch size,which was 400 in our example, gets bigger,

- 17:19
RICHARD WATERMAN [continued]: so the standard error or the standard deviationof the sample mean gets smaller.And that shouldn't be surprising that the more observations youput into the sample mean, then the less variable that mean'sgoing to become.So, what's not necessarily obvious isthat the right divisor is square root of nrather than just plain n.

- 17:39
RICHARD WATERMAN [continued]: But it is, it's square root of n.So we've now got the formula thattells us how spread out the sample means are going to be.The standard deviation or the standard errorof the sample means is sigma over the square root of n.When the time comes, we're going to be able to estimate thatby replacing the population valuesigma with an estimate from our one batch of data, s.

- 18:05
RICHARD WATERMAN [continued]: So I wanted to show you a graph, a histogram of whatthese sample means would look like.Now, in order to do that, I'm goingto run a quick simulation.Now simulations are something of a technique.I'm not going to do them in this classbut they're a very helpful techniqueto get insight on problems and you will see them,For example, in the operations classes.

- 18:27
RICHARD WATERMAN [continued]: We'll talk about something called Monte Carlo simulation.And what we do there is use the ideas of probabilityto get an understanding about some featurethat I'm interested in.And so let's illustrate those ideas.And so, what I've done is create a hypothetical populationof 100,000 loans.So let's say the population is 100,000 loans.

- 18:49
RICHARD WATERMAN [continued]: And for illustrative purposes, let's saythat I know that the true mean mu is oneand the true standard deviation sigma is 0.1.So I'm creating a world in sort of to which I know the answers.But I want to show you what wouldhappen if you took lots of samples of size 400from such a population.

- 19:10
RICHARD WATERMAN [continued]: And what I've done is taken 50 random samples, each of size nequal to 400.And I want to calculate the x barfor each of those random samples and havea look at the distribution of the x bars themselves.Now the 50, here that's a totally arbitrary number, justfor illustrative purposes.I could have taken 100, I could have taken 25,

- 19:32
RICHARD WATERMAN [continued]: but I decided to take 50 to show you what's going on here.So, don't get confused by the 50.That's just a number that I've chosento illustrate this concept.So, I take 50 random samples, each of size 400,from this population of 100,000 loans.And for each random sample, I find its x bar.

- 19:55
RICHARD WATERMAN [continued]: Of course, since I took 50 random samples,there will be 50 x bars.I think of those 50 x bars as new data points, essentially,and I'm going to talk about the shapeof the distribution of the x barsand their standard deviation.So let's have a look and see whathappens when I take, in this particular case, 50

- 20:18
RICHARD WATERMAN [continued]: random samples and for each one I get an x bar.On this slide, I am drawing two different distributions.I'm drawing two different relative frequency histograms.The histogram on the left hand sideshows you the distribution of the data in the population.

- 20:39
RICHARD WATERMAN [continued]: Notice how it's centered around oneand has a pretty symmetric bell shaped looking curveassociated with it.Well, that's the distribution of the raw data.Now, the thought experiment here isI go into that population of the raw data, the 100,000observations in the population, and Itake a sample of size 400.

- 21:01
RICHARD WATERMAN [continued]: Based on that sample of size 400 I calculate the x bar.I then repeat that 50 times so I've got 50 x bars,50 sample means.Now, I have drawn the distribution of those 50 samplemeans in the graph on the right hand side.And I just want to make a couple comments about that graph.

- 21:21
RICHARD WATERMAN [continued]: First, note that it's centered around the population mean one.That's where from approximately the center of that distributionis, and that's illustrating the ideathat the average or the expected value of the x barsis just equal to mu.So it's centered around one.But the other thing that you should note hereis the spread of the x bars in the picture on the right hand

- 21:45
RICHARD WATERMAN [continued]: side is much, much less than the spread in the raw data,the histogram on the left hand side.And the way you convince yourself of thatis to look at the values on the horizontal axes in each plot.You will see that on the plot on the left hand side,there's much more spread in the raw data

- 22:06
RICHARD WATERMAN [continued]: than there is compared to the spread in the sample means.And the spread in the sample meansis much smaller than that in the raw databecause the spread is measured by sigmadivided by the square root of n, wheren is the batch size, the 400.And so we're shrinking the spread

- 22:27
RICHARD WATERMAN [continued]: when we look at the spread of the sample means.So, this is an illustration of whatwe would see had we had the ability to take a whole bunchof random samples.In this particular case, we are obviouslyonly going to take one.But if we were able to take many,then here's what we would see.

- 22:47
RICHARD WATERMAN [continued]: So we've now got our two first ideasrequired to make some statement about the company's claim.The first idea was taking an appropriate sample,the simple random sample.The second idea was the spread of the sample means themselves.

- 23:08
RICHARD WATERMAN [continued]: The third idea, the last one that we need,is called the central limit theorem.And what this is going to do is tell usthe shape of the distribution of the x bars.And what it says to us is that under sufficientlylarge repeated independent sampling from a distribution--

- 23:30
RICHARD WATERMAN [continued]: so within that sentence there are quite a lot of subtleties,and sufficiently large is probably the main onethere-- so it's dependent on the size of the samplethat we're taking.But given that we've got big enough batch size--and we've got n equal to 400 in our particular example, whichis going to be enough-- then it says something about the sample

- 23:53
RICHARD WATERMAN [continued]: means.So, it says that the sample mean,the x bars, have an approximate normal distribution.That's the key fact, that the shapeof the histogram of the x bars, if Iwere to collect many x bars, is goingto be normally distributed.

- 24:13
RICHARD WATERMAN [continued]: It's not an assumption.This central limit theorem says as long as the batch size islarge enough-- and n equal to 400is going to be large enough here--then we're going to see a normal distributionfor the distribution of the x bars.And furthermore, I've made these statements before.The expected value of the x bars is

- 24:34
RICHARD WATERMAN [continued]: mu, the standard deviation of the x bars, otherwise knownas the standard error of the x barsis sigma over square root of n, so the x barshave a normal distribution.And we know they mean and we know they standard deviation,so we know everything about the x bars.So that's what the central limit theorem says.

- 24:54
RICHARD WATERMAN [continued]: And because the x bars are going to be approximately normallydistributed, then we're going to beable to use the empirical rule again.Previously, when we used the empirical rule,we applied it to the raw data.The raw data was the Apple returns.Now, what we have learned is the x bars

- 25:16
RICHARD WATERMAN [continued]: are going to be normally distributed.So if I had a lot of x bars, I could apply the empirical ruleto them, given the central limit theorem.And in particular, one usage of the empirical rulehere would be to say that we anticipatethat 95% of sample means should lie within twostandard errors of mu.

- 25:39
RICHARD WATERMAN [continued]: So that's a direct application of the imperial rulewhen we use the empirical rule on the sample means.And we can do that by bringing to bear on the problemthe central limit theorem that says x bars are approximatelynormally distributed.What's so important about the central limit theorem

- 26:00
RICHARD WATERMAN [continued]: here is what it's not saying.And it's not making any comment about the shapeof the underlying distribution of the raw data.The underlying raw data can have any shape distributionit wants.So long as the sample size is large enough,then the distribution of the x barsis going to be approximately normally distributed

- 26:23
RICHARD WATERMAN [continued]: as the n gets larger and larger.And so, this is a very, very powerful ideabecause it doesn't matter what the raw data distribution lookslike as long as you've got a sufficient batch sizeand you're interested in the sample mean,then those sample means, the x bars,have a normal distribution.

- 26:45
RICHARD WATERMAN [continued]: And again, if something's got a normal distribution,we can apply the empirical rule to it.So, it's time for us to do that.Music: Repeater by Moby, courtesy of mobygratis.comBusiness Mathematics.Richard Waterman.

### Video Info

**Series Name:** Business Statistics

**Episode:** 13

**Publisher:** Wharton

**Publication Year:** 2014

**Video Type:**Tutorial

**Methods:** Statistical inference, Standard error, Standard deviations, Mean scores

**Keywords:** mathematical applications; mathematical concepts; mathematics; risk (business)

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Richard Waterman introduces statistics and the process of measuring risk. In statistics, var is a measure of risk that can be used to make decisions in business. Waterman also discusses simple random samples and the distribution of the sample mean.