- 00:00
[Wharton: University of Pennsylvania][Business Mathematics][Module 10: Statistics Part 2.4 Introduction to Statistics]

- 00:09
RICHARD WATERMAN: We now have the three ideas that weneed to answer the question.The question of interest again was,does this one sample of size 400 that Ihave access to provide me with any evidenceto either corroborate management's claimthat the payback rate was 92%, or does it

- 00:31
RICHARD WATERMAN [continued]: give me evidence again that?The way that I'm going to put the three ideas togetherto answer the question is to create what'scalled a confidence interval.So I want to now discuss confidence intervals.And we will use the confidence intervalin just a minute when we create itto answer management's claim about the average payoff rate.

- 00:54
RICHARD WATERMAN [continued]: So what is a confidence interval?Well, a confidence interval is goingto be a range of feasible values for some unknown populationparameter.And so in our particular scenario that we're looking at,the unknown population parameter is the population mean,the average payback rate on these loansover the entire population.

- 01:15
RICHARD WATERMAN [continued]: So I am going to provide a range of feasible valuesfor what I think mu could be.And-- and it's the second part that's the most important--we're also going to provide a statement conveying what'scalled the confidence that the range of feasible values that Icreate actually includes the unknown population value.

- 01:38
RICHARD WATERMAN [continued]: So I think of the second state one as a meta-level statement.It talks about the first statement.The first statement is, here's a range of values for whereI think the answer is.And the second statement is, and here'show confident I am that that range of valuesincludes the true one that I am interested in.And you have to be a little careful here.

- 01:58
RICHARD WATERMAN [continued]: Conference, it really is a term of art.So how am I going to create a confidence interval?Well, the answer is invert the empirical ruleI told you the empirical rule has a lot of legs.And here's one of the places where it gets applied.The idea is that, based on the Central Limit Theorem,

- 02:22
RICHARD WATERMAN [continued]: the x bars are approximately normally distributed.Therefore I can apply the empirical rule to them.And as I said on the previous slide, 95% of sample meanslie within two standard errors of mu.That's the empirical rule applied to the sample mean.Well, if 95% of the time the sample meanis within two standard errors of mu, then 95% of the time,

- 02:44
RICHARD WATERMAN [continued]: the true and unknown mu is going to bewithin two standard errors of the sample mean.And so I don't know what the true mean is, obviously.If I did, I wouldn't need to do any statistics.I don't know what the mu is.But what I can do is create this confidence interval, whichis essentially to take the sample meanand plus or minus two standard errors from it,

- 03:05
RICHARD WATERMAN [continued]: and use that as the range of feasible values,so inverting the empirical rule.I'll say it one more time.95% of the time, the sample mean is within two standard errorsof mu.And 95% of the time, the true and unknown muis within two standard errors of the sample mean.The sample mean is something that I'mgoing to be able to calculate if I can get a handle on what

- 03:27
RICHARD WATERMAN [continued]: the standard error is.Combine that with the sample mean,it's going to give me my range of values,feasible values for mu.So why are these confidence intervals so important?Well, one of the things they do is move awayfrom a single estimate to a range of values.And that's just simply more realistic.So if I were to say to you, what do you

- 03:49
RICHARD WATERMAN [continued]: think the S&P 500 is going to closeat next-- at the end of this year, and I tell you--and I say, you can only give me a single number,now maybe you'd say, 1846, for example.You're almost bound to be wrong if you justgive a single number.So that's not particularly useful.If you give a range of values, then you'vegot some-- it's more realistic in terms of presenting

- 04:13
RICHARD WATERMAN [continued]: some form of estimate.And when we give a range of values,we call that an interval estimate.A single estimate is sometimes called a point estimate.So it's simply more realistic to give a range.And the second reason why these confidence intervalsare so important is that we get to makethis meta-level statement, our confidenceabout the first statement.

- 04:35
RICHARD WATERMAN [continued]: And we're going to work at 95% confidence.We're about to create a 95% confidence intervalfor the population mean, mu.Once I've got one of these intervals--I still haven't constructed it yet.I'm going to do that in just a second.But once I've got that interval, how do Iuse it to make a decision?Well, we are going to consider, in our particular case,

- 05:01
RICHARD WATERMAN [continued]: management's claim.Management claims that the average performanceof the loans is a 92% payback rate.And I want to determine whether that's reasonablebased on the audit sample that I have.So what I do is create a confidence intervalbased on my audit sample of 400 observations,and then I simply look to see whether or not

- 05:23
RICHARD WATERMAN [continued]: 0.92 lies inside that interval.By definition of the confidence interval,it's the range of feasible values.So when I look to see whether or not0.92 lies inside that interval, I'm simply saying,is 0.92 a feasible value?So if 0.92 is within the interval that I create,then it's feasible, and I have no basis

- 05:46
RICHARD WATERMAN [continued]: to doubt management's claim.If, on the other hand, 0.92 does not lie inside the interval,then 0.92 is not a feasible value,and I have evidence against the management claim.So that's the basic idea.Take your confidence interval, whichis generated from the data that you have, and see whether

- 06:09
RICHARD WATERMAN [continued]: or not the value of interest.In this case, the 0.92 lies inside or outside the interval.If it's inside the interval, then there'sno evidence against this value being a feasible value.So I've got no reason to take management to task.If, on the other hand, 0.92 is not inside the interval,then it's not feasible based on the data.

- 06:29
RICHARD WATERMAN [continued]: And I'd want to have a discussion with management.I'd say, I'm sorry but the data does notsupport your assertion.So that's how we're going to use it to make a decision.So I want to now tell you how the confidence interval is,in fact, constructed.And what I've drawn here is yet another bell curve, and yetanother normal distribution.

- 06:50
RICHARD WATERMAN [continued]: But look at the axis, the axis of change this time,or labeling of the axis.What I'm drawing here is the sampling distributionof the sample mean of the x bars.And this is this theoretical idea, this hypothetical,what if I were to get lots of x bars, lots of samples,and draw all the histogram of those samples?

- 07:14
RICHARD WATERMAN [continued]: Well the Central Limit Theorem tells methat the distribution of the x bars is normal.Hence, the bell curve that you can see.We also know that the x bars, on average, are centered at mu.And so there's mu sitting here in the centerof the normal distribution.And we also know the standard error,or the standard deviation of the x bars,

- 07:35
RICHARD WATERMAN [continued]: is sigma over the square root of n.And so now when I move away from mu,I am counting how many standard errors I am away from mu.And so this normal distribution hereis representing the distribution of x bar,

- 07:57
RICHARD WATERMAN [continued]: and is justified by an appeal to the Central Limit Theorem.So the thought experiment is that when I take a sample--and in our particular case, a sample size 400--and I calculate the x bar, I'm essentiallyputting my hand into a bag of normally distributedrandom variables, and pulling one out.And the normal distribution from which I'm drawing is x bar,

- 08:20
RICHARD WATERMAN [continued]: is centered around mu and has a standard deviation,or standard error, sigma over the square root of n.So I can ask probability questions about the x bar.And the one that I have been teeing up the whole timeis, what's the probability that the x baris more than two standard errors away from mu?And the normal distribution here tells me the answer.

- 08:43
RICHARD WATERMAN [continued]: That it's approximately the-- the probabilitythat it's more than two standard errors away from muis approximately 5% by appeal to the empirical rule.So putting it all together now, using the Central LimitTheorem and the empirical rule, we learn that there's

- 09:04
RICHARD WATERMAN [continued]: an approximate 95% chance that if I wereto take another sample and get an x bar,it's going to be within two standard errors of mu,two sigma over the square root of n.Unfortunately, we don't know what mu is,but we can invert the previous statementto say there's a 95% chance that mu is within two

- 09:25
RICHARD WATERMAN [continued]: standard errors of x bar.And if I were to write that as a formula,I would get x bar plus or minus 2 sigma over the square rootof n.That provides an approximate 95% confidence interval for mu.So find your x bar and go out plus or minustwo standard errors from it.

- 09:46
RICHARD WATERMAN [continued]: There's your 95% confidence interval.So notice the two is coming from the empirical rule, and hencethe 95% confidence.You could create different levels of confidenceif you wanted to.I'm not going to here.But let's say you wanted to create a 68% confidenceinterval, then you would go out one standard error from x bar.

- 10:07
RICHARD WATERMAN [continued]: If you wanted a 99.7% confidence interval,you go out three standard errors.And let's say you wanted a 90% confidence interval,then you go out 1.645 standard errors from x bar.So that's your range of feasible values.We are going to work here at 95% confidence,so we'll use the two in the formula.

- 10:29
RICHARD WATERMAN [continued]: There's still one wrinkle that we've got to deal with.And the wrinkle is, we don't know what sigma is.We're going to know what x bar is because we'regoing to collect some data.We're going to know what n is.It's going to be our sample size, 400in this particular problem.But we don't know sigma.What are we going to do?Well, there's only one thing to do, and that's estimate it.

- 10:49
RICHARD WATERMAN [continued]: Sigma is the population standard deviation,so we're going to estimate that with the sample standarddeviation.We're going to replace the sigma with s.Now, when we do that, there are some nuances associatedwith this substitution, and I willleave those to the stat class professors to discuss with you.

- 11:12
RICHARD WATERMAN [continued]: I would just note that by adding in a little bitmore uncertainty, replacing sigma withsthere need to be some minor tweaks to the formula.That's how I would summarize that.But with reasonable sample sizes--and here we've got a sample size of 400--this is going to provide a good working rule of thumb.

- 11:36
RICHARD WATERMAN [continued]: And it's absolutely something that I would use.If you give me a piece of software,I'll let the piece of software do it exactly.But as a good working rule of thumb, this formula is fine.You've just got to remember to replace the sigma with s.So let's go ahead and do that.Time to calculate the confidence intervalbased off of our single sample of size little n equal to 400.

- 12:01
RICHARD WATERMAN [continued]: It turns out that the sample meanwe observed was 0.87 based on our audit sample of size 400.It turned out that the sample standard deviation, s,was 0.08.So again, I'm just presenting these numerical summariesto you calculated by software.But there's our sample mean, and there's our sample standard

- 12:21
RICHARD WATERMAN [continued]: deviation.Based on the estimate of s, I can estimate the standard errorof the mean itself as s over the square root of n.Remember the formal formula is sigmaover the square root of n, but I don't know sigma,so I replace it with s.So s over the square root of n is 0.08

- 12:41
RICHARD WATERMAN [continued]: over the square root of 400.Square root of 400 is 20, and you'llget that the standard error is 0.004.Now to create the confidence interval, the approximate 95%confidence level, I do x bar, which is 0.87, plus or minustwice because I'm working at 95% confidence,

- 13:03
RICHARD WATERMAN [continued]: s over the square root of n, plus or minus twice 0.004.I then go to my calculator-- or I could probablydo this one in my head, but it's not super hard--and I get that the approximate 95% confidence interval goesfrom 0.862 to 0.878.So now I've got some numbers that I'm looking at.

- 13:24
RICHARD WATERMAN [continued]: Based on the sample that I have observed,a set of feasible values for the populationmean mu is between 0.862 and 0.878, so between 86%roughly and 88%.Management claimed that it was 92%.

- 13:45
RICHARD WATERMAN [continued]: And standing here as an auditor, basedon my audit sample of size 400, itdoes not corroborate management's claim.This claim based on the data does notinclude the 0.92 or the 92%.So I can use the confidence interval to make my decision.And what I'm going to have to say to management

- 14:06
RICHARD WATERMAN [continued]: here is, you claim that the payback rate was 92%,but based on my sample your claim is not supported.And presumably at that stage, some steps are taken.So summarizing this slide, you cansee that the actual calculation of the confidence

- 14:27
RICHARD WATERMAN [continued]: interval itself is not complicated.You need x bar, you need s, and you need n.But the ideas behind creating this confidence interval,and understanding where it comes from,those are a little bit more complicated.You need one, the idea of sampling,that you've got a legitimate sample to extrapolate backto the population.

- 14:48
RICHARD WATERMAN [continued]: You need two, the idea of the standard error of the samplemean telling you how spread out those sample meansare, and essentially telling you howspread out the bell curve that-- or the normal distributionassociated with the sample means is.So you need the standard error of the mean.And you need this idea of Central Limit Theorem thatsays x bars have an approximate normal distribution

- 15:10
RICHARD WATERMAN [continued]: for sufficient batch or sufficient sample sizes.With those ideas in place, we can use the empirical rule.That's what those three ideas are required for here.We can use the empirical rule.And we're using the empirical-- this rule this time,not on the raw data, but on the sample means themselves.And the key statement is, if 95% of the time x bar

- 15:34
RICHARD WATERMAN [continued]: is within two standard errors of mu,then 95% of the time, the true but unknown muis within two standard errors of x bar.I then go out, collect some data,calculate my specific value for x bar, estimate sigma with s,and lo and behold, I got my 95% confidence interval,

- 15:54
RICHARD WATERMAN [continued]: and I can then use that to make a decision.So this is the underlying frameworkfrom which we make decisions in the presence of uncertainty.So I just want to summarize this one more timein terms of the decision making from the specific examplethat we have.The question was, is there evidence

- 16:16
RICHARD WATERMAN [continued]: to reject the true population meanrepayment percentage is 0.92?So that's a much more formal way of saying,management comes to us with 92%.Do we believe it, or does the data support it?We get an interval, 0.862 to 0.878,and it does not include the value 0.92.Therefore, we can say there's evidenceat the 95% level of confidence, to reject

- 16:40
RICHARD WATERMAN [continued]: management's assertion that the population mean is 0.92.So that's how we would formalize the statement.And intuitively what has happened hereis that we observed a value of x bar, 0.87,and it simply not concordant with the true population meanbeing equal to 0.92.

- 17:02
RICHARD WATERMAN [continued]: 0.87 is a long way from 0.92.How far is far though?Well, when we're doing statistics and talkingabout x bar, we need to count "farness"by how many standard errors we are away from 0.92.And the bottom line is that we'remore than two standard errors away from 0.92.

- 17:22
RICHARD WATERMAN [continued]: That's equivalent to this 95% confidence intervalnot containing 0.92.So the data is not concordant with management's claim.We believe the data, and we reject management's claim.That's how we put it all together.[Music: Repeater by Moby, courtesy of mobygratis.com][Business Mathematics]

- 17:43
RICHARD WATERMAN [continued]: [Wharton: University of Pennsylvania]

### Video Info

**Series Name:** Business Statistics

**Episode:** 14

**Publisher:** Wharton

**Publication Year:** 2014

**Video Type:**Tutorial

**Methods:** Confidence intervals, Standard error, Normal distribution, Sampling

**Keywords:** estimates; estimation; estimators

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

In part 2.4 of his series on business mathematics, Professor Richard Waterman explains statistics and confidence intervals. A confidence interval is a range of feasible values for an unknown population parameter. Waterman also discusses the central limit theorem and using the confidence interval to make a decision about data.