- 00:00
[MUSIC PLAYING]I'm now going to start off by summarizingthe material that I've covered in today's module.There's quite a bit of it.But going back to the beginning, we thought about data.

- 00:20
And we decided that we would like to summarize data sets.We did that both graphically and numerically.Graphically, we looked at the box plot and the histogram.Numerically, we talked about measures of centrality, whichwere mean and median, and measuresof spread, which were the variance,the standard deviation, and we could also think about spreadas the distance between the quartiles

- 00:42
as illustrated through the box plots.I then went on to introduce the normal distribution.One of the things about the normal distribution,otherwise known as the bell curve,is that it is characterized by two key one-number summaries,the mean and the standard deviation.With the normal distribution in place,

- 01:04
I talked about the empirical rule.The empirical rule can be appliedwhen you have an approximate normal distributionfor the raw data.We can then apply the empirical rule to it.We did some empirical rule calculations, some probabilitycalculations.The empirical rule that lets you make useful statements.For example, if the data is approximatelynormally distributed, then 95% of the data

- 01:26
should lie within two standard deviations of the mean,that sort of thing.I also talked about value at risk.And that calculation was a calculationto measure the risk, one measure of risk, of a positionthat you might have in a stock.But it was simply an application of the empirical rule.

- 01:49
The next set of ideas were to do with making decisionsin the presence of uncertainty.We wanted to create a confidence interval.But in order to create a confidence intervaland understand the ideas behind a confidence interval,there were three ideas.The first one was that of a simple random sample.And that was a way of collecting datathat ensured that the sample that you had

- 02:10
was representative of the population,that what you learned about the samplecould be legitimately extrapolated to the population.We then talk about the standard error of the sample mean.And that required a thought experiment, the experimentbeing, what if I were to take a lot of samplesand for each sample, I calculated an x bar?

- 02:33
What's the standard deviation, or the standard error,of those x bars.And remember the formula was very neat, very tidy, sigmaover the square root of n.Final idea before we could make our decisionwas the central limit theorem.And that's an incredibly powerful theorem.And what it says to you, essentially,is that for sufficient sample sizes,

- 02:55
x bar is approximately normally distributed.The consequence of that is that because x bar is approximatelynormally distributed, you can apply the empirical ruleto the x bar distribution itself.But you have to know the standard deviationof the object you're looking at.If you're looking at x bars, then each standard deviation

- 03:17
is measured through standard error, sigmaover the square root of n.So bottom line, we can apply the empirical rule to the x bars.By doing that, we were able to create confidence intervals.And the particular confidence intervalwe looked at was the confidence interval for the populationmean.The key idea behind that was the statement

- 03:38
that is driven by the empirical rule,if 95% percent of the time, x baris within two standard errors of mu,then we can say that 95% of the time muis within two standard errors all x bar.Based on that, we could create our confidence intervalfor the population mean.We did that in the example when we

- 03:59
were trying to verify whether or notmanagement's claim regarding the payback rate of the loanswas reasonably 92%.We created a confidence interval based on the data,and we looked to see whether or notmanagement's claim was sitting inside that confidenceinterval.And in our particular example it wasn't.So that gave us evidence to doubt management's claim.

- 04:22
Final set of ideas were to do with relationships between twovariables but now within this realm of a certainty,this statistical realm.And the main idea that we use to measurethe association between two variablesis called correlation.It is a measure of linear association, one

- 04:44
of those reasons why straight lines are such important modelsto understand.So we talked about correlation.Given that we believe there was an approximate linearrelationship between y and x, it'sreasonable to put a line of best fit through the data.That takes us to the idea of regression.

- 05:05
The best fit line was defined as the line thatminimizes the sum of the squares of the vertical distancefrom the point to the line.And once we've got that best fit line,we are able to go to work on the formula,do useful things with it, for example predictive modeling,optimization, and so on.

- 05:27
And that's a summary of what happened in module 10.

### Video Info

**Series Name:** Business Statistics

**Episode:** 16

**Publisher:** Wharton

**Publication Year:** 2014

**Video Type:**Tutorial

**Methods:** Descriptive statistics, Confidence intervals, Correlation

**Keywords:** value at risk

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Richard Waterman concludes his module on business statistics by summarizing what was discussed throughout the series.