- 00:01
[FUNDAMENTALS OF QUANTITATIVE MODELING.Richard Waterman.Module 4: Regression models.Wharton ONLINE.]

- 00:01
RICHARD WATERMAN: Welcome to Fundamentalsof Quantitative Modeling-- Regression Models.In this module, we're going to talk about a regression model.We'll define what that is.We'll discuss questions that a regressionmodel is able to answer for us.We're going to talk about correlationand linear association, because these regression models are

- 00:21
RICHARD WATERMAN [continued]: examples of linear models.We're going to discuss the mechanics of fitting a lineto data.We are going to spend some time interpreting the outputfrom these regression models.And we're going to talk about prediction,and, in particular, prediction intervalsfrom a regression model.We will briefly see the topic known as multiple regression.

- 00:45
RICHARD WATERMAN [continued]: It allows us to create potentially morecomplicated and realistic models of a business process.We'll end up by talking about logistic regression, whichis an appropriate form of regression to use whenthe outcome variable is a categorical variable, typicallya 0, 1 outcome, a Bernoulli random variable.

- 01:08
RICHARD WATERMAN [continued]: So what is a regression model?Well, a simple regression model usesa single predictive variable that we oftengive the letter x to, to estimatethe mean, or the average, of an outcome variable yas some function of x.And so the example that I have here

- 01:29
RICHARD WATERMAN [continued]: plots the price against the weight of a set of diamonds.Each point in the plot represents a diamond.The x-coordinate is the weight of the diamondand the y-coordinate is the price of the diamond.So there's a single predictor.That's why we say a simple regression.

- 01:50
RICHARD WATERMAN [continued]: And what a regression model will dois say, at any given value of x, what doyou expect y, the price, to be?So that's the idea of a regression model.It's a model for the mean of y as a function of x.And very frequently we will use a linear model

- 02:11
RICHARD WATERMAN [continued]: to capture the relationship.So continuing with the diamonds example--the predictive variable is the diamond's weight,and the outcome variable is the price of the diamond.Now, we can see, just by looking at the plot,that heavier stones-- bigger diamonds--

- 02:32
RICHARD WATERMAN [continued]: tend to cost more money.We would often term that positive association.But we can go beyond that simple statementby using a regression model that willformalize the idea of the associationand more precisely define what value we

- 02:54
RICHARD WATERMAN [continued]: expect the price to be at for any given weight of a diamond.So we're going to formalize how the expected price varieswith weight.And as I just said, one of our most frequently used waysof capturing that relationship is with a straight line.And we'll then call it a linear regression.And the formula that you can see at the bottom of this slide

- 03:17
RICHARD WATERMAN [continued]: is how I would write a regression model.And it says that the expected value-- that'sthe average of y-- and then the straight line there meansgiven.We articulate that as given.The expected value of y given x--the expected price of a diamond given its weight--

- 03:39
RICHARD WATERMAN [continued]: is then equal to some function of x.And the most straightforward functionthat we might choose to use is a linear function.And we write the linear function, in this instance,as b0 plus b1 times x.Sometimes you will have seen the equation of a straight linewritten as y equals mx plus b.This is still a straight line, but we

- 03:60
RICHARD WATERMAN [continued]: have a slightly different notation typicallyin the regression model.And there's a reason for that.And the reason is that there's a form of regression calledmultiple regression, which has many x's in.And then we can use a notation that incorporatesb0, b1, b2, b3, et cetera.So we subscript the coefficients.b0 is still the intercept and b1 is still the slope.

- 04:24
RICHARD WATERMAN [continued]: So a regression model is relating the average of yto a particular value of x.And it's not at all uncommon to assert that that association isat least approximately linear.And in that case, we're doing a linear regression.On this slide, I have overlaid the straight line

- 04:45
RICHARD WATERMAN [continued]: model that is calculated from the underlying data.I haven't told you how this line is calculated yet--I will in a few minutes-- but there's the regression line.And the slope and intercept in this particular instanceare presented in the formula below.The expected value of the price of a diamond, given its weight,

- 05:08
RICHARD WATERMAN [continued]: is equal to -260.That's the intercept plus 3,721 times the weight,where the weight is measured in carats.So that's what a linear regressionis going to do for you.It's going to put a line through the data, basically.And once you've got a line going through the data,there are a number of useful things

- 05:29
RICHARD WATERMAN [continued]: that you're going to be able to do with that.So there's a quantitative model thathas been derived from underlying data.So we let the data talk to us in the sensethat the data chose the best fitting line.Now, there's a very commonly used numberto describe the strength of what we term linear association.

- 05:51
RICHARD WATERMAN [continued]: So, essentially, how close are the points to a line?And the way that we capture that is through a conceptcalled correlation.So correlation is a measure of the strengthof a linear association.And correlation is typically given a letter.We call r the sample correlation.And it's a fact that the correlation will always

- 06:12
RICHARD WATERMAN [continued]: lie between minus 1 and plus 1.If you have a negative value to the correlation,then you have negative association.That would be a line from the top left to the bottom right.If you had positive correlation--you got positive association-- thatwould be a line from the bottom left to the top right.But if you have zero correlation, what that means

- 06:33
RICHARD WATERMAN [continued]: is that there's no linear association.It doesn't actually mean there's no association between the twovariables, just as there's no linear associationbetween the two variables.Now, how would you calculate the correlation in practice?The answer is with a computer program or a spreadsheet.So we won't worry about the detailsof the actual calculation.It will happen.

- 06:54
RICHARD WATERMAN [continued]: And I have calculated the correlationfor the diamonds data set.And it turns out to be 0.989.There's a correlation which is an incredibly strongcorrelation as far as correlations go.And it's just asserting the fact that the points reallydo lie very close to a straight line.So a linear model is quite reasonablein this particular instance.

### Video Info

**Series Name:** Fundamentals of Quantitative Modeling

**Episode:** 1

**Publisher:** Wharton

**Publication Year:** 2014

**Video Type:**Tutorial

**Methods:** Regression analysis, Correlation, Multiple regression

**Keywords:** mathematical formulas

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Richard Waterman explains what a regression model is, defines correlation and linear association, and discusses the types of questions a regression model can answer.