• Summary

Search form
• 00:01

INSTRUCTOR: Going beyond these simple regressions.So just revisiting that term, if someone says,I've done a simple regression, whatthey mean is they've run a regressionmodel with a single input.Going beyond simple regression, we get to multiple regression.Now the fact is that the world canbe a pretty complicated place, so let's go back and think

• 00:22

INSTRUCTOR [continued]: about our car example.So in that one, we were interested in lookingat the association between the weight of a car and its fueleconomy.But the weight of the car isn't the only driver, so to speak,of the fuel economy of a car.There are other features of the carthat are important, as well.

• 00:43

INSTRUCTOR [continued]: One of those other features wouldbe the size of the engine.The bigger the engine, then the less fuel efficient the vehicleis, the more gallons it needs to go into the tank.And when we start thinking about additional predictivevariables, beyond a single one, then what we're talking aboutis a multiple regression.So the idea is that the world's a pretty complicated place,

• 01:06

INSTRUCTOR [continued]: we might need a somewhat complicated modelto adequately capture how the world or the business processis working.And multiple regression gives us a chance to do that.And so in the fuel economy data set,we might be interested in adding in the horsepowerto the model as an additional predictive variable.Talking about the diamonds data set,well we could use the weight of the diamond

• 01:28

INSTRUCTOR [continued]: as a predictor of price.But people who are into diamonds knowthat they are four C's associated with diamonds.We have talked about the weight, which is carats,but there's also color, cut, and clarity.And those other features, if we incorporated,might enhance the quality of our regression model.

• 01:49

INSTRUCTOR [continued]: And so in the diamond's data set as a second step,I might be interested in introducing color to the model.When I add in additional predictive variablesthen what I'm doing is running a multiple regression.So formulaically to show you what a multiple regressionwould look like, if I had two variables--let's call them X1 and X2 in general--then with two variables, our equation for the regression

• 02:14

INSTRUCTOR [continued]: is expected value of Y-- so I'm discussing the formulaat the bottom of the slide-- the expected value of Y,that means the average of Y, the mean of Y,is now a function of two variables, X1 and X2.So the straight line there between the Y and the X1,we articulate that as given.The expected value of Y given X1 and X2 is equal to,

• 02:36

INSTRUCTOR [continued]: we start off with our straight line formulation, b0 plus b1X1,but now with an additional variable,we just throw in plus b2X2.So that's our formulation of the multiple regression.In our spreadsheet or data set, we'dhave one column that contained Y.We'd have a second column that contained X1,a third column that contained X2,

• 02:57

INSTRUCTOR [continued]: our data would be in that format,and then we would run our multiple regression,and what the method of least squareswould produce for us are the estimates b0, b1, and b2.So that will give us our multiple regression equation.So let's now have a look at that multiple regression modelin a little bit more detail.So I've added weight and horsepower

• 03:18

INSTRUCTOR [continued]: as predictors of fuel economy.When I do that, I end up with a new model,as compared to the simple regression.I've got the expected value of fuel economyas measured by gallons per 1,000 miles in the city.So the average gain, regression, gives usa model for the average of Y. So we'vegot the expected value of fuel economy now given the weight

• 03:39

INSTRUCTOR [continued]: and horsepower of the vehicle.And using the method of least squares,we get estimates for the coefficients as 11.68for the intercept plus 0.0089 times weightplus 0.0884 times horsepower.You can use this equation to do prediction.You come to me with a vehicle that weighs 2,000 pounds

• 04:02

INSTRUCTOR [continued]: and has 300 horsepower.I can take those values, plug theminto the regression equation, and predict the fuel economyof the vehicle.Note, though, that once we increasethe dimension of the problem, we'vegone from simple regression to multiple regression,if we want to visualize what's going on,which can be very useful, we're goingto need three dimensions to do that,

• 04:23

INSTRUCTOR [continued]: to look at the raw data in all its glory, so to speak.So we have three variables to deal with.We have weight, we have horsepower,and we have fuel economy.That means we need three dimensionsto have a look at the data.Hence, the picture that you can seeon the bottom right-hand side of this slideis a three-dimensional picture.Each point represents a car.

• 04:46

INSTRUCTOR [continued]: And it has coordinates for weight, horsepower, and fueleconomy.And what the multiple regression model is doingis no longer fitting just a line through here--doesn't make sense if we're in high dimensions-- itputs the best fitting plane through the data.So the analog of the line in the simple regressionis a plane in the multiple regression.

• 05:09

INSTRUCTOR [continued]: It still fits through the method of least squares.This is the plane that best fits the data in the sensethat it minimizes the sum of the squaresof the vertical distance from the point to the plane now.So there's our least squares plane.Now I said that we can use this plane for doing forecasting,but we still have our one-number summaries around.

• 05:31

INSTRUCTOR [continued]: Those one-number summaries of the regressionwere R squared and RMSE.If we calculate R squared for this multiple regression,it comes out to be 84%.In the simple regression model, it was 76%.So our R squared has increased.So we've explained more variationby adding in this additional variable,and we've also reduced the value of root mean squared error.

• 05:55

INSTRUCTOR [continued]: Root mean squared error is now only 3.45,so if we wanted to create on an approximate 95% predictioninterval for the fuel economy of a vehicle,as along as its weight and horsepower werein the range described within this data set-- so we'reinterpolating, not extrapolating outside the range--so as long as we're interpolating,

• 06:16

INSTRUCTOR [continued]: we can use our 95% prediction interval rule of thumbagain and say up to the plane plus or minus twice the rootmean squared error.So this model, this regression model,will give us 95% prediction intervalsof a width of about plus or minus 7, twice 3.45.So that's the precision with which we can predict

• 06:38

INSTRUCTOR [continued]: based on the current model.So through the prediction interval,we get a sense of the uncertainty of our forecast.So root mean squared error is really a critical summaryof these regression models.So just summarizing this slide, the multiple regressionallows us to estimate the least squares plane.

• 06:60

INSTRUCTOR [continued]: Once we've got this multiple regression equation,we can use it for prediction, so long as we have a rootmean squared error estimate, which we do,and we're working within the range of the data,we can put the two things together-- the forecastand the root mean squared error--to come up with a 95% prediction interval.So there's a brief look at multiple regression.

• 07:21

INSTRUCTOR [continued]: It's the technique that one woulduse to make more realistic quantitative models of businessprocesses.

Video Info

Series Name: Fundamentals of Quantitative Modeling

Episode: 6

Publisher: Wharton

Publication Year: 2014

Video Type:Tutorial

Methods: Multiple regression

Keywords: mathematical concepts

Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:

Keywords:

Abstract

Richard Waterman explains how to use a multiple regression model to make predictions based on more than one factor. Because they use more than one factor, multiple regression models are in three dimensions instead of two.