- 00:02
CHIRAG SHAH: Hi, here we're going to look at the problemof regression using R. This is where we have a setof independent variables-- it could be one,it could be several, which are also called predictors--and using those variables or features,

- 00:23
CHIRAG SHAH [continued]: we are interested in seeing if wecan predict the outcome or the response, which is the y.So we're interested in exploring the relationship between this xand y.And so the regression is essentially how the predictor--whether it's one variable or multiple--that relate to the response or the outcome.

- 00:47
CHIRAG SHAH [continued]: And linear regression is where we'retrying to do this using a straight line.So this is about finding the bestline that describes the relationship between x and y.So let's go ahead and actually take a look at it, using R.And we're going to use a data set called Size.

- 01:14
CHIRAG SHAH [continued]: This is where you have data about people's heightand weight.So we're going to first load it up here, using read dot table.And in this case, the file is in the same directoryof the current directory, but if it's not,

- 01:35
CHIRAG SHAH [continued]: you will have to give a full path.Or you can also use file dot choosefunction that will let you pick the file from your hard drive.And we'll see the header is true,and the data is separated using comma.

- 01:56
CHIRAG SHAH [continued]: And so we just loaded it up.And if you open it, this is what it looks like.It's got height and weight for 38 people.So we got 38 entries.Now, let's go out and plot this.So let's just make sure that we have the ggplot2 loaded up.

- 02:19
CHIRAG SHAH [continued]: And now we're going to use ggplot functionto plot this data we want.On the x-axis, we'll have height.And then y-axis is weight.

- 02:42
CHIRAG SHAH [continued]: And we want to create a scatterplot.So that's what we'll go with.And let's set some limits for the y-axis,just to make it a little bit more readable.Because people's weight would be somewhere in 100 to 200 range,

- 03:05
CHIRAG SHAH [continued]: so we're just setting that limit from 100 to 200.You don't have to, but it just makes the displaylook a bit more readable.So just run that.And here's our output.So we see the scatterplot here, there is a height here,weight here, and each point represents

- 03:27
CHIRAG SHAH [continued]: a person with corresponding height and weight.So there are 38 points here.Now, what we are interested here is the correspondence between--the relation between height and weight.So we wanted to see how height is related to weight?And for that we'll do a regression.

- 03:48
CHIRAG SHAH [continued]: The way to do regression is actually pretty simple.Actually, let's just go ahead and firstvisualize that, what it looks like.And so I'm going to just add another thingto the previous statement, and that's stat_smooth.And so this will create a statistical model

- 04:13
CHIRAG SHAH [continued]: using LM method, which is which stands for linear model.And I run this.And let's look at the output here.And this is what it looks like.So it's the same what we had previouslydone, the scatterplot.

- 04:33
CHIRAG SHAH [continued]: And now, the only thing is, there is this added line,and this is our regression line.So this line represents a relationshipbetween height and weight.And so now, if you know somebody's height, whichis on the x-axis, you can go to this line

- 04:54
CHIRAG SHAH [continued]: and find the corresponding weight.So that's what it does.So now you're able to predict somebody's weight,having known their height.So let's see how we can do this.To really do this--this is a visualization, that's nice, but to really do that,we use a LM function, which, again, stands for linear model.And we say that we want to find weight as a function of height,

- 05:25
CHIRAG SHAH [continued]: using the data set that we have.And so this is what it gives.And so this is our real linear model.This is our regression.And it's expressed using this--this is actually a line equation.So this is the weight related to height.

- 05:46
CHIRAG SHAH [continued]: And this weight is also called slope--so this is the slope of the line,and this is the intercept.So in other words, the full equationis weight equals 130.354 plus 4.113 times height.

- 06:14
CHIRAG SHAH [continued]: So this is a regression equation.It's not going to run just like that,but let's say we wanted to find out somebody's weight.So what we will do, we will put some height here.So let's say we put 70 as a height.

- 06:38
CHIRAG SHAH [continued]: So if we knew somebody's height being 70,the question is, what is their weight?And that's why we'd run this.And so what is that weight?That's 157.556.So that's our predicted value.And you can see that that's based on this line--so if you go here, 70 is the height,

- 07:01
CHIRAG SHAH [continued]: and the weight corresponding is 157.556.Now, let's look at the real data and seeif somebody has weight equals--sorry, height equals 70.And we have one person with height equals 70,and that person's actual weight is 151.

- 07:21
CHIRAG SHAH [continued]: And we predicted 157.55.So, of course, it's not right, but it'sour estimation, and that's as close as we could get,using this model.So that's the purpose of doing this linear modelingor regression, is to come up with such a model

- 07:46
CHIRAG SHAH [continued]: whether you visualize it here, or you have this kind of a lineequation, that allow you to take some valueof the independent variable, or the predictor,plug it into the equation, and come up with the outcomevalue of the response value.

- 08:07
CHIRAG SHAH [continued]: And of course, you can see, this linedoesn't cover all the points.In fact it doesn't cover most of the points,but it is the best possible line wecould get to cover these points as close as possible.So of course, line is not always the best option,

- 08:29
CHIRAG SHAH [continued]: and so if you wanted a slightly better modelthan the linear model, you can alsolet there be some kind of a curve.And we're not going to go into details for this,but this is just for your information,that you can have something like this.And now you have this kind of curvature,

- 08:51
CHIRAG SHAH [continued]: and that captures the relationship between heightand weight much better.And so we're not going to worry about that for now.What we need to understand is a linear regression,and it's very easy to do in R. All you have to do

- 09:12
CHIRAG SHAH [continued]: is use LM function.And this is how you express the relationship.So this is our dependent variable, or the response.And this is the independent variable, or the predictor.And this is our full data set.So the result is this kind of line equation.

- 09:34
CHIRAG SHAH [continued]: And once you had the line equation, as we saw,you can plug in any unknown value for height,and/or independent variable and get the prediction or estimatefor the dependent variable.So that's a regression using R.

- 09:54
CHIRAG SHAH [continued]: [PIANO MUSIC]

### Video Info

**Series Name:** Machine Learning for Data Science

**Episode:** 3

**Publisher:** Chirag Shah

**Publication Year:** 2018

**Video Type:**Tutorial

**Methods:** Linear regression, Regression analysis, Machine learning

**Keywords:** linear models; pattern analysis; prediction; programming and scripting languages; regression analysis; Scatterplot
...
Show More

### Segment Info

**Segment Num.:** 1

**Persons Discussed:**

**Events Discussed:**

**Keywords:**

## Abstract

Dr. Chirag Shah, PhD, shows how to perform basic linear regression in R using the "lm()" function, and how to create a scatter plot using "ggplot2".