Skip to main content
Search form
  • 00:06

    DR. CHIRAG SHAH: Hello.In this part, we're going to talk about linear regression.So we've seen before where linear regression isa process to which you can connect two quantities,more specifically, two continuous quantities.So the example is house size in square feet.

  • 00:34

    DR. CHIRAG SHAH [continued]: So square feet is a continuous quantity.And then you have dollar value of a house.And so if you have different data points,it kind of makes sense.As you get bigger in terms of the size of the house,

  • 00:54

    DR. CHIRAG SHAH [continued]: the cost goes up.And so what we do, we try to learn something from this.So this is our data, and we're trying to extract some pattern.In human terms, and intuitively speaking,as we just said, as the house size goes up, the price rises.But in what exact way?

  • 01:16

    DR. CHIRAG SHAH [continued]: And so we want to derive that relation in a more formal way.Now visually, this may be something like thisthat we find.So this is a line that is trying to connect as manyof these dots as possible.Of course, it's not going to actually connect them.But the goal here is to find a line that's

  • 01:37

    DR. CHIRAG SHAH [continued]: as close to all the collective dots as possible.And so for timing, let's just say this is the final result.We've gotten this line.And so this is our regression line.

  • 02:00

    DR. CHIRAG SHAH [continued]: So this is the outcome of learningthe relationship between house size and its value.And that outcome looks like a line.It's a regression line.Now keep in mind that we try to simplify things here,and we're just doing this in a two-dimensional space,but this can be extended to any number of dimensions.And in whatever number of dimensions,

  • 02:22

    DR. CHIRAG SHAH [continued]: you can still get the line or more generallyspeaking, the hyperplane.But we're not going to go there.And we're going to only look at this for the time being,because it's simpler.So here, this is our x.And this is our y.This is a notation that I hope we'll continue

  • 02:45

    DR. CHIRAG SHAH [continued]: as we see other problems, also.So x is your input variable or predictor.And y is your outcome variable or the outcome.So for a given house size, what is the value?Somehow we've learned this line or figured this line out.

  • 03:05

    DR. CHIRAG SHAH [continued]: And once you've done that, what it allows us to dois for any value of x, allows to predict y.Now, you may have seen this kind of thing before--that is a line.And if you remember that the lineequation is y is mx plus c.

  • 03:30

    DR. CHIRAG SHAH [continued]: So what are those things?This is your slope.So m is a slope associated with x.And again, in general terms, you could have x1, x2, x3or there could be many dimensions.And so you'll have corresponding m1, m2, m3 slopes corresponding

  • 03:53

    DR. CHIRAG SHAH [continued]: to that.But for time being, we're just keeping itto two-dimensional space.And so we have one x and one y.And so this m is a slope associated with This is your m.And this is your intercept or constant.

  • 04:17

    DR. CHIRAG SHAH [continued]: So in this case, this is your intercept.So in other words, to come up with a value of y,which is the outcome variable, youmultiply the value of x, whatever it is here,with the slope, and then add the constant,or the intercept to it.

  • 04:39

    DR. CHIRAG SHAH [continued]: And that gives you the corresponding value of y.So y equals mx plus c is our line equation.And in the future, when we try to start generalizing this,we might change that notation a little bitto make it possible to generalize to other dimensions.But for now, I'm going to just usethis, because we are going to stick to the two-dimensional.

  • 04:59

    DR. CHIRAG SHAH [continued]: So this, coming up with a line, is a regression problem.How do we do that?So we'll see a couple of methods here.But for now, I'm going to work on this manually,just so that we have an idea of how regression really works.

  • 05:20

    DR. CHIRAG SHAH [continued]: So let's take an example.And this is hypothetical data.This is not real.And I want to keep it simple.So imagine, here what we have--now, this is an example of supervised learning, whichmeans we have x and corresponding yso that we can learn the relationship,

  • 05:40

    DR. CHIRAG SHAH [continued]: so that in the future, when we see an x with a new kindof value, we can figure out what the value of y would be.And but what we are learning or also called training,we have the x and the corresponding y value.So let's say we have some data set with some values.

  • 06:18

    DR. CHIRAG SHAH [continued]: We'll just keep this much.And so if we were to plot this, x and y.

  • 06:41

    DR. CHIRAG SHAH [continued]: So 1 and 3--use a different color--here, 2 and 4, 3 and 5, 4 and 8.

  • 07:07

    DR. CHIRAG SHAH [continued]: So this is our plot.It's always four dots.And as you can see, it's not a very difficult problemto solve.But still, it allows us to do something,and we can do this manually.So the goal here is to come up with this line.So in other words, we want to come upwith y equals mx plus c.

  • 07:30

    DR. CHIRAG SHAH [continued]: Here, this is given.This is also given.This is supervised learning, so x and y are given.So what we're trying to figure out here values of m and c.These are the two unknowns here.So how do we do this?We do this by starting with some values of m and c.We can guess and see how well we do.

  • 07:52

    DR. CHIRAG SHAH [continued]: So let's say we start with the first guess.Let's say m equals 0, c equals 0.This is our first guess.So apply these values to this equationfor each of the x, y pairs.

  • 08:15

    DR. CHIRAG SHAH [continued]: And so we come out with y hat.And y hat is our prediction.y is the real value.We already know the real outcome, the truth,but y hat is our prediction.So this is what we are trying to do.So for x equals 1, we put this into this equation.

  • 08:35

    DR. CHIRAG SHAH [continued]: x is 1, m is 0, c is 0, so y is going to be 0, or rather,y hat, which is our prediction.For 2, 3, 4--and as you can imagine, for no matter what value of x we have,because m and c are 0, we're just going to have zeroes.

  • 08:57

    DR. CHIRAG SHAH [continued]: Now what we do is we find the error in our prediction.So the error would be y minus y hat squared.Why?Because so y is the real truth.y hat is our prediction.So we look at the difference of that.

  • 09:18

    DR. CHIRAG SHAH [continued]: What was the truth and what we predicted.But because the difference can be on either side--it could be positive, it could be negative,we could be off on this side, we could be off on this side--we square that difference.Because when we do that, everything becomes positive.

  • 09:38

    DR. CHIRAG SHAH [continued]: And that's an error.So error is anything--it doesn't matter if you're off this way or this way.The point is, we are off.And so that's an error.So we want to make sure that no matter which direction we are,it's still positive.It's still positive error.And that's why we square it.So in this case, you can imagine if you look at this one,

  • 10:02

    DR. CHIRAG SHAH [continued]: so this is 3 minus 0 squared, 9, 4 minus 0 squared, 16, 25,and so on.And then we sum it.So we add it all up.And so this gives us some value.

  • 10:25

    DR. CHIRAG SHAH [continued]: I'll just remove this.So let's see how much is that.So 114, that's our error.Don't worry about the scale now.Now, we move on.We go with the second guess.

  • 10:46

    DR. CHIRAG SHAH [continued]: And let's say our second guess is 1, 1.We'll just say 1 and 0.All right.We remove this for now.

  • 11:10

    DR. CHIRAG SHAH [continued]: So we do the prediction again.So we plug this in to this equation.And I encourage you to do this yourself right now,just to make sure you understand.This is why we are taking this example.And eventually, we're going to do this kind of automaticallyusing some code.But for now, we're intentionally trying

  • 11:30

    DR. CHIRAG SHAH [continued]: to do this by hand, because it allowsus develop this intuition, what really goes on in regression.That will help us understand a gradient descent and other kindof things coming up.So if you plug this is for x equal 1,this is the prediction for y hat.

  • 11:51

    DR. CHIRAG SHAH [continued]: And then I remove these, is 1.If x is 2, then it's 2.If x is 3--so in other words, because m is 1, and c is 0,we're just putting what x has.

  • 12:13

    DR. CHIRAG SHAH [continued]: And now we can--so this is our y hat.We can take a difference and square it.So we come up with some values here.And I'm going to let you figure this out.So make sure you actually practice this.At this point, really pause this video,and work out these numbers.

  • 12:34

    DR. CHIRAG SHAH [continued]: You can use a calculator.That's fine.But make sure you actually get these numbers.And then sum it up.And you'll get something here.We move on to--it doesn't matter.You can pick something.So if you do this, you get, for x equals 1, plug this in.

  • 13:00

    DR. CHIRAG SHAH [continued]: 2 plus 1 so you get 3.For x equals 2, you get 5.x equals 3, you get 6 plus 7, 9.So this is the prediction for these values of m and c.

  • 13:27

    DR. CHIRAG SHAH [continued]: So it gives us y hat.Once again, you find the difference and square it.And then total it.So as you can imagine, you can do all kindsof possible combinations here.And there are infinite possibilities.

  • 13:48

    DR. CHIRAG SHAH [continued]: Each of them give us a different line.What we are trying to find out is the best line.And how do you know the best line?The line that gives us the least amount of error,that is the best line.And what does the least amount of error indicate?It indicates that this is a line that is just balanced enough.

  • 14:09

    DR. CHIRAG SHAH [continued]: Because I removed this here, but say this is a different--these are data points.And this is the best line we've come up with.If you move that line in any direction--

  • 14:31

    DR. CHIRAG SHAH [continued]: so let's say if I move it here, sure, itgets closer to these points.But then it gets further away from these points.If I move it here, it gets closer to these points,but it gets further away from these points.So in other words, it is the best linebecause it finds the best balance for the points thatare given.If you move it in any direction, it might gain on one side,

  • 14:55

    DR. CHIRAG SHAH [continued]: but it might lose in the other side.And so there is no point-end.So that is the best line.And that is a result of regression.So once again, we tried different values of m and c.Plug that in, find the prediction for y,compare it with the real value of y by finding the difference

  • 15:18

    DR. CHIRAG SHAH [continued]: and squaring it.That gives the error.We sum the error.And that's our total error for this particular line.We keep doing it, and find the line thatgives us the smallest error.And that's our regression line, because it connects the data

  • 15:38

    DR. CHIRAG SHAH [continued]: in the best possible way.And when we have that line-- because we'vecome up with the line equation.When we come up with the line equation,the line goes to infinity, which means once you've learned this,in the future, whatever value of x we get,we can always find the value of y,on either, actually, either end of this.

  • 15:59

    DR. CHIRAG SHAH [continued]: And also, as we know from before about machine learning iswe can improve this.So this is the line that we've learned using the data now.But as we get more data, we can keep readjusting this lineso that it keeps learning from those new experiences, new datapoints, new observations.

  • 16:20

    DR. CHIRAG SHAH [continued]: And so that's a problem of regression.Now, as you can imagine, manually, thisis just very prohibitive.But we can really go through all these things manual.And here, we are only looking at four data points.You can imagine when you have a million data points,and a lot bigger data set, or when you have--

  • 16:42

    DR. CHIRAG SHAH [continued]: here, we are only considering one x,but you could have multiple x.There is still going to be one y,but you could have x1, x2, x3.So this just becomes prohibitively expensivemanually.So there are techniques.There are methods to try these different possibilities of m--and there could be m1, m2, many other slopes--

  • 17:06

    DR. CHIRAG SHAH [continued]: and c, and try to figure out the best possible line for a givenset of data.And that's the problem of regression.So in the future, we'll see how that process couldwork in terms of finding the best possible line by tryingall the possible combinations of m's and c.

  • 17:29

    DR. CHIRAG SHAH [continued]: And so that concludes our discussionon linear regression.[MUSIC PLAYING]

Video Info

Series Name: Machine Learning for Data Science

Episode: 2

Publisher: Chirag Shah

Publication Year: 2018

Video Type:Tutorial

Methods: Linear regression, Regression analysis, Machine learning

Keywords: data analysis; error analysis; pattern analysis; prediction; prediction (methodology); regression; regression analysis; Slope measures; variables (research) ... Show More

Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:



Using first principles, Dr. Chirag Shah, PhD, explains basic linear regression in two dimensional space to determine the best-fitting line containing the least amount of error.

Looks like you do not have access to this content.

Linear Regression: Theory

Using first principles, Dr. Chirag Shah, PhD, explains basic linear regression in two dimensional space to determine the best-fitting line containing the least amount of error.

Copy and paste the following HTML into your website