Skip to main content icon/video/no-internet

Linear Regression

Linear regression is a statistical procedure that allows the prediction of the values of a continuous dependent variable Y based on values of categorical or continuous independent variable X. In other words, this refers to how much of variance in Y can be best predicted by X. The relationship between X and Y is linear (which is usually first proposed theoretically). This linear relationship could be positive (i.e., an increase in X results in some increase in Y), or negative, which is vice versa or flat. When graphed, these predictions can help find the best fitted line—the line that touches as many points as possible—and hence the term, linear regression.

Procedure of Linear Regression

An equation for simple linear regression can be represented as the equation of a line

Y = a + b X ,

where outcome or dependent variable is Y and a is the value of Y when mean of the population is X = 0. This is also a constant and referred to as the intercept. The gradient or the correlation coefficient is b and predicts how much Y will change with a one point increase in X.

The intention of a researcher is to predict an outcome variable based on values of an independent variable. This predicted value is represented by Y. In linear regression, there is an assumption that there exists a perfect or true score, but that score is unattainable due to error, which could be of any type such as systematic, random, biased by sample, and/or refuting the assumptions of regression (described in the next section). These errors acknowledge for e using error (e) in the equation of linear regression written as

Y = a + b X + e .

A multiple regression uses more than one predictor and can written as

Y = a + b 1 X 1 i + b 2 X 2 i + + b i X n i + e i .

In multiple regression, b1 represents the gradient of first predictor and X1 is the first independent variable followed by second and up to ith gradient and independent variable depending on the researcher’s inquiry.

Imagine that a researcher is interested in how time spent studying (X, in hours) affects exam performance (Y, measured in points possible from 0 to 50). Her findings are shown in Figure 1.

In this scatterplot with a fitted line, there is a straight line that shows a positive relationship between time and points gained on an exam. (For a multiple regression analysis, a researcher could be interested in any other factors besides Time accounting for Points scored on an exam. She could add Intelligence X2 and Interest X3 in the equation.) Figure 1 only concerns with simple linear regression where the line represents how a model fits on the given data where b can be calculated as

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading