Skip to main content icon/video/no-internet

Simple Linear Regression

Linear regression is a form of statistical analysis whereby values on one variable (the outcome variable, denoted by Y) are predicted from values on another variable (the predictor variable, denoted by X) with which they are correlated. Here, “predict” does not necessarily have a temporal meaning but merely indicates that values on the outcome variable are estimated using values on the predictor variable. The analysis normally has one or both of two objectives: first, to obtain specific predicted values on Y that correspond to specific observed values on X; and second, to estimate the strength of this predictive relationship—that is, how well does X perform as a predictor of Y? The simplest case of linear regression, to be considered here, is where, in addition to the outcome variable, there is just one predictor variable; this is accordingly referred to as bivariate, or simple, regression. The case in which there are multiple predictors—multiple linear regression—is dealt with elsewhere.

Form of the Regression Model

The nature of the predictive relationship between the predictor variable and the outcome variable is expressed by two coefficients: the intercept (α) and the slope coefficient (β). These can be understood through a simple example. Imagine that a researcher wishes to predict students’ exam scores (Y), measured on a 0–100 scale in a sample of 491 students, from a scale that measures their attitudes to schooling (X), with scores ranging from 0 to 30 (higher scores indicate a more positive attitude). The slope coefficient is the change in Y that is associated with a one-unit increase in X. A coefficient of .26 would indicate that for an increase of one point on the attitude scale, the predicted exam score increases by .26 marks. This relationship is constant across the scale of values—so that for a change in X from 12 to 13, or from 22 to 23, the change in Y is of the same magnitude. This is the basis of the term linear regression—the predicted values lie on a straight line.

The intercept is the predicted value of Y when X is 0 and is a constant. In some cases, the intercept has no real meaning—for example, if age were the predictor, no individual in this sample could have an age of 0—and it may also take a value that is not possible on the scale (such as a negative age). Nonetheless, the intercept is required to calculate the predicted scores. This will be clear if we look at the predictive equation:

Y^=α+βX.

The symbol Y^ indicates the predicted value of the outcome variable. If we suppose that the intercept is 74.07, the predicted exam score for a student whose attitude score is 14 would be 74.07 + (0.26 × 14) = 77.71. Similarly, for a student with an attitude score of 21, it would be 74.07 + (0.26 × 21) = 79.53. Just as the intercept can be positive or negative, so can the slope. This will occur if the relationship between X and Y is negative. So, if we were seeking to predict exam performance from a measure of stress, we might find that a one-unit increase in stress is associated with a decrease in predicted exam score of, say, 1.4 marks and hence a negative slope coefficient of –1.4.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading