Skip to main content icon/video/no-internet

Regression analysis is the name for a family of techniques that attempts to predict one variable (an outcome or dependent variable) from another variable, or set of variables (the predictor or independent variables).

We will illustrate this first with an example of linear regression, also called (ordinary) least squares (OLS) regression. When people say “regression” without any further description, they are almost always talking about OLS regression. Figure 1 shows a scatterplot of data from a group of British ex-miners, who were claiming compensation for industrial injury. The x-axis shows the age of the claimant, and the y-axis shows the grip strength, as measured by a dynamometer (this measures how hard the person can squeeze two bars together).

Running through the points is the line of best fit, or regression line. This line allows us to predict the conditional mean of the grip strength—that is, the mean value that would be expected for a person of any age.

The line of best fit, or regression line, is calculated using the least squares method. To illustrate the least squares method, consider Figure 2, which is simplified, in that it has only four points on the scatter-plot. For each point, we calculate (or measure) the vertical distance between the point and the regression line—this is the residual, or error, for that point. Each of these errors is squared and these values are summed. The line of best fit is placed where it minimizes this sum of squared errors (or residuals)— hence, it is the least squares line of best fit, which is sometimes known as the ordinary least squares line of best fit (because there are other kinds of least squares lines, such as generalized least squares and weighted least squares). Thus, we can think of the regression line as minimizing the error (note that in statistics, the term error is used to mean deviation or wandering, not mistake).

None

Figure 1 Scatteplot Showing Age Against Grip Strength With Line of Best Fit

The position of a line on a graph is given by two values—the height of the line and the gradient of the line. In regression analysis, the gradient may be referred to as b1 or β1 (β is the Greek letter beta). Of course, because the line slopes, the height varies along its length. The height of the line is given at the point where the value of the x-axis (that is, the predictor variable) is equal to zero. The height of the line is called the intercept, or y-intercept, the constant, b0 (or β0), or sometimes α (the Greek letter alpha).

Calculation of the regression line is straightforward, given the correlation between the measures. The slope of the line (b1) is given by

None
None

Figure 2 Example of Calculation of Residuals

where

r is the correlation between the two measures,

sy is the standard deviation of the outcome variable, and

sx is the standard deviation of the predictor variable.

The intercept is given by

None

where

i is the

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading