Skip to main content icon/video/no-internet

A residual is something left over after a primary analysis has been conducted. Usually, the issue most often is applied to multiple regression. Essentially, multiple regression is an attempt to use predictor variables in combination to predict a dependent variable. The following equation provides a general view of this process:

Y = C + b1X1 + b2X2 + b3X3 + + biXi + e

This equation indicates that the value of the dependent variable (Y) is based on a constant number (C) and a combination of predictor variables (X) that are weighted by a coefficient (b). The last term in the equation represents the “error” term (e). What is meant by “error” in this case is that all elements not part of the prediction constitute error or what should be a random and unpredictable part of the equation necessary to make the two sides of the equation balance.

The sum of residuals (error) in the entire set of data should equal zero. Thus, if one considers the score for the dependent variable Y and the amount of error (either positive or negative), one can plot on a diagram the amount of error for each value. The assumption that the error is random means that when examining this distribution, no discernible pattern should exist. The analysis of residuals refers to an attempt to determine whether or not a pattern exists and why.

Plotting Residuals

The typical technique involves the production of a scatterplot permitting a visual examination in addition to more formal statistical analysis. Various possibilities exist that may require attention. Each separate test or assessment requires some effort. The possibility exists that the explanation for the residuals requires a combination of multiple explanations and consideration of several different possible sources.

The first option suggests that the residual pattern is random and within expected tolerance limits for the overall pattern and individual predictions. Under these conditions, the multiple regression equation provides a set of predictions that work without any of the problems identifiable on the basis of residuals. The errors that exist are random, based on the normal assumptions related to sampling error.

The first consideration is the existence of outliers. An outlier involves a single value whose error is so great that there is a distortion to the estimation of the process. Consider a set of values where the error value for one case is 100. Suppose that the next largest positive value for the error is 10. Remember, that the sum of the errors should be zero. The result is that the level of error for this single case becomes so large that the rest of the values are in a sense distorted because almost all of the error becomes associated with this single case. The most frequent remedy is removal of the identified case. The problem with a random sample is that under sampling, the extreme value may simply reflect a randomly drawn outlier.

In theory, 50% of the errors should be positive and 50% should be negative. Often this percentage is distorted by an outlier, but if the percentage is greater or lesser by a significant amount, some adjustment may be required. The assumption is that error should operate as a normal distribution around a mean (zero) and a departure from that indicates some element that may require adjustment or consideration. An extreme outlier in the analysis of residuals indicates the source of a significant potential departure from the normal curve considerations. The definition of a random extreme outlier would be that the mean of the sample adjusts only slightly while the variance (variability) observes a tremendous drop. Under those conditions, the outlier often is considered simply the result of a random chance factor associated with sampling.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading