Skip to main content icon/video/no-internet

Error resides on the statistical side of the fault line separating the deductive tools of mathematics from the inductive tools of statistics. On the mathematics side of the chasm lays perfect information, and on the statistics side exists estimation in the face of uncertainty. For the purposes of estimation, error describes the unknown, provides a basis for comparison, and serves as a hypothesized placeholder enabling estimation. This entry discusses the role of error from a modeling perspective and in the context of regression, ordinary least squares estimation, systematic error, random error, error distributions, experimentation, measurement error, rounding error, sampling error, and nonsampling error.

Modeling

For practical purposes, the universe is stochastic. For example, any “true” model involving gravity would require, at least, a parameter for every particle in the universe. One application of statistics is to quantify uncertainty. Stochastic or probabilistic models approximate relationships within some locality that contains uncertainty. That is, by holding some variables constant and constraining others, a model can express the major relationships of interest within that locality and amid an acceptable amount of uncertainty. For example, a model describing the orbit of a comet around the sun might contain parameters corresponding to the large bodies in the solar system and account for all remaining gravitational pulls with an error term.

Model equations employ error terms to represent uncertainty or the negligible contributions. Error terms are often additive or multiplicative placeholders, and models can have multiple error terms.

Additive: E = MC2 + ∊, where ∊ is an error term perfecting the equation

Multiplicative: y = α + xβ

Other: y = eβ(x + ∊ME) + ∊ where ∊ME is measurement error corresponding to x, and ∊ is an additive error term.

Development of Regression

The traditional modeling problem is to solve a set of inconsistent equations—characterized by the presence of more equations than unknowns. Early researchers cut their teeth on estimating physical relationships in astronomy and geodesy—the study of the size and shape of the earth—expressed by a set of k inconsistent linear equations of the following form:

where the xs and ys are measured values and the p + 1βs are the unknowns.

Beginning in antiquity, these problems were solved by techniques that reduced the number of equations to match the number of unknowns. In 1750, Johann Tobias Mayer assembled his observations of the moon's librations into a set of inconsistent equations. He was able to solve for the unknowns by grouping equations and setting their sums equal to zero—an early step toward Σ∊i = 0. In 1760, Roger Boscovich began solving inconsistent equations by minimizing the sum of the absolute errors (Σ|∊i|=0) subject to an adding-up constraint; by 1786, Pierre-Simon Laplace minimized the largest absolute error; and later, Adrien-Marie Legendre and Carl Friedrich Gauss began minimizing the sum of the squared error terms (Sigma;∊2i = 0) or the least squares. Sir Francis Galton, Karl Pearson, and George Udny Yule took the remaining steps in building least squares regression, which combines two concepts involving errors: Estimate the coefficients by minimizing Sigma;∊2i = 0, and assume that . This progression of statistical innovations has culminated in a family of regression techniques incorporating a variety of estimators and error assumptions.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading