Skip to main content icon/video/no-internet

Smoothing methods attempt to capture the underlying structure of data that contain noise. Noise in data may result from measurement imprecision or the effect of unmeasured variables, and noise tends to mask the structure of a data set or the relationship between two variables. Smoothers aim to eliminate this noise while making very few model assumptions about the distribution of the data.

Common smoothing techniques include density estimation and (nonparametric) regression curve estimation. Density estimation uses sample data for a single variable to estimate the population distribution of that variable. While parametric models (e.g., normal distribution, exponential distribution) can be fit for such data, smoothers typically assume only that the variable is continuous with a smooth density function. Nonparametric regression models the relationship between two (or more) variables without assuming a specific functional form (such as linear or quadratic) for the regression curve.

The smooth curve (whether a density estimate or regression curve) is often determined by “local weighting”—that is, the curve value at any point on the graph is a weighted average of the observed data values near that point, with data closest to the point receiving the greatest weight. Most smoothing methods incorporate some type of tuning parameter that allows the user to control the smoothness of the estimated curve.

This following example (Figure 1) relates the top speed and gas mileage of 82 car models using data from the U.S. Environmental Protection Agency and available online at http://lib.stat.cmu.edu/DASL/Datafiles/carmpgdat.html.

The smooth curve shows the negative association between speed and mileage, reflecting that mileage decreases steeply for low top-speed values and more gradually for large top-speed values. The flat region in the middle of the plot is a feature that would probably be invisible on an examination of the raw data. This graph was produced using the “lowess” function (a local regression technique very similar to the loess function) of the free-source statistical software package R.

None

Figure 1 The Nonparametric Regression of Cars' Mileage on Their Top Speeds Yields This Smooth-Curve Estimate of the Relationship Between These Variables

David B. Hitchcock

Further Reading

Bowman, A. W., & Azzalini, A.(1997).Applied smoothing techniques for data analysis: The Kernel Approach with S-Plus illustrations.New York: Oxford University Press.
Simonoff, J. S.(1996).Smoothing methods in statistics.New York: Springer.
  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading