Skip to main content icon/video/no-internet

An outlier is a data point that differs significantly from other data points within a give data set. Sometimes referred to as abnormalities, anomalies, or deviants, outliers can occur by chance in any given distribution. In large samples, there is an expectation of a small number of outliers and their presence alone does not suggest any anomaly and should not generate concern over the entire data set. However, outliers can also be indicative of measurement error, a skewed distribution, or data points from a different underlying distribution. Many statistical tests are sensitive to the presence of outliers and therefore the ability to detect an outlier is an important part of data analysis. Typically outliers are recording and measurement errors or incorrect distribution assumptions but can also reveal unknown data structures or suggest evidence of some novel phenomenon.

Outliers can have negative effects on data analyses, such as analyses of variance (ANOVAs) or regressions. They increase error variance and reduce the power of statistical tests and when they are not distributed across the data set, but generally fall on one extreme, they function to decrease normality. Therefore, they can influence tests that rely on distribution assumptions or introduce bias into parameter estimates. In such cases, it is important to identify outliers so that they can be dealt with appropriately, resulting in improved statistical analysis. However, outliers can also be valuable data points that reveal important information about the data set, its creation, or the data points themselves. For example, if the outlier is due to a mistake in data entry or instrument error, then researchers can correct those issues by appropriately entering the data or expunging the poor measurements. Other outliers point to normal and expected deviations in the population, such as extremes in human height or weight. Outliers could also suggest faults in a system, changes to how a system behaves, or abnormal behavior of the data in the system. Since the information contained in outliers is potentially so valuable, it is important that researchers, including communication researchers, know how to detect outliers, analyze them to determine why the outlier exists, and understand their impact. This entry examines the detection and analysis of outliers and outlier labeling methods.

Outlier Detection and Analysis

Outlier detection methods create probabilistic, statistical, or algorithmic models that characterize the normal behavior of the data and then based on that analysis identify what values should be considered outliers. Researchers must determine which model type to use for outlier detection and are influenced by several factors, including data type, data size, and the need for interpretability. Interpretability is important because it can explain why a data point is an outlier, providing the researcher valuable information about how to handle the outlier. The choice of the underlying data model is extremely important because outliers can only be determined based on the underlying distribution of the data. If the data are not modeled correctly, then data points will be erroneously characterized as outliers or as normal parts of the data sets.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading