Outlier Analysis

Matthew J. Gill

doi:10.4135/9781483381411

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Outlier Analysis

By: Matthew J. Gill
In:The SAGE Encyclopedia of Communication Research Methods
Chapter DOI:https://doi.org/10.4135/9781483381411.n405
Subject:Communication and Media Studies, Sociology

Request Permissions

Show page numbers Hide page numbers

An outlier is a data point that differs significantly from other data points within a give data set. Sometimes referred to as abnormalities, anomalies, or deviants, outliers can occur by chance in any given distribution. In large samples, there is an expectation of a small number of outliers and their presence alone does not suggest any anomaly and should not generate concern over the entire data set. However, outliers can also be indicative of measurement error, a skewed distribution, or data points from a different underlying distribution. Many statistical tests are sensitive to the presence of outliers and therefore the ability to detect an outlier is an important part of data analysis. Typically outliers are recording and measurement errors or incorrect distribution assumptions but can also reveal unknown data structures or suggest evidence of some novel phenomenon.

Outliers can have negative effects on data analyses, such as analyses of variance (ANOVAs) or regressions. They increase error variance and reduce the power of statistical tests and when they are not distributed across the data set, but generally fall on one extreme, they function to decrease normality. Therefore, they can influence tests that rely on distribution assumptions or introduce bias into parameter estimates. In such cases, it is important to identify outliers so that they can be dealt with appropriately, resulting in improved statistical analysis. However, outliers can also be valuable data points that reveal important information about the data set, its creation, or the data points themselves. For example, if the outlier is due to a mistake in data entry or instrument error, then researchers can correct those issues by appropriately entering the data or expunging the poor measurements. Other outliers point to normal and expected deviations in the population, such as extremes in human height or weight. Outliers could also suggest faults in a system, changes to how a system behaves, or abnormal behavior of the data in the system. Since the information contained in outliers is potentially so valuable, it is important that researchers, including communication researchers, know how to detect outliers, analyze them to determine why the outlier exists, and understand their impact. This entry examines the detection and analysis of outliers and outlier labeling methods.

Outlier Detection and Analysis

Outlier detection methods create probabilistic, statistical, or algorithmic models that characterize the normal behavior of the data and then based on that analysis identify what values should be considered outliers.[Page 1169] Researchers must determine which model type to use for outlier detection and are influenced by several factors, including data type, data size, and the need for interpretability. Interpretability is important because it can explain why a data point is an outlier, providing the researcher valuable information about how to handle the outlier. The choice of the underlying data model is extremely important because outliers can only be determined based on the underlying distribution of the data. If the data are not modeled correctly, then data points will be erroneously characterized as outliers or as normal parts of the data sets.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Outlier Analysis

Outlier Detection and Analysis

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends