Skip to main content icon/video/no-internet

Data trimming is the process of removing or excluding extreme values, or outliers, from a data set. Data trimming is used for a number of reasons and can be accomplished using various approaches. As social scientists, communication researchers often work with data sets that may require the removal of outliers to strengthen a statistic and accomplish a number of research goals. It is important to understand the impact outliers can have on data and the approaches available to eliminate or censor these extreme values without compromising the data set. This entry provides a detailed explanation of data trimming, including a brief review of alternative terminology, an overview of the most common statistical functions in which data trimming is used, an overview of a method (i.e., Winsorizing) related to but distinct from data trimming, and several research examples that demonstrate the use of data trimming.

Defining Data Trimming

Data trimming is applied to data sets when dealing with outliers. Outliers are extreme values that disrupt distributions in a data set. Cutting extreme values can be useful for the mean but not for the median. There is no single accepted standard for dealing with outliers in statistical processes. Statistical scholars John W. Tukey and Peter J. Huber have recommended the exclusion of 10% of the most extreme data points, 5% of the lowest and 5% of the greatest values in social science research. However, this is only a recommendation and more conservative approaches in outlier exclusion have been shown to demonstrate improved results as well. Addressing outliers is essentially dictated by the data set, the research goals, and the statistical functions being applied. Data trimming is integral to most data analysis efforts and is the result of checking the data for errors prior to analysis. Outliers or extreme values are oftentimes considered errors and can be remedied by going back over the collected data or questionnaires individually to discard incomplete or incorrect response items.

Alternative Terminology Applied to Data Trimming

There are a variety of alternative terms used to refer to data trimming. Data trimming is also known as a trimmed estimator, truncation, and truncated distribution. These terms are interchangeable and also indicate when extreme data is removed or excluded from a data set. However, some terms that are used interchangeably with data trimming are actually distinctly different processes. Two additional terms that are sometimes confused with data trimming are statistical censoring and Winsorizing. Statistical censoring is similar to data trimming, but there is one key distinction. In data trimming, outliers are removed prior to analysis. In statistical censoring, the outliers are also removed, but their removal is documented in the research report, which explicitly notes that outliers were removed from the data set and which bound they exceeded, upper or lower. A more detailed explanation of Winsorizing and how it compares to data trimming is provided later in this entry.

Common Statistical Functions That Can Require Data Trimming

When a statistic is sensitive to extreme values or outliers, data trimming can be both useful and necessary to ensure the integrity of the mean. Statistical functions that most often require data trimming are those that are impacted by outliers; these include analysis of variance (ANOVA), multiple regression (MR), t-tests, and correlations. These are common statistical functions and quantitative social science research efforts often contain one or more of these statistics in data analysis. More advanced statistical research efforts such as meta-analysis are also affected by outliers and become more complicated due to substantially increased sample sizes. To limit this effect, a general rule of thumb in meta-analysis is to only trim the most extreme outliers rather than a set percentage on both ends of the distribution.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading