Skip to main content icon/video/no-internet

Standard deviation

A data set or sample is reasonably described by where the data points are centered (central tendency), how much spread or dispersion there is among the data points, and its frequency distribution (i.e., the shape of its histogram). This information allows interpretations and further calculations to be made from the data. Where the data set approximates a normal (bell-shaped) distribution, the mean is the best measure of central tendency, although medians are better central tendency measures for other distributions. If the distribution is approximately normal, then the standard deviation (SD) indicates the dispersion of the data.

It might be expected that the estimate of dispersion would be based on the average of the deviations of each data point from the mean, ignoring whether the deviations were positive or negative. However, some valuable statistical procedures (e.g., multiple linear regression, analysis of variance) rely on the square of the deviations rather than the absolute deviations. Therefore, the most commonly reported measures of dispersion—the variance and the SD—are also based on the square of the deviations.

The variance is calculated by first finding the deviation of each score (X) from the mean (M), [XM], squaring each deviation (XM)2, and then adding these squared deviations together to obtain the sum of squares (SS):

SS=XM2.

The variance of the sample is the average of this SS, obtained by dividing the SS value by the number of scores (N) in the sample. Thus, the variance is given by SS ÷ N. This measure of variance is very useful in many statistical calculations, but, because of the squaring of the deviations, it is out of scale with the original data. The problem of scale is addressed by taking the square root of the variance to give the SD, so compensating for the squaring of the deviations in the calculation of the variance. Thus, the SD is SS ÷ N.

When working with a sample of data, these formulae tend to slightly underestimate the population SD (or variance). The correction for this underestimate is to divide by N − 1, rather than N, yielding slightly higher values. Therefore, the formulae that are usually used to calculate these statistics are:

Standard Deviation=SS÷N   1 and Variance=SS÷N1.

The SD should always be reported for reasonably normally distributed data because it provides a good idea of the data’s variability. Figure 1 illustrates the percentage of scores expected to occur within each SD. For example, 68% of the scores fall within one SD of the mean, 96% within two SDs, and 99.96% within three SDs. Data points more than three SDs from the mean are highly unlikely if the data are normally distributed, which is why these data points are often scrutinized and removed as outliers in a sample.

Figure 1 A normal frequency distribution with standard deviations (SDs) noted

Figure

The SD also provides the means of calculating further useful statistics, including standardized (z) scores, standardized effect sizes such as Cohen’s d, and, in combination with the sample size, the standard error of the mean and confidence intervals.

See also Descriptive Statistics; Distributions; Effect Size; Histograms; Normal Distribution; Variance; Z

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading