Skip to main content icon/video/no-internet

The box and whisker plot was developed by John Tukey to summarize visually the important characteristics of a distribution of scores. The five descriptive statistics included on a box plot are the minimum and the maximum scores (i.e., the extremes of the distribution), the median (i.e., the middle score), and the 1st (Q1) and 3rd (Q3) quartiles. Together these statistics are useful in visually summarizing, understanding, and comparing many types of distributions.

In a box plot, the crossbar indicates the median, and the length (i.e., height) of the box indicates the interquartile range (IQR) (i.e., the central 50% of the data values). The length of the whiskers indicates the range of scores that are included within 1.5 IQRs below and above the 1st and 3rd quartiles, respectively.

Box plots are particularly useful for investigating the symmetry of a distribution and for detecting inconsistent values and outliers. Outliers, which are scores that are more than 1.5 IQRs below Q1 or above Q3, are plotted individually on a box plot. In a normal distribution, about 1% of the scores will fall outside the box and whiskers. The symmetry of a distribution is indicated by where the median bifurcates the box (in a symmetrical distribution, the median is close to the center of the box) and by the length of the whiskers (in a distribution with symmetrical tails, the whiskers are of similar length).

Figure 1 summarizes the descriptive statistics and displays the box plots for 100 randomly selected IQ scores and for the subset of all scores that are greater than 99. Figure 1 shows that variable “Random” is roughly symmetrical, with three low IQ-score outliers. Variable “>99” is slightly positively skewed (the median is closer to Q1 than to Q3, the upper whisker is longer than the lower whisker, and there are no outliers).

None

Figure 1 Box Plots of 100 Normally Distributed IQ Scores and a Subset of All IQ Scores Greater Than 99

The box plot is a classic exploratory data analysis tool that is easy to construct and interpret. It is resistant to small changes in the data (up to 25% of the scores can change with little effect on the plot) because its major components are the median and the quartiles. When one is interpreting a box plot, the following limitations should be noted:

  • Quartiles (also called “hinges”) are defined differently in various computer programs, and these differences can produce very different-looking plots when sample sizes are small.
  • Although the 1.5 IQR value is used in most computer programs to draw the whiskers and define outliers, this value is not universal.
  • Using the box plot to detect outliers is a conservative procedure. It identifies an excessive number of outlying values.
  • Box plots may not have asymmetrical whiskers when there are gaps in the data.
  • Because the length of the box indicates only the spread of the distribution, multimodality and other fine features in the center of the distribution are not conveyed readily by the plot.

In order to address some of these limitations, other forms of the box plot have been developed. For example, the variable-width box plot is used to indicate relative sample size, the notched box plot is used to indicate the confidence interval of the median to enable comparisons between centers of distributions, and the violin plot combines the box plot with density traces to show multimodality and other fine-grain features of the distribution.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading