Skip to main content icon/video/no-internet

Descriptive Statistics

Statistical approaches are subdivided into two major divisions, descriptive and inferential statistics. As the name implies, descriptive statistics entail describing, organizing, and summarizing information. Data are described by graphical methods, numerical indices, and tables. Descriptive statistics also often include a commentary discussing the data structure and any emergent patterns. In contrast, inferential statistics make inferences or estimations about a population from a sample through hypothesis testing and confidence intervals. Inferential statistics are associated with probability theory in order to reach conclusions about a variable beyond the data collected and determine the relative certainty of those conclusions. Statistical methodology, therefore, encompasses descriptive statistics that summarize data and inferential statistics that generalize data from a small group to a larger group. This entry focuses on descriptive statistics, revealing their primary goal, describing the three main types of descriptive statistics—measures of central tendency, measures of dispersion, and measures of distribution shape—and reviewing how graphics can illustrate these statistics.

The primary goal of descriptive statistics is to maximize information and communication effectiveness while minimizing the loss of information. Through a few quantitative values and/or graphical summaries, descriptive statistics reduce large data sets into a simpler, more manageable form. The challenge then is to determine which statistics best summarize the major characteristics of the data set, yet avoid misleading results.

Selecting the proper descriptive statistic is largely dependent on the data characteristics and underlying research goals. The data measurement level (nominal, ordinal, interval, or ratio) determines the types of mathematical operations possible. Calculation methods are also altered depending on whether or not the data are grouped (weighted) or ungrouped (unweighted). In spatial (geographic) data sets, the statistics employed and their interpretation are dependent on the study area boundaries, spatial resolution, and aggregation level (e.g., county, state). Regardless of the data properties, descriptive statistics cover three main types: (1) measures of central tendency, (2) measures of dispersion, or (3) measures of distribution shape.

Measures of central tendency indicate the middle or typical data value. The three most common measures of central tendency are the mean, median, and mode. The mean or average is the summation of all the values divided by the number of observations; hence, the mean can be strongly influenced by isolated values that are exceptionally large or small and are known as outliers. In contrast, the median is based on the middle position within a set of ranked values where the same number of data points lie above and below the middle value. The mode identifies the most frequent observation in a set of ungrouped data and is most appropriate for data sets with multiple tied observations. A less common measure of central tendency is the midrange, the average of the maximum and minimum values. The “best” measure of central tendency depends on the characteristics of the data distribution (e.g., relative symmetry, outliers) and inferential statistics requirements.

Measures of dispersion focus on the spread or variability in the data. The simplest measure is the range (not to be confused with midrange) or the difference between the maximum and minimum values. If outliers are present, then the range highlights only the data extremes, and other methods that showcase the amount of clustering or spread are needed. Quantiles divide the observations into equal amounts or percentages, usually in quartiles (quarters), quintiles (5th), or deciles (10th); thus, specific intervals within the distribution are examined. The most common dispersion measure is standard deviation, which integrates the least squares property of the mean (the difference between the data value and the mean) and accounts for variations in sample size. In general, relatively larger or smaller standard deviations indicate larger or smaller variability in the data set. The square of the standard deviation or variance is a frequent component in many inferential statistics applications; however, variance is not usually reported alone because the values can be extremely large and more difficult to interpret.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading