Skip to main content icon/video/no-internet

A histogram is a method that uses bars to display count or frequency data. The independent variable consists of interval- or ratio-level data and is usually displayed on the abscissa (x-axis), and the frequency data on the ordinate (y-axis), with the height of the bar proportional to the count. If the data for the independent variable are put into “bins” (e.g., ages 0–4, 5–9, 10–14, etc.), then the width of the bar is proportional to the width of the bin. Most often, the bins are of equal size, but this is not a requirement. A histogram differs from a bar chart in two ways. First, the independent variable in a bar chart consists of either nominal (i.e., named, unordered categories, such as religious affiliation) or ordinal (ranks or ordered categories, such as stage of cancer) data. Second, to emphasize the fact that the independent variable is not continuous, the bars in a bar chart are separated from one another, whereas they abut each other in a histogram. After a bit of history, this entry describes how to create a histogram and then discusses alternatives to histograms.

A Bit of History

The term histogram was first used by Karl Pearson in 1895, but even then, he referred to it as a “common form of graphical representation,” implying that the technique itself was considerably older. Bar charts (along with pie charts and line graphs) were introduced over a century earlier by William Playfair, but he did not seem to have used histograms in his books.

Creating a Histogram

Consider the hypothetical data in Table 1, which tabulates the number of hours of television watched each week by 100 respondents. What is immediately obvious is that it is impossible to comprehend what is going on. The first step in trying to make sense of these data is to put them in rank order, from lowest to highest. This says that the lowest value is 0 and the highest is 64, but it does not yield much more in terms of understanding. Plotting the raw data would result in several problems. First, many of the bars will have heights of zero (e.g., nobody reported watching for one, two, or three hours a week), and most of the other bars will be only one or two units high (i.e., the number of people reporting that specific value). This leads to the second problem, in that it makes it difficult to discern any pattern. Finally, the x-axis will have many values, again interfering with comprehension.

Table 1 Fictitious Data on How Many Hours of Television Are Watched Each Week by 100 People

Table 2 The Data in Table 1 Grouped into Bins

The solution is to group the data into mutually exclusive and collectively exhaustive classes, or bins. The issue is how many bins to use. Most often, the answer is somewhere between 6 and 15, with the actual number depending on two considerations. The first is that the bin size should be an easily comprehended size. Thus, bin sizes of 2, 5, 10, or 20 units are recommended, whereas those of 3, 7, or 9 are not. The second consideration is esthetics; the graph should get the point across and not look too cluttered.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading