Skip to main content icon/video/no-internet

Bootstrapping is an approach to properties of statistics, such as sampling variances, standard errors, and confidence intervals, that does not rely on a particular assumption about the shape of the distribution around a given statistic. Bootstrapping is therefore said to be a nonparametric approach to statistical inference. It can be particularly useful when the researcher does not know the theoretical distribution of a given test statistic or when no such distribution exists.

Bootstrap methods for evaluating statistics rely on data-based simulations wherein the observed data stand in for the population of interest. Measures of uncertainty around a statistic that are obtained via the bootstrap therefore might be thought of as being drawn from samples of a given sample, as bootstrapping is a computationally intensive method that uses a computer to “resample” from an original sample. It is through sampling (with replacement) that bootstrap methods allow researchers to make statistical inferences and engage in hypothesis testing. Because bootstrap methods do not force the researcher to make strong distributional assumptions (e.g., that a sampling distribution is normally distributed), they are often considered to be more powerful and more flexible than traditional approaches in many applications, such as when a sample comes from a non-normal distribution or contains high-leverage outliers.

This entry details the steps that are commonly involved in carrying out the nonparametric bootstrap, complete with a pair of motivating examples. It also discusses how bootstrapping can be used instead of the Sobel test, and other parametric tests, in making determinations about whether there exists a significant indirect effect of an independent variable on a given outcome of interest. Finally, this entry touches upon some of the other applications of bootstrapping, and details many of the advantages that such an approach has over other methods.

Bootstrapping: Origins and Execution

Although bootstrapping can be used to recover an estimate of any parameter (such as a sample mean), bootstrap methods are most commonly used to calculate measures of the “spread” around a given statistic that can be used in hypothesis testing. The term bootstrapping has its origins in the phrase “to pull oneself up by one’s bootstrap.” This phrase is commonly uttered in reference to an impossible task—something that simulation-based approaches to statistical inference certainly were prior to the advent of computers that were capable of handling such demands. The process of “booting up” a computer is similarly derived from this phrase, and refers to the early days of computing when starting a machine sometimes involved an iterative process of feeding it progressively more complex lines of code.

In understanding how bootstrap methods differ from more traditional, parametric approaches to statistical inference it is perhaps best to begin with reference to a simple example. Suppose that a professor wants to know whether the students in an upper-division communication course that she taught in the spring semester performed better, on average, than students in the same course the previous fall. Because it is an upper-division course, the final grades in both courses exhibited a negative skew, meaning that more students received an A or a B rather than either a D or an F. While this situation is good for the students, and might net the professor higher teaching evaluations, it creates something of a dilemma for her. If the grade distribution in both classes approximated a normal (symmetrical with equal area under the curve on either side of the mean), then testing the null hypothesis that the grade averages for the two classes are not statistically different from one another would be a straightforward matter, as the known properties of the normal distribution allow the professor to calculate measures of variance around the averages from each class. Because the distribution of grades in each class is skewed, however, standard errors around the average grade in each class that are calculated under the assumption that the sampling distributions are normally distributed (and therefore symmetric) are likely wrong.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading