Skip to main content icon/video/no-internet

Hypothesis testing is a formal procedure often employed in scientific research to test theories or models. To formulate such a test, usually some theory has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but it has not been proved yet. In effect, the researcher bets in advance of the experiment that the results will agree with his or her theory and cannot be accounted for by the chance variations that are involved in the sampling. Nowadays, there are at least four different schools of thought on the inferential significance testing: Fisherian, Neyman–Pearson, Bayesian, and likelihood inference. It would be fair to argue that none of these inferential statistical methods is without controversy.

Before discussing these controversies, this entry provides a brief history and describes classical hypothesis testing. Last, this entry discusses future research, including the use of nonparametric tests.

Brief History

Statistical significance testing has been dated to 1900 with Karl Pearson's publication of his chi-square “goodness-of-fit” test comparing data with a theoretically expected curve. In 1908, William Sealy Gossett (under the pen name of Student) set the stage for the “classical” hypothesis testing. He introduced the notions of the test statistic and its p value (the probability of obtaining a test statistic at least as extreme as the one that was actually observed, given that the null hypothesis is true). Ronald Aylmer Fisher is credited for overemphasizing the role of tests of significance using 5% and 1% quintiles of test statistics in his 1925 articles. In a series of articles written between 1924 and 1934, Jerzy Neyman and Egon Pearson developed the statistical hypothesis test procedures used in every statistics textbook today.

Classical Hypothesis Testing

There are two types of statistical hypotheses, the null and alternative, which deal with the numerical value of a specific population parameter. The null hypothesis is generally the opposite of the research hypothesis, which is what the investigator truly believes in and wants to demonstrate. The test is designed to determine whether an alternative hypothesis achieves the required level of statistical significance, which would justify the acceptance of the alternative hypothesis in preference to the null hypothesis. The decision rule is based on the relevant sampling distribution of the test statistic for the population parameter under the null hypothesis and the selected value of the significance level. The actual decision is based on the selected decision rule and the result of a random sample taken from the relevant population. The decision is to reject or fail to reject the null hypothesis. If the result of the test does not correspond with the actual state of nature, then an error has occurred. There are two kinds of error, which are classified as Type I error (when the true hypothesis is wrongly rejected) and Type II error (when a false hypothesis is not rejected), depending on which hypothesis has been incorrectly identified as the true state of nature. There are several statistical tests to choose from, and choosing the right one for a particular set of data can be an overwhelming task. It is well known that both errors' probabilities cannot be minimized simultaneously. The Neyman–Pearson decision-theoretical approach to finding the best test is to assign a small bound on Type I error probability and minimize Type II error probability.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading