A confidence interval is an interval estimate of an unknown population parameter. It is constructed according to a random sample from the population and is always associated with a certain confidence level that is a probability, usually presented as a percentage. Commonly used confidence levels include 90%, 95%, and 99%. For instance, a confidence level of 95% indicates that 95% of the time the confidence intervals will contain the population parameter. A higher confidence level usually forces a confidence interval to be wider.
Confidence intervals have a long history. Using confidence intervals in statistical inference can be tracked back to the 1930s, and they are being used increasingly in research, especially in recent medical research articles. Researchersand research organizations such as the American Psychological Association suggest that confidence intervals should always be reported because confidence intervals provide information on both significance of test and variability of estimation.
A confidence interval is a range in which an unknown population parameter is likely to be included. After independent samples are randomly selected from the same population, one confidence interval is constructed based on one sample with a certain confidence level. Together, all the confidence intervals should include the population parameter with the confidence level.
Suppose one is interested in estimating the proportion of bass among all types of fish in a lake. A 95% confidence interval for this proportion, [25%, 36%], is constructed on the basis of a random sample of fish in the lake. After more independent random samples of fish are selected from the lake, through the same procedure more confidence intervals are constructed. Together, all these confidence intervals will contain the true proportion of bass in the lake approximately 95% of the time.
The lower and upper boundaries of a confidence interval are called lower confidence limit and upper confidence limit, respectively. In the earlier example, 25% is the lower 95% confidence limit, and 36% is the upper 95% confidence limit.
A significance test can be achieved by constructing confidence intervals. One can conclude whether a test is significant based on the confidence intervals. Suppose the null hypothesis is that the population mean μ equals 0 and the predetermined significance level is α. Let I be the constructed 100(1–α)% confidence interval for μ. If 0 is included in the interval I, then the null hypothesis is accepted; otherwise, it is rejected. For example, a researcher wants to test whether the result of an experiment has a mean 0 at 5% significance level. After the 95% confidence interval [–0.31, 0.21] is obtained, it can be concluded that the null hypothesis is accepted, because 0 is included in the confidence interval. If the 95% confidence interval is [0.11, 0.61], it indicates that the mean is significantly different from 0.
For a predetermined significance level or confidence level, the ways of constructing confidence intervals are usually not unique. Shorter confidence intervals are usually better because they indicate greater power in the sense of significance test.
One-sided significance tests can be achieved by constructing one-sided confidence intervals. Suppose a researcher is interested in an alternative hypothesis that the population mean is larger than [Page 212]0 at the significance level .025. The researcher will construct a one-sided confidence interval taking the form of (–∞, b] with some constant b. Note that the width of a one-sided confidence interval is infinity. Following the above example, the null hypothesis would be that the mean is less than or equal to 0 at the .025 level. If the 97.5% one-sided confidence interval is (–∞, 3.8], then the null hypothesis is accepted because 0 is included in the interval. If the 97.5% confidence interval is (–∞,–2.1] instead, then the null hypothesis is rejected because there is no overlap between (–∞,–2.1] and [0, ∞).
Confidence intervals for a population mean are constructed on the basis of the sample mean distribution.
After a random sample of size N is selected from the population, one is able to calculate the sample mean , which is the average of all the observations. Thus the confidence interval is that is centered at with half of the length is the upper α/2 quantile, meaning P(Zα/2 ≤ Z) = α/2. Here Z is a standard normal random variable. To find zα/2, one can either refer to the standard normal distribution table or use statistical software. Nowadays most of the statistical software, such as Excel, R, SAS, SPSS (an IBM company, formerly called PASW® Statistics), and Splus, have a simple command that will do the job. For commonly used confidence intervals, 90% confidence level corresponds to z0.05 = 1.645, 95% corresponds to z0.025 = 1.96, and 99% corresponds to z0.005 = 2.56.
The above confidence interval will shrink to a point if N goes to infinity. So the interval estimate turns into a point estimate. It can be interpreted as if the whole population is taken as the sample; the sample mean is actually the population mean.
Suppose a sample of size N is randomly selected from the population with observations x1, . This is the sample mean. The sample variance is defined as , denoted by s2. Therefore the confidence interval is . Here is the upper α/2 quantile, meaning and TN–1 follows the t distribution with degree of freedom N–1. Refer to the t-distribution table or use software to get tN–1,α/2. If N is greater than 30, one can use zα/2 instead of tN–1,α/2 in the confidence interval formula because there is little difference between them for large enough N.
As an example, the students’ test scores in a class follow a normal distribution. One wants to construct a 95% confidence interval for the class average score based on an available random sample of size N = 10. The 10 scores are 69, 71, 77, 79, 82, 84, 80, 94, 78, and 67. The sample mean and the sample variance are 78.1 and 62.77, respectively. According the t-distribution table, t9,0.25 = 2.26. The 95% confidence interval for the class average score is [72.13, 84.07].
This is a common situation one might see in practice. After obtaining a random sample of size N from the population, where it is required that N ≥ 30, the sample mean and the sample variance can be computed as in the previous subsections, denoted by and s2, respectively. According to the central limit theorem, an approximate confidence interval can be expressed as .
One will select a random sample from each population. Suppose the sample sizes are N1 and N2. Denote the sample means by 1 and 2[Page 213]and the sample variances by s12 and s22, respectively, for the two samples. The confidence interval is .
If one believes that the two population variances are about the same, the confidence interval will be , where .
Continuing with the above example about the students’ scores, call that class Class A: If one is interested in comparing the average scores between Class A and another class, Class B, the confidence interval for the difference between the average class scores will be constructed. First, randomly select a group of students in Class B. Suppose the group size is 8. These eight students’ scores are 68, 79, 59, 76, 80, 89, 67, and 74. The sample mean is 74, and the sample variance is 86.56 for Class B. If Class A and Class B are believed to have different population variances, then the 95% confidence interval for the difference of the average scores is [–4.03, 12.17] by the first formula provided in this subsection. If one believes these two classes have about the same population variance, then the 95% confidence interval will be changed to [–4.47, 12.57] by the second formula.
Sometimes one may need to construct confidence intervals for a single unknown population proportion. Denote the sample proportion, which can be obtained from a random sample from the population. The estimated standard error for the proportion is . Thus the confidence interval for the unknown proportion is .
This confidence interval is constructed on the basis of the normal approximation (refer to the central limit theorem for the normal approximation). The normal approximation is not appropriate when the proportion is very close to 0 or 1. A rule of thumb is that when and , usually the normal approximation works well.
For example, a doctor wants to construct a 99% confidence interval for the chance of having a certain disease by studying patients’ x-ray slides. N = 30 x-ray slides are randomly selected, and the number of positive slides follows a distribution known as the binomial distribution. Suppose 12 of them are positive for the disease. Hence , which is the sample proportion. Since and are both larger than 5, the confidence interval for the unknown proportion can be constructed using the normal approximation. The estimated standard error for is 0.09. Thus the lower 99% confidence limit is 0.17 and the upper 99% confidence limit is 0.63. So the 99% confidence interval is [0.17, 0.63].
The range of a proportion is between 0 and 1. But sometimes the constructed confidence interval for the proportion may exceed it. When this happens, one should truncate the confidence interval to make the lower confidence limit 0 or the upper confidence limit 1.
Since the binomial distribution is discrete, a correction for continuity of 0.5/N may be used to improve the performance of confidence intervals. The corrected upper limit is added by 0.5/N, and the corrected lower limit is subtracted by 0.5/N.
One can also construct confidence intervals for the proportion difference between two populations based on the normal approximation. Suppose two random samples are independently selected from the two populations, with sample sizes N1 and N2 and sample proportions 1 and 2 respectively. The estimated population proportion difference is the sample proportion difference, 1–2, and the estimated standard error for the proportion difference is . The confidence interval for two-sample proportion difference is .
Similar to the normal approximation for a single proportion, the approximation for the proportion difference depends on sample sizes and sample proportions. The rule of thumb is that , N(1–1), , and N(1–2) should each be larger than 10.
An odds ratio (OR) is a commonly used effect size for categorical outcomes, especially in health [Page 214]science, and is the ratio of the odds in Category 1 to the odds in Category 2. For example, one wants to find out the relationship between smoking and lung cancer. Two groups of subjects, smokers and nonsmokers, are recruited. After a few years’ follow-ups, N11 subjects among the smokers are diagnosed with lung cancer and N21 subjects among the nonsmokers. There are N12 and N22 subjects who do not have lung cancer among the smokers and the nonsmokers, respectively. The odds of having lung cancer among the smokers and the non-smokers are estimated as N11/N12 and N21/N22, respectively. The OR of having lung cancer among the smokers compared with the nonsmokers is the ratio of the above two odds, which is
For a relatively large total sample size, ln(OR) is approximated to a normal distribution, so the construction for the confidence interval for ln(OR) is similar to that for the normal distribution. The standard error for ln (OR) is defined as se(ln(OR)) . A 95% confidence interval for ln(OR) is [ln(OR)–1.96 × se(ln(OR)), ln(OR) + 1.96 × se (ln(OR))]. As the exponential function is mono-tonic, there is one-to-one mapping between the OR and ln(OR). Thus a 95% confidence interval for the OR is [exp(ln(OR)–1.96 × se(ln(OR))), exp(ln(OR) + 1.96 × se(ln(OR)))].
The confidence intervals for the OR are not symmetric about the estimated OR. But one can still tell the significance of the test on the basis of the corresponding confidence interval for the OR. For the above example, the null hypothesis is that there is no difference between smokers and non-smokers in the development of lung cancer; that is, OR = 1. If 1 is included in the confidence interval, one should accept the null hypothesis; otherwise, one should reject it.
Another widely used concept in health care is relative risk (RR), which is the risk difference between two groups. Risk is defined as the chance of having a specific outcome among subjects in that group. Taking the above example, the risk of having lung cancer among smokers is estimated as N11/(N11 + N12), and the risk among nonsmokers is estimated as N21/(N21 + N22). The RR is the ratio of the above two risks, which is [N11/(N11 + N12)]/[N21/(N21 + N22)].
Like the OR, the sampling distribution of ln(RR) is approximated to a normal distribution. The standard error for ln(RR) is se(ln(RR)) . A 95% confidence interval for ln(RR) is
Thus the 95% confidence interval for the RR is [exp (ln(RR)–1.96 × se (ln(RR))), exp (ln(RR) + 1.96 × se(ln(RR)))].
The confidence intervals for the RRs are not symmetric about the estimated RR either. One can tell the significance of a test from the corresponding confidence interval for the RR. Usually the null hypothesis is that RR = 1, which means that the two groups have the same risk. For the above example, the null hypothesis would be that the risks of developing lung cancer among smokers and non-smokers are equal. If 1 is included in the confidence interval, one may accept the null hypothesis. If not, the null hypothesis should be rejected.
A confidence interval for unknown population variance can be constructed with the use of a central chi-square distribution. For a random sample with size N and sample variance s2, an approximate two-sided 100(1–α)% confidence interval for population variance α2 is . Here is at the upper α/2 quantile satisfying the requirement that the probability that a central chi-square random variable with degree of freedom N–1 is greater than XN–12 (α;/2) is α;/2. Note that this confidence interval may not work well if the sample size is small or the distribution is far from normal.
The bootstrap method provides an alternative way for constructing an interval to measure the accuracy of an estimate. It is especially useful when the usual confidence interval is hard or impossible to calculate. Suppose s(x) is used to estimate an unknown population parameter θ based on a sample x of size N. A bootstrap confidence interval for the estimate can be constructed as follows. One randomly draws another sample x* of the same size N with replacement from the original sample x. The estimate s(x*) based on x* is called a bootstrap replication of the original estimate. Repeat the procedure for a large number of times, say 1,000 times. Then the αth quantile and the (1–α;)th quantile of the 1,000 bootstrap replications serve as the lower and upper limits of a 100(1–2α;)% bootstrap confidence interval. Note that the interval obtained in this way may vary a little from time to time due to the randomness of bootstrap replications. Unlike the usual confidence interval, the bootstrap confidence interval does not require assumptions on the population distribution. Instead, it highly depends on the data x itself.
Simultaneous confidence intervals are intervals for estimating two or more parameters at a time. For example, suppose μ1 and μ2 are the means of two different populations. One wants to find confidence intervals I1 and I2 simultaneously such that
If the sample x1 used to estimate μ1 is independent of the sample x2 for μ2, then I1 and I2 can be simply calculated as confidence intervals for μ1 and μ2, respectively.
The simultaneous confidence intervals I1 and I2 can be used to test whether μ1 and μ2 are equal. If I1 and I2 are nonoverlapped, then μ1 and μ2 are significantly different from each other at a level less than α.
Simultaneous confidence intervals can be generalized into a confidence region in the multidimensional parameter space, especially when the estimates for parameters are not independent. A 100(1–α)% confidence region D for the parameter vector θ satisfies P(θ ∊ D) = 1–α, where D does not need to be a product of simple intervals.
A Bayesian interval, or credible interval, is derived from the posterior distribution of a population parameter. From a Bayesian point of view, the parameter θ can be regarded as a random quantity, which follows a distribution P(θ), known as the prior distribution. For each fixed θ, the data x are assumed to follow the conditional distribution P(x|θ), known as the model. Then Bayes's theorem can be applied on x to get the adjusted distribution P(θ|x) of θ, known as the posterior distribution, which is proportional to P(x|θ) · P(θ). A 1–α Bayesian interval I is an interval that satisfies P(θ ∊ I|x) = 1–α according to the posterior distribution. In order to guarantee the optimality or uniqueness of I, one may require that the values of the posterior density function inside I always be greater than any one outside I. In those cases, the Bayesian interval I is called the highest posterior density region. Unlike the usual confidence intervals, the level 1–α of a Bayesian interval I indicates the probability that the random θ falls into I.