Skip to main content icon/video/no-internet

Internal validity refers to the evidence that the interpretations and conclusions reached in the evaluation can be attributed to program functions rather than to other factors. Poorly designed outcome evaluations, by definition, have questionable integrity because of insufficient safeguards against alternative explanations for program effects; program outcome effects are intractably enmeshed with extraneous factors. The design challenges for evaluation studies are similar to those in traditional research in that a system of logic must be applied that enables the designer to construct procedures that will reduce the irrelevant sources of program outcome variability before the evaluation is conducted.

For example, gathering evidence of program fidelity, that is, evidence that the program features that are the object of the evaluation were actually implemented, should be part of the evaluation plan. If such evidence is collected early or periodically, the evaluator may even be instrumental in prompting program staff to activate the identified program features, hence removing this threat to internal validity. Some sources of program outcome variability, however, may be beyond the control of the evaluator. For example, if any events occurred between the pretest and posttest for all or some of the program participants that affect their responses to the posttest, there may be little the evaluator can do about it. This threat to the subsequent explanation of program effect is commonly called history. When discovered, at minimum, the evaluator should adequately describe the situation. Some threats to internal validity can be statistically controlled.

Several threats to internal validity of both research and evaluations have been identified and discussed in the literature:

  • Unreliability of measures. Use of unreliable data collection procedures makes it nearly impossible to disentangle actual program outcome variance from variance reflecting measurement error.
  • Attrition. Differences in groups' standing on measured program outcomes may be due to differential changes in their composition over time rather than to program interventions. For example, if loss of program participants is disproportionately occurring among those with low standing on the measured outcomes, scores will rise due to the survivors' initial higher standing.
  • Statistical regression. Groups that are selected based on their initially very low (or high) standing on some measure are likely to show changes in their standing in the opposite direction on subsequent testing due to the measurement errors in the assessment procedures used in initially classifying them. Statistical procedures are available to control for this phenomenon.
  • Selection. Participation in such programs as community-based health centers or social service agencies is generally voluntary. This self-selection process may be associated with other personal characteristics, such as motivation, interest, and education, that may influence the participants' responses to program interventions, making them respond differently than would individuals in the general population or nonparticipant comparison groups and thus leading to differences due to factors other than program effects.
Charles L. Thomas
10.4135/9781412950558.n283

Further Reading

Posavac, E. J., & Carey, R. C.(2003)Program evaluation methods and case studies (6th ed.). Upper

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading