Entry
Reader's guide
Entries A-Z
Subject index
Internal Validity
Internal validity refers to the accuracy of statements made about the causal relationship between two variables, namely, the manipulated (treatment or independent) variable and the measured variable (dependent). Internal validity claims are not based on the labels a researcher attaches to variables or how they are described but, rather, to the procedures and operations used to conduct a research study, including the choice of design and measurement of variables. Consequently, internal validity is relevant to the topic of research methods. In the next three sections, the procedures that support causal inferences are introduced, the threats to internal validity are outlined, and methods to follow to increase the internal validity of a research investigation are described.
Causal Relationships between Variables
When two variables are correlated or found to covary, it is reasonable to ask the question of whether there is a direction in the relationship. Determining whether there is a causal relationship between the variables is often done by knowing the time sequence of the variables; that is, whether one variable occurred first followed by the other variable. In randomized experiments, where participants are randomly assigned to treatment conditions or groups, knowledge of the time sequence is often straightforward because the treatment variable (independent) is manipulated before the measurement of the outcome variable (dependent). Even in quasi-experiments, where participants are not randomly assigned to treatment groups, the investigator can usually relate some of the change in pre-post test measures to group membership. However, in observational studies where variables are not being manipulated, the time sequence is difficult, if not impossible, to disentangle.
One might think that knowing the time sequence of variables is often sufficient for ascertaining internal validity. Unfortunately, time sequence is not the only important aspect to consider. Internal validity is also largely about ensuring that the causal relationship between two variables is direct and not mitigated by a third variable. A third, uncontrolled, variable can function to make the relationship between the two other variables appear stronger or weaker than it is in real life. For example, imagine that an investigator decides to investigate the relationship between class size (treatment variable) and academic achievement (outcome variable). The investigator recruits school classes that are considered large (with more than 20 students) and classes that are considered small (with fewer than 20 students). The investigator then collects information on students' academic achievement at the end of the year to determine whether a student's achievement depends on whether he or she is in a large or small class. Unbeknownst to the investigator, however, students who are selected to small classes are those who have had behavioral problems in the previous year. In contrast, students assigned to large classes are those who have not had behavioral problems in the previous year. In other words, class size is related negatively to behavioral problems. Consequently, students assigned to smaller classes will be more disruptive during classroom instruction and will plausibly learn less than those assigned to larger classes. In the course of data analysis, if the investigator were to discover a significant relationship between class size and academic achievement, one could argue that this relationship is not direct. The investigator's discovery is a false positive finding. The relationship between class size and academic achievement is not direct because the students associated with classes of different sizes are not equivalent on a key variable—attentive behavior. Thus, it might not be that larger class sizes have a positive influence on academic achievement but, rather, that larger classes have a selection of students that, without behavioral problems, can attend to classroom instruction.
The third variable can threaten the internal validity of studies by leading to false positive findings or false negative findings (i.e., not finding a relationship between variables A and B because of the presence of a third variable, C, that is diminishing the relationship between variables A and B). There are many situations that can give rise to the presence of uncontrolled third variables in research studies. In the next section, threats to internal validity are outlined. Although each threat is discussed in isolation, it is important to note that many of these threats can simultaneously undermine the internal validity of a research study and the accuracy of inferences about the causality of the variables involved.
Threats to Internal Validity
1. History
An event (e.g., a new video game), which is not the treatment variable of interest, becomes accessible to the treatment group but not the comparison group during the pre- and posttest time interval. This event influences the observed effect (i.e., the outcome, dependent variable). Consequently, the observed effect cannot be attributed exclusively to the treatment variable (thus threatening internal validity claims).
2. Maturation
Participants develop or grow in meaningful ways during the course of the treatment (between the pretest and posttest). The developmental change in participants influences the observed effect, and so now the observed effect cannot be solely attributed to the treatment variable.
3. Testing
In the course of a research study, participants might be required to respond to a particular instrument or test multiple times. The participants become familiar with the instrument, which enhances their performance and the observed effect. Consequently, the observed effect cannot be solely attributed to the treatment variable.
4. Instrumentation
The instrument used as a pretest to measure participants is not the same as the instrument used for the posttest. The differences in test type could influence the observed effect; for example, the metric used for the posttest could be more sensitive to changes in participant performance than the metric used for the pretest. The change in metric and not the treatment variable of interest could influence the observed effect.
5. Statistical Regression
When a pretest measure lacks reliability and participants are assigned to treatment groups based on pretest scores, any gains or losses indicated by the posttest might be misleading. For example, participants who obtained low scores on a badly designed pretest are likely to perform better on a second test such as the posttest. Higher scores on the posttest might give the appearance of gains resulting from the manipulated treatment variable but, in fact, the gains are largely caused by the inaccurate measure originally provided by the pretest.
6. Mortality
When participants are likely to drop out more often from one treatment group in relation to another (the control), the observed effect cannot be attributed solely to the treatment variable. When groups are not equivalent, any observed effect could be caused by differences in the composition of the groups and not the treatment variable of interest.
7. Selection
Internal validity is compromised when one treatment group differs systematically from another group on an important variable. In the example described in the previous section, the two groups of class sizes differed systematically in the behavioral disposition of students. As such, any observed effect could not be solely attributed to the treatment variable (class size). Selection is a concern when participants are not randomly assigned to groups. This category of threat can also interact with other categories to produce, for example, a selection-history threat, in which treatment groups have distinct local events occurring to them as they participate in the study, or a selection-maturation threat, in which treatment groups have distinct maturation rates that are unrelated to the treatment variable of interest.
8. Ambiguity about Direction of Causal Influence
In correlation studies that are cross-sectional, meaning that variables of interest have not been manipulated and information about the variables are gathered at one point in time, establishing the causal direction of effects is unworkable. This is because the temporal precedence among variables is unclear. In experimental studies, in which a variable has been manipulated, or in correlation studies, where information is collected at multiple time points so that the temporal sequence can be established, this is less of a threat to internal validity.
9. Diffusion of Treatment Information
When a treatment group is informed about the manipulation and then happens to share this information with the control group, this sharing of information could nullify the observed effect. The sharing of details about the treatment experience with control participants effectively makes the control group similar to the treatment group.
10. Compensatory Equalization of Treatments
This is similar to the threat described in number 9. In this case, however, what nullifies the effect of the treatment variable is not the communication between participants of different groups but, rather, administrative concerns about the inequality of the treatment groups. For example, if an experimental school receives extra funds to implement an innovative curriculum, the control school might be given similar funds and encouraged to develop a new curriculum. In other words, when the treatment is considered desirable, there might be administrative pressure to compensate the control group, thereby undermining the observed effect of the treatment.
11. Rivalry between Treatment Conditions
Similar to the threat described in number 9, threat number 10 functions to nullify differences between treatment groups and, thus, an observed effect. In this case, when participation in a treatment versus control group is made public, control participants might work extra hard to outperform the treatment group. Had participants not been made aware of their group membership, an observed effect might have been found.
12. Demoralization of Participants Receiving Less Desirable Treatments
This last threat is similar to the one described in number 11. In this case, however, when treatment participation is made public and the treatment is highly desirable, control participants might feel resentful and disengage with the study's objective. This could lead to large differences in the outcome variable between the treatment and control groups. However, the observed outcome might have little to do with the treatment and more to do with participant demoralization in the control group.
Establishing Internal Validity
Determining whether there is a causal relationship between variables, A and B, requires that the variables covary, the presence of one variable preceding the other (e.g., A → B), and ruling out the presence of a third variable, C, which might mitigate the influence of A on B. One powerful way to enhance internal validity is to randomly assign sample participants to treatment groups or conditions. By randomly assigning, the investigator can guarantee the probabilistic equivalence of the treatment groups before the treatment variable is administered. That is, any participant biases are equally distributed in the two groups. If the sample participants cannot be randomly assigned, and the investigator must work with intact groups, which is often the case in field research, steps must be taken to ensure that the groups are equivalent on key variables. For example, if the groups are equivalent, one would expect both groups to score similarly on the pretest measure. Furthermore, one would inquire about the background characteristics of the students—Are there equal distributions of boys and girls in the groups? Do they come from comparable socioeconomic backgrounds? Even if the treatment groups are comparable, efforts should be taken to not publicize the nature of the treatment one group is receiving relative to the control group so as to avoid threats to internal validity involving diffusion of treatment information, compensatory equalization of treatments, rivalry between groups, and demoralization of participants that perceive to be receiving the less desirable treatment. Internal validity checks are ultimately designed to bolster confidence in the claims made about the causal relationship between variables; as such, internal validity is concerned with the integrity of the design of a study for supporting such claims.
Further Readings
- Descriptive Statistics
- Distributions
- Graphical Displays of Data
- Hypothesis Testing
- p Value
- Alternative Hypotheses
- Beta
- Critical Value
- Decision Rule
- Hypothesis
- Nondirectional Hypotheses
- Nonsignificance
- Null Hypothesis
- One-Tailed Test
- Power
- Power Analysis
- Significance Level, Concept of
- Significance Level, Interpretation and Construction
- Significance, Statistical
- Two-Tailed Test
- Type I Error
- Type II Error
- Type III Error
- Important Publications
- “Coefficient Alpha and the Internal Structure of Tests”
- “Convergent and Discriminant Validation by the Multitrait–Multimethod Matrix”
- “Meta-Analysis of Psychotherapy Outcome Studies”
- “On the Theory of Scales of Measurement”
- “Probable Error of a Mean, The”
- “Psychometric Experiments”
- “Sequential Tests of Statistical Hypotheses”
- “Technique for the Measurement of Attitudes, A”
- “Validity”
- Aptitudes and Instructional Methods
- Doctrine of Chances, The
- Logic of Scientific Discovery, The
- Nonparametric Statistics for the Behavioral Sciences
- Probabilistic Models for Some Intelligence and Attainment Tests
- Statistical Power Analysis for the Behavioral Sciences
- Teoria Statistica Delle Classi e Calcolo Delle Probabilità
- Inferential Statistics
- Q-Statistic
- R2
- Association, Measures of
- Coefficient of Concordance
- Coefficient of Variation
- Coefficients of Correlation, Alienation, and Determination
- Confidence Intervals
- Margin of Error
- Nonparametric Statistics
- Odds Ratio
- Parameters
- Parametric Statistics
- Partial Correlation
- Pearson Product-Moment Correlation Coefficient
- Polychoric Correlation Coefficient
- Randomization Tests
- Regression Coefficient
- Semipartial Correlation Coefficient
- Spearman Rank Order Correlation
- Standard Error of Estimate
- Standard Error of the Mean
- Student's t Test
- Unbiased Estimator
- Weights
- Item Response Theory
- Mathematical Concepts
- Measurement Concepts
- Organizations
- Publishing
- Qualitative Research
- Reliability of Scores
- Research Design Concepts
- Aptitude-Treatment Interaction
- Cause and Effect
- Concomitant Variable
- Confounding
- Control Group
- Interaction
- Internet-Based Research Method
- Intervention
- Matching
- Natural Experiments
- Network Analysis
- Placebo
- Replication
- Research
- Research Design Principles
- Treatment(s)
- Triangulation
- Unit of Analysis
- Yoked Control Procedure
- Research Designs
- A Priori Monte Carlo Simulation
- Action Research
- Adaptive Designs in Clinical Trials
- Applied Research
- Behavior Analysis Design
- Block Design
- Case-Only Design
- Causal-Comparative Design
- Cohort Design
- Completely Randomized Design
- Cross-Sectional Design
- Crossover Design
- Double-Blind Procedure
- Ex Post Facto Study
- Experimental Design
- Factorial Design
- Field Study
- Group-Sequential Designs in Clinical Trials
- Laboratory Experiments
- Latin Square Design
- Longitudinal Design
- Meta-Analysis
- Mixed Methods Design
- Mixed Model Design
- Monte Carlo Simulation
- Nested Factor Design
- Nonexperimental Design
- Observational Research
- Panel Design
- Partially Randomized Preference Trial Design
- Pilot Study
- Pragmatic Study
- Pre-Experimental Designs
- Pretest–Posttest Design
- Prospective Study
- Quantitative Research
- Quasi-Experimental Design
- Randomized Block Design
- Repeated Measures Design
- Response Surface Design
- Retrospective Study
- Sequential Design
- Single-Blind Study
- Single-Subject Design
- Split-Plot Factorial Design
- Thought Experiments
- Time Studies
- Time-Lag Study
- Time-Series Study
- Triple-Blind Study
- True Experimental Design
- Wennberg Design
- Within-Subjects Design
- Zelen's Randomized Consent Design
- Research Ethics
- Research Process
- Clinical Significance
- Clinical Trial
- Cross-Validation
- Data Cleaning
- Delphi Technique
- Evidence-Based Decision Making
- Exploratory Data Analysis
- Follow-Up
- Inference: Deductive and Inductive
- Last Observation Carried Forward
- Planning Research
- Primary Data Source
- Protocol
- Q Methodology
- Research Hypothesis
- Research Question
- Scientific Method
- Secondary Data Source
- Standardization
- Statistical Control
- Type III Error
- Wave
- Research Validity Issues
- Bias
- Critical Thinking
- Ecological Validity
- Experimenter Expectancy Effect
- External Validity
- File Drawer Problem
- Hawthorne Effect
- Heisenberg Effect
- Internal Validity
- John Henry Effect
- Mortality
- Multiple Treatment Interference
- Multivalued Treatment Effects
- Nonclassical Experimenter Effects
- Order Effects
- Placebo Effect
- Pretest Sensitization
- Random Assignment
- Reactive Arrangements
- Regression to the Mean
- Selection
- Sequence Effects
- Threats to Validity
- Validity of Research Conclusions
- Volunteer Bias
- White Noise
- Sampling
- Cluster Sampling
- Convenience Sampling
- Demographics
- Error
- Exclusion Criteria
- Experience Sampling Method
- Nonprobability Sampling
- Population
- Probability Sampling
- Proportional Sampling
- Quota Sampling
- Random Sampling
- Random Selection
- Sample
- Sample Size
- Sample Size Planning
- Sampling
- Sampling and Retention of Underrepresented Groups
- Sampling Error
- Stratified Sampling
- Systematic Sampling
- Scaling
- Software Applications
- Statistical Assumptions
- Statistical Concepts
- Autocorrelation
- Biased Estimator
- Cohen's Kappa
- Collinearity
- Correlation
- Criterion Problem
- Critical Difference
- Data Mining
- Data Snooping
- Degrees of Freedom
- Directional Hypothesis
- Disturbance Terms
- Error Rates
- Expected Value
- Fixed-Effects Model
- Inclusion Criteria
- Influence Statistics
- Influential Data Points
- Intraclass Correlation
- Latent Variable
- Likelihood Ratio Statistic
- Loglinear Models
- Main Effects
- Markov Chains
- Method Variance
- Mixed- and Random-Effects Models
- Models
- Multilevel Modeling
- Odds
- Omega Squared
- Orthogonal Comparisons
- Outlier
- Overfitting
- Pooled Variance
- Precision
- Quality Effects Model
- Random-Effects Models
- Regression Artifacts
- Regression Discontinuity
- Residuals
- Restriction of Range
- Robust
- Root Mean Square Error
- Rosenthal Effect
- Serial Correlation
- Shrinkage
- Simple Main Effects
- Simpson's Paradox
- Sums of Squares
- Statistical Procedures
- Accuracy in Parameter Estimation
- Analysis of Covariance (ANCOVA)
- Analysis of Variance (ANOVA)
- Barycentric Discriminant Analysis
- Bivariate Regression
- Bonferroni Procedure
- Bootstrapping
- Canonical Correlation Analysis
- Categorical Data Analysis
- Confirmatory Factor Analysis
- Contrast Analysis
- Descriptive Discriminant Analysis
- Discriminant Analysis
- Dummy Coding
- Effect Coding
- Estimation
- Exploratory Factor Analysis
- Greenhouse–Geisser Correction
- Hierarchical Linear Modeling
- Holm's Sequential Bonferroni Procedure
- Jackknife
- Latent Growth Modeling
- Least Squares, Methods of
- Logistic Regression
- Mean Comparisons
- Missing Data, Imputation of
- Multiple Regression
- Multivariate Analysis of Variance (MANOVA)
- Pairwise Comparisons
- Path Analysis
- Post Hoc Analysis
- Post Hoc Comparisons
- Principal Components Analysis
- Propensity Score Analysis
- Sequential Analysis
- Stepwise Regression
- Structural Equation Modeling
- Survival Analysis
- Trend Analysis
- Yates's Correction
- Statistical Tests
- F Test
- t Test, Independent Samples
- t Test, One Sample
- t Test, Paired Samples
- z Test
- Bartlett's Test
- Behrens–Fisher t′ Statistic
- Chi-Square Test
- Duncan's Multiple Range Test
- Dunnett's Test
- Fisher's Least Significant Difference Test
- Friedman Test
- Honestly Significant Difference (HSD) Test
- Kolmogorov-Smirnov Test
- Kruskal–Wallis Test
- Mann–Whitney U Test
- Mauchly Test
- McNemar's Test
- Multiple Comparison Tests
- Newman–Keuls Test and Tukey Test
- Omnibus Tests
- Scheffé Test
- Sign Test
- Tukey's Honestly Significant Difference (HSD)
- Welch's t Test
- Wilcoxon Rank Sum Test
- Theories, Laws, and Principles
- Bayes's Theorem
- Central Limit Theorem
- Classical Test Theory
- Correspondence Principle
- Critical Theory
- Falsifiability
- Game Theory
- Gauss–Markov Theorem
- Generalizability Theory
- Grounded Theory
- Item Response Theory
- Occam's Razor
- Paradigm
- Positivism
- Probability, Laws of
- Theory
- Theory of Attitude Measurement
- Weber–Fechner Law
- Types of Variables
- Validity of Scores
- Loading...