Experimental designs are used to examine the effect of a treatment or intervention on some outcome. In the simplest two-group case, a treatment is implemented with one group of participants (the treatment group) and not with another (the control group). Experiments can be conducted with individual participants or with clusters of individuals. That is, the unit of assignment may be at the individual level or at the cluster level. This entry refers to individual participants as the unit of assignment with the understanding that the same designs may be used with clusters of individuals. The entry further describes experimental designs, looks at the role of randomization in experimental designs, and discusses some commonly used experimental designs.
Experimental designs require that the researcher assign participants to the treatment or the control group using random assignment, a process known as randomization. Subsequent to applying the intervention to the treatment group and observing participants in both conditions, the researcher hypothesizes about the intervention’s effect on each group. Treatment exposure is the independent variable that is hypothesized to lead to changes in the outcome or dependent variable.
When correctly implemented, experimental designs provide unbiased estimates of the effect of a treatment on observed outcomes. The primary purpose of experimental designs is to establish “cause and effect” or more technically, to make causal inferences. The researcher aims to conclude that the treatment caused the differences that were observed between the groups on the attribute that is being studied.
Establishing cause and effect requires that several conditions be met. For X to cause Y, X must occur before Y; changes in X must be associated with changes in Y; and all other plausible explanations for the observed association between X and Y must be controlled. The condition that all other plausible explanations are controlled is one of the defining characteristics of experimental research. [Page 643]When researchers conduct an experiment, they apply these three conditions by (1) manipulating the hypothesized cause and observing the outcome afterward, (2) testing whether variation in the hypothesized cause is associated with variation in the outcome, and (3) using randomization to reduce the plausibility of other explanations for the results observed. In technical terms, the final condition stipulates that plausible threats to internal validity are controlled. These include subject characteristics threats, testing threats, instrumentation threats, history threats, attrition threats, and regression to the mean.
Arguably, the subject characteristics threat is the most important threat minimized by the process of randomization. A subject characteristics threat occurs when individuals in the groups being compared are not equivalent to each other prior to the implementation of the treatment. In this case, equivalence implies that on average, the two groups are approximately the same on all measured and unmeasured characteristics. Without group equivalence, the researcher cannot be confident that any observed posttreatment differences were caused by the treatment. It is important to note that randomization does not eliminate all threats to internal validity; instead by reducing subject characteristics threats, randomization aims to ensure that threats are distributed evenly across conditions and are not conflated with participants’ condition membership. In experimental designs, randomization is the primary mechanism for minimizing plausible internal validity threats, distinguishing them from quasi-experimental designs, which without randomization cannot fully minimize all plausible threats.
Randomization requires that each participant has a nonzero probability of being assigned to condition, implying that all participants could be assigned to either condition. Groups created using a random process are probabilistically equivalent having been equated on the expected values of all attributes prior to the implementation of the intervention, regardless of whether those attributes are measured. When the randomization process is fair, the center of the distribution of all possible sample statistics (e.g., means, standard deviations) will be the same in the treatment and control groups. It is important to note, however, that expectation is about the mean of all possible sample statistics and that the results of a single randomization process may create groups that are, by chance, different from each other on some attributes. In this case, researchers may still conclude that a single experiment provides an unbiased estimate of the treatment effect because the difference between the observed treatment effect and the population treatment effect only occurs by chance.
Experimental designs fall into several broad categories. In a between-subject design, participants are randomly assigned to serve in only one of the treatment conditions. In a crossover design, participants serve in one condition first and then cross over to participate in the other condition. In a longitudinal design, researchers collect data at multiple time points before and after the implementation of the treatment. In a factorial design, the effects of two or more treatments and their interactions are evaluated simultaneously. In the case where all possible combinations of treatments are evaluated, the design is referred to as full factorial experimental design, and when only some combinations are evaluated, the design is referred to as a fractional factorial experimental design.
In the simplest form of a between-subject experimental design, participants are randomly assigned to either the treatment or control condition, pretest data are collected, the treatment is implemented with one group and not with the other, and at the end of the intervention phase, posttest data are collected. A two-group experimental design with randomization to groups (denoted by R), a single treatment (X), and pre- and posttest measures (O) is configured as follows:
Treatment group R O X O
Control group R O O
Alternatively, the researcher may randomly assign participants to treatment and control conditions after the pretest data are collected. Under this approach, the pretest data may be used to create homogenous strata from within which [Page 644]participants are randomly assigned to either the treatment or control condition. Compared to the simple random assignment process that relies on chance to make the group’s equivalent, this alternative approach may result in treatment and control groups that are more similar to each other, particularly if the sample size is small.
Treatment group O R X O
Control group O R O
Between-subject designs can easily be extended to include more than two groups so that the outcomes can be compared across the conditions to determine which treatment (XA or XB) or amount of treatment produces the greatest effect:
Treatment group A O R XA O
Treatment group B O R XB O
Control group O R O
In between-subject designs, the pretest allows researchers to empirically examine the equivalence of the treatment and control groups on the measured variables; statistically control for preexisting differences on the pretest; and monitor attrition rates in the treatment and control groups. Although there is the possibility of a testing threat, this can be mitigated by using psychometrically equivalent pre- and posttest measures and/or maximizing the time between the data collection points. Another variation on the between-subject designs would be to eliminate the pretest measures; while this may ameliorate the effects of a testing threat, it would preclude being able to evaluate the equivalence of the groups and monitor the effects of attrition.
The Solomon four-group design is a variation of the between-subject design that allows researchers to examine testing threats empirically. Using a complex four-group design, this configuration allows researchers to compare groups that do and do not complete a pretest and do and do not receive the treatment. By comparing the posttest scores for Groups A and C, and Groups B and D, researchers can evaluate whether testing threats are likely to have led to the results observed. The configuration for the Solomon four-group design is as follows:
Group A R O X O
Group B R O O
Group C R X O
Group D R O
Crossover experimental designs require that participants be randomly assigned to receive one of at least two treatments first and, subsequent to the posttest, receive the second treatment. A typical configuration with two treatment conditions (XA and XB) would be as follows:
R O XA O XB O
R O XB O XA O
The primary advantage of crossover designs is that individuals serve in every condition, making it possible to look at the effects of individual treatments and if there is an interest, in the cumulative effects of participating in both conditions. That being said, crossover designs are generally considered useful when carryover effects are not expected from the first treatment being implemented, and when attrition from the study and testing threats are not be expected to be an issue.
When implementing a longitudinal design, researchers collect data from randomly formed groups at multiple time points prior to and after the implementation of a treatment. The following configuration indicates pre- and posttest data collection at four time points before and after the implementation, but as many time points as are feasible may be added:
Treatment group R O O O O X O O O O
Control group R O O O O O O O O
[Page 645]Because multiple measures of an attribute provide a more stable and consistent (i.e., reliable) estimate compared to a single measure, the researcher can have greater confidence in the researcher’s measurement of the attribute in the groups. In addition, this type of configuration allows the researcher to formulate statistical models of change over time as a consequence of the intervention. This approach would be particularly useful if, say, maturation was expected to be a concern. In this case, the researcher could build a measure of maturation into the design and during the data analysis phase could explicitly model maturation while also examining the treatment effect. Finally, the multiple posttest measures allow researchers to examine the immediacy of the treatment effect and whether it endures over time. This virtue makes longitudinal designs ideal for examining interventions that aim to create sustainable change in attributes or behaviors.
Despite these strengths, however, this configuration has several weaknesses, some of which may preclude its implementation, and others that can weaken the validity of any causal claims. Specifically, longitudinal designs are vulnerable to testing threats, particularly if the same measurement instruments are used at each time point and are often weak with regard to attrition. Depending on the duration of the study and the extent of the commitment required on the part of the participants, it can be difficult and costly to maintain a sufficiently large sample that also remains representative of the population. Overall, longitudinal designs are costly and time intensive to implement, and depending on the research area, it can be difficult to recruit subjects to studies that are conducted over long periods of time.
When the effects of two or more treatments and their interactions need to be estimated together, researchers may choose a factorial experimental design. In this type of design, two or more treatments are considered factors, each with at least two levels. Although these types of designs can be extended to include many factors with many levels, the following configuration represents a two-factor (A and B) design, each with two levels (1 and 2):
R O XA1B1 O
R O XA1B2 O
R O XA2B1 O
R O XA2B2 O
Although this design is often difficult to implement in the field, it offers researchers several advantages when the aim is to examine treatments together. For instance, these designs allow researchers to investigate the joint effect of the treatments and whether the effect of one treatment is constant across all levels of the other treatment(s). The latter is referred to as an interaction effect. Moreover, all else being equal, factorial designs require fewer participants than conducting two or more between-subject studies to estimate treatment effects independently.
The clear advantage of experimental research designs rests on their capacity for supporting causal inferences. With the combined virtues of random assignment and counterfactual evidence provided a control group, experimental research designs are the gold standard for isolating causal mechanisms. However, experimental designs can be difficult to implement in the real-world environments such as classrooms and schools. For example, it is often difficult to assign (randomly, or otherwise) teachers in the same school to different conditions, to assign students to classrooms, and to assign students in the same classroom to different conditions. As a consequence, experimental designs in education often use schools (i.e., clusters) as the assignment unit whereby schools, along with every teacher and student in that school, are assigned to either the treatment or control condition. This approach typically requires many schools and so can be quite expensive to implement.