Skip to main content icon/video/no-internet

Equating

The term equating refers to a statistical process used to establish comparable scores on alternate test forms built to the same test specifications. A test form is a collection of items or tasks intended to measure examinees’ performance on a set of predefined domains of a test. Most large-scale testing programs have multiple alternate test forms, each of which is developed according to the same specifications. These forms are administered at different times so that test users have some flexibility in selecting a test date. Such flexibility, however, requires that reported scores on the alternate forms administered at different dates be comparable and should not offer (dis)advantages to examinees depending on which date they took the test.

Alternate forms of a test are typically constructed to be as similar as possible in content and statistical characteristics including difficulty levels. However, no matter how carefully forms are constructed, there will be some differences in difficulty between alternate forms. As a result of equating, scores on alternate forms of the same test have the same meaning and can be used interchangeably. This entry begins with the basic concept of equating; provides an overview of data collection designs, equating methods, and smoothing techniques; and concludes with some cautionary remarks on the accuracy of equating.

Basic Concept of Equating

The goal of equating is to find an equating relationship that transforms the scores on Form X (i.e., a new form) to the scale of Form Y (i.e., an old or base form). It is assumed here that some process has been used to establish a raw-to-scale score transformation for the base form Y.

Typically, the first step in equating is to determine a transformation function based on raw scores (e.g., number-correct scores) for both forms. Then, the equated raw scores on Form X are converted to scale scores on Form Y. These scale scores are the scores reported to examinees. When this process is performed successfully, the reported scale scores have the same meaning regardless of which form was administered. For example, a scale score of 500 on Form X indicates the same level of performance as a scale score of 500 on Form Y.

Equating Designs

For equating to be successful, differences attributable to forms must be separated from differences in the examinee groups taking the forms. Accomplishing this requires using a data collection design in which the sets of data for Form X and Form Y have some link between them—either common items or common (or similar) persons. The three most commonly used designs are the random groups design, the single group design, and the common-item nonequivalent groups (CINEG) design, sometimes called the nonequivalent anchor test design.

Random Groups Design

In the random groups design, the groups taking Form X and Form Y are randomly equivalent so that the differences between the scores on the two groups can be viewed as a direct indication of differences in difficulty between the two forms. A spiraling process is often employed to randomly assign forms. For example, Form X and Form Y can be distributed to examinees in an alternating order (e.g., Form X to the first examinee, Form Y to the second examinee, and Form X to the third examinee).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading