Skip to main content icon/video/no-internet

Computerized adaptive testing (CAT) is a method of administering tests that adapts to the examinee's trait level. A CAT test differs profoundly from a paper-and-pencil test. In the former, different examinees are tested with different sets of items. In the latter, all examinees are tested with an identical set of items. The major goal of CAT is to fit each examinee's trait level precisely by selecting test items sequentially from an item pool according to the current performance of an examinee. In other words, the test is tailored to each examinee's θ level, so that able examinees can avoid responding to too many easy items, and less able examinees can avoid being exposed to too many difficult items. The major advantage of CAT is that it provides more-efficient latent trait estimates (θ) with fewer items than would be required in conventional tests.

The earliest large-scale application of CAT is the computerized version of the Armed Services Vocational Aptitude Battery (ASVAB), now administered to more than half a million applicants each year. The paper-and-pencil version of the ASVAB takes 3 hours to complete, and the CAT version takes about 90 minutes. With the CAT version, an examinee's qualifying scores can be matched immediately with requirements for all available positions. CAT has become a popular mode of assessment in the United States. In addition to the ASVAB, examples of large-scale CATs include the Graduate Record Examinations (GRE), the Graduate Management Admission Test, and the National Council of State Boards of Nursing. The implementation of CAT has led to many advantages, such as new question formats, new types of skills that can be measured, easier and faster data analysis, and faster score reporting. Today the CAT GRE is administered year-round, which allows examinees to choose their own date and time for taking it, whereas the paper-and-pencil version is administered only 3 times per year.

Item Selection in CAT

The most important ingredient in CAT is the item selection procedure, which selects items during the course of the test. According to M. F. Lord, an examinee is measured most effectively when test items are neither too difficult nor too easy. Heuristically, if the examinee answers an item correctly, the next item selected should be more difficult; if the answer is incorrect, the next item should be easier. Because different examinees receive different tests, in order to equate scores across different sets of items, it is necessary to use a convenient probability model for item responses, and this can be achieved by item response theory (IRT). According to IRT modeling, a difficult item will have large b-value, and an easy item will have small b-value. Knowing the difficulty levels of all the items in the pool, one can possibly develop an item selection algorithm based on branching. For instance, if an examinee answers an item incorrectly, the next item to be selected should have a lower b-value; if the examinee answers correctly, the next item should have a higher b-value.

In 1970, Lord proposed an item selection algorithm as an extension of the Robbins-Monro process, which has been widely used in many other areas, including engineering control and biomedical science. The Robbins-Monro process has been proved a method in minimizing the number of animals required to estimate the acute toxicity of a chemical. In order to use the method in CAT, item difficulty levels for all the items in the item pool must be calibrated before testing. Let b1, b2,…, bn be a sequence of the difficulty parameters after administering n items to the examinee. The new items should be selected such that bn approaches a constant b0 (as n are indefinitely large), where b0 represents the difficulty level of an item that the examinee has about a 50% chance of answering correctly, or P{Xn = 1|θ = b0} ≈ 1/2. Because our goal is to estimate θ, knowing b0, we can use b0 as a reasonable guess for θ. Notice that b0 can be linearly transformed to any meaningful score scale, which makes it convenient for us to score the examinee's test responses by a function of b0. Lord, writing in 1970, proposed several rules based on the Robbins-Monro process and envisioned that such testing could be implemented when computers became sufficiently powerful. A specific example of the item selection rule can be described by the following equation:

None

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading