Skip to main content icon/video/no-internet

Differential Item Functioning

Differential item functioning (DIF) is formally defined as a lack of equality between two group’s conditional probability functions that relate a trait of measurement to an item’s response data. DIF indicates that examinees who are equal on the trait of measurement, but differ according to some external variable, show differential performance on a test item. For example, DIF is said to be present if groups of examinees who are matched in quantitative aptitude (trait of measurement), but vary in country of origin (external variable), show differential performance on a test item that is intended to measure quantitative aptitude. The presence of DIF is viewed as problematic because it implies that some factor external to the trait being measured is influencing responses to test items in different ways for different groups of examinees, providing an advantage to one or more groups. DIF is a commonly explored phenomenon in educational measurement data because it assists in the statistical process of gauging fairness at the item level, which contributes to the overarching evaluation of measurement validity. Educational research on DIF includes developing methods to estimate DIF; comparing and contrasting the wide variety of DIF methods available in the literature; connecting DIF to other psychometric phenomena; conducting DIF analysis on particular test items and/or across particular groups of examinees; and overcoming challenges to estimating, interpreting, and addressing DIF in practice. The remainder of this entry introduces the concept of DIF in relation to other educational measurement concepts, defines various manifestations of DIF in data, describes some select methods for evaluating DIF, and reviews some of the challenges to evaluating DIF in practice.

The Concept of DIF and Relationships to Other Educational Measurement Concepts

DIF indicates that two (or more) groups display conditional differences in item responses. Ultimately, these functions can only differ if some secondary factor is playing a role in item responses, and the groups have different distributions on that secondary factor. For example, an item that is intended to measure science ability may also measure language proficiency as a nuisance trait (i.e., an unintended trait of measurement). If two (or more) groups of examinees have different distributions of language proficiency (e.g., one group has a lower mean language proficiency), then DIF is expected to be present across those groups in the item response data. This conceptual understanding of DIF exemplifies the connections between DIF, dimensionality, fairness, and validity. DIF is one of many ways that failure to measure a single trait in the same manner across different groups of examinees manifests itself in test outputs. Hence, the Standards for Educational and Psychological Testing refers to DIF analysis as a necessary part of evaluating test fairness.

DIF is encompassed in the broader statistical phenomenon of measurement invariance. Measurement invariance refers to the equality of parameters in a statistical model across various groups, and the parameters of concern could be item-level parameters, structural model parameters, model fit parameters, parameters related to external relationships, and more. DIF is only concerned with item-level parameters and a lack of invariance in such parameters, making it a specific type of violation of measurement invariance.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading