Skip to main content icon/video/no-internet

Reliability and validity are two major requirements for any measurement. Validity pertains to the correctness of the measure; a valid tool measures what it is supposed to measure. Reliability pertains to the consistency of the tool across different contexts. As a rule, an instrument's validity cannot exceed its reliability, although it is common to find reliable tools that have little validity.

There are three primary aspects to reliability: (a) A reliable tool will give similar results when applied by different users (such as technicians or psychologists). (b) It will also yield similar results when measuring the same object (or person) at different times. In psychometrics, reliability also implies a third feature, which is relevant to scales (measures that include various submeasures or items). Specifically it entails the requirement that (c) all parts of the instrument be interrelated.

1. Interrater Reliability: Some measurements require expertise and professional judgments in their use. The reliability of such a tool is contingent on the degree to which measurements of the same phenomenon by different professionals will yield identical results. What we want to avoid is a test that essentially relies on undefined judgments of the examiner, without concrete criteria that are clearly spelled out.

Statistically, this aspect of reliability is usually determined by having several raters measure the same phenomena, and them computing the correlations between the different raters. For typical measures that yield numerical data, a correlation index needs to be high (e.g., in the .90s) to demonstrate good interrater reliability.

2. Test-Retest Reliability: This feature, often referred to as temporal stability, reflects the expectancy that the measurement of a specific object will yield similar results when it is measured at different times. Clearly, this is based on an assumption that one does not expect the object to be changing between the two measurements. In fact, this may not be the case for many constructs. (Consider, for example, blood pressure or stress, both of which would be expected to vary from one time to another—even for the same person.) Specifically within psychology, it is important to understand that this aspect of reliability pertains only to measures that refer to traits (aspects of personality that are constant regardless of environmental events or context); it does not pertain to states (specific aspects of behavior or attitudes that vary based on situations and interactions at the moment).

Statistically, this aspect of reliability is usually determined by having a group of people measured twice with the instrument. The time interval can vary, based on specific studies, from several weeks to a year or two. Unfortunately, testing experts often choose a short duration between tests (there are published test-retest time periods of only 6 hours!), which make their claim of testing an actual trait equivocal. Correlations are computed between the two trials. For typical measures that yield numerical data, a correlation index needs to be high to demonstrate good interrater reliability (although .70 would be sufficient).

3. Internal Consistency: This usually entails the reliability of measures that have multiple items; such measures are known as scales.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading