Skip to main content icon/video/no-internet

A test can be defined as an instrument, tool, or procedure that is used to obtain information about a particular outcome. Other definitions include reference to its capacity to measure specific variables and including standardized procedures to gather data on underlying constructs for drawing conclusions or developing hypotheses for further examination. Currently, there are thousands of available tests in many different areas, such as achievement, intelligence, personality, aptitude, and vocational. This entry discusses the various types of test and scores obtained and the psychometric properties of tests.

Types of Tests and Scores

Tests can be considered norm referenced or criterion referenced. For norm-referenced tests, an individual’s scores are compared to scores from a particular normative group, which is a sample of individuals who should be representative of the individuals for whom the test was developed. When interpreting criterion-referenced tests, however, the emphasis is on determining what the individual knows, which can be done by comparing the examinee’s performance to a particular standard, or level of performance.

Multiple types of scores can be obtained from tests. Raw scores, such as the number of correct and incorrect items produced by an examinee, can be obtained on any type of test. Although raw scores are, by themselves, relatively meaningless, they can be transformed into more refined scores that carry different types and levels of meaning. On norm-referenced tests, raw scores are often transformed into standard scores, which are scores that are standardized according to a certain metric and are compared to a group’s mean score in reference to their standard deviation. Percentile ranks are often used to describe a student’s performance on a norm-referenced test, as they provide information about an individual’s position within a distribution or set of scores from a particular group. Percentile ranks indicate what percentage of individuals within a particular group received scores that fell at or below the examinee’s scores. Percentile ranks are somewhat limited, however, as they have unequal scale units and the differences between these units will vary, affecting their meaning.

Due to the inherent presence of measurement error, norm-referenced scores are often presented within confidence intervals. Confidence intervals can be of various sizes, with their width indicating how much one should expect an individual’s actual test score to vary from the true score. The larger the confidence interval, the more confident one can be that the true score falls within that range. Although the confidence interval can be calculated by multiplying the standard error of measurement by Z scores, they are often presented within test manuals, such as at the 68%, 90%, 95%, and sometimes 99% level.

Psychometric Properties

Psychometric properties are critical components to consider when selecting a test. Reliability represents the consistency of measurement across administrations, or replications, of a particular test. Typically, reliability is used when discussing an individual’s obtained score, as the true score is directly influenced by measurement error. Different types of methods evaluate reliability and produce reliability coefficients. One such type is test–retest reliability, which is determined using the Pearson product–moment correlation technique, showing how consistent scores are across administrations. A second type of reliability is an alternate form of reliability, in which scores from one form of a test are compared to those from an alternate form of the same test, each containing the same content. A third type of reliability coefficient is internal consistency, which refers to the level of consistency across test items. Internal consistency typically provides the highest reliability coefficients, which reflects the level of consistency across items. Internal consistency is reported by the Cronbach’s coefficient α, representing the relationship between items across the entire test. Finally, interrater reliability can also be obtained, which refers to the consistency across ratings. This particular type of reliability addresses measurement error associated with an examiner’s subjective evaluations of an examinee’s test responses. Interrater reliability can be represented through either interrater agreement, in which the percentage of agreement is calculated, or through an interrater reliability coefficient, often resulting from Pearson product–moment correlations.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading