Skip to main content icon/video/no-internet

Item Development

Creating fair, valid, and reliable assessments, at a minimally sufficient length to yield useful information about the individuals, requires the use of high-quality items. What constitutes appropriately high quality depends on the content being measured, assessment stakeholders, and the purpose of the assessment. All these considerations must be defined and clearly articulated before item writing can take place. The process of creating these items with desirable properties is termed item development. This entry provides the basic concepts necessary to understand the item development process. Particular attention is given to identifying item and assessment purpose, item types, response space, assessment blueprints, and item specifications.

Validity and Reliability for Item Development

An end goal of assessment is to make valid and reliable conclusions about individuals and groups in order to provide formative or summative feedback and to drive learning improvement. Because the principle of validity and reliability provides the foundation for the purpose of item development, it is important that item developers become familiar with these concepts. Valid inferences about an individual’s knowledge, skills, abilities (KSA), and other constructs require that the use and interpretation of the assessment scores lead to the proper conclusions about the individual’s KSA. These valid inferences are related to how well the items are performing with regard to the item’s intended purpose. If the items are measuring the KSA and constructs as anticipated, then decisions related to the assessment scores can be made with a certain confidence. However, if the items were not designed, developed, or performing as intended, then those decisions about an individual’s KSA cannot be made with much confidence. For example, a third-grade multiplication item purporting to measure a third grader’s multiplication ability should not also require nonmultiplication vocabulary for a correct response to be given.

An item that required both multiplication and extraneous vocabulary would likely have considerable measurement error. More measurement error would imply the group of items are not measuring the KSA and other constructs as planned. One method to quantify measurement error is reliability, a measure of how stable an individual’s assessment scores are. Stable scores have less measurement error. However, assessment reliability does not automatically guarantee the scores from an assessment will yield valid inferences.

Keeping the end goal in mind of making valid, reliable inferences, the question then becomes, how are items developed so that the inferences made can have these desirable properties? That is the primary purpose of this entry—to provide a general overview of item development, with emphasis placed not just on item writing tips but also on the end goal of valid inferences being made from the assessment.

Assessment Purpose

The first step in the item development process is to determine the assessment’s purpose. The purpose guides item development by addressing questions such as: How will the assessment scores be used? Which stakeholders will receive the score reports? is the assessment intended for certification, licensure, accreditation, and so on? Will the emphasis be formative or summative? What consequences exist for the examinees, that is, is the assessment high stakes or low stakes? Are there time constraints? What other restraints, such as cost and other resources, are present? Who will be scoring the assessment, how will scoring be done, and what is the desired turn-around for providing feedback?

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading