Skip to main content icon/video/no-internet

Intercoder Reliability

Intercoder reliability is an integral part of content analysis. It allows researchers to argue for the consistency, and by extension the validity, of their findings. This entry defines intercoder reliability and explains why it is important to assess and report it. The steps necessary to establish an acceptable level of intercoder reliability are also reviewed. Finally, competing reliability indices and guidelines for reporting intercoder reliability are detailed.

Definition and Importance

Content analysis is a research method used to systematically examine a sample of artifacts or texts to discover patterns and meanings that can be generalized to a larger body of texts. In content analysis, the term text can refer to any type of communication message that can be stored so that the researchers can access it in order to conduct the observations. Intercoder reliability—also referred to as intercoder, interjudge, or interrater agreement—is the extent to which independent coders can analyze the same texts using the same categorizing (coding) scheme and reach the same decisions. It is calculated for each categorization variable in a study. Reaching an acceptable intercoder reliability criterion ensures that the results of the content analysis are generalizable to a larger body of similar texts and not merely unique, subjective interpretations of the texts in the study sample. In other words, if intercoder reliability is established, researchers can trust that the findings of the content study are internally consistent and that such consistency can extend to other similar samples.

Communication researchers use content analysis to study texts such as news articles, blogs, television programs, motion pictures, print or television or online advertisements, video games, social media sites, speeches, transcripts from focus groups or in-depth interviews, historical documents, and other documented forms of communication. For the study of mass and (increasingly) social media, content analysis is a critical method to uncover patterns and themes reflected in and/or perpetuated through various forms of public discourse and storytelling. Although it is not the only factor in determining the validity of the study, without an acceptable level of intercoder agreement, there are no valid results to report. Low levels of agreement suggest methodological problems such as inadequate coder training or a faulty instrument with poor operational definitions or categories.

Intercoder reliability is measured by having two or more coders independently analyze a set of texts (usually a subset of the study sample) by applying the same coding instrument, and then calculating an intercoder reliability index to determine the level of agreement among the coders.

The Process

Although many of the details vary across studies, the typical process of establishing and reporting reliability includes the following steps below:

  • Design a coding manual. The manual includes coding instructions, detailed definitions for each variable and its categories (possible values), and examples.
  • Train coders to use the coding instrument. Use texts not in the study sample as test cases for the purposes of training. Coding decisions can be made and discussed together at this stage.
  • Select one or more appropriate indices. There is no consensus on which of the dozens of available indices for assessing intercoder reliability is the best to employ. However, indices that do not account for agreement that occurs by chance are too liberal and thus should not be used alone. These include percent agreement and Holsti’s method. Indices that do account for chance agreement may in some cases be too conservative. Indices in this category include Cohen’s kappa and Scott’s pi, both only appropriate for nominal level (categorical) variables; Cohen’s kappa is arguably the most used index except for percent agreement, despite significant objections raised by statistical experts. Krippendorff’s alpha is a widely praised, if difficult to calculate, index that can be used for variables with ordered (ordinal) and continuous (interval/ratio) values and accommodates different numbers of coders, missing data, and other factors. Some indices are not appropriate for assessing intercoder reliability, including Cronbach’s alpha, Chi-square, and Pearson’s r (correlation).
  • Obtain the necessary tools to calculate the index or indices selected. Some indices can be calculated “by hand,” but there are a variety of free-standing software packages, software extensions/plug-ins, and online calculators available. The cost, interface design, and even accuracy of these tools vary, along with requirements regarding the organization of the data to be analyzed.
  • Select an appropriate minimum acceptable level of reliability for the index or indices to be used. Coefficients of .90 or greater are nearly always acceptable, .80 or greater is acceptable in most situations, and .70 may be appropriate in some exploratory studies for some indices. Criteria should be adjusted depending on the characteristics of the index.
  • Assess reliability informally during coder training. Refine the instrument and continue training until the coders reach acceptable levels of independent agreement.
  • Assess reliability formally in a pilot test. Unless there are compelling reasons, use at least 30 randomly selected units of text not part of the study sample; coding must be done independently and without consultation or guidance. If reliability levels for each variable meet the criteria established earlier, proceed to coding the full sample, otherwise conduct additional training and refine the coding instrument and procedures.
  • Assess reliability formally during coding of the full sample. Use a representative subsample that all coders evaluate; the resulting reliability coefficients are those reported for the study. The appropriate sample size depends on many factors but should not be less than 50 units or 10% of the full sample, and it rarely will need to be greater than 300 units. Again all coding must be independent and without consultation or guidance.
  • Select and follow an appropriate procedure for incorporating the coding of the reliability sample into the coding of the full sample. The disagreements can be resolved by randomly selecting the decisions of the different coders, using a “majority” decision rule (when there are an odd number of coders) or discussing and resolving the disagreements.

Report intercoder reliability in a careful, clear, and detailed manner in all research reports, and include the following

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading