Content Analysis: An Introduction to Its Methodology

Name: Content Analysis: An Introduction to Its Methodology
ISBN: 9781071878781

Klaus Krippendorff

doi:10.4135/9781071878781

Summary
Contents
Subject index

What matters in people’s social lives? What motivates and inspires our society? How do we enact what we know? Since the first edition published in 1980, Content Analysis has helped shape and define the field. In the highly anticipated Fourth Edition, award-winning scholar and author Klaus Krippendorff introduces readers to the most current method of analyzing the textual fabric of contemporary society. Students and scholars will learn to treat data not as physical events but as communications that are created and disseminated to be seen, read, interpreted, enacted, and reflected upon according to the meanings they have for their recipients. Interpreting communications as texts in the contexts of their social uses distinguishes content analysis from other empirical methods of inquiry. Organized into three parts, Content Analysis first examines the conceptual aspects of content analysis, then discusses components such as unitizing and sampling, and concludes by showing readers how to trace the analytical paths and apply evaluative techniques. The Fourth Edition has been completely revised to offer readers the most current techniques and research on content analysis, including new information on reliability and social media. Readers will also gain practical advice and experience for teaching academic and commercial researchers how to conduct content analysis. Available with Perusall–an eBook that makes it easier to prepare for class Perusall is an award-winning eBook platform featuring social annotation tools that allow students and instructors to collaboratively mark up and discuss their SAGE textbook. Backed by research and supported by technological innovations developed at Harvard University, this process of learning through collaborative annotation keeps your students engaged and makes teaching easier and more effective. Learn more.

Back to table of contents

Glossary

Abductive inference:

The process of proceeding from true propositions in one logical domain to propositions in another logical domain, believed to be true on account of presumed empirical relationships between them—that is, from particulars to particulars without generalizations governing both. (2.4.5)

Accuracy:

One of three kinds of Reliability. Accuracy is a measure of the extent to which a data-making instrument (e.g., a Coding instruction, measuring device, observational accounting practice) produces data that are accurate relative to a trusted standard. Standards may be established by a panel of experts, a proven method, or a norm. If the standard is considered true, accuracy measures the extent to which the generated data are valid. (12.2.1)

Alpha (α):

A generalized coefficient of agreement used to assess the Reliability of Unitizing, Coding, and Data. Alpha measures the agreement among independent Observers, coders, or analysts relative to what can be expected by chance, that is, the agreement that would be observed when they randomly assign the values from a population estimate of values to given Units or randomly segment a given continuum into units of different lengths and kinds. It is applicable to any number of observers, all standard Metrics, missing data, and small Sampling sizes. Alpha generalizes several special-purpose coefficients but goes far beyond either of them. (12.2.3)

Analytical construct:

An operationalization (formalization) of the content analyst’s knowledge of how Text is used in a chosen Context. It is the ground on which Abductive inferences from given text to unobserved features of content analytical interests are justifiable. An analytical construct is akin to a computable model of the network of stable correlations between available texts and the analyst’s Research question. (2.4.4; 4.2.1; 9)

Association:

At least two kinds are distinguished: (a) co-occurrences of words, concepts, or symbols from which cognitive proximities can be inferred (10.6; 11.4.3), and (b) linguistically expressed connections between attitude objects considered either associative (e.g., is, has, likes, belongs) or dissociative (e.g., is not, dislikes, opposes) in degrees. (9.2.3; 11.4.3)

Association structures:

Sets of objects, concepts, or symbols plus their Associations of various strengths between all pairs (11.4.5); not to be confused with Semantic networks, whose relations have diverse meanings. (10.5; 11.4.4)

Attention:

A common Inference from the amount or relative frequency of a source’s coverage of an issue.

Attribution:

The use of language, including adjectives and stories, in declaring objects, events, or people to be of a certain kind or to have certain qualities. (10.5)

CATA:

See Computer-aided text analysis.

Categorizing:

The process of grouping ideas, objects, and data into mutually exclusive sets. In Content analysis, categorizing reduces a diversity of Recording/coding units into convenient kinds—the values of a nominal Variable.

Cluster sample:

A sample that follows a hierarchical conception of the population, proceeding from general to subordinate Units; for example, Sampling newspapers from a list of all major newspapers of a country, then sampling particular issues from the newspapers sampled, then sampling particular articles from the issues sampled, and so on. At each step, all units available on that level should have the same chance of being included. (6.2.5)

Coder:

See Observer.

Coding:

The process of Categorizing, describing, evaluating, judging, or measuring descriptively undifferentiated Units of analysis, thereby rendering them analyzable in well-defined terms. (12.2.3)

Coding instructions:

Instructions given to Observers to standardize processes of Recording, categorizing, or valuing phenomena of analytical interest. They enable several independent observers to generate reliable (Reliability) data. And they are instrumental for data analysts and the stakeholders in research results to infer from the data what observers saw, read, and recorded.

Coefficient of imbalance:

A measure of the degree to which favorable attributes of an issue outweigh unfavorable ones. (3.3.2)

Coincidence matrix:

A tabulation of the pairwise coincidences of categories or numerical values assigned to each of a given set of units by any number of Observers. Rows and columns of this matrix bear the names of the categories or values used. The matrix is square and symmetrical. Its diagonal cells contain perfect matches and its off-diagonal cells the disagreements. Its entries sum to the number of values paired. It visualizes the Reliability of the Data it tabulates. (12.3.2; 12.4.4) Not to be confused with a Contingency matrix.

Computer-aided text analysis (CATA):

Text analysis in which the analytical task is divided between humans doing what they do best and computers employed for what they do better: searching through huge bodies of literature and processing character strings at high speeds but with limited ability to identify meanings. (11.4)[Page 408]

Concordance:

A literary account of words, phrases, and their contexts and locations in the work of an author. Software designed for creating concordances has been used in Content analysis as well, especially in surveying the vocabulary of a body of Texts in view of their contextual meanings.

Content:

Something that, according to a popular metaphor, is contained in messages or Texts regardless of who reads them. Content analysts avoid the use of this content-objectifying metaphor by acknowledging both the analysts’ informed attention to available texts as well as the many worlds of readers, writers, and users for whom the analyzed texts could have all kinds of meanings. (2.2)

Content analysis:

A Research technique for making replicable and valid Inferences from Texts (or other meaningful matter) to the Contexts of their use. (for contested definitions, 2.1)

Context:

Generally, the world in which a body of Text plays a role or has meanings to those attending to it. In Content analysis, one distinguishes the many worlds of others in which texts play various roles, from which texts are sampled (decontextualized) for analysis, and the world of the analyst, in which particular Research questions are posed and answered with the help of available texts. The context of the analyst need not coincide with the worlds of others; however, its construction must be justifiable to the content analyst’s peers and users of the content analysis results. (2.4.3) Most specifically, the linguistic context of a word is its textual environment, which enables readers to make context-specific sense of that word.

Context units:

Textual matter that is defined to limit what Observers or readers are to consider when Categorizing, or describing Recording/coding units. The size of context units is important. If allowed to be too large, coding tends to become unreliable. If defined too small or not at all, recording units may lose the information that stems from the textual environment of these units. For example, personal pronouns are meaningless without reference to the contexts in which they occur. (5.2.3)

Contingency analysis:

Analysis of the co-occurrences of words, concepts, or symbols, especially relative to their co-occurring by chance, in a body of Text to infer the Association structures of authors or readers of that text. (10.6)

Contingency matrix:

A cross-tabulation of observed Units of analysis created in order to analyze statistical dependencies between two categorical or nominal Variables describing these units. Contingencies tend to be analyzed by association statistics, such as χ2 or product-moment correlations. When Reliability data are recorded in contingency matrices, these matrices are square, rows and columns refer to different Observers, perfect matches of categories occur in their diagonal cells, and systematic mismatches occur in their off-diagonal cells. However, analyzing reliability with association statistics would be a mistake, as reliability requires observers to be interchangeable, and statistical dependencies among the proclivities of individual observers are misleading indicators of reliability. The entries of contingency matrices sum to the number of units described. (12.3.4.2; 12.3.5) A contingency matrix must not be confused with a Coincidence matrix, whose entries sum to the values assigned to units and is used in evaluating reliability. (12.3.2; 12.4.4)

Conversation:

Fundamentally open and self-governed human interactions that rely, but not exclusively, on linguistic utterances or Texts, each responding to preceding utterances or texts. The prototypical conversation is interpersonal. However, content analysts have extended the notion of conversation when making Inferences from diplomatic exchanges, deliberations in small groups, and therapeutic sessions regarding how relationships develop interactively and interactive processes unfold. (3.6)

Crowdcoding:

A version of crowdsourcing that invites anonymous Internet users to code or evaluate texts for subsequent analysis. Besides savings in cost and time to train observers to generate analyzable data, crowdcoding is often justified by appealing to the wisdom of crowds, the belief that collective judgments are better than individual ones (7.4)—tested by the Decisiveness of majorities (12.4.3)

Data (plural of datum):

Semidurable records taken as the unquestioned basis for reasoning, discussion, or calculation. As such, data are not found but generated at least by declaration more typically by experiments, surveys, or measurements. Data must show some diversity (convey information—hence the plural of datum), must be comparable with each other, and, in the case of scientific data, must be obtained by replicable methods or be validatable. The data of Content analyses are Texts. (4.1; 8)

Data language:

The formal organization of Data—for example, a system of categories and measurements. A data language has a syntax and a semantics. Its syntax must render the data amenable to a proposed analysis (e.g., be computable). Its semantics links the data to the phenomena they are thought to represent, often formalized in Coding instructions that specify the relationship between each datum and what Observers saw or read, or embodied in measuring instruments. (8)

Data reduction:

Amounts to simplifying typically large volumes of Data—for example, by counting, abstracting, computing indices, offering statistical accounts of analytical results, or describing a statistical distribution by its parameters. In more qualitative approaches, data reduction means selecting representative quotes or prototypical examples from the analyzed Text, summarizing or abstracting an analyst’s reading. All content analyses start with volumes of texts and reduce them to the answers to Research questions. (4.1.1; 10)[Page 409]

Decisiveness of majorities or aggregates:

One of four Reliability measures that assesses the degree to which individual observer’s judgments, evaluations, or descriptive accounts of phenomena deviate from their majority or average. It is based on data obtained by replication of data generating processes, but unlike Replicability, it answers the question of whether a majority can be trusted more so than individual observers. (12.4.4)

Deductive inference:

The process of proceeding from a general proposition, considered true, to particular propositions that are logically implied and are therefore considered true as well. (2.4.5)

Dictionary (CATA):

A sometimes user-specifiable computational tagging of single or compound words in a body of Text. Tags have meanings that are simpler and more general than those occurring in the text, ideally preserving the meanings that are relevant for answering a given Research question. CATA dictionaries make use of thesauri, Stemming, Lemmatization, and disambiguation rules that rely on the linguistic environment of tagged Units of text. They aim at Data reduction by providing frequencies of tags in text. The relationship of tags to what they tagged demonstrates their Semantic validity. (11.4.1)

Difference functions:

In the calculation of the agreement coefficient Alpha, difference functions weigh the frequencies of coinciding values by the Metric differences between these values. A difference function can be stated mathematically or tabularly. In matrix form, a difference function is defined by a list of all differences between two sets of categories, ranks, values, (12.3.4) or sets of values. (12.7.2)

Direct/indirect indicators:

Analytical constructs in the form of correlations between textual attributes and the phenomena of interest are called direct indicators (see Index), and those that take the form of a network of correlates, including intervening Variables, are called indirect indicators. The history of Content analysis shows a move from direct to indirect indicators of the answers to Research questions. (9.3.3)

Evaluative assertion analysis:

The use of two-valued predicates to infer the valuation of an object from how it is linguistically associated (see also Association). (9.2.3)

Extrapolations:

Inferences about the unobserved gaps between observations, such as between data points (interpolations), or beyond a recorded history of observations (predictions). Extrapolations assume the continuity of observed phenomena, trends, patterns, or differences. (3.2; 9.3.1)

General Inquirer:

The historically first operational dictionary approach to computational Content analysis; accepts several CATA Dictionaries. (11.4.1)

Go-words:

A list of words to be included in an analysis. (11.3.1)

Index:

A Variable that is computed on textual attributes for its correlation with extratextual phenomena or variables of interest. An index empirically relates logically distinct kinds and may therefore be used to infer particular phenomena or compare several Texts in their own terms. (3.4; 9.3.3)

Inductive inference:

The process of proceeding from particular propositions, such as a sample of observations, to general propositions, such as to statistical generalizations of that sample, accounting for these observations in most if not all respects. (2.4.5)

Inference:

The process of passing from accepted propositions, statements, or data to other propositions or statements whose validity is believed to be preserved in that process. One can distinguish Deductive, Inductive, and Abductive inferences. In Content analysis inferences are typically abductive, proceeding from available Text to the analyst’s Research question, which pertains to unavailable features of the analyst’s chosen Context. (2.1; 2.4.5)

Institutions:

Habitual social practices that are enacted within a community and serve normative functions for how members organize themselves and construct the realities they live by. Content analysts may infer how the exchange of textual matter encourages, constructs, or undermines particular institutional realities. (3.7)

KWIC (keyword in context) lists:

The tabulation of character strings of a specified length, within which selected keywords or phrases occur in a Text. KWIC lists consist of the contexts in which chosen words or phrases occur in analyzed texts suggesting their meanings or roles they play within these texts. (11.3)

Latent content:

See its opposite, Manifest content. (2.1)[Page 410]

Lemmatization:

The process of Stemming a word to its stem or root form after considering the grammatical function of the word in a sentence. (11.3; 11.4.1)

Levels of measurement:

See Metric.

LexisNexis:

A company that makes probably the largest English-language electronic text archive available to subscribers. Lexis provides a searchable database of U.S. laws and legal decisions. Nexis provides access to published data from more than 20,000 multilingual news sources, journals, biographical and reference materials, business reports, and regulatory findings.

LIWC (Linguistic Inquiry and Word Count):

A CATA Dictionary approach. (11.4.1)

Manifest content:

Texts that are easy to read, generally understood, unambiguously interpretable, and therefore yield high agreement, even among untrained coders. Early definitions of Content analysis sought to exclude Latent content for its methodological difficulties, requiring experts to identify and code. (2.1)

Matrix:

A rectangular array of numbers or symbols, arranged in rows and columns, often summed to its marginal totals. (See Coincidence matrix, Contingency matrix, and Difference functions.)

Meme:

A unitary idea, message, behavior, or style that spreads throughout a community of minds. Analogous to genes, which transmit biological information, memes transmit cultural information and are thought to affect the content and organization of human minds. (11.4.5)

Metric:

An explicit ordering of the values of a Variable, distinguished by the kinds of operations on these values that preserve their ordering (8.7). Most common are binary, nominal, ordinal, interval, and ratio metrics, also called levels of measurement (8.6). Analytical methods tend to make specific demands on the metric of the data they can process. Conversely, the metric of given data limits the choices among analytic/computational techniques available. In the analysis of reliability, metrics are operationalized in the form of Difference functions.

Networked texts:

Whereas writing is continuous, and many social phenomena, ranging from speech to historical developments, proceed linearly, hypertext can be read in sequences at their readers’ digressions, and users of social media generate large volumes of messages and dynamically link them across many. All posts by Internet users respond to previously received messages and acquire their meanings by what happens after they are posted, commented on, reproduced, and acted upon. Individual posts are short and quite uninformative of what happens. Social media texts are networked, and their content analyses cannot ignore the contexts of their connections. (2.3; 11.3.4)

Observer:

A generic term for a coder, unitizer, transcriber, judge, rater—anyone who has access to undifferentiated phenomena and transforms them into analyzable Data. (qualifications, 7.2; training, 7.3; reliability, 12.2)

Plagiarism:

An established resemblance of a work by one author and a previously published work by another, controlled for common cultural backgrounds or joint exposure to events. (10.5)

Precision:

A measure of the extent to which the items retrieved by a search algorithm are relevant to a research question. This proportion is extended to a reliability coefficient for text retrieval. (12.5)

Propaganda analysis:

In Content analysis, the term applies to three different conceptions: (a) Using printed matter and speeches to identify propagandists in pursuit of doctrinaire intents or in the unacknowledged service of a foreign government. Keys to this identification are the use of devious persuasion techniques. (b) Making Inferences from an enemy country’s domestic broadcasts about planned military actions, political support of the governing elite by its population, prediction of leadership succession, and opposition to the government. (c) Assessing the effectiveness of one’s own propaganda efforts. (1.4)

QDA software:

See Qualitative data analysis software.

Qualitative content analysis:

Conventionally contrasted with Quantitative content analysis—involves a close reading of textual matter, reorganizing relevant parts of it into analytical categories, and creating interpretations, narratives of scholarly interest relating to the meanings and uses of the analyzed Text. Any interpretation of a text is an Inference that should have Semantic validity—that is, it must be plausibly related to the original text. To demonstrate that connection, qualitative content analysts prefer to support their inferences by quoting from the analyzed text. Qualitative content analysts are not opposed to counting, but their inferences rarely rely on frequencies. (1.7)

Qualitative data analysis (QDA) software:

Software that provides a means to efficiently manage bodies of Text as well as graphical, audio, and video data in the pursuit of text-driven Qualitative content analysis. It accounts for analyst-introduced unitizing, coding, annotating, retrieving, sorting, counting, and graphing of various relationships among textual elements. (11.5)

Quantitative content analysis:

Conventionally contrasted with Qualitative content analysis—relies heavily on enumerating coded textual matter. However, all Coding of Text involves qualitative judgments or identifications, so that the distinction between qualitative and quantitative approaches is largely one of emphasis, often falsely identified with being interpretive versus scientific. (2.1)

Quantitative newspaper analysis:

A precursor to Content analysis developed around the end of the 19th century. It measured the amount of space in newspapers devoted to various subject matters, largely to objectify and criticize developments in newspaper publishing at that time. (1.2)[Page 411]

Random sample:

A sample of Units of analysis selected from a known population where each unit has the same probability of being included. (6.2.1)

Recall:

A measure of the extent to which a search algorithm retrieves all relevant items contained in a database. This proportion entails an epistemological paradox as the sole motivation for using search algorithms, which is the human inability to know what a data base contains. (12.5)

Recording:

Creating somewhat durable records from transient or otherwise unanalyzable phenomena. Recording ranges from the use of mechanical devices (voice recorders) and transcriptions to Unitizing and Coding of Text. Durability is a prerequisite of all re-search efforts and Data in particular, which moreover require analyzability by available techniques. (7)

Recording/coding units:

Identified within unanalyzed Text, recording units are independently describable, transcribable, categorizable, or codable units. Each recording or coding unit becomes represented by an enumerable record, code, or datum (see Data). To include information from the textual surroundings of such units without abandoning their analytically required independence, content analysts often ask observers to describe them as a function of Context units within which they occur. (5.2.2)

Relational content analysis:

A form of content analysis that seeks to set itself apart from categorical accounts of textual elements, focusing on relations, contingencies, or semantic connections between them. It underlies Contingency analysis, Evaluative assertion analysis, Semantic networks, and Webgraphs.

Reliability:

The attribute of Data on which researchers can rely in answering their Research questions. Unreliability results from two factors: disagreements among independent ways of generating data about the phenomena in question and lack of information (variation) in the resulting data to answer given research questions about these phenomena. In Content analysis, the first is assessed by agreement coefficients, such as Alpha, applied to how several Observers unitize the same continuum and/or record, describe, or code the same set of Units. The second becomes evident by tracing the information in data throughout an analytical process to its results. (When variation is absent, data cannot be correlated with anything; and when undetected noise in the data is not eliminated during an analysis, the answers to Research questions may well be unrelated to the phenomena of interest.) Four kinds of reliability are distinguished: Stability, Replicability, Accuracy or Surrogacy, (12.1) and Decisiveness of majorities. (12.4) High reliability is a prerequisite of Validity but does not guarantee it. (12.1)

Replicability:

One of four kinds of Reliability. Replicability measures the extent to which data-making instruments (Coding instructions, measuring devices, or observational methods) can be relied upon to generate the same Data from the same set of phenomena in diverse circumstances, employing different and independently working Observers, coders, judges, or measuring instruments. It is a function of the degree to which independently obtained data of the same phenomena agree. (12.2.1)

Re-presentation:

A text-invoked recall of something previously experienced, making that experience present again, as distinct from representation, which suggests some objective relationship between Text and what it is about. (3.5)

Research or re-search:

A repeated and replicable search within Data for patterns that explain them. Content analysts move outside such searches by relying on identifiable properties in Texts to answer Research questions pertaining to the Contexts of these texts.

Research design:

The network of procedural steps a researcher plans to take to proceed from generating Data to research results. Research designs should be explicit so that they can be replicated or critically evaluated for the conclusiveness of the findings. (4; 14.1)

Research questions:

In Content analysis, research questions must satisfy three requirements: (a) They must pertain to currently unknown phenomena in the Context of analyzed Texts, (b) they must allow several possible answers among which the analysis selects, and (c) they must provide the possibility—even if only in principle—of alternative ways to answer the chosen question. The requirement (a) is implicit in the definition of content analysis, (b) prevents researchers from merely generalizing from or confirming what they already know, and (c) assures that the answers of research questions are validatable in principle, by means other than a content analysis, subsequent observations, correlations with related variables, or successful actions based on the results. (2.4.2; 14.1.2.1)

Sampling:

The process of selecting a representative sample from a larger population of Units so that an analysis of the sample enables the researcher to draw conclusions about that population. The Inference involved is inductive—not deductive or abductive. (6)

Sampling units:

Units distinguished for selective inclusion in an analysis. (5.2.1)[Page 412]

Sampling validity:

The assurance that a sample is large enough to represent a population of phenomena in its composition so that the sample can be studied in place of that population. Because Content analysis makes Inferences from Texts to nontextual phenomena, two populations are important: (a) the population of texts actually sampled and analyzed, and (b) the population phenomena to be inferred from the sampled texts. Sampling validity is assured when texts are sampled so that the phenomena of interest are fairly represented in the sample. (13.2.1)

Semantic differential scales:

Seven-point scales between polar-opposite attributes (good-bad, strong-weak, active-passive), usually labeled from -3 to +3, and widely used in assessments of sources’ evaluations of word meanings or attitude objects referred to. (7.4.4)

Semantic networks:

Sets of objects, concepts, or symbols connected by linguistically meaningful binary relations, such as two-place predicates (11.4.4); not to be confused with Association structures, whose binary relations are merely of varying strengths.

Semantic validity:

One Validity criterion unique to content analysis, measuring the degree to which relevant readings of a Text are preserved in processes of Unitizing, Coding, and analyzing the text. (13.2.2)

Sentiment analysis:

A computational Content analysis using a dictionary of pleasant–unpleasant, active–passive, and easy to imagine–hard to imagine attributes of selected attitude objects or issues. It is intended to infer their emotional value. (11.4.1)

Snowball sample:

A multistage sample obtained by examining an initial sample of Units for leads to other Sampling units and so on, until some practical closure is reached. (6.2.6)

Stability:

One of four kinds of Reliability. Stability measures the extent to which a researcher or data-making instrument repeatedly generates the same Data from the same set of phenomena. Stability is not sufficient to establish the reliability of content analysis data, as consistent biases, prejudices, or misunderstandings of Coding instructions would not become manifest in repetitions. Replicability, which can detect individual idiosyncrasies, provides a stronger measure. (12.2.1)

Standards:

Established rules, principles, or measures against which deviations can be measured. Reliability and Validity are two standards against which the quality of a Content analysis is judged (12.1). Reliability determines the degree to which data represent the phenomena to be analyzed and validity the extent to which its inferences are borne out by independent evidence. Content analysis may also use standards in its Research questions, for example, in establishing Plagiarism, to evaluate biases (Coefficient of imbalance), or to judge an account as true or false. (3.3; 9.3.2)

Stemming:

The process of reducing grammatical variations of words to their common stems, bases, or root forms, used to simplify CATA Dictionaries. (11.3)

Stop-words:

A list of words to be excluded from an analysis, largely because they are judged irrelevant to the Research question. (11.3.1)

Stratified sample:

A sample whose Units are selected (randomly or systematically) from several known subpopulations (strata) in a population so that each subpopulation is represented in the sample either equally or in proportion to its size. (6.2.3)

Surrogacy:

The degree to which a measure or model can serve as a surrogate of other phenomena. It is computationally equivalent to Accuracy, one of four forms of reliability. (12.2.1) (12.4)

Systematic sample:

A sample that consists of every kth Unit from a serial population or list of textual units. (6.2.2)

TAT:

See Thematic Apperception Test.

Text analysis:

Provides accounts of the character strings, compositions, coincidences, syntactical structures, and layouts of given Text. Word counts and KWIC lists are basic. Text mining and Webgraphs make use of search engines and more complicated algorithms for analyzing texts. Computer text analysis has little place for meanings and leaves the burden of relating its findings to the given Research question to the content analyst. (11.3)

Text mining:

The largely computerized search and identification of usually rare textual elements in a large bodies of Texts. The textual elements searched and found should not be confused with information that could answer the Research questions content analysts ask. Although there are occasions where names of individuals, places, or concepts lead to such answers, the latter always requires human judgments. (11.3.2)

Texts:

Anything variable and textured that has meaning to somebody, including to the analyst, and can be examined or read, interpreted, and acted upon repeatedly: letters, e-mails, blogs, literature, images, video recordings, transcripts of conversations, political speeches, historical records, police reports, accounts of legal proceedings, corporate statements, advertisements, medical documents, photographic images, posters, cultural artifacts, and so on. (2.4.1)

Thematic Apperception Test (TAT):

A psychological test involving the content analysis of interpretations of ambiguous images by subjects in order to reveal hidden attitudes, prejudices, motives, and personality variables.[Page 413]

Trends:

See Extrapolations.

Unitizing:

Identifying contiguous and nonoverlapping sections within a descriptively undifferentiated continuum of Text or other spatially organized matter, typically generating Recording or coding units—words, sentences, utterances, rhetorical moves, metaphors, themes, images, anything—that may thereafter be coded whole and subsequently analyzed. (5.3; 12.2.3)

Units:

Decontextualized but information-bearing textual wholes that are distinguished within an otherwise undifferentiated continuum and thereafter considered separate from their context and independent of each other (5.1). Content analysts distinguish three kinds of units by the functions they serve within an analysis: Sampling, Recording or Coding, and Context units (5.2). Units may be distinguished physically (5.3.1), syntactically (5.3.2), categorially (5.3.3), propositionally (5.3.4), and thematically (5.3.5).

Units of enumeration:

The units that are actually counted or measured. Although Recording/coding units may be described numerically, for example, in terms of their size, their location, or their statistical properties, enumeration typically follows Coding, and units of enumeration often coincide with Recording/coding units. (5.2.3)

Unobtrusiveness:

The attribute of a Research method that uses Data about which their sources or object of study have no knowledge, and the collection of which does not affect these sources or objects. Content analysis is mostly unobtrusive, as it relies on Text not generated for the purpose of analysis. By contrast, survey research and psychological experiments are obtrusive, as asking questions or instructing subjects affects those studied. (2.5)

Unstructuredness:

The attribute of phenomena that appear in a form that available analytical techniques cannot (yet) handle. Unstructured phenomena may be transient, as is human speech, or possess properties that are variable and complex, as in most human communications. Much of content analysis involves transforming unstructured matter into analyzable Text. (2.5)

Validating evidence:

Data that substantiate analysts’ results by means other than those used in the analysis. In Content analysis, validating evidence is largely ex post facto, as its Research questions are posed in the very absence of direct evidence for what it seeks to infer from available Text. (2.4.6)

Validity:

The quality of a claim to be as stated, true, or correct. A Content analysis is valid to the extent the Inferences about the Context of the analyzed Texts withstand the test of independently obtained Validating evidence. (13)

Variable:

A conceptual unity that embraces alternatives, something that can vary. Variables are essential to any Data language. Open variables are placeholders for Units as observed or found in Text. Closed variables have a predefined range of or consist of a list of conceptually possible values (8.3). The values of a variable may be unordered (8.4), or ordered variously (8.5), and possess diverse Metrics (8.6).

Varying probability sample:

A sample whose Units are selected according to known probabilities of their relevance to a Research question. It compensates for known inequalities in the units’ informativeness, importance, or self-sampling biases. (6.2.4)

Webgraphs:

Graphical depictions of the links between Web pages available on the Internet. Inasmuch as links are established by website designers for directing readers’ attention to other relevant matter, they represent potential traffic and networks of meanings within communities of users. (11.3.4)

WordNet:

A lexical database for the English language. It groups English words into sets of synonyms, similar to a thesaurus, provides short definitions for these sets, and orders them according to the semantic relations between them. Its organization is based on common uses of language. It supports Text mining and computational linguistics approaches to information processing and has been used in the development of content analysis instruments. (11.3.2)

Note: Numbers in parentheses within and following glossary entries refer to sections or whole chapters of this volume in which the concepts are discussed in depth.

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Summary

Contents

Subject index

Glossary

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends