# Statistical Analysis

Statistical analysis of case data allows researchers to explain quantitative findings in the case study and, in some instances, make inferences about the population from which the case is drawn. Descriptive techniques help present the collected information in summary fashion, such as tables, graphs, and central tendency indices, or to reflect embedded patterns within data. Inferential techniques enable researchers to estimate one or more characteristics of the larger population from which the case was drawn and test specific hypotheses.

Quantitative case studies contain numerical data that must undergo further analysis to form conclusions. A large proportion of such case studies employ descriptive techniques. Because inferential techniques require specific types of data, these are less frequently seen, especially in one-shot case studies.

The measurement level used in a case study determines the choice of data analytic technique. In case studies employing a nominal level of measurement, which uses numbers simply to classify respondents into groups (e.g., male or female), frequency and cross-tabulation tables are most common. Ordinal measurements reflect inherent rank ordering of observed phenomena (e.g., work performance of five employees) and use the mean, median, and mode to indicate rank differences. The mean denotes the average value of a distribution, whereas the median and mode represent the middlemost and typical values in the same distribution, respectively. When multiple observations on two or more variables are available, Spearman rank correlation coefficients can also be computed. A value of 1 indicates perfect agreement between ranks of two variables, such as teacher effectiveness and student performance; a value of–1 indicates that the ranks of one variable are in exactly the opposite order as the ranks of the other; and a value of near zero indicates that the two variables are independent.

With interval-level measurement the numbers take on new meaning. Typically, ratings on a scale, such as “below average” (1), “average” (2), and “above average” (3), are employed here. In this instance, order and quality of magnitude of the differences in the data can be computed along with measures of central tendency such as the mean, median, and mode. However, interval scales do not have an absolute zero point; hence, we cannot say that a teacher who earned a score of 2 is twice as good as another who earned a score of 1.

An interval scale with an absolute zero becomes a ratio scale and permits the researcher to perform various mathematical operations (add, subtract, multiply, and divide).

A popular approach to summarize and analyze quantitative information is the frequency distribution. For example, in a case study on a school, teachers' attitudes toward mandatory health food outlets in schools can be represented as a frequency distribution that shows the number of teachers (or frequencies) who have a particular attitude (favorable, unfavorable, no opinion). Problems emerge when category definitions are ambiguous or not mutually exclusive. Are there degrees of favorable responses to the question posed to the respondents? If someone favors both health food and fast food in the school, is his or her response comparable to that of another who gave the response “no opinion”? When precise definitions of each category are absent, frequency distributions of observations become difficult.

When observed values show marked differences, computation of variance becomes important. In this instance, a single index, such as a mean or median, is incapable of validly summarizing the available information. For example, consider the test scores of three students in each of three classes: Class A = 49, 50, 51; Class B = 40, 50, 60; Class C = 10, 50, 90. Although the average and median scores in each class are identical, there is vast difference in the class performance. In such instances, additional measures of dispersion are useful. The range scales the distance between the highest and lowest score in a distribution, and the standard deviation denotes the square root of the sum of the squared deviations about the mean divided by the number of deviations; in other words, it tells the researcher how far, on average, each score deviates from the mean. The standard deviation is the most popular measure of variation.

When describing socioeconomic or demographic data, it often is helpful to assess the skewness of the data. Consider the following six income levels in a small community: $9,000, $10,000, $11,000, $20,000, and $120,000. Five of the six incomes are less than the mean value of $30,000. In such instances, the mean is considerably higher than median, making the distribution positively skewed. There are other instances in which the mean is vastly lower than the median, resulting in a negatively skewed distribution. In both instances, the median is a better representative because it is not affected by extreme values in the distribution. An understanding of the nature of distribution enables the researcher to use the appropriate index to describe the collected data.

In several instances, an index of correlation that reflects the relationship between two variables may be relevant. If the grades students receive are consistently and positively related to their teachers' length of training, one can conclude that the two variables are correlated—that is, that students who are taught by highly trained teachers are likely to earn better grades. Although a number of indices of correlation exist, perhaps, the most popular two are the Pearson r, named after Karl Pearson (1857–1936), and Spearman's rank correlation, named after Charles Spearman (1863–1945). It should be noted that even if two variables are highly correlated, one cannot infer causality between them (i.e., assume that changes in one variable causes changes in the other) in the absence of other conditions.

In several instances, nonparametric regression, which relaxes the assumption of linearity between variables and replaces it with a weaker assumption of a smoother population regression function, may be viable. This is particularly helpful in longitudinal case studies (e.g., a case study assessing women's participation in the labor force over decades).

By their very nature, case studies cannot be used to establish cause–effect relationships. Not only is there insufficient sample size (an n of 1, typically), but the very purpose of case study is not to establish cause–effect linkages. This also means that the generalizability of findings in most case studies is limited.

Despite the fact that case studies are not intended to establish cause–effect relationships, it may be fruitful at times to employ inferential statistics, especially if research consists of multiple or longitudinal case studies or if the case study focuses on an organization that possesses a considerable volume of archival data at the ordinal level or higher. The more important approaches to inferential statistics are outlined in the following sections.

When the nature of probability distribution from which sample is drawn is known, it is possible to test hypotheses about its parameters. Most parametric statistical tests require measurements at least at the interval level. Typical parametric tests include the Z test, the Student's t test, the F test, and analysis of variance. Most of these tests assume knowledge of the shape of the distribution and/or underlying parameters. The focus typically is on testing for true differences in population means or variances based on observed sample data. These tests permit the researcher to make inferences about the population at prespecified confidence levels. Popular statistical confidence levels are 95% and 99%, indicating that the researcher's conclusions are valid 95 or 99 times out of 100. However, even a conclusion formed at a 99% level confidence may not have practical significance. This is particularly the case when the findings are not grounded in theory or observations became statistically significant due to large numbers of observations. For example, a correlation of .3 can become statistically significant with 250 observations or more, although the independent variable is able to explain only 9% of the variance of the dependent variable.

In several case studies it may be difficult or even impossible to make assumptions about the form of probability distribution of sample data. Many case studies also measure data only at nominal or ordinal levels. In such circumstances the researcher can use distribution-free nonparametric statistics. Statistical tests based on signs of differences, ranks of measurements, or counts of observations falling into specific categories are used to interpret the data. The more popular nonparametric tests are the median test, the Mann–Whitney U test, Wilcoxon's signed rank test, the Kruskal–Wallis statistic, and the Kolmogorov–Smirnov test. The median test is used to test the hypothesis that a set of n randomly drawn measurements originated from a population with a specific median. The Mann–Whitney U test is used to determine whether two random samples have been drawn from the same population or different populations. The Wilcoxon signed rank test compares matched samples to test the equivalence of their mean. The Kruskal–Wallis test is an extension of the Mann–Whitney U test and is used when the researcher is interested in comparing more than two populations. Finally, the Kolmogorov–Smirnov test is used to test for goodness of fit when the population distribution is specified. It should be noted that nonparametric tests can also be applied to data measured at interval or ratio levels, although the power of these tests are lower than parametric tests.

In analyzing case data, comparing the observed pattern of relationships among variables with a theoretically predicted one is a popular strategy. Depending on the volume and quality of data, simple frequency comparisons or correlational analyses may be appropriate. Use of multiple dependent variables assessed using distinct measures and instruments adds robustness to the analysis.

Particularly relevant to exploratory case studies is the explanation building process, whereby the researcher attempts to make causal links based on existing theory or sound iterative analysis of data. Depending on the nature and volume of data available, correlational or path analytical approaches are feasible. In some instances, use of advanced structural equation models using statistical programs such as LISREL or AMOS may also be possible.

In longitudinal case studies, values of a variable can be recorded over an extended period of time, making time series analysis viable. Even with a single participant, multiple assessments over time permit the researcher to make comparisons and form conclusions. For example, a single variable (e.g., student performance) may be recorded over time, and patterns noted. Here, the student may undergo some form of intervention (e.g., counseling) in one time period and his or her performance recorded. In another period, without counseling, performance is again assessed and differences with the treatment period noted. Statistical tests can help identify factors associated with the pattern. In complex time series designs multiple variables and multiple trends in the progression of a variable can be simultaneously tracked.

Categorical data take a finite or countable number of values and usually belong to a multi-nomial distribution, a distribution of a random sample from a population over the various categories measurable for each individual item in the population. Examples of categorical variables are social classes, age groups, number of political parties, and number of religious groups, where the given sample of people can be classified into one of the predetermined categories. Contingency tables, in which the observed values for categorical variables are cross-classified in a tabular form and relationships among them investigated, are popular. Statistical measures of association such as chi-square, Fisher's exact test, phi, Cramer's V, and Kendall's tau are most popular. Correspondence analysis combines the mathematics of the contingency table with a graphical technique to explore the structure of relationships among variables. The result is the representation of relationship among the variables in a graphical format, with points on the graph representing the categories of the variables under study. Multidimensional contingency tables, in which the main and interaction effects can be separately identified, may often require log-linear models. When the variable of interest to the researcher is binary and the explanatory variables are categorical, a logit model of regression can be used to identify interdependencies. For example, a researcher may use logit regression to understand the impact of road conditions (good or bad), driver experience (high or low), and size of vehicle (large or small) on highway accidents (coded as a binary variable, with 1 indicating not fatal and 2 indicating fatal). When the explanatory variables are continuous, logistic regression models are more useful to the researcher.

In several health-related, sociological, educational, and developmental case studies either a phenomenon or its predictor, or both, may be functional in nature, permitting functional adaptive model estimation. For example, in a school that employs two primary pedagogical approaches, the impact of a chosen approach on student achievement can be functionally portrayed. Linear or nonlinear regression models can be formulated depending on the data and assumptions. Functional analysis can be applied with functional predictors and scalar responses (or criteria). Researchers can also attempt nonparametric functional modeling when the assumptions behind the data do not warrant standard functional modeling.

Paul Friesema's study of the impact of natural disasters on surrounding communities offers a good example of using statistical tools to analyze case data. Friesema studied the impact of disasters on four communities: (a) Yuba City, California (1955); (b) Galveston, Texas (1961); (c) Conway, Arkansas (1965); and (d) Topeka, Kansas (1966). Extensive time series data on various social and economic indexes were collected, and a complex time series model was used to analyze the data. The statistical analysis revealed that the disastrous events had very little long-term effects on the communities, although they had considerable short-term impact.

Ramsay and Silverman's study of the nondurable goods index in the United States over eight decades spanning 1920 to 2000 offers a good illustration of the use of functional analysis to gather new insights into a phenomenon. Using the approach, they were able to show that the growth rate for the index was especially high from 1960 to 1975, when the Baby Boom generation was in its peak consumption period. In subsequent years, the growth was found to be substantially lower; this was attributed to a probable reduction in consumption. Using statistical analysis, the researchers were also able to separate the impact of World War II and the end of the Vietnam war in 1974.

John Fox elaborated on the application of nonparametric regression to a number of settings, such as analyzing countries' infant mortality rates, analyzing married women's labor force participation, and the rating of occupations on education and income levels by using census data.

Although many case studies collect statistics, the depth of data analysis shows marked variation across them. Careful planning can facilitate collection of higher quality data, permitting sophisticated data analysis. Regression and correlational analyses are helpful, but one should always recognize the possibility of spurious correlations among variables not grounded in theory. Use of multiple sites and longitudinal data enable researchers to compare findings across different geographic, occupational, or temporal dimensions, permitting the formulation of richer, more grounded hypotheses. However, even here causal linkages should be established only carefully and only after controlling for potential confounding variables.

## Reader's Guide

- Case Study Research in Anthropology
- Case Study Research in Business and Management
- Case Study Research in Business Ethics
- Case Study Research in Education
- Case Study Research in Feminism
- Case Study Research in Medicine
- Case Study Research in Political Science
- Case Study Research in Psychology
- Case Study Research in Public Policy
- Case Study Research in Tourism
- Case Study With the Elderly
- Ecological Perspectives
- Healthcare Practice Guidelines
- Pedagogy and Case Study
- Before-and-After Case Study Design
- Blended Research Design
- Bounding the Case
- Case Selection
- Case-to-Case Synthesis
- Case Within a Case
- Comparative Case Study
- Critical Incident Case Study
- Cross-Sectional Design
- Decision Making Under Uncertainty
- Deductive-Nomological Model of Explanation
- Deviant Case Analysis
- Discursive Frame
- Dissertation Proposal
- Ethics
- Event-Driven Research
- Exemplary Case Design
- Extended Case Method
- Extreme Cases
- Healthcare Practice Guidelines
- Holistic Designs
- Hypothesis
- Integrating Independent Case Studies
- Juncture
- Longitudinal Research
- Mental Framework
- Mixed Methods in Case Study Research
- Most Different Systems Design
- Multimedia Case Studies
- Multiple-Case Designs
- Multi-Site Case Study
- Naturalistic Inquiry
- Natural Science Model
- Number of Cases
- Outcome-Driven Research
- Paradigmatic Cases
- Paradigm Plurality in Case Study Research
- Participatory Action Research
- Participatory Case Study
- Polar Types
- Problem Formulation
- Quantitative Single-Case Research Design
- Quasi-Experimental Design
- Quick Start to Case Study Research
- Random Assignment
- Research Framework
- Research Objectives
- Research Proposals
- Research Questions, Types of Retrospective Case Study
- Rhetoric in Research Reporting
- Sampling
- Socially Distributed Knowledge
- Spiral Case Study
- Statistics, Use of in Case Study
- Storyselling
- Temporal Bracketing
- Thematic Analysis
- Theory, Role of
- Theory-Testing With Cases
- Utilization
- Validity
- Agency
- Alienation
- Authenticity and Bad Faith
- Author Intentionality
- Case Study and Theoretical Science
- Contentious Issues in Case Study Research
- Cultural Sensitivity and Case Study
- Dissertation Proposal
- Ecological Perspectives
- Ideology
- Masculinity and Femininity
- Objectivism
- Othering
- Patriarchy
- Pluralism and Case Study
- Power
- Power/Knowledge
- Pragmatism
- Researcher as Research Tool
- Terroir
- Utilitarianism
- Verstehen
- Abduction
- Bayesian Inference and Boolean Logic
- Bricoleur
- Case-to-Case Synthesis
- Causal Case Study: Explanatory Theories
- Chronological Order
- Coding: Axial Coding
- Coding: Open Coding
- Coding: Selective Coding
- Cognitive Biases
- Cognitive Mapping
- Communicative Framing Analysis
- Complexity
- Computer-Based Analysis of Qualitative Data: ATLAS.ti
- Computer-Based Analysis of Qualitative Data: CAITA (Computer-Assisted Interpretive Textual Analysis)
- Computer-Based Analysis of Qualitative Data: Kwalitan
- Computer-Based Analysis of Qualitative Data: MAXQDA 2007
- Computer-Based Analysis of Qualitative Data: NVIVO
- Concept Mapping
- Congruence Analysis
- Constant Causal Effects Assumption
- Content Analysis
- Conversation Analysis
- Cross-Case Synthesis and Analysis
- Decision Making Under Uncertainty
- Document Analysis
- Factor Analysis
- Fiction Analysis
- High-Quality Analysis
- Inductivism
- Interactive Methodology, Feminist
- Interpreting Results
- Iterative
- Iterative Nodes
- Knowledge Production
- Method of Agreement
- Method of Difference
- Multicollinearity
- Multidimensional Scaling
- Over-Rapport
- Pattern Matching
- Re-Analysis of Previous Data
- Regulating Group Mind
- Relational Analysis
- Replication
- Re-Use of Qualitative Data
- Rival Explanations
- Secondary Data as Primary
- Serendipity Pattern
- Situational Analysis
- Standpoint Analysis
- Statistical Analysis
- Storyselling
- Temporal Bracketing
- Textual Analysis
- Thematic Analysis
- Use of Digital Data
- Utilization
- Webs of Significance
- Within-Case Analysis
- Action-Based Data Collection
- Analysis of Visual Data
- Anonymity and Confidentiality
- Anonymizing Data for Secondary Use
- Archival Records as Evidence
- Audiovisual Recording
- Autobiography
- Case Study Database
- Case Study Protocol
- Case Study Surveys
- Consent, Obtaining Participant
- Contextualization
- Critical Pedagogy and Digital Technology
- Cultural Sensitivity and Case Study
- Data Resources
- Depth of Data
- Diaries and Journals
- Direct Observation as Evidence
- Discourse Analysis
- Documentation as Evidence
- Ethnostatistics
- Fiction Analysis
- Field Notes
- Field Work
- Going Native
- Informant Bias
- Institutional Ethnography
- Interviews
- Iterative Nodes
- Language and Cultural Barriers
- Multiple Sources of Evidence
- Narrative Analysis
- Narratives
- Naturalistic Context
- Nonparticipant Observation
- Objectivity
- Over-Rapport
- Participant Observation
- Participatory Action Research
- Participatory Case Study
- Personality Tests
- Problem Formulation
- Questionnaires
- Reflexivity
- Regulating Group Mind
- Reliability
- Repeated Observations
- Researcher-Participant Relationship
- Re-Use of Qualitative Data
- Sensitizing Concepts
- Subjectivism
- Subject Rights
- Theoretical Saturation
- Triangulation
- Use of Digital Data
- Utilization
- Visual Research Methods
- Activity Theory
- Actor-Network Theory
- ANTi-History
- Autoethnography
- Base and Superstructure
- Case Study as a Methodological Approach
- Character
- Class Analysis
- Closure
- Codifying Social Practices
- Communicative Action
- Community of Practice
- Comparing the Case Study With Other Methodologies
- Consciousness Raising
- Contradiction
- Critical Discourse Analysis
- Critical Sensemaking
- Dasein
- Decentering Texts
- Deconstruction
- Dialogic Inquiry
- Discourse Ethics
- Double Hermeneutic
- Dramaturgy
- Ethnographic Memoir
- Ethnography
- Ethnomethodology
- Eurocentrism
- Families
- Formative Context
- Frame Analysis
- Front Stage and Back Stage
- Gendering
- Genealogy
- Governmentality
- Grounded Theory
- Hermeneutics
- Hybridity
- Imperialism
- Institutional Theory, Old and New
- Intertextuality
- Isomorphism
- Langue and Parôle
- Layered Nature of Texts
- Life History
- Logocentrism
- Management of Impressions
- Means of Production
- Metaphor
- Modes of Production
- Multimethod Research Program
- Multiple Selfing
- Native Points of View
- Negotiated Order
- Network Analysis
- One-Dimensional Culture
- Ordinary Troubles
- Organizational Culture
- Paradigm Plurality in Case Study Research
- Performativity
- Phenomenology
- Practice-Oriented Research
- Praxis
- Primitivism
- Qualitative Analysis in Case Study
- Qualitative Comparative Analysis
- Quantitative Single-Case Research Design
- Quick Start to Case Study Research
- Self-Confrontation Method
- Self-Presentation
- Sensemaking
- Sexuality
- Signifier and Signified
- Sign System
- Simulacrum
- Social-Interaction Theory
- Storytelling
- Structuration
- Symbolic Value
- Symbolic Violence
- Thick Description
- Writing and Difference
- Case Study and Theoretical Science
- Chicago School
- Colonialism
- Constructivism
- Critical Realism
- Critical Theory
- Dialectical Materialism
- Epistemology
- Existentialism
- Families
- Formative Context
- Frame Analysis
- Historical Materialism
- Interpretivism
- Liberal Feminism
- Managerialism
- Modernity
- North American Case Research Association
- Ontology
- Paradigm Plurality in Case Study Research
- Philosophy of Science
- Pluralism and Case Study
- Postcolonialism
- Postmodernism
- Postpositivism
- Poststructuralism
- Poststructuralist Feminism
- Radical Empiricism
- Radical Feminism
- Reality
- Scientific Method
- Scientific Realism
- Socialist Feminism
- Symbolic Interactionism
- Analytic Generalization
- Audience
- Authenticity
- Concatenated Theory
- Conceptual Argument
- Conceptual Model: Causal Model
- Conceptual Model: Operationalization
- Conceptual Model in a Qualitative Research Project
- Conceptual Model in a Quantitative Research Project
- Contribution, Theoretical
- Credibility
- Docile Bodies
- Equifinality
- Experience
- Explanation Building
- Extension of Theory
- Falsification
- Functionalism
- Generalizability
- Genericization
- Indeterminacy
- Indexicality
- Instrumental Case Study
- Macrolevel Social Mechanisms
- Middle-Range Theory
- Naturalistic Generalization
- Overdetermination
- Plausibility
- Probabilistic Explanation
- Process Tracing
- Program Evaluation and Case Study
- Reporting Case Study Research
- Rhetoric in Research Reporting
- Statistical Generalization
- Substantive Theory
- Theory-Building With Cases
- Theory-Testing With Cases
- Underdetermination
- ANTi-History
- Case Study as a Teaching Tool
- Case Study in Creativity Research
- Case Study Research in Tourism
- Case Study With the Elderly
- Collective Case Study
- Configurative-Ideographic Case Study
- Critical Pedagogy and Digital Technology
- Diagnostic Case Study Research
- Explanatory Case Study
- Exploratory Case Study
- Inductivism
- Institutional Ethnography
- Instrumental Case Study
- Intercultural Performance
- Intrinsic Case Study
- Limited-Depth Case Study
- Multimedia Case Studies
- Participatory Action Research
- Participatory Case Study
- Pluralism and Case Study
- Pracademics
- Processual Case Research
- Program Evaluation and Case Study
- Program-Logic Model
- Prospective Case Study
- Real-Time Cases
- Retrospective Case Study
- Re-Use of Qualitative Data
- Single-Case Designs
- Spiral Case Study
- Storyselling

- All
- A
- B
- C
- D
- E
- F
- G
- H
- I
- J
- K
- L
- M
- N
- O
- P
- Q
- R
- S
- T
- U
- V
- W
- X
- Y
- Z