‘What Is a case?’ was originally a conference that Howard Becker and I organized almost two decades ago (Ragin and Becker 1992). The basic idea was to invite eight distinguished scholars to offer their answers to a question that seemed both foundational and largely ignored. The eight were chosen to yield a variety of answers, and vary they did. One thing we found was that the ‘case question’ was a Rorschach: a stimulus that provoked diverse and revealing responses. Just as the Rorschach test can reveal subtle aspects of a person's personality, answers to ‘What Is a case?’ reveal social scientists' varied methodological commitments and their diverse epistemological and ontological positions.
My co-editor for this Handbook, David Byrne, shares with me the conviction that the case concept is central to social research and proposed that we do this Handbook in part as a way to revisit core methodological issues raised, but certainly not resolved, in Ragin and Becker (1992). However, this book goes far beyond its predecessor in several important respects. First, it takes as its starting point the centrality of case-oriented research to contemporary social science and seeks to improve its practice. One implicit goal of Ragin and Becker (1992) was to make the case for case-oriented research by bringing the case concept to the foreground of social science discourse (see also Feagin et al. 1991). Comparing today with the early 1990s, it is clear that the status of case-oriented work has improved, and scholars can describe their work as case-oriented without feeling awkward or vulnerable to attack. Second, unlike its predecessor, which was short on practical advice, this Handbook is both conceptually oriented and practically oriented. It not only revisits conceptual issues addressed in the previous work but also raises an array of new issues, packaged around discussions of a variety of case-oriented techniques. The practical advice offered in the present book spans the entire spectrum of case-oriented inquiry, with a special emphasis on analytic [Page 523]techniques that maintain the integrity of cases through the research process and also provide ways of viewing cases as coherent bundles of aspects and attributes (e.g. cluster analysis, correspondence analysis, single-case probabilities, qualitative comparative analysis). Third, many of the contributions to this Handbook explicitly engage the realist perspective in some way. In essence, to posit cases is to engage in ontological speculation regarding what is obdurately real but only partially and indirectly accessible through social science. Bringing a realist perspective to the case question deepens and enriches the dialogue, clarifying some key issues while sweeping others aside. For example, while it is clear that cases, at their very best, are invoked by researchers, it is also true that some case constructions are vastly and demonstrably superior to others, especially in terms of their degree of resonance with the objects of study.
This Handbook documents the considerable progress of the last two decades, both in case-based analytic procedures and in the discussion of conceptual issues linked to the case concept. The contributions to this Handbook – like the contributions to ‘What is a case?’ – underscore the importance of the case concept to social research and also its Rorschach nature. The case concept gets at the heart of what social scientists do and how they think about what they do. It is difficult to imagine an empirical social science that is also ‘caseless.’
In this chapter, I offer several reflections on key issues raised in this handbook. First, I revisit the concept of ‘casing’ that I first elaborated in Ragin and Becker (1992). Then, I turn to several issues in applying this concept. I explore the issue of casing across several research design, with special attention to the issue of ‘negative cases’ and the relation of casing to another core but rarely questioned foundational concept, the idea that social scientists study (and make inferences about) ‘populations.’ At various points in this essay, I challenge practices that are central to conventional quantitative social science and elaborate approaches that are more compatible with the realist perspective, especially its synthesis with complexity theory. For example, there is a clear tension between what I call ‘casing by outcome,’ an approach that is more compatible with the realist perspective, and the more conventional ‘casing by population,’ which usually posits more cross-case homogeneity or comparability than may be warranted. A taken-for-granted or ‘given’ population may contain several distinct types of cases. Treating heterogeneous cases in a taken-for-granted population as comparable instances of ‘the same thing’ is little more than asserting comparability by fiat and runs the risk or relegating important case differences to the error vector of linear models. As one alternative to simply assuming comparability, I offer ‘possibility analysis,’ a method that directly examines the degree to which the members of a given set of cases are candidates for the outcome under investigation (see also Mahoney and Goertz 2004).
In Ragin (1992), I proposed an approach to the issue of cases that I hoped would make it more tractable. The approach I suggested was to shift the discussion from its primarily ontological focus (Can cases be specified in some universally coherent and valid manner?) to more practical concerns, especially research practices (How, when, and why do researchers invoke cases?). In this formulation, ‘casing’ is a more-or-less routine research act, especially in the social sciences. Researchers ‘case’ their evidence in order to bring closure to difficult issues in conceptualization and research design and thus allow analysis to proceed. Empirical evidence is infinite in its complexity, specificity, and contextuality. Casing focuses attention on specific aspects of that infinity, highlighting some aspects as relevant and obscuring others. For example, it matters greatly whether a set of actions by a group of individuals is characterized as ‘dissonance reduction,’ ‘collective behavior,’ [Page 524]‘collective action,’ ‘resource expenditure’ by a ‘social movement organization,’ or ‘incipient institutionalization.’ Different casings provide different blinders, different findings, and different connections to theory, research literatures, and research communities. Casing locates research in the vast domain of social science, linking it to the efforts of some researchers and severing its connections with others.
The idea of casing resonates with the realist perspective in several ways. First, the realist perspective keeps the case concept in the foreground. Rather than assuming that cases are mere observations, much like coin tosses, contained within given populations, the realist perspective supports the idea that cases are real entities and reflect the operation of actual causal mechanisms and processes. Of course, these entities and mechanisms are shrouded, even by our conceptions of them as cases, but still they discipline both our casings and our attempts to conceptualize and understand. Second, the realist perspective resonates strongly with the central idea that casing is a tentative and iterative process. The casing that inspires an investigation is always open to refinement and revision. By the time the research is complete, both the cases and their casing may have shifted substantially. Finally, it is clear that the realist perspective accepts that social phenomena are basically unknowable in a way that many social scientists would like. These phenomena do not arrive at our doorsteps in neat packages, ready to be opened and admired. Rather social phenomena and inordinately complex, contingent, and context specific. When we case social phenomena we gain brief glimpses of the possibility that social phenomena are ordered in some way that we can grasp.
Complex realism (see especially the chapters by my co-editor David Byrne [Chapter 5] and David Harvey [Chapter 1], in Part 1 of this handbook) also brings important insights to casing, in part explaining why it so often seems fluid and tentative. Complex realism views cases in social research as ‘complex’ as opposed to ‘simple’ systems. Simple systems generally conform to the expectations of linear models, which means that researchers can simply sum the impact of the relevant causal vectors in order to make predictions regarding a system's behavior. Complex systems, by contrast, are composed internally of bundles of interrelated aspects that can be understood only in these terms, as mutually reciprocal influences that together constitute the complex whole. Furthermore, complex systems have the capacity for qualitative, case-wide change, which can follow from routine, small-scale changes that accumulate and cascade through the system as a whole. In part, phase shifts (qualitative changes) occur due to the interconnectedness of system aspects and their reciprocal determination. The complex realist perspective provides important tools for social scientists when confronted with the difficult task of casing, with its emphasis on phase shifts and qualitative change providing clues as to how aspects are interconnected. Cilliers' (1998, 2001) notion that the boundaries of a system (and its ‘caseness’) are analytically inseparable from the processes that constitute the system is very useful here, as well, suggesting that the processes that produce phase shifts are central to the constitution of the system as whole.
It is always possible to push casing too far – to homogenize social phenomena in ways that contradict their specificity and their integrity. Complex realism offers an important caution not to betray social phenomena, by making comparable cases of things that are not. Casings are invoked; they also can be revoked. Complex realism recognizes the necessity of their invocation, but also celebrates their refinement, their revision, and even their revocation.
One enduring source of confusion in the discussion of casing in the social sciences is that it has a dual nature, for the casing of a study can be based on either the outcome of a larger population of candidates for the outcome. For example, a researcher might [Page 525]study multiple instances of ethnic political mobilization and describe his or her cases in these terms, explicitly casing the study according to the outcome shared by the cases included in the study. However, a second researcher might conduct a study of ethnic political mobilization that includes countries with politically relevant ethnic differences, regardless of whether ethnic political mobilization actually occurred. Thus, the second study incorporates ‘negative cases’ of ethnic political mobilization. In this second framing, the cases of the study are not countries with ethnic political mobilization, but the larger population of countries in which ethnic political mobilization is thought to be possible (which, of course, includes both positive and negative cases). Thus, the question ‘What Is the case?’ can have different answers in studies that might appear, at first glance, to have identical casings. In the first study, casing is centered on the phenomenon in question – the set of cases with the outcome. In the second study, casing is centered on a larger population embracing both positive and negative instances.
Before discussing casings as outcomes versus populations further, it is important to address a common misunderstanding in the social sciences regarding the so-called error of ‘selecting on the dependent variable.’ Many case-oriented studies examine only positive cases (e.g. only actual instances of ethnic political mobilization, with negative cases excluded). In fact, the first step in much case-oriented inquiry is to identify the best possible instances of the phenomenon to be explained and then to study these instances in great depth. This research strategy is consistent not only with the realist perspective, but also with Lars Mjøset's ‘contextualist’ definition of cases, which he describes in Chapter 2: an outcome, the processes leading to the outcome, and the enabling features of the setting in which it occurred. Implicit in this formulation is the idea that casing is outcome driven and that instances of the outcome, on the one hand, and the population of relevant cases, on the other, are one and the same. The logic of this approach is straightforward, for it is very difficult, if not impossible, to ‘process trace’ phenomena that do not exist or have yet to occur. Imagine, for example, trying to process trace the emergence of publicly funded paternity leave programs in poor countries.
Despite the obvious value of studying good instances of the phenomenon of interest, this practice is routinely castigated by conventional quantitative researchers. In Designing Social Inquiry, a book hailed by some as a handbook on how to conduct good qualitative research, King et al. (1994) are unabashedly hostile to this practice. Their reasoning is that studies that select on values of the dependent variable attenuate correlations between causal conditions and outcomes. In various publications, I have explained why this critique, while sound enough from a strictly correlational point of view, is fundamentally misguided (Ragin 1997, 2000, 2008). Of course, negative cases are often quite useful, especially when they offer theoretically decisive contrasts with positive cases. The search for negative cases is also important when researchers seek to establish the sufficiency of a given set of causal conditions for an outcome, because all (or virtually all) of the cases that display this set of conditions should also display the outcome (see Ragin 2000). The key point, however, is that it is wrong to label a study flawed simply because it omits negative cases, for there are many good reasons to study positive cases in isolation from negative cases. One obvious justification is the simple fact that it is very difficult to identify ‘candidates’ for an outcome – as any good negative case should be – without first knowing a lot about positive cases (Ragin 1997).
More pertinent to the topic of casing than the so-called error of selecting on the dependent variable is the tension that exists between casing outcomes and casing populations. The issue can be highlighted by examining the practice of casing across a range of research designs.
First, consider the single case study, which is almost always cased according to the outcome in question. For example, a researcher might conduct a case study of the [Page 526]US Civil War as a ‘modernizing revolution,’ using the conceptual framework Barrington Moore, Jr (1966) elaborated in Social Origins of Dictatorship and Democracy. This researcher would probably argue that findings from her study are relevant to modernizing revolutions in general. Of course, this same set of events could be cased in other ways, for example, as part of a worldwide struggle against slavery, as an attempt to purge the US political economy of its neocolonial vestiges, or in some other manner. The point is not the choice of casing; rather, the point is that when researchers case a collection of processes and events, tied to a specific setting, they tend to focus on the outcome. The fact that multiple casings can be applied to a single case makes it a ‘rich’ case and also illustrates one of the central purposes of casing, which is to provide necessary blinders. Different casings highlight different case aspects and downplay or obscure others. Casing makes it possible for researchers to deal with the inherent complexity, specificity, uniqueness, and contextuality of social phenomena – to conduct social science despite all the obvious obstacles.
Next up in the hierarchy of research designs is the study of a set of cases with the same outcome, such as the one mentioned above – the study of positive instances of ethnic political mobilization. Again, casing in this design is primarily in terms of the outcome. However, the notion of casing by population is more in play, because the researcher in this example must cope with the diversity of cases in a well-defined set (e.g. the set of cases with ethnic political mobilization). Note that because there are multiple instances, the casing of the outcome is, in effect, constrained by observable commonalities in the outcome across relevant cases. In other words, the conceptualization of the outcome, and thus the casing, is more empirically constrained in this design than it is in the single case study. Despite the absence of negative cases, this design can be put to a lot of different uses. Is there a single set of causal conditions linked to the outcome? Do they make sense as necessary conditions? Or are there different sets of causal conditions linked to different subsets of cases with the outcome? Might these different sets of causal conditions signal the existence of subtypes of the outcome? Finally, are there meaningful differences across cases in the outcome, linked to differences in causal conditions? A simple example: are there differences in the success of ethnic political mobilization linked to differences in the political strategies adopted by these movements?
Consider next the addition of purposefully selected negative cases to the design just described. The casing of ‘outcomes versus populations’ now shifts even more in the direction of populations. With the addition of explicitly selected negative cases, the focus is on the larger population of candidates for the outcome, whether or not the outcome has occurred. This set is inherently more difficult to define than the set of cases with the outcome. The definition of this set is heavily knowledge and theory dependent, for it is impossible to know where something might have happened without explicit guidance. Note also that any definition of candidacy for the outcome that a researcher might propose could be contested. For example, one researcher might claim that any country with ethnic diversity is a candidate for ethnic political mobilization; another might claim that there must also be ethnic inequality; another might claim that in addition to ethnic inequality the political system must be nonrepressive; and so on. In short, the question of candidacy concerns the kinds of settings capable of sustaining a process that could lead, potentially, to the outcome in question.
Finally, consider a research design where the casing is almost entirely by population. This design goes well beyond the previous, which involves careful specification of relevant negative cases based on their candidacy for the outcome in question. Instead, the researcher relies on a generic casing and uses a given, preconstituted population of cases. The central goal of research of this type is to construct generalizations about that population. For example, researchers studying ethnic political mobilization might argue that their [Page 527]cases include ‘all formally constituted nation states,’ a generic population, and that the research goal is to construct generalizations about that population. In much research of this type, the question of candidacy for the outcome is bypassed altogether. Researchers instead assume that the cases found in generic populations constitute the proper basis for constructing social scientific generalizations. Notice, however, that the outcome may not be possible for many members of a generic population. For example, ethnic political mobilization may not be even a remote possibility in many ‘formally constituted nation states.’ Because the researcher's focus is on the population as a whole, however, this issue is veiled.1
Some variable-oriented quantitative researchers who favor using generic populations would argue additionally that outcomes should be seen simply as dependent variables and not as emergent phenomena. For example, these researchers might substitute the dependent variable ‘level of ethnic political mobilization’ for the qualitative outcome, ‘emergence of political mobilization along ethnic lines.’ In this view, no mobilization (i.e. a score of zero on level of ethnic political mobilization) is simply one of many possible scores on the dependent variable, and the difference between a score of zero and a score of one is the same as the difference between a score of nineteen and a score of twenty. After all, in this view the goal of empirical analysis is to identify the independent variables that are the best predictors of variation in the dependent variable in a given population. It is clear that in this final step of the progression from the case study to the variable-oriented study, the possibility that the qualitative outcome, ‘ethnic political mobilization,’ might constitute a proper way of casing the study has been eliminated. Instead, casing is constituted by the boundaries of a generic population of substitutable observations. These observations, in turn, display the required variation-to-be-explained in the dependent variable, which in turn is now conceived as a variable aspect of members of the designated population. The possibility that a score of zero on the dependent variable might signal that the outcome is impossible for some cases is now veiled by the assumption that the key research question concerns the properties of a population.2
Whereas focusing on the properties of taken-for-granted populations seems to relieve researchers of the problem of identifying relevant negative cases (via inclusion by fiat), this design merely sidesteps the issue without addressing it. The idea that the central goal of research is to study the properties of populations trumps even the concept of cases, downgrading them to the status of mere ‘observations.’ Below, I sketch an alternative approach to negative cases centered on the systematic identification of cases for which the outcome is possible (i.e. the identification of candidates for the outcome). Before elaborating this approach, I establish important groundwork showing how multidimensional state spaces and truth tables can be used to identify kinds of cases.
In Chapter 5, my co-editor, David Byrne, describes complex systems (i.e. cases) in terms of their ‘co-ordinates … in a multidimensional state space, the dimensions of which are the variate measures describing the system.’ Whereas cases have different trajectories in this multidimensional space, they tend to cluster in specific locations. Furthermore, cases are capable of qualitative change, which means that they can move from one location in the variate space to another, even though the second location may be relatively distant from the first. Most regions of the variate space are devoid of cases, and only a relatively small number of well-populated locations or sectors exist. The variate space contains only a small number of well-populated locations (yielding clusters of cases) because of the way case aspects fit together. That is, aspects of cases cohere in meaningful packages that have a syndrome-like (and thus mutually [Page 528]reinforcing) character. These ‘coherencies’ reflect the interconnectedness of case aspects and the fact that only a limited number of combinations of aspects go together well.
This understanding of complex systems resonates well with the concept of ‘limited diversity’ in qualitative comparative analysis (QCA). In both its crisp set and fuzzy set versions, QCA relies on truth tables, which list all logically possible combinations of causally relevant conditions. With crisp sets, there are 2k logically possible combinations of presence/absence of dichotomies (where k is the number of conditions). With fuzzy sets, a truth table can be used to summarize a k-dimensional vector space, with each row of the truth table corresponding to a specific corner of the k-dimensional space. (There are 2k corners of the multidimensional space defined by the k fuzzy-set causal conditions and thus 2k corresponding truth table rows; see Ragin 2000, 2008.) In essence, a truth table lists the different sectors of a ‘multidimensional state space’ and shows the number of cases in each sector.
In the typical application of the truth table approach, researchers find that most cases are captured by a relatively small subset of truth table rows, which correspond, in turn, to the most populated sectors of the multidimensional state space. In other words, a common finding in truth table analysis is that case diversity is profoundly limited. Often, only a minority of the logically possible combinations of conditions can claim empirical instances. The examination of limited diversity (i.e. the distribution of cases across truth table rows) shows which ‘coherencies’ are empirically common and which combinations of attributes are uncommon (perhaps even impossible).
For an illustration of these principles, consider Table 31.1, which shows a truth-table summary of a variate space for black males in the United States. The data are from the National Longitudinal Survey of Youth, which documents the connections between various background characteristics, especially educational experiences, and socioeconomic status as an adult. The variate traces used to construct the table are married versus not married, children versus no children, degree of membership in the set of cases with high income parents, degree of membership in the set of cases with college education, and degree of membership in the set of cases with low test scores (the Armed Forces Qualifying Test; AFQT).
Consider the distribution of the cases across the thirty-two sectors of the five-dimensional variate space defined by the causal conditions (see columns 1–5 of Table 31.1). The sectors (locations) are ranked according to their case counts, with the most common combination of conditions listed first. Column 6 shows the number of cases in each sector; column 7 shows the cumulative percentage of cases, moving down the table from the most frequent combination, to the second most frequent, and so on. As the table clearly shows, cases are distributed unevenly. In fact, the five most populated sectors of the thirty-two-sector, five-dimensional variate space capture nearly 65% of the cases. The ten most populated sectors embrace more than 80% of the cases, and the eighteen sectors with at least ten cases each account for almost 95% of the cases. Viewed from the opposite end of the table, the fourteen least populated sectors together snare only about 5% of the cases. In other words, nearly half of the thirty-two sectors of the five-dimensional state space are virtually void of cases.
From the perspective of ‘coherencies,’ it is clear that the most populated sectors combine conditions that are linked to poor prospects for socioeconomic advancement. The five most populated sectors (nearly 65% of the cases), for example, all combine low membership in both high parental income and college education. Three of these five also include high membership in low test scores, and none of the five displays the family configuration most often linked to avoiding poverty – married without children. In short, when viewing black males in the US in terms of the characteristics they most often combine, that is, the aspects that most often ‘cohere,’ it is clear that they face substantial obstacles to advance their socioeconomic status.[Page 529]
Using truth tables it is also possible to differentiate combinations of conditions that clearly ‘exist’ in the data from those that do not. The key concern here is not the difference between having at least one case versus having no cases, which simply differentiates the bottom four rows of Table 31.1 from the rest. Rather, the interest is in combinations of conditions registering nontrivial counts of cases versus those registering trivial counts. With a large N and individual-level data, it is important to consider the possibility that measurement error generates assignment error and that some combinations may have cases only as a result of such errors. Thus, in an analysis of this type it is useful to specify a frequency threshold that signals which combinations are nontrivial. The cumulative percentage data shown in column 7 of Table 31.1 are useful for establishing such a threshold. For example, defining rows with frequencies of at least ten cases as non-trivial differentiates the eighteen most populated sectors (containing 94.7% of the cases) from the fourteen least populated sectors (containing 5.3% of the cases). Of course, other frequency thresholds are reasonable, within obvious limits. For example, [Page 530]a threshold of at least fifteen cases captures 88.6% of the cases distributed across fourteen of the thirty-two sectors.
The larger point is that multi-dimensional state spaces, and by implication truth tables, provide a good starting point for thinking about and analyzing the ‘kinds’ of cases that exist in a given collection. This way of approaching the question of kinds differs from some of the more inductive approaches discussed in this handbook (e.g. cluster analysis and correspondence analysis) because it starts with a specification of the key dimensions, rather than arriving at them through a bottom-up analysis of case-level similarities and differences. Specifying kinds of cases, based on an encompassing view of their key attributes, is central to the larger process of casing, addressing the question: ‘What are these cases, cases of?’
The identification of the kinds of cases in a given set is an important gateway to possibility analysis. In essence, the specification of a frequency threshold defines which locations in the multi-dimensional state space are well-populated ‘coherencies’ and therefore worthy of further analysis. Using ‘at least ten cases’ as the threshold identifies eighteen kinds of cases. Using ‘at least fifteen cases’ as the threshold identifies fourteen kinds of cases. The choice of threshold depends ultimately on whether the researcher is more interested in a fine-grained representation (using a lower frequency threshold) or a coarse-grained representation (using a higher frequency threshold). After establishing different kinds of cases, the next step is to evaluate each type with respect to the outcome, determining for each type whether the outcome is possible.
Akey issue-in the study of case outcomes, such as ethnic political mobilization, is the question of candidacy, as discussed previously. Does it make sense to include in a study the analysis of cases that cannot be considered candidates for the outcome? In general, the answer is that it is not sensible, except in those situations where the stated goal of the research is to generalize to a given population, that is, to estimate some population-wide effect or property. Furthermore, that population must be known, uncontested, and relatively well bounded. Absent this goal, researchers must address the issue of candidacy when identifying potential negative cases. Should the UK in the twentieth century be included in the set of countries vulnerable to peasant revolution? Certainly not, and it would be a monumental waste of intellectual labor to pursue this question through in-depth study.
Related to the idea of candidacy is the analysis of possibility (Mahoney and Goertz 2004). The key focus in possibility analysis is determining which cases have a possibility of displaying the outcome. For some cases, an outcome may be virtually certain. For example, children of the rich who attend the top prep schools and universities, earn high marks, and score well on achievement tests are almost certain to achieve a middle class life style, or better, as adults. Cases below these, say those from solidly middle class families, have a decent probability of achieving a middle class lifestyle, but this outcome is far from certain or even nearly certain. Instead, depending on the specific circumstances, the outcome is reached in a probabilistic manner; some cases have a higher probability, and some cases have a lower probability. Finally, at the opposite end of the distribution of causal conditions, there may be those for whom a middle class lifestyle is only a dream. They are the flipside of the first group, because their outcome is completely opposite: they are almost certain not to achieve a middle class lifestyle. In effect, the probabilistic range is sandwiched in between two kinds of quasiuniformity – those who are almost certain to succeed, on the one hand, and those who are almost certain not to, on the other.
Conventional quantitative social science focuses exclusively on the realm of probability. That is, the extremely high probabilities of the outcome at one end (i.e. those who are almost certain to display the outcome) and the extremely low probability of the outcome at the other end are viewed simply as part [Page 531]of the range of probabilities (i.e. as mere quantitative variation) and not as signaling possible qualitative discontinuities or breaks in the range of probabilities. By contrast, set theoretic analysis focuses explicitly on such qualitative breaks. When cases with a specific combination of characteristics (e.g. those displayed by the offspring of the rich) are almost certain to display an outcome (e.g. high socioeconomic status as adults), they constitute a near-perfect subset of the cases with the outcome. Furthermore, the combination of characteristics they share can be considered sufficient for the outcome (that is, assuming this connection resonates with theoretical and substantive knowledge). Likewise, if the cases at the opposite end of the distribution are almost certain not to display the outcome, then these cases constitute a near-perfect subset of those with an absence of the outcome, and the characteristics they share may be considered sufficient for its absence.
The qualitative break between ‘almost certain not to display the outcome’ and ‘a nontrivial probability of displaying the outcome’ is an important and vastly under-explored divide in the analysis of social phenomena. Cases with a non-trivial probability of the outcome (which includes those cases for which the outcome is almost certain) are those for whom the outcome is ‘possible.’ Cases with a null or trivial probability of the outcome are those for whom the outcome is virtually impossible.
For illustration of these ideas, consider again the cases presented in Table 31.1. The analysis of these data, using degree of membership in the set of cases achieving a middle class income (or better) as the outcome, reveals that two of the listed combinations (among the eighteen rows that pass the frequency threshold mentioned above) have truly trivial probabilities of displaying the outcome. Cases in these two rows combine the following four characteristics: they are unmarried with children, and they have low membership in college educated and low membership in high parental income. (They may or may not have low AFQT scores.) The results suggests that there is a chasm separating cases in these two truth table rows from those in the other sixteen combinations of conditions, for the row with the next highest probability of the outcome displays a value well above those registered by these two rows. From a purely probabilistic point of view, cases in these two rows are simply those with a very low probability of the outcome, and a conventional statistical analysis might predict these low probabilities with reasonable accuracy. From a set theoretic point of view, however, the combination of these four conditions is sufficient for virtual exclusion from the outcome. That is, the set-theoretic view differentiates between these cases and the rest in a qualitative manner by focusing on the substantively meaningful difference between trivial and non-trivial probabilities of the outcome.
Expressed as a logical equation, the set-theoretic formulation for exclusion from the outcome is:
where ‘∼’ signals negation or not, ‘•’ signals combined conditions (set intersection – logical and), ‘possible’ is the set of cases for whom the outcome is possible, ‘married’ is the set of married individuals, ‘high_income_parents’ is the set of cases with high income parents, and ‘college’ is the set of cases with college education. By applying De Morgan's Law (see Ragin 1987) to this logical equation, a set-theoretic formulation describing the conditions linked to the possibility of the outcome can be derived:
where ‘+’ signals alternate conditions (set union – logical or). The equation indicates that the possession of any one of these four advantages makes a black male in the US a candidate for inclusion in the set of cases with at least middle-class incomes. This conclusion cannot be derived statistically, but follows instead from a set-theoretic analysis [Page 532]of the evidence. The main point of this exercise, however, is not its substantive conclusion, but rather to demonstrate that researchers can exploit the qualitative break between trivial and non-trivial probabilities and thereby explore conditions of possibility. This exploration relies on the understanding of this qualitative break as a basis for conducting set theoretic analysis.
The ‘possibility analysis’ just presented has a direct bearing on the issue of casing. In the absence of an explicit interest in deriving population-specific properties or estimates, possibility analysis shows which cases are not candidates for the outcome and thus are implausible negative cases. In a conventional quantitative analysis, the inclusion of implausible cases simply inflates theory-confirming correlations. Correlations are strong when there are many cases displaying both the cause and the outcome and, simultaneously, there are many cases displaying the absence of both the cause and the outcome. Both kinds of cases contribute equally to the strength of a correlation, for its calculation is completely symmetrical. The inclusion of cases that are not plausible candidates for the outcome in a correlational analysis simply pads the number of cases in the null-null category or sector, and thus also pads the correlation, for such cases lack both the relevant causal conditions and the outcome.
It is important to point out that many data sets used by social scientists are not true populations or even samples drawn from true populations, but instead are simply convenient collections of cases. Furthermore, most researchers are not that interested in inferring population properties, per se. More commonly, they are interested in looking at patterns across their cases, usually via the application of correlational analysis or some technique based on correlational analysis (e.g. multiple regression, logistic regression, structural equations models, and so on). Researchers routinely subject these convenient collections of cases to conventional statistical analysis as a way to identify important patterns and relationships. When these analyses include a substantial number of cases that are not candidates for the outcome in question, researchers risk inflating correlations and thereby distorting their results. Thus, while it may seem that reliance on given populations offers a safe haven from the problem of identifying valid negative cases, in fact, given populations are often laden with results-distorting irrelevant cases.
Lurking behind my discussion of negative case, populations, and possibility analysis is the implication that treating cases as members of given (and fixed) populations and seeking to infer the properties of populations may be a largely illusory exercise. While demographers have made good use of the concept of population, and continue to do so, it is not clear how much the utility of the concept extends beyond their domain. In case-oriented work, the notion of fixed populations of cases (observations) has much less analytic utility than simply ‘the set of relevant cases,’ a grouping that must be specified or constructed by the researcher. The demarcation of this set, as the work of case-oriented researchers illustrates, is always tentative, fluid, and open to debate. It is only by casing social phenomena that social scientists perceive the homogeneity that allows analysis to proceed.
My critique of the idea of given populations also has policy implications. Conventional quantitative researchers largely focus on the estimation of the net causal effect of independent variables across a large, encompassing, given population of observations. Case-oriented researchers, by contrast, are more focused on kinds of cases and their different fates. As I have noted elsewhere (Ragin and Rihoux 2004), elaborating kinds of cases and studying their different fates is more relevant to policy than the estimation of the net effects of causal variables across a broad population (often containing unknown and unacknowledged heterogeneity). After all, a common goal of social policy is to [Page 533]make decisive interventions, not to move average levels or rates up or down by some small increment. A social policy is most capable of decisive intervention when it is grounded in explicit case-oriented knowledge about specific sets of cases. The idea of population and the notion that the goal of social science is to estimate the properties of populations undermine fine-grained attention to types and kind, and to context and contingency.
This handbook illustrates the substantial progress that has occurred over the past two decades in the field of case-based social research. There have been many important practical advances, and there is now available an array of techniques that address cases as bundles of connected attributes. Today, these techniques can be viewed properly as members of a family of related techniques, all case-centered. Without the ‘case-centered’ label, they appear as separate attempts to analyze social phenomena in ways that diverge from conventional ‘net effects’ approaches (Ragin 2008), where the focus is on parsing each independent variable's unique effect. By joining the different case-centered techniques together, as we have in this handbook, it is possible to see their connections, as well as to imagine new ways of combining them.
Cases are obviously at the foundation of case-centered methods. It is impossible to address case-centered methods without also addressing what is meant by case. Many of the contributions to this handbook address the issue of casing in one way or another, often with the aid of some version of the realist perspective. As I have indicated in this chapter, the process of casing is as varied as the research designs that social scientists contrive. The important underlying commonality is that the process of casing offers a way of seeing, with different casings offering different, though selective and limited, views of the same infinite body of evidence. Casing has the potential to offer glimpses of the underlying processes and entities that are the central focus of any social science that seeks to go beyond the immediate.
1. In many quantitative studies, conditions that might be viewed by case-oriented researchers as those that define the set of relevant cases (e.g. scope conditions; see Goertz and Mahoney, Chapter 17) become independent variables in multivariate analyses and are used to predict variation in the dependent variable. This practice is very common; it is also fundamentally flawed. Suppose, for example, that a quantitative researcher studying variation in ethnic political mobilization in all formally constituted nation states uses ‘degree of ethnic diversity’ as an independent variable, estimating its net effect on the dependent variable. Superficially, this practice appears perfectly reasonable, for there should be a non-trivial correlation between ethnic diversity and ethnic political mobilization across formally constituted nation states. The problem with this approach is that all other predictors of ethnic political mobilization in this analysis should have their effect only when country scores on ‘ethnic diversity’ are nontrivial. In effect, a non-trivial level of ethnic diversity, in this hypothetical analysis, is a necessary condition for ethnic political mobilization and thus should be understood as a condition that must be substantially present for the other causal conditions to have any impact (or at least for the researcher to be able to properly estimate their impact). In practical terms, the implication of this understanding is that ethnic diversity should be included in the model only in multiplicative interaction terms, with ethnic diversity paired with most, if not all, of the other predictors. Of course, a simpler and more straightforward way to address this issue is simply to exclude irrelevant cases (i.e. those with low levels of ethnic diversity). However, quantitative researchers are often reluctant to exclude cases because successful statistical analysis often hinges on having a large number of cases – the larger the N, the better.
2. Of course, there are techniques that address the qualitative break between a score of zero on the dependent variable and a non-zero score, for example, Tobit analysis (see, e.g., Walton and Ragin 1990). In effect, these techniques estimate a model that addresses the differences between zero and a nonzero case and then controls for these effects when estimating the impact of causal conditions on variation in the level of the outcome across the non-zero cases. Although these techniques do address differences between the two basic kinds of cases, the issue is viewed primarily as a data problem and focuses on [Page 534]the technical challenge of estimating the true level of the dependent variable for cases that have zero scores on the outcome, as though these scores have been censored in some way. These techniques do not address the problem of defining the set of relevant cases.