Summary
Contents
Subject index
This handbook thoroughly covers all aspects of evaluation, yet isn't too technical to understand. It offers everything an organization needs to know to get the most out of evaluation' - Nonprofit World. `The Handbook succeeds in capturing and presenting evaluation s extensive knowledge base within a global context. In so doing it provides a useful, coherent and definitive benchmark on the field s diverse and dynamic purposes, practices, theories, approaches, issues, and challenges for the 21st century. The Handbook is an essential reference and map for any serious evaluation practitioner, scholar and student anywhere in the world' - Michael Quinn Patton, author of Utilization-Focused Evaluation. `Readers of this volume will find a set of texts that provide an evocative overview of contemporary thinking in the world of evaluation. This is not a book of simple tips. It does justice to the complex realities of evaluation practice by bringing together some of the best practitioners in the world to reflect on its current state. It is theoretically sophisticated yet eminently readable, anchored in evaluation as it is undertaken in a variety of domains. It is the kind of book that startles a little and makes you think. I highly recommend it' - Murray Saunders, University of Lancaster. In this comprehensive handbook, an examination of the complexities of contemporary evaluation contributes to the ongoing dialogue that arises in professional efforts to evaluate people-related programs, policies and practices. The SAGE Handbook of Evaluation is a unique and authoritative resource consisting of 25 chapters covering a range of evaluation theories and techniques in a single, accessible volume. With contributions from world-leading figures in their fields overseen by an eminent international editorial board, this handbook is an extensive and user-friendly resource organised in four coherent sections:
Introduction: The Evaluation of Policies, Programs, and Practices
Introduction: The Evaluation of Policies, Programs, and Practices
Introduction to the Handbook
A handbook is an ambitious enterprise, but one that also needs delimiting and framing. There are simply too many good ideas and useful perspectives to include in any one volume. The commitments and framework that informed the development of this Handbook are introduced in this chapter. We also offer an overview of and perspective on evaluation.
Evaluation as a Human Activity
Evaluation is a natural part of humans’ everyday life. People make evaluations, in the form of judgments of how good or bad, how desirable or undesirable something is, almost nonstop in the ordinary course of their daily lives. How good was breakfast at the restaurant? How did the meeting with a client go? How good was the play, symphony, or disk jockey last night? Not only is evaluation commonplace. It also appears to be fairly fundamental. Evidence from psychology indicates that, when asked to make an evaluative judgment about some object, people respond more quickly than when asked to make a descriptive statement about the same object (Maio and Olson, 2000; Musch and Klauer, 2003). This may indicate that making evaluative judgments about the world is even more basic than making sense of the world descriptively. Emerging evidence from neuroscience also indicates that making evaluative judgments involves different parts of the brain than making descriptive judgments (Maio and Olson, 2000; Musch and Klauer, 2003). Humans may be hardwired to look at the world evaluatively.
Even though it is natural for humans to make evaluations as part of their everyday lives, both personally and professionally, this kind of everyday, informal evaluative judgment often does not suffice. It has long been known that an individual's informal evaluations can be affected by her or his expectations and preferences (e.g., Shaw, 1984). With everyday, informal evaluation, it usually is not clear to others (and perhaps not even to the individual making the judgment) what has led to a particular evaluative conclusion. Systematic, formal evaluation can, at the very least, make explicit the evidence and criteria on which evaluative judgments are based. Systematic methods of evaluation also can, if not completely eliminate biases, at least help make it clearer or easier to detect what the sources of possible judgmental or evaluative biases are.
In addition, everyday, informal evaluations are typically made by individuals. And any given individual may not have as much information as one would like for making an evaluative judgment that could inspire confidence in others. For instance, staff members working in a social program almost certainly make evaluative judgments about it. But they often don't see what happens to the clients who drop out of the program; they probably don't know what happens to most clients a year after they have completed the program; and they rarely have a good way to gauge what would have happened to clients if they had not been in the program. Systematic evaluation can offer a way to go beyond the evidence available to any individual, as well as to facilitate evaluative processes that are collective and not simply individual.
Disciplined and systematic means of evaluation also allow for evaluative processes to be designed so as to achieve different ends. As is demonstrated throughout this Handbook, systematic evaluation can be aimed at a variety of purposes, including policy making and public accountability, program and organizational improvement, knowledge development, advancement of social justice, and the enhancement of practical wisdom and good practice judgments. The ideas, first, that systematic evaluation can be directed towards different purposes and, second, that the choice of a specific purpose can have implications for how evaluation should be done, permeate this Handbook. Indeed, parts of the Handbook, in particular Part 1 and some of Part 3, are organized around alternative evaluation purposes.
Having made a distinction here between everyday, informal evaluation, on the one hand, and more formal, systematic evaluation, on the other, we should recognize that there is no consensus in the evaluation or the wider social science communities about the relationship between the different kinds of knowledge that are entailed in more formal and more informal evaluative judgments. Some would claim that one is based on the logic of science and the other on the fundamentally different logic of everyday reasoning (e.g., Hammersley, 2003; Polkinghorne, 2000). Others, while developing and emphasizing the significance of alternative forms of knowledge, see them as potentially complementary (e.g., Schmidt, 1993). We do not mean to suggest a chasm between these two forms of evaluation. Everyday evaluation and more systematic evaluation can and should have much in common. To the extent that there are differences between the two, they are differences of degree. And, whenever more formal, systematic evaluation is done, it inevitably rests on and interacts with everyday evaluation activities. To take but one example, the findings reported from a systematic evaluation almost certainly are interpreted in the context of the evaluation users’ pre-existing, everyday evaluations of the policy, program, or practice in question.
More generally, we think there are perhaps three different ways in which informal and formal evaluation have been related. First, many see formal evaluation as an improvement, strengthening and offering more explicit and usable knowledge than informal evaluations. In general form, this position is similar to that held by advocates of critical thinking and evidence-based policy and practice. Second, some would regard informal and formal knowledge as different but, potentially at least, complementary (e.g., Schmidt, 1993). Third, some have seen these as essentially interactive forms of knowledge, such that informal and formal evaluations mutually challenge and question the other in a non-hierarchical way In spite of such complexities, the distinction between everyday, informal evaluation and more formal, systematic evaluation is useful. In particular, readers should recognize that this Handbook focuses, albeit not exclusively, on more formal, systematic approaches to evaluation, rather than on informal, everyday evaluation. To those who make their living as evaluators, and to others aware of the professional activity of evaluators, this focus is not surprising. Regardless, keeping in mind the distinction between everyday evaluation and more formal evaluation is important. Professional evaluators should remember that everyday, informal evaluation takes place without them, that their systematic evaluation work occurs in a world filled with natural evaluative judgments, and that it is but a small fraction of that larger evaluative world.
The Scope of the Handbook
In focusing on more formal evaluation, the Handbook describes and critiques the kind of evaluation carried out by professional evaluators and others who may not label themselves as “evaluators” but who do similar work. We have aimed throughout to respect the diversity of evaluators’ social roles and to avoid unquestioningly treating the evaluation history of any one country or discipline as normative or regulative for any others. Evaluation, whether more formal and systematic or more informal and everyday, can be applied to almost anything. Michael Scriven (1991, 1999), a major figure in the evaluation literature, has referred to evaluating the “six Ps”: programs, policies, performance, products, personnel, and proposals. Indeed, evaluation itself can be evaluated - evaluators have come to call this meta-evaluation.
This Handbook focuses on the evaluation of programs, policies, and practices. We believe it is infeasible to discuss in detail the evaluation of everything in a single handbook. Personnel evaluation, for instance, is a sufficiently specialized area to have its own handbooks (e.g., Anderson, 2001; Evers, Anderson, & Voskuijl, 2005; Lombardi, 1988). The present Handbook addresses “three Ps” - programs, policies, and practices - largely because there is enough commonality across these three in terms of the purposes, methods, and uses of evaluation and because a narrower focus could contribute to a kind of myopia. A secondary reason for focusing on programs, policies, and practices is that the communities of evaluators that examine these three categories have considerable overlap. For example, evaluative activities designed to improve programs often actually involve the evaluation of specific practices being carried out by program staff. As another example, policies are often important precisely because of the programs they stimulate. Policies, programs, and practices are often intertwined.
To achieve greater coherence, the Handbook's focus is restricted to directly people-related programs, policies, and practices. It does not address, for example, energy policies, space exploration programs, defence policy, or animal husbandry practices. Programs, policies, and practices that are not directly people-related can of course be evaluated. Indeed, there are evaluators who specialize in such areas of work. But that is not the focus of the current Handbook.
In this Handbook, we broadly consider policies as legislative and political statements of governmental or organizational intent. A policy sets direction for how resources in a given domain (e.g., education) will be allocated and what substantive foci will receive priority (e.g., a policy promoting the inclusion of diverse people in science and mathematics careers). A program then is a particular enactment of a policy. It offers one concrete representation of how a given policy can be realized, in terms of particular activities and materials provided to a particular target population (e.g., a program designed to attract diverse students to a university campus for scientific laboratory training). And a practice refers to the specific professional interactions that take place within programs (e.g., the teaching or mentoring that diverse students receive during their laboratory training).
Aspirations for the Handbook
This Handbook is written for practicing evaluators, academics, advanced post-graduate students, and evaluation clients. It is intended to offer a definitive, benchmark statement on evaluation theory and practice for the first decades of the twenty-first century. In developing this Handbook, we strove to offer a coherent picture of the nature and role of evaluative inquiry in contemporary twenty-first century societies around the globe. The resulting picture is necessarily pluralistic, because evaluation has many countenances, multiple vested audiences, and diverse ideologies. The picture of evaluation presented in this Handbook is also necessarily dynamic - tracing historical evolutions and projecting future pathways - because evaluation is changing and evolving. The end chapters in each of the four parts specifically address the implications of these issues. This dynamic nature exists partly because evaluation is intrinsically linked to changing societal and scientific ideas and ideals. And the picture of evaluation painted in this volume is of necessity targeted, for even a relatively comprehensive Handbook must leave out some perspectives, some ways of knowing, and some domains of evaluation.
Despite the need to be targeted, this Handbook is also intended to be reasonably comprehensive. The volume includes many diverse evaluation stances. But these are directed toward greater theoretical coherence and practical value than is typical of independent descriptions of alternative perspectives. To achieve a comprehensive statement, the Handbook aspires to an integration of theory, research and practice within each chapter (although the relative mix of theory, research and practice varies from chapter to chapter). Contributors were asked to discuss the strengths and weaknesses of alternative perspectives, including their own - a tough act, requiring them to avoid the Scylla of blandness and the Charybdis of partisanship. Contributors were also asked to consider how evaluation practice is part of a changing social and political context. They look backward as well as forward as they overview the field. They also explore linkages with other disciplines and fields of practice.
While reflecting the highly pluralistic field of evaluation, the Handbook also strives for coherence. Coherence is enhanced through the Handbook's organizing framework (with four parts that address, respectively, the role and purpose of evaluation in society; evaluation as a social practice; the practice of evaluation, including methods; and a relatively full sampling of evaluation in various people-related domains). Moreover, in each of the four parts, a final chapter offers a critical synthesis of key ideas related to the contents of that part. These synthesis chapters are not intended as summaries or reviews of the chapters within each part; rather, they are relatively independent assessments of key issues related to the concerns raised.
Another decision made in service of coherence was the limitation of the Handbook's coverage to the evaluation of people-related policies, programs, and practices. The policies, programs, and practices discussed in the Handbook may be international or local in scope, public or private in origin, established or innovative in demeanor. However, they share the commonalities of involving people as participants and of somehow seeking a valued improvement in the life quality of these participants.
Development of the Handbook
The process by which this project has been developed reflects these aspirations for a Handbook which, while reflecting the diversity of the field of evaluation, is both coherent and comprehensive. The three editors represent different disciplinary backgrounds, various domains of evaluation practice, alternative methods proclivities, and different theoretical stances. A relatively small but international and eminent Editorial Advisory Board was constituted, again reflecting a range of perspectives and experiences. If anything, the cast of chapter authors is even more diverse, on several dimensions, than the editorial team or the Board. That many chapters are co-authored is no mere pragmatic solution, but rather incorporates our intention to represent multiple perspectives and backgrounds. The Handbook in a sense represents a conversation - in part between chapters, and in several instances within chapters.
The three editors developed a detailed outline of the Handbook, including a list of possible authors for each chapter. The Editorial Board provided comments on the overall outline, as well as suggestions about contributors. A detailed “Briefing” was then developed for contributors (running to 14 single-spaced pages). The briefing included an overall description of the project, an introduction to each of the four parts, and suggestions about the content of most chapters. Contributors were encouraged to include exemplars where possible, and to avoid national or other forms of ethnocentrism. We also asked contributors to include critical, reflexive assessments of the viewpoints they described, including positions with which they themselves are associated. They were asked to consider the overall gains and deficits in the area about which they wrote and, where appropriate, to set out their aspirations as to what developments would make for substantial gains in the near and medium-term future.
Contributors first submitted an abstract outlining the expected content of their chapters. The three editors responded collectively to the abstracts, aiming to support the development of a first chapter draft that would both represent the intended scope and tone of the chapter, and also ensure completeness for the overall Handbook “map.” A lead editor was appointed for each chapter. In most cases, draft chapters were reviewed by a member of the Editorial Advisory Board and the lead editor. Feedback in most instances was considerable. Of course, contributors varied somewhat in the extent to which they complied with the initial briefing or were responsive to the editorial feedback. In some cases they responsively enriched the editors’ original concept for that chapter. In general, we believe the contributors’ work has resulted in a picture of evaluation that is definitive, diverse, comprehensive, and coherent.
Perspectives on Evaluation
In this Handbook, systematic evaluation is conceptualized as a social and politicized practice that nonetheless aspires to some position of impartiality or fairness, so that evaluation can contribute meaningfully to the well-being of people in that specific context and beyond.
What is Evaluation?
Paraphrasing an old joke, if you ask 10 evaluators to define evaluation, you'll probably end up with 23 different definitions. Given that evaluation is diverse, with multiple countenances, it should not be surprising that varying definitions exist.
“Evaluate,” or at least its root word “value,” finds its origin in the Old French value and valoir and the Latin valére, which had the sense of “to be worth (something)” and “to work out the value of (something).” Even in present everyday usage this has a double meaning, of finding a numerical expression for, and estimating the worth of. And of course, “worth” carries several distinct meanings, including personal worthiness, status, how we personally estimate someone, judgments of importance, and intrinsic worth.
The merit/worth distinction has sometimes been pressed into service to make distinctions by evaluation writers. Lincoln & Guba, for example, use the term “merit” to refer to the intrinsic, context-free qualities that accompany the evaluand from place to place and are relatively invariant. For example, a curriculum may have a value independent of any local application. “Worth” refers to the context-determined value, which varies from one context to another. For example, a curriculum may have a certain value for teaching a particular child in a given setting. Based on these distinctions they define evaluation as:
a type of disciplined inquiry undertaken to determine the value (merit and/or worth) of some entity - the evaluand - such as a treatment, program, facility, performance, and the like - in order to improve or refine the evaluand (formative evaluation) or to assess its impact (summative evaluation). (Lincoln & Guba, 1986a, p. 550)
One key way that definitions of evaluation differ is in terms of the components they include. Some definitions of evaluation focus on the general function evaluation serves. The most common functional definition would indicate that evaluation involves judgments of value, determinations of the merit, worth or significance of something. This kind of definition is associated with Michael Scriven, among others. It is a kind of definition that bridges the more systematic and the more everyday forms of evaluation. Scriven's definition follows:
Evaluation refers to the process of determining the merit, worth, or value of something, or the product of that process. … The evaluation process normally involves some identification of relevant standards of merit, worth, or value; some investigation of the performance of the evaluands on these standards; and some integration or synthesis of the results to achieve an overall evaluation or set of associated evaluations. (Scriven, 1991, p. 139)
Other definitions of evaluation include as a core aspect of the definition a specification of evaluation purpose (e.g., providing information for policy making or for program improvement). Given the multiplicity of evaluation purposes, several variations on this theme exist. Michael Patton's commitment to evaluation use in multiple forms (that is, to multiple purposes, depending on user's intentions) is evident in his definition:
Program evaluation is the systematic collection of information about the activities, characteristics, and outcomes of programs to make judgments about the program, improve program effectiveness, and/or inform decisions about future programming. (Patton, 199 7, p. 23)
This can be contrasted with Barry MacDonald's democratic commitment to evaluation as providing important information about educational programs to multiple, diverse audiences, each with a different stake in the program.
Democratic Evaluation is an information service to the community about the characteristics of an educational program…. The democratic evaluator recognises value pluralism and seeks to represent the range of interests in his [or her] issue formulation. The basic value is an informed citizenry, and the evaluator acts as a broker in exchanges of information between groups who want knowledge of each other. (MacDonald, 1987, p. 45)
Further, at least some historically important definitions have also specified evaluation methods, as illustrated by Peter Rossi and Howard Freeman's definition:
Evaluation research [considered the same as evaluation] is the systematic application of social research procedures in assessing the conceptualization and design, implementation, and utility of social intervention programs. In other words, evaluation research involves the use of social research methodologies to judge and to improve the planning, monitoring, effectiveness, and efficiency of health, education, welfare, and other human service programs. (Rossi & Freeman, 1985, p. 19)
Given the many faces of evaluation, and given its dynamic nature, it is not surprising that no single definition has taken hold among all evaluators.
Different Kinds of Approaches to Evaluation
Definitions of evaluation provide one way of thinking about what evaluation involves. Yet another way of trying to gain a sense of what the field encompasses is to look instead at evaluation theories, approaches, and models (these terms are often used interchangeably in the evaluation literature). While it is not feasible here to recount a wide range of evaluation theories, we will glance in passing at one or two selected “meta-models” that aspire to capture some of the key similarities and differences across evaluation theories. This may be helpful in understanding some aspects of the “lay of the land” of contemporary evaluation, and links with Stevenson and Thomas's review of the intellectual contexts of evaluation in Chapter 9.
For example, Shadish, Cook & Leviton (1991), in a comprehensive review of the work of seven program evaluation theorists, offered a stage model meant to describe the development of major evaluation theories to that point in time. During the first stage, according to Shadish et al., “Evaluation started with theories that emphasized a search for truth about effective solutions to social problems” “(p. 67). In other words, Stage 1 theorists emphasized the use of procedures to get valid, unbiased answers to questions about the performance of social programs (Donald Campbell and Michael Scriven represent Stage 1 theorists, according to Shadish et al.). During Stage 2, evaluation theorists “generated many alternatives predicated on detailed knowledge of how organizations in the public-sector operate, aimed at producing politically and socially useful results” (p. 67). That is, Shadish et al.'s Stage 2 theorists (Carol Weiss, Joseph Wholey, and Robert Stake) attempted to go beyond previous work by examining the details of organizational processes and decision-making, and trying to shape evaluation to fit these organizational realities. Stage 3, represented by Lee Cronbach and Peter Rossi, “then produced theories that tried to integrate the alternatives generated in the first two stages” (p. 67).
Shadish et al.'s (1991) stage model has limits, as does any attempt to capture the complex and multifaceted state of evaluation theory (or practice) with a simple system. For example, it tends to over-individualize evaluation theory, and to underemphasize structural and societal factors. It is a “great man” approach. Indeed, the term great man generally applies, with Carol Weiss being the only female included. The model thus discloses (without explicit discussion) the gendered nature of evaluation's development, at least in the USA. And this is its third limitation - it is a North American model and history. However, Shadish et al.'s stage model is valuable in several ways. For example, it helpfully demonstrates that evaluation is dynamic and changing, and successfully avoids overemphasizing differences. Their model also reminds us that some perspectives on evaluation are less comprehensive than others. For example, the Stage 3 theorists attempted to build on selected developments from the previous stages. Stage 1 theorists, in contrast, tended to emphasize a single approach to evaluation. In addition, Shadish et al.'s stage model reminds us that different approaches to evaluation have varying emphases. A more recent framework concentrates even more on this point.
In a book entitled Evaluation Roots: Tracing theorists’ views and influences, Marvin Allan (2004b) asked a number of evaluation theorists both to summarize their own views and to describe which evaluators and other sources have most influenced their work. In an effort to capture the historical and conceptual relationships among these evaluation theorists, Alkin & Christie (2004) developed the “evaluation theory tree,” a graphical representation modeled after a tree with three main branches. Each theorist was placed on one of the three branches, to indicate his or her relative degree of emphasis on three issues: “(a) issues related to the methodology being used; (b) the manner in which data are to be judged or valued; and (c) the user focus of the evaluation effort” (Alkin, 2004a, p. 8; the tree diagram is reproduced in Chapter 2).
In other words, Alkin & Christie (2004) suggest that there are three distinguishable streams, or traditions, in evaluation theory. One has focused on evaluation methods, another on values or valued judgments, and the third on users and use. These three streams, as Alkin & Christie acknowledge, have intersected at times, and many evaluation theorists address all three issues, methods, values, and use, to some extent. Again, though, placement on one or another branch of the evaluation tree is intended to reflect the “relative degree of emphasis” on these three concerns. Contributors to the Evaluation Roots book were asked to comment on the tree itself. Many reported finding the evaluation tree useful and were satisfied with their placement on it. Others offered criticism of the tree model, their placement on it, or both.
The evaluation tree, in our view, is useful in highlighting some of the major conceptual emphases in the field of evaluation, as well as in capturing how many major figures in evaluation have influenced or been influenced by others. At the same time, a unidimensional model with three categories cannot capture all of the important aspects of the diversity of the field. For example, there are broader intellectual traditions that have influenced evaluation that are not reflected in the evaluation tree (to take but one example, see the discussion in Chapter 4 of the influence of continental philosophy).
Our own view is that any attempt to classify alternative approaches to evaluation will necessarily have shortcomings, especially so with simpler frameworks. Nevertheless, these frameworks do provide alternative views of the “big picture” of the field of evaluation. At the very least, readers of this Handbook may find it useful, as they read various chapters, to think about how a given approach would fall in terms of Shadish et al.'s stage model and Alkin & Christie's evaluation tree.
Histories of Evaluation
It seems almost a requirement in a Handbook of Evaluation to discuss the history of evaluation. However, we find it important to avoid taking a stance that appears (usually implicitly) in most of the brief discussions of history one finds in evaluation textbooks and elsewhere. That is, most often these presentations suggest that there is a history of evaluation. To the contrary, we believe there are multiple histories of evaluation. There are somewhat different histories depending upon one's discipline and domain of evaluation work. For example, in the United States, discussions of educational evaluation often suggest a pivotal historical role of the work of Ralph Tyler; in contrast, discussions of non-educational social programs often ignore Tyler completely and emphasize the explosion of evaluation activity during the 1960s and 1970s, often referring to the Johnson administration's “Great Society.” Moreover, the work of Stafford Hood (1998) and others shows that most histories of evaluation in the United States ignore the contributions of African American evaluation scholars, likely due to the discrimination endemic to US society.1
The history of evaluation also varies, to a greater or lesser extent, with geography and governments. The Great Society stimulated program evaluation in the United States, but not worldwide. Other sources of influence exist elsewhere. For example, one stream that has contributed to evaluation in the United Kingdom is the policy research tradition that goes back to the Fabian impact on British welfare state policy after the Second World War, with the associated presence of social policy as a distinct discipline in the social sciences. More recently, the development of the European Union has had important impacts on the presence and nature of evaluation in many countries. In so-called developing countries, the growth and style of evaluation has drawn heavily on liberation ideology, such as the writings of Paolo Freire (1970). Again, there are multiple histories of evaluation, which collectively have contributed to the growing diversity in the field. This diversity is reflected in several of the co-written Handbook chapters, where the range of issues often lies beyond any individual writer. It is also reflected in the substantive scope of the Handbook, as, for example, in Chapter 25, which deals not only with development but also with the rapidly growing experience of evaluation in the context of complex humanitarian emergencies.
Despite the multiplicity of evaluation histories, we believe it is possible to offer at least a tentative assessment of the current status of evaluation, as follows. In less than one generation, evaluation has become an internationally recognized practice that is positioned at the intersection of broad social and economic aspirations, contested political ideologies, and individual life quality. Evaluations of people-related policies, programs, and practices can and do contribute to the betterment of life for individual people and to the defensibility and effectiveness of varied national policy decisions and directions. Evaluation is a player on the world stage of debate about how best to address contemporary critical concerns of the international community - concerns such as extreme poverty, inadequate health care, environmental degradation and global warming, ethnic and religious strife and warfare, access to education, and fair distribution of resources.
The developing history of evaluation, and the emergence of social and cultural forces that will influence evaluation, continues. We mention but a few of the relevant forces here. Major economic shifts, political watersheds, the emergence of faith-based governments, the characteristics of late modernity - each of these has fundamental consequences for nature and very existence of evaluation. Politically, the dissolution of the USSR, and more recently the emergence on the world stage of China, both presage the diversification and probably the expansion of evaluation in association with national policies, programs and practice. The characteristics of late modernity are also shaping evaluation. For instance, the contemporary “evidence-based practice” movement has increased pressure for evaluation, and also has tended to shape evaluation in some directions and against others. This widespread force is mentioned by several of the Handbook contributors, most obviously in Part 4. Rationality in policy making is also the leading edge of the Cochrane Collaboration in medical practice and the Campbell Collaboration in social and educational programs and practice. Both involve the use of meta-analysis, or quantitative synthesis of comparative evaluations, with the goal of identifying effective courses of action. Both are about evaluation for decision-making. And they both represent a social organization that goes beyond individual evaluators or evaluation organizations, as well as a striking preference for some kinds of evaluation designs and evidence over others (a preference that is not without controversy; see, e.g., Lipsey, 2000a, 2000b; Schwandt, 2000a, 2000b). The long-term consequences of these endeavors, especially the Campbell Collaboration, are yet to be seen. Widespread preoccupations with information use and application also show little sign of abating. For instance, Carol Weiss and her colleagues have recently identified what they see as a new form of evaluation use, which they call “imposed use” (Weiss et al., 2005). Imposed use occurs, for example, when local schools believe (accurately or not) that they will receive state funding for a particular kind of program only if they adopt a program from the state's approved list. To the extent that imposed use becomes more common, it may change the nature of evaluation's influence -and with possible implications for evaluators’ standards of evidence.
Our discussion here of contemporary forces that may affect the future history of evaluation is necessarily brief and selective, although sufficient to illustrate that, for evaluation, “all the world's a stage.”2 But a stage on which the evaluation community is merely a player with its exits and entrances, and playing many parts. Additional attention to such forces occurs in many of the chapters in this Handbook, is the primary focus of Chapter 6, and recurs through Parts 2 and 3.
Having addressed the definitions, shape, and histories of evaluation in this section, in the next section of this introductory chapter we highlight our perspectives on some of the critical facets of contemporary evaluation theory and practice.
Critical Dimensions and Issues in Evaluation
We have observed that the character and import of evaluation's contributions to social betterment remain necessarily pluralistic, as evaluation has many countenances, multiple vested audiences, and diverse ideologies. Part of this pluralism is indeed ideological, as evaluation has been at the center of a generation of controversy on the meanings of defensible social inquiry. Part of this pluralism is temporal, as evaluation is intrinsically linked to changing societal and international ideals and aspirations. And part of this pluralism is spatial, as evaluation is inherently embedded in its contexts, which themselves vary in multiple ways, both within a given program and more dramatically around the globe.
In this Handbook, we intentionally engage the pluralism of social policy, program, and practice evaluation today. We engage this pluralism by featuring diverse evaluation traditions, methods, and practices throughout the Handbook. We engage this pluralism by respecting and highlighting critical differences, as well as commonalities, among different approaches to and rationales for evaluation. And we engage this pluralism by aspiring to counter ethnocentrism of various kinds. The field of evaluation has often been limited by boundaries of discipline, professional interest and paradigmatic location. National boundaries also have often served to introduce an unnecessary parochialism into the development of the field. We seek to resist such ethnocentric tendencies through the cultivation of a critical (rather than polemical) and open stance. The Handbook contributors stand against naïve pragmatism on such matters and have included in their chapters critical, reflexive assessments of positions with which they themselves are associated.
We further engage the pluralism of the field of evaluation today in this introductory essay, though in doing so we regard “pluralism,” to borrow Karl Popper's term, as a “bucket word.” In its strict philosophical meaning it refers to a system of thought which recognizes more than one ultimate principle, over against philosophical monism. Reflecting back, our mention of strong ways of distinguishing between formal and informal evaluation, and our subsequent discussion of contingency models, can both be read as more strictly pluralist in this sense.
But more commonly, reference to pluralism seems to refer to the distribution of power in (western) society. Lee Cronbach and Ernie House have offered sharp examples of this sense of pluralist. Cronbach's pluralism echoed his conception of the policy context in which evaluation is located. He and his colleagues complained that evaluation theory has “been developed almost wholly around the image of command” and the assumption that managers and policy makers have a firm grip on the control of decision making. However, “most action is determined by a pluralist community not by a lone decision maker” (Cronbach et al., 1980, p. 84). Hence, evaluation enters a context of government which is typically one of accommodation rather than command, and “a theory of evaluation must be as much a theory of political interaction as it is a theory of how knowledge is constructed” (1980, pp. 52–3). Cronbach drew on this form of pluralism to attack goal-setting and accountability models of evaluation, and to portray the role of the evaluator as adopting “a critical, scholarly cast of mind” to serve the cause of relatively piecemeal, multi-partisan advocacy (1980, pp. 67, 157).
Cronbach's position is quite different from a tendency to soft relativism that is present in some forms of the pluralist argument. House, for example, criticizes pluralism on the grounds that in essence it “confuses issues of interests with conflicts of power. It can balance only those interests that are represented - typically those of the powerful” (House, 1991, p. 240). The risk not only of relativism but also of political conservatism may be evident when Lincoln & Guba some time ago concluded that “all ideologies should have an equal chance of expression in the process of negotiating recommendations” (Lincoln & Guba, 1986b, p. 79), and when Guba says “it is the mind that is to be transformed, not the real world” (Guba, 1990, p. 27).
The editors represent different positions in the field and we have crafted this essay to reflect our own differences, as well as commonalities. These are captured in various stances presented throughout the chapter. Yet, our highlighting of the pluralistic character of evaluation today does not signal a purely relativistic stance. We do not believe that any evaluation practice is as good as any other. Further, we all agree that good evaluation practice engages important societal concerns along with the political controversies that usually accompany such concerns, is methodologically and ethically defensible, is responsive and useful to key stakeholders both inside and external to the context being evaluated, and contributes to some form of social betterment. Yet, as the old adage suggests, the devil is in the details. And we do not fully agree on all details, as signaled within this introductory essay.
In the remainder of this section, we address four sets of questions that roughly parallel the four main sections of the Handbook:
- What are the purposes of evaluation today? What are the most important questions that evaluation can address?
- Who conducts and who participates in evaluation, and what is the character of that participation in varied evaluation contexts?
- What methodologies characterize evaluation practice?
- Who are the audiences for evaluation in varied domains of practice? What uses do they make of evaluative work?
We do not attempt to provide definitive answers to these questions. Indeed, given the many faces of evaluation, no single answer could be adequate. Instead, we provide and discuss these questions for three reasons. First, they serve as a kind of “advance organizer.” That is, they provide a framework and a set of issues that readers can apply as they examine and respond to the various chapters in the Handbook. In other words, we think it beneficial if readers, in examining each of the chapters, ask themselves what evaluation purpose or purposes the author is emphasizing, who would participate in the evaluation and how, what methodologies would be employed, and who the intended audience would be. Second, as noted, the questions correspond roughly to the four parts of the Handbook. Thus, addressing the questions provides us an opportunity to overview very briefly the contents of each part. Third, as we consider this set of questions, it allows us as editors to highlight various issues. Many of these are issues that have been raised by one or more of the contributors to the Handbook, but that warrant more focused attention.
That said, let us turn to the four sets of questions.
(1) What Are the Purposes of Evaluation Today? What Are the Most Important Questions that Evaluation Can Address?
A striking characteristic of contemporary evaluation theory and practice is the recognition that there are alternative purposes toward which evaluation can be directed. Put differently, evaluators and evaluation can fill various roles. Is the purpose of evaluation to generate information that can contribute to improved decision-making by those in policy positions? Should evaluation instead have as its purpose improvement of everyday program operations and practices? Or is the evaluator's role, not unlike that of the social researcher, to develop and test general knowledge about social problems and their solutions? Should evaluators strive to enhance democratic processes and help achieve democratic ideals? Or is the purpose of evaluation to improve the wisdom of practitioners to engage wisely in the practice choices they face?
The chapters in Part 1 explicitly examine a set of alternative evaluation purposes. The contributors of the first five chapters were briefed to regard the chapters as organized around distinctive intellectual traditions and ways of thinking and knowing within evaluation, with each tradition representing a particular role or purpose for evaluation in society. For each tradition, contributors were asked to consider:
- Philosophy and paradigm. What is the philosophical framework justifying this intellectual tradition?
- Theory. What major evaluation approaches characterize this intellectual tradition?
- Practice. What does evaluation practice within this intellectual tradition look like? Whose interests does it serve? What major questions does it answer?
- Critique. What are important critiques of this tradition in evaluation? What are its particular benefits and limitations? What are important future areas for refinement and development?
- Exemplars. What do exemplars of this approach look like?
The authors were also asked to concentrate on the broad, macro picture of evaluation practice, rather than try to detail all specific methods for that tradition. Of course, the authors of these chapters responded in different ways to the complex request facing them. Thus, the chapters in Part 1 vary in their relative emphasis on the five considerations listed above.
We invited Eleanor Chelimsky to discuss evaluation for decision-making and public accountability. In Chapter 1, she addresses these interrelated evaluation purposes, but casts a broader net by examining the role of evaluation in democracies. In that context, she describes three purposes of evaluation, two of which overlap in part with Chapters 2 and 3. However, Chelimsky's attention to knowledge development and organizational improvement is narrower than in the subsequent chapters. She always frames these other purposes in ways that ultimately serve accountability and decision-making. Regardless, Chelimsky's chapter reminds us that the boundaries between evaluation purposes are not fixed or rigid, and that an evaluator can have an overarching mission and role but still needs to address different evaluation purposes at times.
In Chapter 2, Stewart Donaldson and Mark Lipsey address evaluation that has as its purpose the generation or advancement of knowledge about social problems and their solutions. Although they address other approaches, they focus primarily on the role of theory in efforts to facilitate knowledge development through evaluation. Patricia Rogers and Bob Williams, in Chapter 3, consider practice improvement and organizational learning as closely related purposes of evaluation. After discussing a set of general issues related to these evaluation goals, they describe and critique nine approaches that evaluators have used toward these ends.
Chapter 4 focuses on evaluation as an enterprise concerned with representing the experience of those involved in a program or practice. In that chapter, Thomas Schwandt and Holli Burgon discuss and illustrate approaches to evaluation that attend to lived experience and to engagement in practice. Jennifer Greene, in Chapter 5, discusses evaluation approaches intended to contribute to democracy, not primarily by infusing good information into existing decision-making processes; rather, these approaches take evaluation itself as a way of enhancing and even constituting democratic processes and ideals. Finally, in Chapter 6, Peter Dahler-Larsen begins by describing the current state of evaluation in terms of three characteristics. He then discusses five issues that he suggests may help shape the nature of evaluation in the future.
Although evaluation purposes are most explicitly related to the organization of Part 1, they appear elsewhere in this volume. Notably, the same set of purposes also helps organize some of Part 3. In Chapters 14 and 15, a range of evaluation methods are discussed. The authors of these chapters were asked to focus on methods that can be used in service of two or three of the evaluation purposes addressed in Part 1. Thus, these Part 3 chapters complement the corresponding Part 1 chapters, and offer more detail regarding method. In addition, the chapters in Part 4 include numerous examples of evaluation aimed at alternative purposes, insofar as professions and services are associated with alternative social arguments. The Handbook is in part an exploration of how the range of issues about contexts, methods, and domains of practice in evaluation can be pictured when evaluation purposes are made the gateway.
Forces Influencing Preferences Among Evaluation Purposes
The contents of the Part 1 chapters, along with the rest of the volume, should be helpful to readers as they deal with the question of which evaluation purpose(s) to emphasize, under what conditions. However, because we asked the Part 1 authors to focus on a specific tradition in evaluation, they may not have gone into as much detail as possible about why alternative evaluation approaches have developed that emphasize different purposes. Nor have they gone into great detail about what an evaluator or evaluation client might want to consider as they choose which particular purpose to emphasize in a given evaluation. Given the diversity of the field, we do not expect that evaluators would always agree about their role or about an evaluation's purpose in a particular case. Perhaps we would agree more about the fit and congruency between purpose and the context, method or domain in a given evaluation.
Presumed Leverage Points. Historically, we believe, much of the divergence across evaluation traditions has been based on judgments, sometimes implicit, about where leverage exists for evaluation to make a contribution. Cronbach and colleagues in their influential book, Towards the Reform of Program Evaluation (1980), employed the idea of leverage. “Leverage is the bottom line. Leverage refers to the probability that the information - if believed - will change the course of events” (p. 265). Let us take as examples of different presumptions about where leverage exists the contributions to evaluation of Donald Campbell and Joseph Wholey.
Campbell is best known among evaluators as an advocate of using quasi-experiments and randomized experiments, to assess the effect of a policy or program on various outcomes of interest while attempting to minimize various validity threats. Implicit in most of Campbell's writings on evaluation is the idea that an important leverage point exists when democracies or organizations make choices among program or policy alternatives. For example, schools face decisions about whether to implement a new math curriculum or retain the old one; cities face decisions about whether to ban smoking in public places; states face decisions about whether to adopt new welfare policies; and so on. Campbell's approach to evaluation assumed that such decisions - where there are forks in the road, so to speak - are an important leverage point for evaluative information.
Wholey, in contrast, has emphasized the benefits of providing program managers with information about program performance, to allow better decision-making on an ongoing basis. Wholey observed public sector managers who did not have the benefit of information on how their organizations were doing (versus, say, private sector managers who had access to information on sales, profits, and the like). In other words, Wholey emphasized a different leverage point than Campbell. Wholey focused on the potential of evaluative information, in the form of performance measurement systems, to influence the operations of ongoing programs and agencies.
Leverage can also be conceptualized as contingent upon context (a notion elaborated on subsequently). That is, from this view, determining what is an appropriate purpose for evaluation depends on the context at hand. Leverage, and therefore, choice of evaluation purpose and method, can be affected by considerations such as the character and longevity of the program, the political contentiousness of its ambitions, and the funding dynamics of the program and the evaluation. Highly context-sensitive evaluators may well shift purpose dramatically from one context to the next, as each presents different opportunities for and constraints on leverage.
The role of leverage in identifying an evaluation purpose seems most evident in the first three chapters of Part 1. Campbell's influence is visible in some of the work discussed in Chapter 1, for example, while Wholey's influence is present in the discussion of performance measurement in Chapter 3. In addition, Chapter 2, on knowledge development and theory testing, offers a relatively pragmatic justification for this approach, which could easily be translated into the language of leverage points.
Value Commitments. To make a judgment about where leverage exists for evaluation is to be rather pragmatic. Alternatively, the choice of evaluation purpose or evaluator role may be more heavily influenced by values, specifically values about the relationship evaluation should have in and with the world. It appears that advocates of some evaluation approaches are driven more by such values than they are by assumptions about where an effective leverage point exists. This is not to say that choice of evaluation purpose involves either leverage or values; rather we mean to suggest a difference across evaluation approaches in terms of the relative priority given to values (about evaluation's role in and with the world) and to assumed leverage points.
For example, what have come to be called democratic approaches to evaluation, as discussed in Chapter 5, appear to be driven largely by value positions regarding the role evaluation should have in democracies. Valuing some sort of system change also tends to characterize several of these democratic evaluation approaches more than, say, the approaches of Campbell or Wholey. The furtherance and representation of certain values through evaluation also seems evident in the evaluation approaches discussed in Chapter 4, where some of the approaches to capturing lived experience are based on considerations of social power or transformation.
Of course, considerations other than leverage and values influence judgments about evaluation purpose and evaluator's role. For instance, much has often been made of philosophical differences across evaluators (including alternative assumptions about the nature of the world and about how evaluators and others can generate defensible claims about how things are). Indeed, such differences were often taken as central in the so-called paradigm war, or qualitative-quantitative debate that consumed much of the intellectual capital of evaluators for years (House, 1994).
Undoubtedly, philosophical assumptions and a host of other factors influence preferences for different evaluation purposes. Pluralism about such choices (and even, perhaps, in views about the forces that influence such preferences) is a fact of life in contemporary evaluation. Nevertheless, the importance of presumed leverage points and values, we feel, deserves highlighting.
Contingencies Among Alternative Purposes
Although the organization of Part 1 required authors to focus primarily on one evaluation purpose or another, some of the models that evaluators have developed instead emphasize, to some degree, ways of choosing among alternative evaluation purposes. These “contingency models” differ in important ways, and we cannot fully summarize them all here. Instead, we offer a selective review of three contingency models, which provide very different views of the primary reason for choosing one evaluation purpose versus another in a given evaluation. By way of overview, the three models we mention suggest, respectively, that the choice of an evaluation purpose should be based primarily on the preferences of intended users, or on program stage, or on a kind of policy analytic assessment of the likely contribution of alternative evaluation purposes to social betterment. Familiarity of these three kinds of “drivers” of contingent decision-making may be more advantageous than strict adherence to one model or another. Again, the three contingency models focus on different factors that may help guide choices about which evaluation purpose to emphasize in a given situation.
Michael Patton (1997) has long been a visible advocate of what he calls utilization-focused evaluation. According to Patton, “the focus in utilization-focused evaluation is on intended use by intended users” (p. 20). On other words, any purpose of evaluation, any evaluator role, is appropriate - if it satisfies the intended users’ intended use of the evaluation (and if it is consistent with personal morality and professional standards; Patton, 1997, p. 364).
In contrast, several evaluation models have focused on how the stage of a program should influence the purpose of evaluation (see Cronbach et al., 1980). Chen (2005) has recently articulated a relatively thorough stage model. According to Chen, “Stakeholders’ evaluation needs vary across the stages of program growth…. Evaluators can best understand stakeholders’ evaluation needs if the evaluators are provided with information on the stage(s) the stakeholders are interested in evaluating” (p. 49). (Chen also provides a useful discussion of the trade-offs for evaluation design that can occur across timeliness, rigor, thoroughness, and cost.) A common, simple version of a stage model suggests that evaluation should emphasize program improvement and organizational learning early in a program's life, and shift toward a more summative, accountability focus later on.
Yet a third kind of contingency model exists in the work of Mark, Henry, & Julnes (2000), who suggest that evaluators (with stakeholders) should scan the policy environment, and assess the likely contribution of different evaluation purposes to social betterment. Mark and colleagues suggest, in essence, an analysis of the extent to which leverage exists for each of the alternative evaluation purposes that might be considered.
In practice there often may be more overlap across these three approaches to contingent selection of evaluation purpose than the simplified descriptions here would suggest. More importantly, we suggest that these should be seen not as competitors, but as complementary ways of thinking about choices among evaluation purposes. Each may be more useful in some situations than others, and there is considerable value in evaluators being “multilingual.” In addition, all three of these contingency models have limitations, though we will not attempt to list them all here. But there is a shared limitation that should be acknowledged: almost by definition, none of the three deals with value-driven choices of evaluation purpose to the degree represented, say, in Chapters 4 and 5.
Responding to the Diversity Among Evaluation Approaches
There are various ways that evaluators respond to the multiplicity of evaluation purposes, approaches, and roles. Some evaluators endorse (and advocate) a given approach to evaluation. In essence, they argue for a “winner” among the diversity of evaluation approaches. Others suggest we embrace diversity within the field of evaluation as but one characteristic of the essential uncertainty of postmodernity. Still others suggest that flexibility may not be adequate, that something is needed to guide efforts to choose which approaches might best apply under different circumstances, and thus prefer integrative models such as the contingency models just described.
Donaldson and Lipsey, in Chapter 2, have captured this state of affairs nicely, as they discuss a symposium (Donaldson & Scriven, 2003) in which a set of prominent evaluation theorists described their vision for the future of evaluation, along with a set of responses:
[One] major theme that emerged from this discourse on future evaluation practice was the challenge posed by the vast diversity represented in modern evaluation theories. Mark (2003) observed that each vision for the future of evaluation gave a central place to one theory of evaluation practice and left scant room for others. One way out of the continuous polarization of perspectives that results from these diverse theoretical allegiances is the development of higher order frameworks that integrate evaluation theories (Mark, 2003). Another option presented for achieving a peaceful and productive future for evaluation involved a strategy for embracing diversity in evaluation theory, recognizing that fundamental differences cannot be reconciled or integrated, and that clients may benefit most from having a diverse smorgasbord of options to choose from for evaluating their programs and policies.
But how well can clients choose from a “smorgasbord of options”? How effectively can evaluators truly embrace diversity of perspectives, without giving way to skirmishes or outright resumption of updated versions of the paradigm wars? We may not completely agree about the answers to these questions, but the reality is that there is likely to continue to be a multiplicity of ways that evaluators react to the diversity in our field. And in our view, tolerance, and acceptance - yet with deliberative dialog among our different ideas - will do more to further the visibility and potential reach of evaluation than skirmishes and debates about winners and losers.
Research on Evaluation. Evaluations can contribute to democratic governance, to social betterment, to organizational learning, and more. But how do we know which types of evaluation are more likely to make what kinds of contributions? The idea that it would be helpful to increase the evidence base about evaluation practices and their consequences is not new. For example, Shadish et al. (1991), state that “In the long run, … evaluation will be better served by increasing the more systematic empirical evaluation of its theories, by treating evaluation theory like any other scientific theory, subjecting its problems and hypotheses to the same wide scientific scrutiny to which any theory is subjected. We do not lack for hypotheses worth studying” (p. 483). More recently, Marvin Alkin and Christina Christie have in several publications argued for empirical assessments of the theoretical claims of evaluators. With such empirical data, the evaluation community could construct a “descriptive theory” of evaluation, or an empirically based assessment of what evaluation looks like, under different conditions, and what kind of consequences result from various approaches to evaluation (e.g., Alkin, 2003).3 Viewed from this perspective, a growing evidence base about evaluation could help answer questions such as: Which approaches to evaluation, implemented how, and under what conditions, lead to what kind of improvements in policy deliberations, program operations, or client outcomes?
In addition to providing better advice for evaluation practitioners (Shadish et al., 1991), a larger evidence base about evaluation might have several other benefits (Mark, in press). For instance, increasing the evidence base of evaluation might:
- improve the terms of debate among evaluators, by helping to substitute some degree of empirical evidence for rhetorical style;
- allow us to document and understand evaluation's current and past contributions;
- facilitate appropriate claims about what evaluation can do, perhaps most often by moving evaluators in the direction of modesty (Weiss, 1991);4
- stimulate efforts to improve evaluation practice, in part by identifying circumstances in which evaluation demonstrably fails to meet its promise;
- increase a sense of professionalism among evaluators, by making it clear that evaluation itself is a worthy of systematic study; and
- help move the field past generic and relatively abstract standards and guiding principles, to more empirically supported guidance about the relative benefits of different evaluation practices.
Research on evaluation will not be a magic bullet. It will not immediately transform the field. It will not replace all judgments about wise evaluation practice - but instead can aid such judgment to a significant extent. Research on evaluation, like evaluation itself, will raise questions about generalizability, contextuality, and applicability to specific situations. Research on evaluation will at least some of the time be ignored - even in cases where it could be useful. In short, the various problems and limits we know about regarding evaluation (and research) itself and its use will also arise in the context of research on evaluation.
(2) Who Conducts and Who Participates in Evaluation, and What Is the Character of that Participation in Varied Evaluation Contexts?
Evaluation is a social practice. This notion has shaped the Handbook to the considerable extent that Part 2 brings together the social, relational, political and ethical dimensions of evaluation as a practice engaged with people and their programs, policies, and own practices. Indeed, one early anonymous reviewer of the Handbook proposal took the view in this regard that we had majored on participatory approaches at all costs! Just how we interpret the scope, importance and indeed meaning of the social character of evaluation is itself one of the centrally recurring conversations within the evaluation community. But social, intellectual, political, ethical matters and issues of use, voice, and audience will be inescapable, whether we align ourselves with a strict accountability model of evaluation or with a dialogical and hermeneutical stance.
In Chapter 7 we asked Phil Davies, Kathryn Newcomer, and Haluk Soydan to focus on evaluation and government. These authors examine why governments fund evaluations and how they use them, and also examine alternative structural arrangements and processes through which evaluation is carried out. Other structural contexts are dealt with fully elsewhere in the Handbook, for example, in Chapter 16 and throughout Part 4. In Chapter 8, Tineke Abma explores the relational dimensions of evaluation as a social practice. She probes the ways in which internal stakeholder relationships and stakeholder-evaluator relationships matter in different approaches to evaluation practice and for different aspects of our work, most notably, the character of our knowledge claims.
John Stevenson and David Thomas paint a large canvas for their comprehensive review of the intellectual contexts of evaluation in Chapter 9. We asked these writers to set evaluation in its intellectual contexts and discuss what the evaluator brings to those contexts. There are various disciplinary frames that impinge on evaluation, including sociology, social policy, psychology, economics, education, and political science. The editors and writers recognize that the intellectual contexts are wide, and that they include intellectual contexts that are not narrowly disciplinary in nature (e.g., evaluation draws upon intuition and common sense, and follows with attention to shared experience, education, history, philosophy, and of course, literary and arts criticism). We have dealt in part with this wider intellectual context in this editorial essay.
In Chapter 10, Ove Karlsson Vestman and Ross Conner analyze the nature and role of politics and related values, both as critical contextual features of evaluation practice and as intertwined with epistemological claims of knowledge. They examine varying conceptions of politics, and describe and critique alternative models which they believe characterize the differing views evaluators have about the relationship between evaluation and politics. This chapter links closely with Chapter 5 in its relevance to questions of social justice as a purpose of evaluation. In Chapter 11, Helen Simons brings her experience and interest to bear on the professional, personal, and public dimensions of ethics in evaluation practice. This discussion traces the development of ethical standards and guidelines in evaluation communities; describes and critiques the different purposes, theoretical links and political stances associated with ethical guidelines in evaluation; and offers an elaborated discussion of ethics as inscribed in evaluation practice. In Chapter 12, Brad Cousins and Lyn Shulha review work on evaluation utilization as compared to its “cognate fields,” particularly of research and knowledge utilization. They aspire in this chapter to “situate current trends in the study of evaluation use within the broader intellectual landscape of research and knowledge utilization.” And they focus this comparative discussion around six themes common to various domains of inquiry on utilization: (1) epistemological challenges, (2) emphasis on context for use, (3) focus on user characteristics, (4) linking knowledge production and use, (5) utilization as process, and (6) methodological challenges. This chapter is thus related to the discussion on evaluation and utilization in Chapter 3, but offers a very different lens on the issues.
Elliot Stern, in writing the forward-gazing chapter for Part 2 (13), describes and reflects on the complex contextual landscape of evaluation as a profession, occupation, community, and discipline. He touches on discussions of standards, the role of journals, and professional associations in the field. He also addresses challenges of teaching and learning for evaluation, demands for evaluation competencies, and challenges of building evaluation capacity. But more interestingly than this, he writes in a way that opens up the sense of “context” and that both portrays the texture of context while crossing local frontiers and exploring a globalizing world.
Part 2 poses some big and complex issues, issues that are embedded in the very fabric of evaluation as a human endeavor. These issues are invoked in significant part by the topics that constitute this section - evaluation's structural contexts, social relationships, disciplinary legacies, politics and values, ethical considerations, and utilization. But the issues do not reside within these topics; rather they permeate all other aspects of evaluative thinking and evaluation practice. These issues significantly include the plurality and diversity of contemporary evaluation theory and practice and the challenges this diversity presents to considerations of what constitutes “good” evaluation practice. Related are the challenges of forging respectful and constructive engagement with the differences of intellectual persuasion and political commitment that sometimes still polarize the evaluation community, and then signaling this respectful engagement to our clients. And there are the continuing challenges of utilization and dissemination of what is learned from evaluation studies. Paralleling the integrated character of these issues in evaluation practice, our own engagement with these issues is not separated out in this Part 2 discussion but rather permeates this introductory essay. In this way we seek to mirror our view that the social dimensions of evaluation powerfully influence its character and its potential reach.
(3) What Methodologies Characterize Evaluation?
As a member of the social science family, evaluation integrally involves the systematic and defensible use of methods for gathering or generating, analyzing and interpreting, and representing - usually in written and oral form - information about the quality of the program, practice or policy being evaluated. In the United States and other western countries, in fact, early evaluators were disciplinary social scientists (sociologists, economists, educators, psychologists) who endeavored to apply their methodological expertise to real world, social policy settings. And while the challenges of this form of applied social science were substantially greater than anticipated (Cook, 1985), evaluation remains a method-driven field, especially in its public perception and in the formal training many evaluators receive.
Part 3 of the Handbook engages discussions about the methods that evaluators use in their work. However, the Handbook as a whole represents our view that methodology is but one facet of the practice of evaluation. This view is evident, not only in Part 3, but also in the inclusion of Part 2 of the Handbook - which highlights the complex contexts and contours of evaluation that both shape its character as a social practice and are, in turn, reshaped by the particular conduct of an evaluation study. As just described, other facets highlighted in Part 2 include the socio-political and economic characteristics of the context in which evaluation is conducted, the character of the interactions and relationships that take place among stakeholders and evaluators, the intellectual traditions that inform an evaluation study, the politics and values embedded in a given evaluation approach, as well as the ethical considerations and the varied commitments to utilization that importantly guide evaluation practice.
Part 3 was envisioned as the Handbook location for discussions of the actual practice of evaluation, or “how to” do evaluation and why. Again, given our conceptualization of evaluation practice as more than methodology, the six chapters in Part 3 encompass practical issues related to evaluation management, communication, and quality in addition to issues of method within various evaluation traditions. Specifically, the Part 3 chapter authors were guided by the following suggested themes:
- How evaluation practice is shaped by evaluative purpose.
- The role of the evaluator.
- Important practice-ethics decisions in the field.
- How practical exemplars demonstrate evaluation practice in the field.
- Reasoning about evaluation method, or where methods decisions are located in the evaluation process and what influences methods decisions.
As with every part, chapter authors in Part 3 differentially responded to this guidance, resulting in a diverse set of discussions on evaluation practice.
As noted above, the first two chapters in Part 3 explicitly connect evaluation methodology with evaluation purpose. In Chapter 14, Melvin Mark and Gary Henry present contemporary methodological approaches to evaluations conducted for purposes of decision-making, accountability, and knowledge generation (corresponding to Chapters 1 and 2 in Part 1). In Chapter 15, Elizabeth Whitmore, Irene Guijt, Donna Mertens, Pamela Imm, Matthew Chinman, and Abraham Wandersman present methodologies for evaluations conducted for purposes of improving practice, understanding lived experience, and enhancing democratization or social justice (corresponding to Chapters 3, 4, and 5 in Part 1). To fulfill such an ambitious agenda, the authors of Chapter 15 offer three different case examples of evaluation as well as an analysis of the themes and commitments common to all. Then in Chapter 16, Robert Walker and Michael Wiseman offer perspectives on and examples of how to manage evaluation, especially within complex policy contexts. Interestingly, these authors echo the relational themes of Part 2, especially the concentrated focus on evaluative relationships presented in Chapter 8. Chapter 17, by Marvin Alkin, Christina Christie, and Mike Rose, surveys the considerable evaluation literature on communications in evaluation. Among the topics engaged in this chapter are when and how to communicate with stakeholders during the course of an evaluation and how to structure and write an evaluation report for maximum impact. Chapter 18, by Robert Stake and Thomas Schwandt, addresses matters of the recognition and representation of quality in evaluation - the quality of that which is being evaluated and the quality of evaluation itself. As evaluation is at root an enterprise dedicated to discerning quality, this chapter is an integral part of a conceptualization of evaluation practice. Finally, in Chapter 19, Lois-ellin Datta offers wise and thoughtful reflections on evaluation practice in the three-part form of challenges met, unfinished business, and new challenges ahead.
The thoughtful discussions in the Part 3 chapters offer a rich and multihued portrait of evaluation practice. In this introductory essay, we briefly take up three issues commonplace in discussions of methodology: (1) the intellectual traditions that have informed and shaped evaluation methodologies; (2) what, if anything, is distinctive about the methodologies and methods evaluators employ; and (3) implications for evaluation training.
The Intellectual Traditions that Have Informed Evaluation Methodologies
Evaluation is generally understood, both within and outside the evaluation community, as an “applied” social science. The intellectual traditions that have informed evaluation methodologies are thus (a) the disciplinary perspectives of sociology, economics, and psychology and also, more recently, anthropology, women's studies, and cultural studies, and (b) the accompanying perspectives of various philosophies of science, often referred to as scientific paradigms.5 While our disciplinary legacies are more implicitly embedded in our practice than is usually acknowledged explicitly or discussed (although see Chapter 9), our varied philosophical or paradigmatic allegiances have been at the center of considerable controversy and debate throughout the last quarter of the twentieth century
During the 1970s and 1980s in particular, evaluators were engaged in a broad debate that permeated all corners of the social scientific community regarding the primary rationales, roles, and character of social science in society. Most popularly dubbed the “quantitative-qualitative debate,” this controversy centered around the legitimacy and relative superiority of “quantitative” versus “qualitative” methodologies for evaluation. Prior to this debate, most evaluation practice had followed the standard social science conventions of the era, which were largely “quantitative.” Evaluation practice at that time, like most social research, focused largely on assessing the strength of causal relationships (in evaluation, between treatments and outcomes) via experimental or quasi-experimental designs and standardized measurements. The debate surrounding social science methodology first arose within the philosophy of science, where prior commitments to extreme versions of objectivity and remaining notions of methodological infallibility were seriously challenged, and where convictions about the relevance and applicability of objectivist and realist frameworks (or paradigms) for the study of human phenomena had gradually eroded during the twentieth century (Schwandt, 2000c). In evaluation, the debate also arose because the initial use of conventional methodologies had demonstrated mixed success, at best, in assessing the quality and effectiveness of public policies and programs. The door was thus opened to alternative ways of understanding human phenomena and their accompanying alternative paradigms.
Thus entered a variety of “qualitative” paradigms and methodologies, collectively under the banner of the “interpretive turn” in social science.6 “Qualitative” paradigms emphasize the interpreted and constructed (often versus real) nature of the social world, and thus emphasize the value-laden and contextual nature of social knowledge (versus quantitative approaches’ emphasis more on the objective and generalizable). Qualitative methodologies generally focus on the meaningfulness of human activity, while quantitative methods usually address its frequency or magnitude (such as estimates of the size of a program's effects). Qualitative methods represent human phenomena in words or images or symbols, rather than in numbers and ordered dimensions.7
As revealed by its moniker, this “quantitative-qualitative” debate was centered on social-scientific inquiry designs and methods, but it was especially charged because it also involved politics and values and thereby fundamental definitions and understandings of the role of (social science and) evaluation in society. That is, “quantitative” and “qualitative” philosophical traditions generally take different stances on the place of values in social inquiry. In “quantitative” traditions, a long-standing view was that values can be empirically studied but are otherwise outside the boundaries of science and more properly in the domains of ethics, morality, religion, and politics: Scientists provide the facts; politicians and priests debate their significance. In contrast, in “qualitative” traditions, values have always been seen as inherent in all human action and all knowledge generated about that action. Values inevitably and for many properly refract the lens of the scientist and that which he or she is studying. There is no value-free knowledge nor any value-free methodology, argue proponents of “qualitative” traditions (a position with which many if not most contemporary quantitatively oriented evaluators agree, though often to somewhat different effect). And because the debate was as much about underlying values and commitments as about the right methodology, the stakes of this debate were high and passions were engaged as often as reasoned argument.8
The debate was most intense in evaluation during the 1980s, followed by a period of rapprochement in the 1990s, signaling an acceptance of the legitimacy of multiple methodological traditions in the evaluation community with attendant turns to multiplism and to mixed methodological thinking (Greene, Caracelli, & Graham, 1989). Yet, controversy about what constitutes legitimate or “good” practice still persists in many domains of social science, including evaluation, as underscored by the pluralist stance taken in this Handbook.
We endorse the rapprochement that characterized the end of the “quantitative-qualitative debate” and support a vibrant evaluation community filled with different ways of knowing and different ways of practicing our craft. We believe that this debate, in the end, was educative in important ways, as evaluators of all stripes had to engage seriously with their own assumptions, stances, and beliefs, and had to learn how to assume responsibility for making thoughtful and informed choices among the multiplicity of approaches to evaluation that emerged during this era. That is, this was a richly generative era in evaluation that yielded multiple, diverse evaluation theories and practices - a diversity which remains today.
Amidst this diversity, as stated above, we also believe that continued dialog and even debate on our differences - and on our commonalities - can and should help to quiet worries about a wildly relativistic community of practice. In the tradition of the “quantitative-qualitative debate,” but with greater civility and acceptance of others’ viewpoints, we encourage continued conversation about just what constitutes good evaluation practice.
What is Distinctive About Evaluation?
The question of what is distinctive to evaluation may refer either to methods of inquiry, or to the fundamental nature of evaluation. As for the first, there are a few distinctive evaluation methods, strategies, and techniques, examples of which are sprinkled throughout this volume, including the “Most Significant Change” approach featured in Chapter 15. Other “methods” distinctive to evaluation9 include the multi-utility attribution technique, log frames and logic models (e.g., United Way of America, 1996), values inquiry (Mark et al., 2000), and Scriven's (1991) version of the weight-and-sum technique. Mostly, however, evaluators make liberal use of methodologies and methods from other disciplines and traditions in the social sciences. These include large-scale statistical modeling methods like hierarchical linear modeling and on-site, up-close methods like participant observation and individual interviewing.
What is perhaps distinctive to evaluation practice is that we use methodologies and methods from multiple and diverse social sciences and that we unhesitatingly use them side-by-side, or in the best of mixed method evaluation practice, actually attain some kind of integration of originally quite different ways of making sense of human phenomena. In general, that is, the community of evaluators welcomes multiple ways of understanding the very complex phenomena we study. Our practice is thus quite multidisciplinary in character. This is especially so because the nature of our practice includes aspects of management, communication, and utilization, which draw on fields of research and practice outside the social science core.
What is perhaps also somewhat distinctive to evaluation practice is that the requirements of methods do not reign supreme but rather must be negotiated with other important features of our work. Notably, evaluation is fundamentally judgmental; we are charged with discerning and assessing the quality of the program, policy, or practice we are evaluating. Judgments of quality rest on particular conceptualizations of “goodness,” “effectiveness,” or “success,” which themselves are often contested. Further, evaluation takes place in politicized contexts with multiple legitimate stakeholders and interests. Good evaluators skillfully honor and respectfully help to negotiate these multiple interests. And also, evaluation is fundamentally relational, anchored in the kinds of relationships established among stakeholders and with the evaluator in particular contexts. These characteristics of our practice place particular burdens on our methods. Our methods inevitably bump into the various commitments to quality, interests and values, and dynamic and complex relationships that inhabit a given evaluation context, and methods usually must accommodate these values, interests and relationships in service of the larger evaluative mission or purpose. Good evaluation practice thus involves a delicate and negotiated balance among the contours of a particular context, the public interests at hand, and the methodological requirements for rigor and defensibility.
When we explore the distinctive character of evaluation, it is important to recognize that we are compressing three different questions.
- First, can evaluation be distinctive?
- Second, is evaluation distinctive?
- Third, should evaluation be different from or similar to research?
The first question is theoretical, the second empirical, and the third is normative. We will end up in murky waters if we treat the question as if it were empirical when in fact it is usually normative or theoretical.
Some have concluded that research, evaluation, and policy analysis are different forms of disciplined inquiry. Hence, Lincoln (1990, p. 76) states that it makes “no sense to refer to ‘evaluation research', save as research on evaluation methods or models.” We acknowledge that, on average, there are important differences between evaluation and research, which follow from the differing clusters of purposes that direct each form of inquiry. Nevertheless, making over-simple generalizations about homogeneous entities of “research” and “evaluation” risks contributing to rhetoric rather than good practice. Thus, in the UK or the Nordic countries, for example, to talk of “evaluation research” or “policy research” may make good sense, and involve no confusion of categories, whereas for many - but certainly not all - in the USA it may be treated as involving some confusion of terms.
Implications for Evaluator Training
Many educational and training programs in evaluation include coursework or readings in evaluation theory, evaluation practice, and social science methods, both qualitative and quantitative. Training programs in the USA often also include an evaluation practicum, in which novice evaluators get invaluable experience in the field. There are, of course, many people who practice evaluation without any particular form of professional training. But, the important question here is, what should be included in such training for those who choose to participate?
A thorough answer to this question is well beyond the scope of this introductory essay. We wish to suggest, however, that, even in a relatively developed circumstance such as the USA, there are important gaps in current training programs for evaluators - gaps in such areas as management, communication, and ethics, as well as gaps in processes like negotiation, dialog, deliberation, and conflict resolution. We all recognize the contested character of the contexts in which we work. However, as evaluators we have yet to incorporate this recognition into our programs for training evaluators of the future. Finally, we also wish to encourage readers of this Handbook to consider issues of evaluator training as they pursue the chapters that follow. What implications do these various discussions have for how we should train new entrants into our field?
(4) Who Are the Audiences for Evaluation in Varied Domains of Practice? What Uses Do They Make of Evaluative Work?
In fundamental ways, the audiences of evaluation, or the stakeholder groups and individuals for whom the evaluation is being conducted, are connected to evaluation purposes. Generally speaking, evaluations intended to provide accountability or to support decision-making serve the interests of stakeholders responsible for policy and program decisions, including legislators and other policy makers. Evaluations designed to generate knowledge about the phenomena being studied speak to the interests of program developers and theoreticians with expertise in that area. Evaluations designed for program improvement or organizational learning serve the interests of stakeholders responsible for administering the program or managing the organization. Evaluations that seek critical insight into the professional practices of program staff or in-depth understanding of the lived experiences of participants likely engage staff and participant interests and concerns. And evaluations with an ideological agenda, like democratization or empowerment, are intended to address most directly the interests of those at the margins of society. These purpose-audience connections reflect, in part, different perspectives on critical leverage points and different value commitments, as discussed above.
The concept of evaluation audience is further connected to issues of utilization and dissemination of evaluation findings, which likely vary across domains of practice. For whom is evaluation conducted in various domains of practice? How does audience affect the conceptualization and implementation of evaluation in various domains of human professional practice?
Part 4 of the Handbook explores the conceptual, methodological, and practical issues of evaluation in different domains of professional and occupational practice, including attention to issues of audience and utilization. Contributors to this section were asked to include in their discussion:
- The history of evaluation in their field.
- Evaluation purposes, social and disciplinary contexts, and practice issues that have most powerfully influenced evaluation in their field.
- The descriptive character of evaluation in their field, and how evaluation has helped to shape key discussions in the field.
- Likely future developments in evaluation for their field.
We noted at the outset that this part is where the risk of ethnocentrism is at its greatest. It is for readers to judge how well we have avoided the danger. Representing a relatively comprehensive though not exhaustive sample of important human endeavors, Part 4 includes six chapters. David Nevo discusses evaluation in the field of education (Chapter 20), Andrew Long in the field of health (Chapter 21), Ian Shaw for human services and social work (Chapter 22), Nick Tilley and Alan Clarke for criminal justice (Chapter 23), and Osvaldo Feinstein and Tony Beck for the combined fields of humanitarian relief and international development (Chapter 24). Finally, in Chapter 25, Alan Clarke, in closing the Handbook, offers his perspectives on challenges and new directions for evaluation in occupational domains, including thoughts about interprofessional evaluation practice.
Part 4 is important for several reasons. First, more than elsewhere, the chapters flesh out the recurring motif of the Handbook, of “policies, programs, and practice.” It is in these domains of evaluation practice that we see the enactment of legislative and political statements of governmental or organizational intent, and the evaluation of policies as setting the direction for how resources in a given domain will be allocated and what substantive foci will receive priority. For each domain we witness the evaluation of programs as particular enactments of a policy, which offer concrete representations of how a given policy can be realized, in terms of particular activities and materials provided to a particular target population. And we see diverse instances of practices - the specific professional interactions that take place within programs.
Second, and perhaps serendipitously as we did not plan it thus, Part 4 contains the Handbook's most sustained discussions and reflections on the place of evidence-based drivers in the evaluation of health, human services, education, and the like. Alan Clarke knits much of this together in his closing essay.
Third, and following from the first point, we witness the diversity and commonalities of contextual issues, which embody the important issues addressed in Part 2. In Part 4 we witness structural contexts, social relations, intellectual contexts, politics and values, evaluation ethics, and utilization practices working out in national, state or local domains. This introductory essay now briefly takes up the final issue of utilization and dissemination as relevant to evaluation audience. We again hope that these prefatory comments will help sensitize Handbook readers and encourage their own reflections on these issues as they read the chapters in Part 4.
Connections to Utilization
As is well discussed in this Handbook (see Chapters 3, 12, and 17), evaluators are deeply concerned about being useful. ‘To whom and for what purposes?” are the questions that immediately come to mind. Audiences are multiple. Uses characteristically take instrumental, conceptual, or symbolic form. Instrumental uses involve direct, visible contributions to decisions about the evaluated program, policy, or practice. Conceptual uses take more educational form, contributing, for example, to the direction of a policy conversation or to enlightenment of heretofore unacknowledged dimensions relevant to this conversation (Weiss, 1998). And symbolic uses are more political, for example, using evaluation primarily to signal attention to a particular program area.10
Some evaluators emphasize the need to plan for a particular form of use, by specific, identifiable users (e.g., Patton, 1997). Others believe that, in most cases, conceptual use is more likely to occur (e.g., Weiss, 1991). It may well be that when situated in and viewed from within particular domains of professional practice, evaluation uses become primarily conceptual and users or audiences become theorists, program developers, and other content experts in that field. Evaluations in the domain of health care, for example, may importantly inform other health care researchers and theorists about the specific ways in which and extent to which the intervention studied influenced the health outcomes of interest.
To the extent this is true, what distinguishes evaluation from other forms of social research? And does this matter? Moreover, what does this knowledge-oriented, conceptualization of use and audience suggest about the importance of generalizability in evaluation practice? Many evaluation studies are conducted with explicit, context-bound purposes and identifiable, local audiences in mind. A perspective that privileges conceptual and knowledge-oriented use and audiences may exert unwarranted pressure on evaluation studies for results that are applicable to other contexts. Does this reduce the likelihood of important local uses? Or does it increase the likelihood that evaluation is influential and contributes to social betterment?
We raise these issues, once again, not to provide answers, but instead to sensitize the reader to issues of audience and use - and their implications for evaluation design and methods - as they read the chapters about evaluation in professional practice in Part 4.
Connections to Dissemination
In recent years, many evaluators have expanded their efforts to disseminate findings. In the USA it is increasingly common for large evaluation firms to have specialists in communication, for instance. More generally, the interest in kinds of knowledge used by policymakers and professionals (an issue touched upon in our opening discussion of evaluation as a human activity) stimulates and enriches the debate on how knowledge is utilized.
The kinds of dissemination efforts we have been seeing more in practice often include: multiple forms of reports, typically with at least a more user-friendly and a more technical version, as well as versions for peer-reviewed journals and perhaps releases for practitioner newsletters; press briefings, as well as briefings for identifiable decision-makers and major stakeholder groups; websites on the evaluation and its results. In addition, some evaluators are active in the “issues networks” that exist around various policy issues. One rationale for these extended dissemination efforts is that, given the costs of evaluations (including not only financial costs, but also the time and other burdens born by the study participants), those who conduct social experiments should try to facilitate use, at the very least through extensive dissemination (cf. Rosenthal, 1994).
Concern about effective dissemination is, of course, not new. Cronbach and his colleagues some time ago urged the importance of tellable stories. The evaluator faces a mild paradox. “All research strives to reduce reality to a tellable story,” but “thorough study of a social problem makes it seem more complicated” (Cronbach et al., 1980, p. 184). Cronbach and his associates’ resolution of this paradox lay in urging that evaluators should seek constant opportunity to communicate with the policy-shaping community throughout the evaluation. They believed that “much of the most significant communication is informal, not all of it is deliberate, and some of the largest effects are indirect” (p. 174). Their recommendations -unnervingly contemporary - were:
- Be around.
- Talk briefly and often.
- Tell stories. Always be prepared with a stock of anecdotes regarding the evaluation.
- Talk to the manager's sources.
- Use multiple models of presentation.
- Provide publicly defensible justifications for any recommended program changes. These will be very different from scientific arguments.
Cronbach was strongly opposed to holding on until all the data are in and conclusions are firm. Influence and precision will be in constant tension, and Cronbach held that if in doubt we should always go for influence. Live, informal, quick overviews, responsiveness to questions, the use of film and sound clips, and personal appearances are the stuff of influence. On this view of things, the final report thus acts as an archival document.
The impotence that comes with delay … can be a greater cost than the consequences of misjudgment. The political process is accustomed to vigorous advocacy … (and) is not going to be swept off its feet by an ill-considered assertion even from an evaluator. (Cronbach et al., 1980, pp. 179–80)
Of course, as with other concerns related to evaluation, we recognize that a diversity of views exists about dissemination. Once again, we invite the reader to consider this issue when engaging with the chapters to follow.
Conclusion
We began this chapter with the idea that evaluation is a natural human activity. Throughout this chapter we have also noted many of the challenges and complexities that arise in professional efforts to evaluate people-related programs, policies, and practices. Among these complexities is the reality that contemporary evaluation has many faces, histories, approaches and views on such issues as evaluation's purpose, the role of values, the most likely leverage points for evaluation influence, and the preferred form of use.
These multiplicities can pose challenges, for example, by overwhelming some evaluators’ capacity to choose thoughtfully. Alternatively, these multiplicities can create opportunities, for example, by providing a rich array of options from which to choose, and by contributing to ongoing dialog and exchanges among evaluators that lead to even better approaches to practice. We hope this Handbook might contribute to such exchanges.
Notes
1. Perhaps because of these multiple and relatively early developments in the United States, and perhaps because of the relatively earlier creation of professional evaluation associations in the United States, American writers and practitioners for some time have had a disproportionate influence in the field.
2. Shakespeare, As You Like It Act 2 Sc VII 134–142.
3. Similar arguments for an empirical approach to issues like evaluators’ epistemology and evaluation utilization have been advanced in the human services field by Kirk & Reid (2002).
4. For example, the word “transformative” has been bantered about by many evaluators in recent years, and it would be good either to have evidence that evaluation is indeed transformative or instead to embrace the value of more incremental consequences of evaluation.
5. Paradigms are constellations of interrelated assumptions about the nature of the social world, the nature of the knowledge we can have about that social world, what's most important to know, and how best to attain or generate this knowledge.
6. There are numerous accounts of this shift, perhaps most familiar in Norman Denzin's various incarnations of the historical “moments” of qualitative inquiry (Denzin & Lincoln, 2004). Clifford & Marcus's book is often taken as the fulcrum of this shift (Clifford & Marcus, 1986).
7. We hasten to add that, in capturing the differences between qualitative and quantitative, it is possible to exaggerate differences, easy to ignore moderate voices from the center, and difficult to avoid language that may strike someone as favoring one “side” or the other.
8. See Cook & Reichardt (19 79) and Reichardt & Rallis (1994) for additional discussions of the “qualitative-quantitative” debate in the evaluation community.
9. “Distinctive” in this context may mean methods stemming from and unique to the practice of evaluation, or, less strongly (and perhaps more likely), methods that the evaluation field has distinctively contributed to the wider family of social science methods.
10. Again, as noted above, uses may also now take an “imposed” form (Weiss et al., 2005).
References
- Loading...