Warning: The NCBI web site requires JavaScript to function. more...

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Panel on the Evaluation of AIDS Interventions; Coyle SL, Boruch RF, Turner CF, editors. Evaluating AIDS Prevention Programs: Expanded Edition. Washington (DC): National Academies Press (US); 1991.

Cover of Evaluating AIDS Prevention Programs

Evaluating AIDS Prevention Programs: Expanded Edition.

  • Hardcopy Version at National Academies Press

1 Design and Implementation of Evaluation Research

Evaluation has its roots in the social, behavioral, and statistical sciences, and it relies on their principles and methodologies of research, including experimental design, measurement, statistical tests, and direct observation. What distinguishes evaluation research from other social science is that its subjects are ongoing social action programs that are intended to produce individual or collective change. This setting usually engenders a great need for cooperation between those who conduct the program and those who evaluate it. This need for cooperation can be particularly acute in the case of AIDS prevention programs because those programs have been developed rapidly to meet the urgent demands of a changing and deadly epidemic.

Although the characteristics of AIDS intervention programs place some unique demands on evaluation, the techniques for conducting good program evaluation do not need to be invented. Two decades of evaluation research have provided a basic conceptual framework for undertaking such efforts (see, e.g., Campbell and Stanley [1966] and Cook and Campbell [1979] for discussions of outcome evaluation; see Weiss [1972] and Rossi and Freeman [1982] for process and outcome evaluations); in addition, similar programs, such as the antismoking campaigns, have been subject to evaluation, and they offer examples of the problems that have been encountered.

In this chapter the panel provides an overview of the terminology, types, designs, and management of research evaluation. The following chapter provides an overview of program objectives and the selection and measurement of appropriate outcome variables for judging the effectiveness of AIDS intervention programs. These issues are discussed in detail in the subsequent, program-specific Chapters 3 - 5 .

  • Types of Evaluation

The term evaluation implies a variety of different things to different people. The recent report of the Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences defines the area through a series of questions (Turner, Miller, and Moses, 1989:317-318):

Evaluation is a systematic process that produces a trustworthy account of what was attempted and why; through the examination of results—the outcomes of intervention programs—it answers the questions, "What was done?" "To whom, and how?" and "What outcomes were observed?'' Well-designed evaluation permits us to draw inferences from the data and addresses the difficult question: ''What do the outcomes mean?"

These questions differ in the degree of difficulty of answering them. An evaluation that tries to determine the outcomes of an intervention and what those outcomes mean is a more complicated endeavor than an evaluation that assesses the process by which the intervention was delivered. Both kinds of evaluation are necessary because they are intimately connected: to establish a project's success, an evaluator must first ask whether the project was implemented as planned and then whether its objective was achieved. Questions about a project's implementation usually fall under the rubric of process evaluation . If the investigation involves rapid feedback to the project staff or sponsors, particularly at the earliest stages of program implementation, the work is called formative evaluation . Questions about effects or effectiveness are often variously called summative evaluation, impact assessment, or outcome evaluation, the term the panel uses.

Formative evaluation is a special type of early evaluation that occurs during and after a program has been designed but before it is broadly implemented. Formative evaluation is used to understand the need for the intervention and to make tentative decisions about how to implement or improve it. During formative evaluation, information is collected and then fed back to program designers and administrators to enhance program development and maximize the success of the intervention. For example, formative evaluation may be carried out through a pilot project before a program is implemented at several sites. A pilot study of a community-based organization (CBO), for example, might be used to gather data on problems involving access to and recruitment of targeted populations and the utilization and implementation of services; the findings of such a study would then be used to modify (if needed) the planned program.

Another example of formative evaluation is the use of a "story board" design of a TV message that has yet to be produced. A story board is a series of text and sketches of camera shots that are to be produced in a commercial. To evaluate the effectiveness of the message and forecast some of the consequences of actually broadcasting it to the general public, an advertising agency convenes small groups of people to react to and comment on the proposed design.

Once an intervention has been implemented, the next stage of evaluation is process evaluation, which addresses two broad questions: "What was done?" and "To whom, and how?" Ordinarily, process evaluation is carried out at some point in the life of a project to determine how and how well the delivery goals of the program are being met. When intervention programs continue over a long period of time (as is the case for some of the major AIDS prevention programs), measurements at several times are warranted to ensure that the components of the intervention continue to be delivered by the right people, to the right people, in the right manner, and at the right time. Process evaluation can also play a role in improving interventions by providing the information necessary to change delivery strategies or program objectives in a changing epidemic.

Research designs for process evaluation include direct observation of projects, surveys of service providers and clients, and the monitoring of administrative records. The panel notes that the Centers for Disease Control (CDC) is already collecting some administrative records on its counseling and testing program and community-based projects. The panel believes that this type of evaluation should be a continuing and expanded component of intervention projects to guarantee the maintenance of the projects' integrity and responsiveness to their constituencies.

The purpose of outcome evaluation is to identify consequences and to establish that consequences are, indeed, attributable to a project. This type of evaluation answers the questions, "What outcomes were observed?" and, perhaps more importantly, "What do the outcomes mean?" Like process evaluation, outcome evaluation can also be conducted at intervals during an ongoing program, and the panel believes that such periodic evaluation should be done to monitor goal achievement.

The panel believes that these stages of evaluation (i.e., formative, process, and outcome) are essential to learning how AIDS prevention programs contribute to containing the epidemic. After a body of findings has been accumulated from such evaluations, it may be fruitful to launch another stage of evaluation: cost-effectiveness analysis (see Weinstein et al., 1989). Like outcome evaluation, cost-effectiveness analysis also measures program effectiveness, but it extends the analysis by adding a measure of program cost. The panel believes that consideration of cost-effective analysis should be postponed until more experience is gained with formative, process, and outcome evaluation of the CDC AIDS prevention programs.

  • Evaluation Research Design

Process and outcome evaluations require different types of research designs, as discussed below. Formative evaluations, which are intended to both assess implementation and forecast effects, use a mix of these designs.

Process Evaluation Designs

To conduct process evaluations on how well services are delivered, data need to be gathered on the content of interventions and on their delivery systems. Suggested methodologies include direct observation, surveys, and record keeping.

Direct observation designs include case studies, in which participant-observers unobtrusively and systematically record encounters within a program setting, and nonparticipant observation, in which long, open-ended (or "focused") interviews are conducted with program participants. 1 For example, "professional customers" at counseling and testing sites can act as project clients to monitor activities unobtrusively; 2 alternatively, nonparticipant observers can interview both staff and clients. Surveys —either censuses (of the whole population of interest) or samples—elicit information through interviews or questionnaires completed by project participants or potential users of a project. For example, surveys within community-based projects can collect basic statistical information on project objectives, what services are provided, to whom, when, how often, for how long, and in what context.

Record keeping consists of administrative or other reporting systems that monitor use of services. Standardized reporting ensures consistency in the scope and depth of data collected. To use the media campaign as an example, the panel suggests using standardized data on the use of the AIDS hotline to monitor public attentiveness to the advertisements broadcast by the media campaign.

These designs are simple to understand, but they require expertise to implement. For example, observational studies must be conducted by people who are well trained in how to carry out on-site tasks sensitively and to record their findings uniformly. Observers can either complete narrative accounts of what occurred in a service setting or they can complete some sort of data inventory to ensure that multiple aspects of service delivery are covered. These types of studies are time consuming and benefit from corroboration among several observers. The use of surveys in research is well-understood, although they, too, require expertise to be well implemented. As the program chapters reflect, survey data collection must be carefully designed to reduce problems of validity and reliability and, if samples are used, to design an appropriate sampling scheme. Record keeping or service inventories are probably the easiest research designs to implement, although preparing standardized internal forms requires attention to detail about salient aspects of service delivery.

Outcome Evaluation Designs

Research designs for outcome evaluations are meant to assess principal and relative effects. Ideally, to assess the effect of an intervention on program participants, one would like to know what would have happened to the same participants in the absence of the program. Because it is not possible to make this comparison directly, inference strategies that rely on proxies have to be used. Scientists use three general approaches to construct proxies for use in the comparisons required to evaluate the effects of interventions: (1) nonexperimental methods, (2) quasi-experiments, and (3) randomized experiments. The first two are discussed below, and randomized experiments are discussed in the subsequent section.

Nonexperimental and Quasi-Experimental Designs 3

The most common form of nonexperimental design is a before-and-after study. In this design, pre-intervention measurements are compared with equivalent measurements made after the intervention to detect change in the outcome variables that the intervention was designed to influence.

Although the panel finds that before-and-after studies frequently provide helpful insights, the panel believes that these studies do not provide sufficiently reliable information to be the cornerstone for evaluation research on the effectiveness of AIDS prevention programs. The panel's conclusion follows from the fact that the postintervention changes cannot usually be attributed unambiguously to the intervention. 4 Plausible competing explanations for differences between pre-and postintervention measurements will often be numerous, including not only the possible effects of other AIDS intervention programs, news stories, and local events, but also the effects that may result from the maturation of the participants and the educational or sensitizing effects of repeated measurements, among others.

Quasi-experimental and matched control designs provide a separate comparison group. In these designs, the control group may be selected by matching nonparticipants to participants in the treatment group on the basis of selected characteristics. It is difficult to ensure the comparability of the two groups even when they are matched on many characteristics because other relevant factors may have been overlooked or mismatched or they may be difficult to measure (e.g., the motivation to change behavior). In some situations, it may simply be impossible to measure all of the characteristics of the units (e.g., communities) that may affect outcomes, much less demonstrate their comparability.

Matched control designs require extraordinarily comprehensive scientific knowledge about the phenomenon under investigation in order for evaluators to be confident that all of the relevant determinants of outcomes have been properly accounted for in the matching. Three types of information or knowledge are required: (1) knowledge of intervening variables that also affect the outcome of the intervention and, consequently, need adjustment to make the groups comparable; (2) measurements on all intervening variables for all subjects; and (3) knowledge of how to make the adjustments properly, which in turn requires an understanding of the functional relationship between the intervening variables and the outcome variables. Satisfying each of these information requirements is likely to be more difficult than answering the primary evaluation question, "Does this intervention produce beneficial effects?"

Given the size and the national importance of AIDS intervention programs and given the state of current knowledge about behavior change in general and AIDS prevention, in particular, the panel believes that it would be unwise to rely on matching and adjustment strategies as the primary design for evaluating AIDS intervention programs. With differently constituted groups, inferences about results are hostage to uncertainty about the extent to which the observed outcome actually results from the intervention and is not an artifact of intergroup differences that may not have been removed by matching or adjustment.

Randomized Experiments

A remedy to the inferential uncertainties that afflict nonexperimental designs is provided by randomized experiments . In such experiments, one singly constituted group is established for study. A subset of the group is then randomly chosen to receive the intervention, with the other subset becoming the control. The two groups are not identical, but they are comparable. Because they are two random samples drawn from the same population, they are not systematically different in any respect, which is important for all variables—both known and unknown—that can influence the outcome. Dividing a singly constituted group into two random and therefore comparable subgroups cuts through the tangle of causation and establishes a basis for the valid comparison of respondents who do and do not receive the intervention. Randomized experiments provide for clear causal inference by solving the problem of group comparability, and may be used to answer the evaluation questions "Does the intervention work?" and "What works better?"

Which question is answered depends on whether the controls receive an intervention or not. When the object is to estimate whether a given intervention has any effects, individuals are randomly assigned to the project or to a zero-treatment control group. The control group may be put on a waiting list or simply not get the treatment. This design addresses the question, "Does it work?"

When the object is to compare variations on a project—e.g., individual counseling sessions versus group counseling—then individuals are randomly assigned to these two regimens, and there is no zero-treatment control group. This design addresses the question, "What works better?" In either case, the control groups must be followed up as rigorously as the experimental groups.

A randomized experiment requires that individuals, organizations, or other treatment units be randomly assigned to one of two or more treatments or program variations. Random assignment ensures that the estimated differences between the groups so constituted are statistically unbiased; that is, that any differences in effects measured between them are a result of treatment. The absence of statistical bias in groups constituted in this fashion stems from the fact that random assignment ensures that there are no systematic differences between them, differences that can and usually do affect groups composed in ways that are not random. 5 The panel believes this approach is far superior for outcome evaluations of AIDS interventions than the nonrandom and quasi-experimental approaches. Therefore,

To improve interventions that are already broadly implemented, the panel recommends the use of randomized field experiments of alternative or enhanced interventions.

Under certain conditions, the panel also endorses randomized field experiments with a nontreatment control group to evaluate new interventions. In the context of a deadly epidemic, ethics dictate that treatment not be withheld simply for the purpose of conducting an experiment. Nevertheless, there may be times when a randomized field test of a new treatment with a no-treatment control group is worthwhile. One such time is during the design phase of a major or national intervention.

Before a new intervention is broadly implemented, the panel recommends that it be pilot tested in a randomized field experiment.

The panel considered the use of experiments with delayed rather than no treatment. A delayed-treatment control group strategy might be pursued when resources are too scarce for an intervention to be widely distributed at one time. For example, a project site that is waiting to receive funding for an intervention would be designated as the control group. If it is possible to randomize which projects in the queue receive the intervention, an evaluator could measure and compare outcomes after the experimental group had received the new treatment but before the control group received it. The panel believes that such a design can be applied only in limited circumstances, such as when groups would have access to related services in their communities and that conducting the study was likely to lead to greater access or better services. For example, a study cited in Chapter 4 used a randomized delayed-treatment experiment to measure the effects of a community-based risk reduction program. However, such a strategy may be impractical for several reasons, including:

  • sites waiting for funding for an intervention might seek resources from another source;
  • it might be difficult to enlist the nonfunded site and its clients to participate in the study;
  • there could be an appearance of favoritism toward projects whose funding was not delayed.

Although randomized experiments have many benefits, the approach is not without pitfalls. In the planning stages of evaluation, it is necessary to contemplate certain hazards, such as the Hawthorne effect 6 and differential project dropout rates. Precautions must be taken either to prevent these problems or to measure their effects. Fortunately, there is some evidence suggesting that the Hawthorne effect is usually not very large (Rossi and Freeman, 1982:175-176).

Attrition is potentially more damaging to an evaluation, and it must be limited if the experimental design is to be preserved. If sample attrition is not limited in an experimental design, it becomes necessary to account for the potentially biasing impact of the loss of subjects in the treatment and control conditions of the experiment. The statistical adjustments required to make inferences about treatment effectiveness in such circumstances can introduce uncertainties that are as worrisome as those afflicting nonexperimental and quasi-experimental designs. Thus, the panel's recommendation of the selective use of randomized design carries an implicit caveat: To realize the theoretical advantages offered by randomized experimental designs, substantial efforts will be required to ensure that the designs are not compromised by flawed execution.

Another pitfall to randomization is its appearance of unfairness or unattractiveness to participants and the controversial legal and ethical issues it sometimes raises. Often, what is being criticized is the control of project assignment of participants rather than the use of randomization itself. In deciding whether random assignment is appropriate, it is important to consider the specific context of the evaluation and how participants would be assigned to projects in the absence of randomization. The Federal Judicial Center (1981) offers five threshold conditions for the use of random assignment.

  • Does present practice or policy need improvement?
  • Is there significant uncertainty about the value of the proposed regimen?
  • Are there acceptable alternatives to randomized experiments?
  • Will the results of the experiment be used to improve practice or policy?
  • Is there a reasonable protection against risk for vulnerable groups (i.e., individuals within the justice system)?

The parent committee has argued that these threshold conditions apply in the case of AIDS prevention programs (see Turner, Miller, and Moses, 1989:331-333).

Although randomization may be desirable from an evaluation and ethical standpoint, and acceptable from a legal standpoint, it may be difficult to implement from a practical or political standpoint. Again, the panel emphasizes that questions about the practical or political feasibility of the use of randomization may in fact refer to the control of program allocation rather than to the issues of randomization itself. In fact, when resources are scarce, it is often more ethical and politically palatable to randomize allocation rather than to allocate on grounds that may appear biased.

It is usually easier to defend the use of randomization when the choice has to do with assignment to groups receiving alternative services than when the choice involves assignment to groups receiving no treatment. For example, in comparing a testing and counseling intervention that offered a special "skills training" session in addition to its regular services with a counseling and testing intervention that offered no additional component, random assignment of participants to one group rather than another may be acceptable to program staff and participants because the relative values of the alternative interventions are unknown.

The more difficult issue is the introduction of new interventions that are perceived to be needed and effective in a situation in which there are no services. An argument that is sometimes offered against the use of randomization in this instance is that interventions should be assigned on the basis of need (perhaps as measured by rates of HIV incidence or of high-risk behaviors). But this argument presumes that the intervention will have a positive effect—which is unknown before evaluation—and that relative need can be established, which is a difficult task in itself.

The panel recognizes that community and political opposition to randomization to zero treatments may be strong and that enlisting participation in such experiments may be difficult. This opposition and reluctance could seriously jeopardize the production of reliable results if it is translated into noncompliance with a research design. The feasibility of randomized experiments for AIDS prevention programs has already been demonstrated, however (see the review of selected experiments in Turner, Miller, and Moses, 1989:327-329). The substantial effort involved in mounting randomized field experiments is repaid by the fact that they can provide unbiased evidence of the effects of a program.

Unit of Assignment.

The unit of assignment of an experiment may be an individual person, a clinic (i.e., the clientele of the clinic), or another organizational unit (e.g., the community or city). The treatment unit is selected at the earliest stage of design. Variations of units are illustrated in the following four examples of intervention programs.

Two different pamphlets (A and B) on the same subject (e.g., testing) are distributed in an alternating sequence to individuals calling an AIDS hotline. The outcome to be measured is whether the recipient returns a card asking for more information.

Two instruction curricula (A and B) about AIDS and HIV infections are prepared for use in high school driver education classes. The outcome to be measured is a score on a knowledge test.

Of all clinics for sexually transmitted diseases (STDs) in a large metropolitan area, some are randomly chosen to introduce a change in the fee schedule. The outcome to be measured is the change in patient load.

A coordinated set of community-wide interventions—involving community leaders, social service agencies, the media, community associations and other groups—is implemented in one area of a city. Outcomes are knowledge as assessed by testing at drug treatment centers and STD clinics and condom sales in the community's retail outlets.

In example (1), the treatment unit is an individual person who receives pamphlet A or pamphlet B. If either "treatment" is applied again, it would be applied to a person. In example (2), the high school class is the treatment unit; everyone in a given class experiences either curriculum A or curriculum B. If either treatment is applied again, it would be applied to a class. The treatment unit is the clinic in example (3), and in example (4), the treatment unit is a community .

The consistency of the effects of a particular intervention across repetitions justly carries a heavy weight in appraising the intervention. It is important to remember that repetitions of a treatment or intervention are the number of treatment units to which the intervention is applied. This is a salient principle in the design and execution of intervention programs as well as in the assessment of their results.

The adequacy of the proposed sample size (number of treatment units) has to be considered in advance. Adequacy depends mainly on two factors:

  • How much variation occurs from unit to unit among units receiving a common treatment? If that variation is large, then the number of units needs to be large.
  • What is the minimum size of a possible treatment difference that, if present, would be practically important? That is, how small a treatment difference is it essential to detect if it is present? The smaller this quantity, the larger the number of units that are necessary.

Many formal methods for considering and choosing sample size exist (see, e.g., Cohen, 1988). Practical circumstances occasionally allow choosing between designs that involve units at different levels; thus, a classroom might be the unit if the treatment is applied in one way, but an entire school might be the unit if the treatment is applied in another. When both approaches are feasible, the use of a power analysis for each approach may lead to a reasoned choice.

Choice of Methods

There is some controversy about the advantages of randomized experiments in comparison with other evaluative approaches. It is the panel's belief that when a (well executed) randomized study is feasible, it is superior to alternative kinds of studies in the strength and clarity of whatever conclusions emerge, primarily because the experimental approach avoids selection biases. 7 Other evaluation approaches are sometimes unavoidable, but ordinarily the accumulation of valid information will go more slowly and less securely than in randomized approaches.

Experiments in medical research shed light on the advantages of carefully conducted randomized experiments. The Salk vaccine trials are a successful example of a large, randomized study. In a double-blind test of the polio vaccine, 8 children in various communities were randomly assigned to two treatments, either the vaccine or a placebo. By this method, the effectiveness of Salk vaccine was demonstrated in one summer of research (Meier, 1957).

A sufficient accumulation of relevant, observational information, especially when collected in studies using different procedures and sample populations, may also clearly demonstrate the effectiveness of a treatment or intervention. The process of accumulating such information can be a long one, however. When a (well-executed) randomized study is feasible, it can provide evidence that is subject to less uncertainty in its interpretation, and it can often do so in a more timely fashion. In the midst of an epidemic, the panel believes it proper that randomized experiments be one of the primary strategies for evaluating the effectiveness of AIDS prevention efforts. In making this recommendation, however, the panel also wishes to emphasize that the advantages of the randomized experimental design can be squandered by poor execution (e.g., by compromised assignment of subjects, significant subject attrition rates, etc.). To achieve the advantages of the experimental design, care must be taken to ensure that the integrity of the design is not compromised by poor execution.

In proposing that randomized experiments be one of the primary strategies for evaluating the effectiveness of AIDS prevention programs, the panel also recognizes that there are situations in which randomization will be impossible or, for other reasons, cannot be used. In its next report the panel will describe at length appropriate nonexperimental strategies to be considered in situations in which an experiment is not a practical or desirable alternative.

  • The Management of Evaluation

Conscientious evaluation requires a considerable investment of funds, time, and personnel. Because the panel recognizes that resources are not unlimited, it suggests that they be concentrated on the evaluation of a subset of projects to maximize the return on investment and to enhance the likelihood of high-quality results.

Project Selection

Deciding which programs or sites to evaluate is by no means a trivial matter. Selection should be carefully weighed so that projects that are not replicable or that have little chance for success are not subjected to rigorous evaluations.

The panel recommends that any intensive evaluation of an intervention be conducted on a subset of projects selected according to explicit criteria. These criteria should include the replicability of the project, the feasibility of evaluation, and the project's potential effectiveness for prevention of HIV transmission.

If a project is replicable, it means that the particular circumstances of service delivery in that project can be duplicated. In other words, for CBOs and counseling and testing projects, the content and setting of an intervention can be duplicated across sites. Feasibility of evaluation means that, as a practical matter, the research can be done: that is, the research design is adequate to control for rival hypotheses, it is not excessively costly, and the project is acceptable to the community and the sponsor. Potential effectiveness for HIV prevention means that the intervention is at least based on a reasonable theory (or mix of theories) about behavioral change (e.g., social learning theory [Bandura, 1977], the health belief model [Janz and Becker, 1984], etc.), if it has not already been found to be effective in related circumstances.

In addition, since it is important to ensure that the results of evaluations will be broadly applicable,

The panel recommends that evaluation be conducted and replicated across major types of subgroups, programs, and settings. Attention should be paid to geographic areas with low and high AIDS prevalence, as well as to subpopulations at low and high risk for AIDS.

Research Administration

The sponsoring agency interested in evaluating an AIDS intervention should consider the mechanisms through which the research will be carried out as well as the desirability of both independent oversight and agency in-house conduct and monitoring of the research. The appropriate entities and mechanisms for conducting evaluations depend to some extent on the kinds of data being gathered and the evaluation questions being asked.

Oversight and monitoring are important to keep projects fully informed about the other evaluations relevant to their own and to render assistance when needed. Oversight and monitoring are also important because evaluation is often a sensitive issue for project and evaluation staff alike. The panel is aware that evaluation may appear threatening to practitioners and researchers because of the possibility that evaluation research will show that their projects are not as effective as they believe them to be. These needs and vulnerabilities should be taken into account as evaluation research management is developed.

Conducting the Research

To conduct some aspects of a project's evaluation, it may be appropriate to involve project administrators, especially when the data will be used to evaluate delivery systems (e.g., to determine when and which services are being delivered). To evaluate outcomes, the services of an outside evaluator 9 or evaluation team are almost always required because few practitioners have the necessary professional experience or the time and resources necessary to do evaluation. The outside evaluator must have relevant expertise in evaluation research methodology and must also be sensitive to the fears, hopes, and constraints of project administrators.

Several evaluation management schemes are possible. For example, a prospective AIDS prevention project group (the contractor) can bid on a contract for project funding that includes an intensive evaluation component. The actual evaluation can be conducted either by the contractor alone or by the contractor working in concert with an outside independent collaborator. This mechanism has the advantage of involving project practitioners in the work of evaluation as well as building separate but mutually informing communities of experts around the country. Alternatively, a contract can be let with a single evaluator or evaluation team that will collaborate with the subset of sites that is chosen for evaluation. This variation would be managerially less burdensome than awarding separate contracts, but it would require greater dependence on the expertise of a single investigator or investigative team. ( Appendix A discusses contracting options in greater depth.) Both of these approaches accord with the parent committee's recommendation that collaboration between practitioners and evaluation researchers be ensured. Finally, in the more traditional evaluation approach, independent principal investigators or investigative teams may respond to a request for proposal (RFP) issued to evaluate individual projects. Such investigators are frequently university-based or are members of a professional research organization, and they bring to the task a variety of research experiences and perspectives.

Independent Oversight

The panel believes that coordination and oversight of multisite evaluations is critical because of the variability in investigators' expertise and in the results of the projects being evaluated. Oversight can provide quality control for individual investigators and can be used to review and integrate findings across sites for developing policy. The independence of an oversight body is crucial to ensure that project evaluations do not succumb to the pressures for positive findings of effectiveness.

When evaluation is to be conducted by a number of different evaluation teams, the panel recommends establishing an independent scientific committee to oversee project selection and research efforts, corroborate the impartiality and validity of results, conduct cross-site analyses, and prepare reports on the progress of the evaluations.

The composition of such an independent oversight committee will depend on the research design of a given program. For example, the committee ought to include statisticians and other specialists in randomized field tests when that approach is being taken. Specialists in survey research and case studies should be recruited if either of those approaches is to be used. Appendix B offers a model for an independent oversight group that has been successfully implemented in other settings—a project review team, or advisory board.

Agency In-House Team

As the parent committee noted in its report, evaluations of AIDS interventions require skills that may be in short supply for agencies invested in delivering services (Turner, Miller, and Moses, 1989:349). Although this situation can be partly alleviated by recruiting professional outside evaluators and retaining an independent oversight group, the panel believes that an in-house team of professionals within the sponsoring agency is also critical. The in-house experts will interact with the outside evaluators and provide input into the selection of projects, outcome objectives, and appropriate research designs; they will also monitor the progress and costs of evaluation. These functions require not just bureaucratic oversight but appropriate scientific expertise.

This is not intended to preclude the direct involvement of CDC staff in conducting evaluations. However, given the great amount of work to be done, it is likely a considerable portion will have to be contracted out. The quality and usefulness of the evaluations done under contract can be greatly enhanced by ensuring that there are an adequate number of CDC staff trained in evaluation research methods to monitor these contracts.

The panel recommends that CDC recruit and retain behavioral, social, and statistical scientists trained in evaluation methodology to facilitate the implementation of the evaluation research recommended in this report.

Interagency Collaboration

The panel believes that the federal agencies that sponsor the design of basic research, intervention programs, and evaluation strategies would profit from greater interagency collaboration. The evaluation of AIDS intervention programs would benefit from a coherent program of studies that should provide models of efficacious and effective interventions to prevent further HIV transmission, the spread of other STDs, and unwanted pregnancies (especially among adolescents). A marriage could then be made of basic and applied science, from which the best evaluation is born. Exploring the possibility of interagency collaboration and CDC's role in such collaboration is beyond the scope of this panel's task, but it is an important issue that we suggest be addressed in the future.

Costs of Evaluation

In view of the dearth of current evaluation efforts, the panel believes that vigorous evaluation research must be undertaken over the next few years to build up a body of knowledge about what interventions can and cannot do. Dedicating no resources to evaluation will virtually guarantee that high-quality evaluations will be infrequent and the data needed for policy decisions will be sparse or absent. Yet, evaluating every project is not feasible simply because there are not enough resources and, in many cases, evaluating every project is not necessary for good science or good policy.

The panel believes that evaluating only some of a program's sites or projects, selected under the criteria noted in Chapter 4 , is a sensible strategy. Although we recommend that intensive evaluation be conducted on only a subset of carefully chosen projects, we believe that high-quality evaluation will require a significant investment of time, planning, personnel, and financial support. The panel's aim is to be realistic—not discouraging—when it notes that the costs of program evaluation should not be underestimated. Many of the research strategies proposed in this report require investments that are perhaps greater than has been previously contemplated. This is particularly the case for outcome evaluations, which are ordinarily more difficult and expensive to conduct than formative or process evaluations. And those costs will be additive with each type of evaluation that is conducted.

Panel members have found that the cost of an outcome evaluation sometimes equals or even exceeds the cost of actual program delivery. For example, it was reported to the panel that randomized studies used to evaluate recent manpower training projects cost as much as the projects themselves (see Cottingham and Rodriguez, 1987). In another case, the principal investigator of an ongoing AIDS prevention project told the panel that the cost of randomized experimentation was approximately three times higher than the cost of delivering the intervention (albeit the study was quite small, involving only 104 participants) (Kelly et al., 1989). Fortunately, only a fraction of a program's projects or sites need to be intensively evaluated to produce high-quality information, and not all will require randomized studies.

Because of the variability in kinds of evaluation that will be done as well as in the costs involved, there is no set standard or rule for judging what fraction of a total program budget should be invested in evaluation. Based upon very limited data 10 and assuming that only a small sample of projects would be evaluated, the panel suspects that program managers might reasonably anticipate spending 8 to 12 percent of their intervention budgets to conduct high-quality evaluations (i.e., formative, process, and outcome evaluations). 11 Larger investments seem politically infeasible and unwise in view of the need to put resources into program delivery. Smaller investments in evaluation may risk studying an inadequate sample of program types, and it may also invite compromises in research quality.

The nature of the HIV/AIDS epidemic mandates an unwavering commitment to prevention programs, and the prevention activities require a similar commitment to the evaluation of those programs. The magnitude of what can be learned from doing good evaluations will more than balance the magnitude of the costs required to perform them. Moreover, it should be realized that the costs of shoddy research can be substantial, both in their direct expense and in the lost opportunities to identify effective strategies for AIDS prevention. Once the investment has been made, however, and a reservoir of findings and practical experience has accumulated, subsequent evaluations should be easier and less costly to conduct.

  • Bandura, A. (1977) Self-efficacy: Toward a unifying theory of behavioral change . Psychological Review 34:191-215. [ PubMed : 847061 ]
  • Campbell, D. T., and Stanley, J. C. (1966) Experimental and Quasi-Experimental Design and Analysis . Boston: Houghton-Mifflin.
  • Centers for Disease Control (CDC) (1988) Sourcebook presented at the National Conference on the Prevention of HIV Infection and AIDS Among Racial and Ethnic Minorities in the United States (August).
  • Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences . 2nd ed. Hillsdale, NJ.: L. Erlbaum Associates.
  • Cook, T., and Campbell, D. T. (1979) Quasi-Experimentation: Design and Analysis for Field Settings . Boston: Houghton-Mifflin.
  • Federal Judicial Center (1981) Experimentation in the Law . Washington, D.C.: Federal Judicial Center.
  • Janz, N. K., and Becker, M. H. (1984) The health belief model: A decade later . Health Education Quarterly 11 (1):1-47. [ PubMed : 6392204 ]
  • Kelly, J. A., St. Lawrence, J. S., Hood, H. V., and Brasfield, T. L. (1989) Behavioral intervention to reduce AIDS risk activities . Journal of Consulting and Clinical Psychology 57:60-67. [ PubMed : 2925974 ]
  • Meier, P. (1957) Safety testing of poliomyelitis vaccine . Science 125(3257): 1067-1071. [ PubMed : 13432758 ]
  • Roethlisberger, F. J. and Dickson, W. J. (1939) Management and the Worker . Cambridge, Mass.: Harvard University Press.
  • Rossi, P. H., and Freeman, H. E. (1982) Evaluation: A Systematic Approach . 2nd ed. Beverly Hills, Cal.: Sage Publications.
  • Turner, C. F., editor; , Miller, H. G., editor; , and Moses, L. E., editor. , eds. (1989) AIDS, Sexual Behavior, and Intravenous Drug Use . Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. [ PubMed : 25032322 ]
  • Weinstein, M. C., Graham, J. D., Siegel, J. E., and Fineberg, H. V. (1989) Cost-effectiveness analysis of AIDS prevention programs: Concepts, complications, and illustrations . In C.F. Turner, editor; , H. G. Miller, editor; , and L. E. Moses, editor. , eds., AIDS, Sexual Behavior, and Intravenous Drug Use . Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. [ PubMed : 25032322 ]
  • Weiss, C. H. (1972) Evaluation Research . Englewood Cliffs, N.J.: Prentice-Hall, Inc.

On occasion, nonparticipants observe behavior during or after an intervention. Chapter 3 introduces this option in the context of formative evaluation.

The use of professional customers can raise serious concerns in the eyes of project administrators at counseling and testing sites. The panel believes that site administrators should receive advance notification that professional customers may visit their sites for testing and counseling services and provide their consent before this method of data collection is used.

Parts of this section are adopted from Turner, Miller, and Moses, (1989:324-326).

This weakness has been noted by CDC in a sourcebook provided to its HIV intervention project grantees (CDC, 1988:F-14).

The significance tests applied to experimental outcomes calculate the probability that any observed differences between the sample estimates might result from random variations between the groups.

Research participants' knowledge that they were being observed had a positive effect on their responses in a series of famous studies made at General Electric's Hawthorne Works in Chicago (Roethlisberger and Dickson, 1939); the phenomenon is referred to as the Hawthorne effect.

participants who self-select into a program are likely to be different from non-random comparison groups in terms of interests, motivations, values, abilities, and other attributes that can bias the outcomes.

A double-blind test is one in which neither the person receiving the treatment nor the person administering it knows which treatment (or when no treatment) is being given.

As discussed under ''Agency In-House Team,'' the outside evaluator might be one of CDC's personnel. However, given the large amount of research to be done, it is likely that non-CDC evaluators will also need to be used.

See, for example, chapter 3 which presents cost estimates for evaluations of media campaigns. Similar estimates are not readily available for other program types.

For example, the U. K. Health Education Authority (that country's primary agency for AIDS education and prevention programs) allocates 10 percent of its AIDS budget for research and evaluation of its AIDS programs (D. McVey, Health Education Authority, personal communication, June 1990). This allocation covers both process and outcome evaluation.

  • Cite this Page National Research Council (US) Panel on the Evaluation of AIDS Interventions; Coyle SL, Boruch RF, Turner CF, editors. Evaluating AIDS Prevention Programs: Expanded Edition. Washington (DC): National Academies Press (US); 1991. 1, Design and Implementation of Evaluation Research.
  • PDF version of this title (6.0M)

In this Page

Related information.

  • PubMed Links to PubMed

Recent Activity

  • Design and Implementation of Evaluation Research - Evaluating AIDS Prevention Pr... Design and Implementation of Evaluation Research - Evaluating AIDS Prevention Programs

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Logo for Mavs Open Press

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

11.1 Evaluation research

Learning objectives.

  • Describe how to conduct evaluation research
  • Define inputs, outputs, and outcomes
  • Identify the three goals of process assessment

As you may recall from the definition provided in Chapter 1, evaluation research is research conducted to assess the effects of specific programs or policies. Evaluation research is often used when some form of policy intervention is planned, such as welfare reform or school curriculum change. The focus on interventions and social problems makes it natural fit for social work researchers. It might be used to assess the extent to which intervention is necessary by attempting to define and diagnose social problems in social workers’ service areas, and it might also be used to understand whether their agencies’ interventions have had their intended consequences.  Evaluation research is becoming more and more necessary for agencies to secure and maintain funding for their programs.  The main types of evaluation research are needs assessments, outcomes assessments, process assessments, and efficiency analyses such as cost-benefits or cost-effectiveness  analyses.  We will discuss two types in this section:  outcomes assessments and process assessments .

evaluative research helps those in the social sciences to

Outcomes Assessments

An outcomes assessment is an evaluation designed to discover if a program achieved its intended outcomes. Much like other types of research, it comes with its own peculiar terminology.  Inputs are the resources needed for the program to operate. These include physical location, any equipment needed, staff (and experience/knowledge of those staff), monetary funding, and most importantly, the clients. Program administrators pull together the necessary resources to run an intervention or program. The program is the intervention your clients receive—perhaps giving them access to housing vouchers or enrolling them in a smoking cessation class. The outputs of programs are tangible results of the program process. Outputs in a program might include the number of clients served, staff members trained to implement the intervention, mobility assistance devices distributed, nicotine patches distributed, etc. By contrast, outcomes speak to the purpose of the program itself.  Outcomes are the observed changes, whether intended or unintended, that occurred due to the program or intervention. By looking at each of these domains, evaluation researchers can obtain a comprehensive view of the program.

Let’s run through an example from the social work practice of the wife of Matt DeCarlo who wrote the source material for much of this textbook. She runs an after-school bicycling club called Pedal Up for children with mental health issues. She has a lot of inputs in her program. First, there are the children who enroll, the volunteer and paid staff members who supervise the kids (and their knowledge about bicycles and children’s mental health), the bicycles and equipment that all clients and staff use, the community center room they use as a home base, the paths of the city where they ride their bikes, and the public and private grants they use to fund the program. Next, the program itself is a twice weekly after-school program in which children learn about bicycle maintenance and bicycle safety for about 30 minutes each day and then spend at least an hour riding around the city on bicycle trails.

In measuring the outputs of this program, she has many options. She would probably include the number of children  participating in the program or the number of bike rides or lessons given. Other outputs might include the number of miles logged by the children over the school year, the number of bicycle helmets or spare tires distributed, etc. Finally, the outcomes of the programs might include each child’s mental health symptoms or behavioral issues at school.

Process Assessments

Outcomes assessments are performed at the end of a program or at specific points during the grant reporting process. What if a social worker wants to assess earlier on in the process if the program is on target to achieve its outcomes? In that case a process assessment is recommended, which evaluates a program in its earlier stages. Faulkner and Faulkner (2016) describe three main goals for conducting a process evaluation.

The first is program description , in which the researcher simply tries to understand how the program looks like in everyday life for clients and staff members. In our Pedal Up example, assessing program description might involve measuring in the first few weeks the hours children spent riding their bikes, the number of children and staff in attendance, etc. This data will provide those in charge of the program an idea of how their ideas have translated from the grant proposal to the real world. If, for example, not enough children are showing up or if children are only able to ride their bikes for ten minutes each day, it may indicate that something is wrong.

Another important goal of process assessment is program monitoring . If you have some social work practice experience already, it’s likely you’ve encountered program monitoring. Agency administrators may look at sign-in sheets for groups, hours billed by clinicians, or other metrics to track how services are utilized over time. They may also assess whether clinicians are following the program correctly or if they are deviating from how the program was designed. This can be an issue in program evaluations of specific treatment models, as any differences between what the administrators conceptualized and what the clinicians implemented jeopardize the internal validity of the evaluation. If, in our Pedal Up example, we have a staff member who does not review bike safety each week or does not enforce helmet laws for some students, we could catch that through program monitoring.

The final goal of process assessments is quality assurance. At its most simple level, quality assurance may involve sending out satisfaction questionnaires to clients and staff members. If there are serious issues, it’s better to know them early on in a program so the program can be adapted to meet the needs of clients and staff. It is important to solicit staff feedback in addition to consumer feedback, as they have insight into how the program is working in practice and areas in which they may be falling short of what the program should be. In our example, we could spend some time talking with parents when they pick their children up from the program or hold a staff meeting to provide opportunities for those most involved in the program to provide feedback.

Needs Assessments

A third type of evaluation research is a needs assessment. A needs assessment can be used to demonstrate and document a community or organizational need and should be carried out in a way to better understand the context in which the need arises. Needs assessments focus on gaining a better understanding of a gap within an organization or community and developing a plan to address that gap. They will often precede the development of a program or organization and are often used to justify the necessity of a program or organization to fill a gap. Needs assessments can be general, such as asking members of a community or organization to reflect on the functioning of a community or organization, or they can be specific in which community or organization members are asked to respond to an identified gap within a community or agency.

Needs assessments should respond to the following questions:

  • What is the need or gap?
  • What data exist about the need or gap?
  • What data are needed in order to develop a plan to fill the gap?
  • What resources are available to do the needs assessment?
  • Who should be involved in the analysis and interpretation of the data?
  • How will the information gathered be used and for what purpose?
  • How will the results be communicated to community partners?

In order to answer these questions, needs assessments often follow a four-step plan. First, researchers must identify a gap in a community or organization and explore what potential avenues could be pursued to address the gap. This involves deciphering what is known about the needs within the community or organization and determining the scope and direction of the needs assessment. The researcher may partner with key informants within the community to identify the need in order to develop a method of research to conduct the needs assessment.

Second, the researcher will gather data to better understand the need. Data could be collected from key informants within the community, community members themselves, members of an organization, or records from an agency or organization. This involves designing a research study in which a variety of data collection methods could be used, such as surveys, interviews, focus groups, community forums, and secondary analysis of existing data. Once the data are collected, they will be organized and analyzed according to the research questions guiding the needs assessment.

Third, information gathered during data collection will be used to develop a plan of action to fill the needs. This could be the development of a new community agency to address a gap of services within the community or the addition of a new program at an existing agency. This agency or program must be designed according to the results of the needs assessment in order to accurately address the gap.

Finally, the newly developed program or agency must be evaluated to determine if it is filling the gap revealed by the needs assessment. Evaluating the success of the agency or program is essential to the needs assessment process.

Evaluation research is a part of all social workers’ toolkits. It ensures that social work interventions achieve their intended effects. This protects our clients and ensures that money and other resources are not spent on programs that do not work. Evaluation research uses the skills of quantitative and qualitative research to ensure clients receive interventions that have been shown to be successful.

Key Takeaways

  • Evaluation research is a common research task for social workers.
  • Outcomes assessment evaluate the degree to which programs achieved their intended outcomes.
  • Outputs differ from outcomes.
  • Process assessments evaluate a program in its early stages, so changes can be made.
  • Inputs- resources needed for the program to operate
  • Outcomes- the issues the program is trying to change
  • Outcomes assessment- an evaluation designed to discover if a program achieved its intended outcomes
  • Outputs- tangible results of the program process
  • Process assessment- an evaluation conducted during the earlier stages of a program or on an ongoing basis
  • Program- the intervention clients receive

Image attributions

assess by Wokandapix CC-0

Foundations of Social Work Research Copyright © 2020 by Rebecca L. Mauldin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

University of Illinois Urbana-Champaign

University of Illinois I-Mark

Evaluation Research: An Overview

Powell, ronald r..

https://hdl.handle.net/2142/3666 Copy

Description

  • Research methods
  • Evaluation research

Owning Collections

Library trends 55 (1) summer 2006: research methods primary, manage files, edit collection membership, edit metadata, edit properties.

University of Illinois Logo

I don't have an Illinois NetID

evaluative research helps those in the social sciences to

The Evaluation of Research in Social Sciences and Humanities

Lessons from the Italian Experience

  • © 2018
  • Andrea Bonaccorsi 0

DESTEC, University of Pisa, Pisa, Italy

You can also search for this editor in PubMed   Google Scholar

  • Examines very important issues in research evaluation in the Social Sciences and Humanities
  • Presents a systemic review of theoretical issues influencing the evaluation of Social Sciences and Humanities
  • Paints a broad and colorful picture of the Italian research assessment and its debate in Italy from many disciplines

11k Accesses

60 Citations

34 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

Similar content being viewed by others, patterns of internationalization and criteria for research assessment in the social sciences and humanities.

evaluative research helps those in the social sciences to

Research Assessment in the Humanities: Introduction

evaluative research helps those in the social sciences to

The Evolution of Research Evaluation in China

  • Research in Social Science and Humanities
  • Epistemic Theory of Evaluation in Social Sciences and Humanities
  • Peer review in Social Sciences and Humanties
  • Guideline Managing Peer Review
  • Examining Data on Individual Research Products
  • Research Quality Criteria
  • Evaluation of Books in Social Sciences and Humanities
  • Reliability and Quality of Online Library Catalogs
  • Linked Data Technology in Catalog Data
  • Evaluation Process and Books
  • Role of Books in Research Practices of Humanities
  • Role of Legal Monographs in France, UK and the Netherlands
  • Role of Legal Monographs of Scholars of Italian Universities
  • Validity and Robustness of Journal Rating
  • Strategies for Journal Publication in SSH
  • Scientific Societies and Journal Rating
  • Potential of Indicators and Google Scholar
  • Social Sciences and Humanities in Italy
  • The Impact of Research in Social Sciences and Humanities

Table of contents (17 chapters)

Front matter, towards an epistemic approach to evaluation in ssh.

Andrea Bonaccorsi

Research Quality Criteria in SSH

Mapping the role of the book in evaluation at the individual and department level in italian ssh. a multisource analysis.

  • Chiara Faggiolani, Giovanni Solimine

Guidelines for Peer Review. A Survey of International Practices

  • Andrea Capaccioni, Giovanna Spina

Peer Review in Social Sciences and Humanities. Addressing the Interpretation of Quality Criteria

Research quality evaluation: the case of legal studies.

  • Ginevra Peruginelli, Sebastiano Faro

The Role of Books and Monographs in SSH Research and Their Evaluation

More, less or better: the problem of evaluating books in ssh research.

  • Geoffrey Williams, Antonella Basso, Ioana Galleron, Tiziana Lippiello

Research Quality Criteria in the Evaluation of Books

  • Carla Basili, Luca Lanzillo

Quality Evaluation of Online Library Catalogues, Advanced Discovery Tools and Linked Data Technologies

  • Maria Teresa Biagetti, Antonella Iacono, Antonella Trombone

A Survey on Legal Research Monograph Evaluation in Italy

  • Ginevra Peruginelli, Sebastiano Faro, Tommaso Agnoloni

Journal Classification and Rating

Publication strategies in ssh: empirical evidence and policy implications.

  • Domenica Fioredistella Iezzi

Journal Ratings as Predictors of Article Quality in Arts, Humanities, and Social Sciences: An Analysis Based on the Italian Research Evaluation Exercise

  • Andrea Bonaccorsi, Antonio Ferrara, Marco Malgarini

Exploring New Indicators for the Multilevel Assessment of Research

Google scholar as a citation database for non-bibliometric areas: the eva project results.

  • Alfio Ferrara, Stefano Montanelli, Stefano Verzillo

Assessing the Reliability and Validity of Google Scholar Indicators. The Case of Social Sciences in Italy

  • Ferruccio Biolcati-Rinaldi, Francesco Molteni, Silvia Salini

Is the Diffusion of Books in Library Holdings a Reliable Indicator in Research Assessment?

The social impact assessment in social sciences and humanities: methodological issues from the italian experience.

  • Luca Lanzillo

Editors and Affiliations

About the editor, bibliographic information.

Book Title : The Evaluation of Research in Social Sciences and Humanities

Book Subtitle : Lessons from the Italian Experience

Editors : Andrea Bonaccorsi

DOI : https://doi.org/10.1007/978-3-319-68554-0

Publisher : Springer Cham

eBook Packages : Social Sciences , Social Sciences (R0)

Copyright Information : Springer International Publishing AG 2018

Hardcover ISBN : 978-3-319-68553-3 Published: 15 January 2018

Softcover ISBN : 978-3-319-88620-6 Published: 06 June 2019

eBook ISBN : 978-3-319-68554-0 Published: 04 January 2018

Edition Number : 1

Number of Pages : XX, 416

Number of Illustrations : 22 b/w illustrations, 39 illustrations in colour

Topics : Research Methodology , Philosophy of the Social Sciences , Assessment, Testing and Evaluation

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Evaluation Methods
  • Research Methods

Evaluations of social programs are vital to understand what works and what doesn’t, in order to make the best use of available resources. This video describes the various types of evaluations and illustrates these using examples from evaluation research conducted in ISSR.

Evaluation Methods video

  • Featured Projects
  • Training Opportunities

View examples of our featured projects exemplifying ISSR's strengths:

evaluative research helps those in the social sciences to

Develop skills to conduct  and  commission  evaluations

Women in a crowd of people looking at laptop

The one-day MFSAS Program Evaluation Course introduces key evaluation concepts and techniques. It provides participants with the foundational skills to plan or commission an evaluation.

Find out more here

  • Mixed Methods
  • Physiological Measurement and Observation Methods
  • Predictive Risk Modelling Methods
  • Survey Methods
  • Behavioural Economics, Experimental and Quasi-experimental Methods

Dr Tyrone Ridgway  | Research Partnerships Manager

[email protected]

+61 7 3346 7471

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why Publish?
  • About Research Evaluation
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

1. introduction, 2. theoretical background, 3. analytical framework, 4. review of 10 evaluation methods, 5. analysis of evaluation methods, 6. reflection and discussion, acknowledgements.

  • < Previous

The production of scientific and societal value in research evaluation: a review of societal impact assessment methods

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Jorrit P Smit, Laurens K Hessels, The production of scientific and societal value in research evaluation: a review of societal impact assessment methods, Research Evaluation , Volume 30, Issue 3, July 2021, Pages 323–335, https://doi.org/10.1093/reseval/rvab002

  • Permissions Icon Permissions

Over the past two decades, several methods have been developed to evaluate the societal impact of research. Compared to the practical development of the field, the conceptual development is relatively weak. This review article contributes to the latter by elucidating the theoretical aspects of the dominant methods for evaluating societal impact of research, in particular, their presuppositions about the relationship between scientific and societal value of research. We analyse 10 approaches to the assessment of the societal impact of research from a constructivist perspective. The methods represent different understandings of knowledge exchange, which can be understood in terms of linear, cyclical, and co-production models. In addition, the evaluation methods use a variety of concepts for the societal value of research, which suggest different relationships with scientific value. While some methods rely on a clear and explicit distinction between the two types of value, other methods, in particular Evaluative Inquiry, ASIRPA, Contribution Mapping, Public Value Mapping, and SIAMPI, consider the mechanisms for producing societal value integral to the research process. We conclude that evaluation methods must balance between demarcating societal value as a separate performance indicator for practical purposes and doing justice to the (constructivist) science studies’ findings about the integration of scientific and societal value of research. Our analytic comparison of assessment methods can assist research evaluators in the conscious and responsible selection of an approach that fits with the object under evaluation. As evaluation actively shapes knowledge production, it is important not to use oversimplified concepts of societal value.

Today, evaluation, audits, and accountability are widespread in society ( Power 2000 ; Dahler-Larsen 2011 ). Evaluating productivity and impact has also become an integral part of scientific practice at all levels—from individuals and research groups to departments, faculties, and universities, and from grants and funding programmes to entire disciplinary fields ( Wilsdon 2016 ). Over the past two decades, and in response to practical needs of science policymakers, research funding bodies, and university administrators, science studies scholars have developed a number of methods to evaluate the societal value of research. Several reviews have compared the functionality of these different methods for evaluating societal value ( Bornmann 2013 ; Penfield et al. 2014 ; Miettinen, Tuunainen, and Esko 2015 ; Greenhalgh et al. 2016 ). Most literature about impact evaluation is user-oriented and driven by user needs. Lacking, however, is critical engagement with the methods and their policy context from a theoretical point of view ( Donovan 2019 ; Muhonen, Benneworth, and Olmos-Peñuela 2020 ; Thomas et al. 2020 ; Williams 2020 ). This is urgent because evaluation methods are not passive instruments but actively steer what counts as good, real, and relevant research ( de Rijcke et al. 2016 ). As instances of valuation, research evaluations produce concepts of value that influence decisions about the research lines of individuals and research groups.

In this article, we analyse the theoretical aspects of 10 societal impact evaluation methods to understand how evaluation differentiates between the scientific and societal value of research. The performative nature of evaluation, and the outpacing of theory by fast developments in the practice of research evaluation, motivates us to scrutinize the conceptual presuppositions of these methods. This critical review aims to contribute to the required theoretical reflection on impact assessment, which can inform future evaluation practices.

We probe the conceptual reach and limits of these evaluation methods with a framework that is informed by constructivist studies of scientific practice. While Williams (2020) has recently proposed a sociological theory of power to describe research impact assessment, we understand societal value in a more constructivist way. This perspective will draw attention both to the interactions between academic researchers and societal actors and to the role allotted to evaluators in knowledge production. The focus on the practice of research abandons an upfront distinction between the scientific value and the societal value of research. Instead, a constructivist perspective suggests that a difference between the scientific and societal value of research has to be actively produced. We assume that evaluation practices have visibly contributed to such differentiation. An important purpose of evaluating societal value of research is to emphasize and illustrate the contribution that research activities can make to economic progress, societal well-being, or other public goods distinct from an, arguably more internal, epistemic contribution. This explains why methods for evaluating societal value typically position this type of value explicitly as a category separate from scientific value, which is subject to a longer tradition of evaluation ( Wouters 1999 ).

The central research question for this article is : how is the relationship between scientific and societal value of academic research understood and operationalized in different impact evaluation methods? In Section 2, we discuss science studies literature about the relationship between societal and scientific value. In Section 3, we present our analytical framework and its three critical aspects: actors, exchange mechanisms, and the concept of societal value. Background and characteristics of the 10 evaluation methods are introduced in Section 4. Subsequently, in Section 5, we analyse and compare the methods with respect to the theoretical aspects of actors, mechanisms, and concepts. Ultimately, we will reflect on the possibilities for evaluation practices to balance practical (policy) demands for evaluation with the theoretical understanding of research practice.

In this reflective study, we explore the theoretical assumptions underpinning a set of societal impact evaluation methods. Our perspective builds on a variety of historical, sociological, and philosophical studies that have described scientific research as a socio-material practice, embedded in networks of human and non-human actors. In this view, the production of scientific knowledge co-evolves with the establishment of relations with non-academic actors. This implies that scientific and societal value of research are strongly related.

2.1 Societal and scientific value of research

The most common term to refer to evaluation methods addressing societal benefits or societal value of science is probably societal impact evaluation. In line with this, we will use the term societal impact evaluation to refer to our general object of study, but in our analysis of their theoretical presuppositions, we will focus in particular on the concept of societal value . While impact suggests a limited focus on (intended) changes in behaviour or practices, we regard the concept of societal value as a more open and inclusive concept, which can also refer to the appreciation of particular research outcomes that do not involve tangible effects, such as the cultural value of a better understanding of social phenomena. Informed by the approach in valuation studies that takes value itself as a social construct ( Lamont 2012 ), we do not work with a precise circumscription of societal value but rather explore the way research evaluations construct the value of research activities in relation to a variety of actors in society.

Constructivist science studies have understood research primarily in terms of practices and networks, avoiding an a priori distinction between knowledge producers and knowledge users. Knowledge production takes place in translation networks, in which scientific claims start as ‘fictions’ that only develop the status of ‘truth’ if they receive sufficient interest from others ( Latour 1987 ; Stengers 1997 ). Any new scientific claim can be stabilized by gathering a wide variety of allied ‘actants’—including texts, devices, skills, institutions, and humans, from researchers and technicians to industrialists, politicians and activists ( Callon 1997 ). Stabilizing a claim, or making it reliable, thus requires extension of the network. A strict boundary between the inside (‘producers’) and outside (‘users’) of research cannot be easily drawn in this view. Rather, it pleads for an ontologically open stance towards the actors that potentially contribute to the process of knowledge production.

The diversity of actors involved in knowledge production blurs the distinction between the activities of research and its distribution ‘outside of the laboratory’. Current science policies aiming to improve the societal value of research typically concern themselves with the relations between scientific and societal actors. From a constructivist perspective, these relations appear integral to the network that sustains the primary research process. This implies that one cannot distinguish between active producers and passive recipients of knowledge and that the circulation of knowledge is an interactive process, integral to research as practice. This is of course in conflict with the value-free ideal of research and the linear model of innovation. Ideals of a pure and value-free science suggest that it is possible and necessary to separate the ‘disinterested pursuit of truth’ from economic, political, or moral interests and actors. The linear model adds that knowledge exchange runs only from fundamental research to society, eventually causing technological change and economic growth. Many analysts have shown that both perspectives are academic and rhetoric ideals rather than practical realities ( Proctor 1991 ; Edgerton 2004 ; Godin 2006 ; Douglas 2014 ). For example, research categories like ‘basic’ and ‘applied’, or ‘mode-1’ and ‘mode-2’, do not really describe different methodologies but rather mirror political issues with respect to the practical organization of research ( Godin 1998 ; Shinn 2002 ; Hessels and van Lente 2008 ; Kaldewey and Schauz 2018 ).

Constructivist science studies instead understand fundamental research itself as a value-laden practice that is the result of long chains of translations, flowing in both directions between science and society. In this perspective, the reliability or value of scientific claims partly depends on the heterogeneity and extension of the translation network. This view might induce the relativist reproach that this makes the credibility of knowledge dependent on the views and fads of industry, politicians and society ( Sismondo 2017 ; Lynch 2017 ). The issue at stake, however, is not a choice between scientific or societal value. The constructivist premise is instead that scientific and societal values rely on similar actor networks. The process and results of evaluation could contribute to the understanding of which relations enable, prohibit, or blind the orientation and exchange of research to society.

2.2 Evaluation of research

Evaluations, assessments, and indicators have become widespread in the academic world in the last three decades as a part of a broader trend of accountability in public administration ( Dahler-Larsen 2011 ). Compared to the evaluation of the economic and scientific value of research, evaluating the societal value of research is a recent phenomenon. Economists have modelled and measured the economic value of scientific research from the 1950s onwards, to account for observed growth rates of productivity ( Godin and Doré 2004 ). The scientific value of science has been rendered evaluable by the field of scientometrics, which emerged in the 1960s, and focused on the circulation of knowledge within the scientific community ( Wouters 1999 ). Scientometrics has strongly contributed to the production of a concept of scientific value linked to citation scores of articles, journals, and ultimately, individuals. Evaluations of the societal value of research have been developed in rudimentary forms since the 1980s in North-American and European public R&D funding programmes and have existed since 2000 as a systematic practice in various countries ( Sand and Toulemonde 1993 ; Bozeman and Sarewitz 2011 ; de Jong, Smit, and van Drooge 2016 ; Smith et al. 2020 ). Today, the societal value of research is particularly prominent in science policies of a selection of high-income, European and North-American countries.

In response to the increasing demand from policymakers and society for (evidence of) valuable research, many reviews of the various assessment methods have appeared that aim to provide ‘the basis for the development of robust and reliable methods of societal impact assessment’ ( Bornmann 2013 ). Reliable evidence of societal value matters to the research community to ‘justify expenditure, showcase our work and inform future funding decisions’ ( Penfield et al. 2014 ). In this body of review literature, two contradictory types of problem formulation can be distinguished. Ironically, most review articles problematize either the lack of standardization or the lack of heterogeneity in evaluation methods. Several reviewers observe that there exists no shared accepted framework for evaluation of societal value and that routine capture of data is (therefore) also lacking ( Penfield et al. 2014 ; Miettinen, Tuunainen, and Esko 2015 ). Some explain this by reference to the relative ‘infancy’ of the research field, compared to scientometrics ( Bornmann 2013 ; De Silva and Vance 2017 ). Other reviewers stress instead that existing evaluation practices of societal value lack heterogeneity. Many of them disapprove of the overemphasis on economic impact ( Bornmann 2013 ; Miettinen, Tuunainen, and Esko 2015 ). Some also criticize the dominance of STEM models of ‘good’ research and the focus on short-term proximal value ( Molas-Gallart 2015 ; Greenhalgh et al. 2016 ; Reale et al. 2018 ). These reviewers argue that more sophisticated or additional approaches need to be developed that take into account the heterogeneity of societal value per discipline, especially for research in arts, humanities and social sciences ( Budtz Pedersen, Grønvad, and Hvidtfeldt 2020 ). This is in line with studies that point out that the relations of research with societal actors differ fundamentally between and even within research fields ( Hessels et al. 2011 ).

The authors of most of these reviews provide practical guidance regarding the most appropriate methods or novel approaches to capture societal value by either more diverse or more standardized techniques. For example, Penfield et al. (2014) conclude with a ‘mixed method’ approach to pull all data together; Molas-Gallart (2015) proposes a special methodology to gather evidence for value generated by arts and humanities research; Miettinen, Tuunainen, and Esko (2015) develop a framework for qualitative analysis in distinction to economic, quantitative, and constructivist approaches; De Silva and Vance (2017) suggest that an integration of altmetrics and bibliometrics could provide a ‘near-complete picture of the impact of scientific research’. The authors of one review embrace precisely the lack of standardization and make it their goal to ensure that ‘the most appropriate [method] is selected’ for different situations ( Greenhalgh et al. 2016 ).

Overall, the focus in these reviews is on the policy demands for, and practical requirements of evaluation of the societal value of research. To date, there is little theoretical analysis or conceptual comparison of the concepts of societal value in the evaluation practice ( Muhonen, Benneworth, and Olmos-Peñuela 2020 ; Thomas et al. 2020 ). This is problematic, however, given the active, performative role of evaluation in knowledge production and the steering effects of metrics and indicators on scientific practice ( de Rijcke et al. 2016 ; Wilsdon 2016 ). Evaluation methods embody an implicit or explicit theory of excellence, or ‘good’ research. These theoretical assumptions become most visible when researchers behave strategically in response, but it also influences the activities and priorities of researchers in more subtle ways. For example, the inclusion of a societal impact criterion in research assessment redefines what counts as valuable research ( Oancea 2019 ).

The importance of conceptual presuppositions in evaluation methods can be illustrated by the fierceness and diversity of recent criticism of the impact criterion in the Research Excellence Framework (REF) in the UK ( Smith et al. 2020 ). Scholars have criticized both the ‘implicit optimism’ and the overemphasis on the extraordinary of the societal impact agenda. Derrick et al. plead to study ‘grimpact’, extreme examples of negative impact, to question the dominance of non-controversial, economic, and indisputably good versions of impact ( Derrick et al. 2018 ). In addition, Savigny (2019) showed how impact practices, and public engagement in particular, are infused with raced and gendered norms and expectations. Sivertsen and Meijer argue to look beyond the rare incidences where existing or new interactions have unexpected widespread implications. Instead they ask attention for ‘normal’ impact which follows from everyday active, productive, and responsible relations between academic and other organizations for the conduct of research ( Sivertsen and Meijer 2020 ). It is these kinds of assumptions in evaluation methods about the concepts and mechanisms of the societal value of research that shape the researchers’ responses about, and possibly the practical relations to, societal actors.

To the best of our knowledge, no analytical framework exists for a systematic comparison of the conceptual foundations of evaluation methods dealing with societal value. To fill this gap, we have designed an analytical framework based on a constructivist approach to research and evaluation. The framework consists of four key aspects that enable the comparison of evaluation methods, in particular with respect to the relation between scientific and societal value.

First, we will compare the types and roles of actors that are considered part of the production of knowledge in different methods. According to constructivist science studies nobody, or even no thing, can be ruled out as a relevant actor in knowledge production. Due can be paid to this insight in evaluation by an ontologically open attitude to the question ‘who is doing science, after all?’ ( Latour 1987 ). In the practice of the evaluation of publicly funded research, a special interest of course does exist for the role of the staff of the research institution, group, or project that is being assessed. The issue at stake is how evaluation methods balance their interest in the evaluated actors with connected networks of heterogeneous actors.

Second, we will compare the interaction mechanisms that different methods presume fundamental to the creation of societal value . Building on the literature about knowledge utilization, transfer, and exchange ( Jacobson 2007 ; Best and Holmes 2010 ; Ward, House, and Hamer 2017 ), we distinguish three different understandings of the exchange mechanism. Linear models allocate a central place to research in relative isolation from society. Results are subsequently transferred to external parties, who function only as consumers or users. Cyclical models (also named relational or interaction models) describe the importance of recurrent, reciprocal and sometimes highly structured interactions between researchers and external agents for the agenda-setting, production, and dissemination of research. Co-production models (also named system, integration or dynamic models) point to a breakdown of the hierarchy between producers and users, and instead de- and prescribe participatory processes of research in which academic and non-academic actors are both actively involved.

Third, we will compare the concepts of societal value covered by different assessment methods. As we indicated above, we take the societal value of scientific research to be a social construct. Its meaning depends on its use in policy situations, scientific practices, and societal contexts. In actual evaluation available time and data can restrict the scope of the societal value concept. We will use an existing typology of three uses of the term ‘societal impact’ in evaluation that correspond to distinctive moments in the exchange process ( Bornmann 2013 ; De Jong et al. 2014 ). Impact appears as product in terms of knowledge with potential value (also known as output); as use of this product by stakeholders (also known as outcome); and finally, as the benefits that follow from this use.

Fourth, we will use the three characteristics of actors, mechanisms and concepts to compare the views on the relationships between scientific and societal value of scientific research. Do assessment methods rely on an integrated concept of scientific and societal value or on a separation between these two? Our constructivist position implies that both scientific and societal value are generated in complex processes that can overlap to a large extent. Theoretically, one could argue for one integrated evaluation process of the value of scientific research, including all engaged actors and exchange mechanisms. Practically, most of the evaluation methods that we discuss here have been proposed in addition to a strongly institutionalized evaluation practice of ‘scientific’ value.

We have selected 10 assessment methods for analysis that all aim to capture the societal value of scientific research for evaluation purposes. 1 The selection is based on scientific database queries as well as comparison of the various reviews of evaluation methods (see Section 2.2). In our selection, we have attempted to achieve a reasonable coverage of the diversity in terms of the technical approaches (type of evaluation and data used). Our geographic and disciplinary focus is limited to regions and research fields with advanced science policies and evaluation practices. This makes it possible to critically explore the theoretical consequences of the link between policy, science studies, and scientific practice (see Table 1 ).

Overview of the assessment methods reviewed in this article

MethodEvaluation typeLevel of analysisQualitative dataQuantitative dataOriginal contextKey publication
Payback FrameworkEx post; summativeProgrammeDocuments, interviews, surveysUK medical research
Science and Technology Human CapitalEx post; formativeResearch group or programmeInterviews, surveys, diaries, resumes, contractsCitation and patent patternsUS STEM research
Public Value MappingEx ante and ex post; formativeProgramme or organizationCase studies, documents, surveys, focus groups, expert opinionsIndicatorsUS science policy
MonetisationEx post; summativeProgramme or systemMeasures of investment and (health) gainsUK medical research
Flows of KnowledgeEx post; summativeProgrammeCase studies, documents, interviews, surveys, focus groupsBibliometricsUK research council funding
SIAMPIEx ante and ex post; formativeProject, programme, or organizationCase studiesContextual response analysis and indicators of (im)material interactionsResearch institutes (ICT, health, SSH, nano) for European Commission
Contribution MappingEx post; summative and formativeProject or programmeInterviews with all actorsGlobal health sector
Impact Narratives (REF)Ex post; summativeResearch groupStructured case studies, (user) expert opinionsIndicators for causal impactUK assessment of university research (REF)
ASIRPAEx post; summativeProgramme or organizationStandardized case studiesEconometric, bibliometric and statistical methodsFrench public agricultural research institute
Evaluative InquiryEx post; formativeResearch group or organizationDocuments, interviews, workshopContextual scientometrics, contextual response analysisDutch assessment of university research (SEP)
MethodEvaluation typeLevel of analysisQualitative dataQuantitative dataOriginal contextKey publication
Payback FrameworkEx post; summativeProgrammeDocuments, interviews, surveysUK medical research
Science and Technology Human CapitalEx post; formativeResearch group or programmeInterviews, surveys, diaries, resumes, contractsCitation and patent patternsUS STEM research
Public Value MappingEx ante and ex post; formativeProgramme or organizationCase studies, documents, surveys, focus groups, expert opinionsIndicatorsUS science policy
MonetisationEx post; summativeProgramme or systemMeasures of investment and (health) gainsUK medical research
Flows of KnowledgeEx post; summativeProgrammeCase studies, documents, interviews, surveys, focus groupsBibliometricsUK research council funding
SIAMPIEx ante and ex post; formativeProject, programme, or organizationCase studiesContextual response analysis and indicators of (im)material interactionsResearch institutes (ICT, health, SSH, nano) for European Commission
Contribution MappingEx post; summative and formativeProject or programmeInterviews with all actorsGlobal health sector
Impact Narratives (REF)Ex post; summativeResearch groupStructured case studies, (user) expert opinionsIndicators for causal impactUK assessment of university research (REF)
ASIRPAEx post; summativeProgramme or organizationStandardized case studiesEconometric, bibliometric and statistical methodsFrench public agricultural research institute
Evaluative InquiryEx post; formativeResearch group or organizationDocuments, interviews, workshopContextual scientometrics, contextual response analysisDutch assessment of university research (SEP)

The methods are ordered chronologically based on their key publications.

Below, we introduce the reviewed evaluation methods. For each method, we highlight its context of origin, disciplinary focus and scope of data collection. We also introduce the actors, mechanisms, and concept of societal value each method includes, the three theoretical aspects on which we will compare the different methods in Section 5. We refer to some key publications and supporting literature for each method.

The Payback Framework was developed in the UK healthcare context for the specific aim of describing the wider social benefits that follow from providing evidence for policy. Subsequently, the framework has been used for, and further developed in, the evaluation of national funding programmes, also in the social and biomedical sciences. Various qualitative data sources are mobilized to analyse paybacks, such as surveys, interviews, and policy documents ( Klautzer et al. 2011 ). The framework consists of a cyclical model of seven stages of research, from the inception of a research idea to the final societal outcomes, with several feedback loops to avoid the suggestion of linearity ( Donovan and Hanney 2011 ). The Payback Framework emphasizes policymakers as main recipients of research, and it includes two interfaces for interaction between researchers, policymakers, and potential users of research, namely in the first stage of ‘project specification’ and the intermediate stage of ‘dissemination’. The value of the research process is expressed in terms of ‘paybacks’, or ‘impacts’, which are classified in five dimensions that correspond with different steps in the model. Klautzer et al. (2011) generalized these impacts to make them applicable also beyond the health sector: knowledge (e.g. academic publications), impacts on future research (e.g. training new researchers), impacts on policy (e.g. at national level or within organizations), impacts on practice (e.g. cost savings in health), and broader social and economic impacts (e.g. commercial spin-offs or public debate).

Science and Technology (S&T) Human Capital has been developed to emphasize the ‘socially embedded nature’ of knowledge production and exchange. Based on Bourdieu’s concept of social capital, this approach focuses on the growth of capacities and capabilities of individuals and, by addition, of groups and projects ( Bozeman, Dietz, and Gaughan 2001 ). Human capital is operationalized by taking stock of individual career trajectories and the different types of knowledge acquired in that process (including tacit, craft, and know-how), as well as the productive social networks that sustain the creation of knowledge. The value of research is described in terms of human and social capital increase. This requires quite specific kinds of data: from activity diaries, resumes, and interviews (to trace individual capacities), to contracts, citation, and patent patterns and questionnaires (to map the social network). The main actors are mobile scientists and engineers, who as such embody knowledge exchange between academia and other environments like start-ups, firms, or other universities. Ultimately, this ‘holistic’ approach has to speak to actors’ own perception of the interconnected nature of all aspects of their work: the scientific, commercial, and social value of research depend on ‘the conjoining of equipment, material resources (including funding), organizational and institutional arrangements for work, and the unique S&T Human Capital embodied in individuals’.

Public Value Mapping ( PVM) was created to highlight the non-economic value of STEM research in US federal and state funding programmes and to align science policy with more diverse public values ( Bozeman 2003 ). This approach builds on the previous method as well as on pragmatist theory. The mapping exercise starts with a case study consisting of the identification of relevant public values through document research and opinion polls. These values are subsequently hypothetically linked to research outputs so that these linkages can be empirically tested with indicators of social impacts ( Bozeman and Sarewitz 2011 ). The latter are defined as the extent to which research contributes to broad social goals, a process in which ‘knowledge value collectives’ play an important role. These collectives include many parties, such as funding agents, end users, citizen groups, and commercial parties. The presupposition is that the production of research oriented at public values and its translation into uses takes place in contact with these broader collectives. In the PVM approach, value of knowledge, ultimately, consists in its use within the collective.

Monetization methods offer abstract evaluations of the societal value of research investments, in terms of economic returns. The evaluation method was first developed in the context of the UK health system and is mainly used in the highly institutionalized field of medical research. Societal value is defined as improvements to healthcare—in terms of cost-reduction or an increase in ‘quality adjusted life years’ (QALY) ( Glover et al. 2014 ). For example, one can calculate the societal value of cardiovascular research in Canada by relating public investments in this type of research to estimates of the QALY for all the unique users of the different treatments ( de Oliveria et al. 2013 ). The Monetization method only works with input and output indicators for funding and health gains and contains no explicit understanding of interaction mechanisms that produce societal value, apart from the linear chain from research to treatment to QALY increase. Similarly, no concrete actors are defined explicitly.

The Flows of Knowledge approach was developed for the evaluation of research council programmes that funded, for example psychology and mathematics research in the UK. It is inspired by a ‘linkage-and-exchange’ model that was used for Canadian health services research ( Meagher, Lyall, and Nutley 2008 ). Flows of Knowledge is a multi-method approach including document research, surveys, case studies, semi-structured interviews, occasional bibliometrics and focus groups. It distinguishes between researchers, practitioners, policymakers, and private enterprises and especially highlights institutional and individual intermediaries (from funders and media to consultants and PhDs) in the process of societal value creation. Arrows of interaction generally flow both ways, emphasizing the point that long-term relationships of mutual respect, iterative dialog, and reciprocal benefits are an important proxy for non-academic impact ( Meagher and Lyall 2013 ). Over the years, the concept of impact in the Flows of Knowledge method was diversified to five types (instrumental, conceptual, capacity, cultural, and connectivity) that could be realized by 27 different mechanisms ( Meagher and Martin 2017 ).

SIAMPI (Social Impact Assessment through Productive Interactions) originated in a European funded project for which the production of societal value was studied at research institutes and departments in various European countries and disciplines ( Spaapen and Van Drooge 2011 ). The premise of this approach is that one should not study the impact of research but the processes that function as proxy for impact. Productive interactions are exchanges (direct as well as mediated by material carriers or money) between researchers and stakeholders in which knowledge is produced and stakeholders make an effort to use this. The method prescribes to use field-specific quantitative indicators and qualitative data from case studies. Although this approach takes researchers as the main actors that produce knowledge, it considers a wide range of stakeholders to be a part of this process, including researchers in neighbouring fields, industry, public organizations, the government, and the general public. The SIAMPI approach does not make a clear distinction between productive interactions and societal impact ‘because the transition from interaction to impact is often gradual’ ( Spaapen and Van Drooge 2011 : 212).

Contribution Mapping was first used as a learning tool in the context of global health research and focuses on contributions and processes in order to avoid the overemphasis on impact and knowledge producers in other methods. This approach is inspired by actor-network theory and builds on the Payback Framework . To understand how research leads to action for health, Contribution Mapping focuses on the way users collect and combine knowledge ( Kok and Schuit 2012 ). ‘Process maps’ are iteratively produced, from document analysis and interviews with researchers, potential key users, and other stakeholders, and used for improvement and accountability. Researchers and linked actors (practitioners, policymakers, participants, patient group representatives, opinion leaders) are considered equally involved in collective translation efforts that lead to knowledge utilization. Each of these actors can undertake ‘alignment efforts’ to enhance the likeliness that research contributes to action, for example by engaging linked actors in priority-setting or data interpretation. In Contribution Mapping , societal value is defined in terms of contributions to actor scenarios to stress that the role and meaning of research outcomes also depend on the context of users.

The Impact Narratives method has been developed as part of the Research Evaluation Framework (REF) for UK higher education institutes, which included impact as a criterion of assessment for the first time in 2014. This narrative case study approach is based on an elaboration of the Payback Framework with an additional impact rating scale focused on interactions with end users ( Samuel and Derrick 2015 ). The Impact Narratives method of REF is applicable to all disciplines and based on expert review of the case studies. Research units produce exemplary narratives in which they causally relate high quality research to impact on societal stakeholders within detailed timeframes. This presupposes a rather linear process of exchange from clearly defined producers of knowledge to users outside of academia. Impact is defined as ‘an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia’. A large diversity of potential beneficiaries is identified by the researchers themselves ( Grant and Hinrichs 2015 ). Impact also includes ‘reduction or prevention of harm, risk, cost or other negative effects’. Ultimately, this broad definition is assessed according to two outcome-based criteria: ‘significance’ (meaning intensity of the effect) and ‘reach’ (meaning the spread amongst relevant constituencies).

ASIRPA (‘Socio-Economic Analysis of the Impact of Public Agricultural Research’) was developed as an ex post approach for assessing the socio-economic impact of public-sector research organizations, the French National Agricultural Research Institute in specific ( Joly et al. 2015 ). Inspired by actor-network theories of innovation, this method focuses on the process of impact generation. The analysis consists of identifying ‘the chain of translations’, two-way processes that also transform problems, knowledge, and goals ( Matt et al. 2017 ). The method consists of qualitative case studies, for ‘thick description of specific situations’, which are streamlined with the standardized outline of impact pathways so as to enable comparison of the diversity and amplitude of impact. These case studies include all the actors that happen to be involved in the different phases of an impact pathway according to a prescribed binary of academics, firms, extension agencies, public institutions, media, and farmers. Intermediaries, both people, organizations and artefacts, play an important role in the mechanisms of exchange. For example, technological objects contribute to the dispersion of knowledge from first users to massive utilization. ASIRPA uses as concept of impact ‘direct and indirect effects of the various components of research on the economy, environment, health, etc. … generated by lengthy and complex processes’ ( Matt et al. 2017 ). From 32 case studies, the ASIRPA authors extract four different impact ideal types based on the involvement of, and effects on, users.

Evaluative Inquiry is a recent approach to the evaluation of research groups and institutes that was developed in the Dutch university research assessment context. Building on PVM , SIAMPI , and ASIRPA , it consists of a mixed-methods approach, including contextualized scientometrics, productive interactions, and impact pathways, tailored to specific research units and evaluation purposes. The methods used may represent research ‘numerically, verbally, and/or visually in ways that make visible the complexity of actual practice and its engagements’ ( de Rijcke et al. 2019 ). The authors of this approach identify networks of people, infrastructures, technologies, and resources as collectives and understand academic achievement as distributed over a host of academic and non-academic actors. Impact is an effect of translations within and between networks of actors that make up academic research and its environments. This method aims to do justice to the complexity and heterogeneity of research practices. Indicators are therefore not pre-defined but aligned with the work and mission(s) of the unit under assessment. Together, the actors under evaluation and the Evaluative Inquiry analysts determine the scope of actors and mechanisms, and the meaning of impact concepts. Evaluative Inquiry represents an explicit ambition to move away from a detached, clear delineation of academic value and to overcome the divide between the academic and the societal.

Table 2 summarizes our analysis of the 10 evaluation methods in terms of their main conceptual assumptions. We will elaborate on this in the following sections.

Comparison of the main theoretical assumptions of 10 evaluation methods

MethodActor rolesInteraction mechanismsConcept of societal valueRelationship societal– scientific value
Payback FrameworkPolicymakers and professionals as contractors, agenda-setters and users : 7 stages with interfaces and feedback : Successively as products for, use by or benefits to research, policy, (health) practice and economyDistinctive, successive categories
Science and Technology Human CapitalScientists and engineers as producers and carriers of knowledge : People mobility : Increase in human capitalEmbodied
Public Value MappingInstitutional, social and economic ‘end users’; ‘knowledge value collectives’ as translators of research to new uses : Knowledge value collectives : Tracked backwards from public benefits to societal use and research outcomeIntegrated
MonetisationClinicians as users, patients as beneficiaries : Linear chain : Improvements to healthcareImplicitly connected
Flows of KnowledgePractitioners and policymakers as specific users; organizations and individuals as intermediaries : Dynamic process of iterative dialogue and reciprocal benefits : 5 types of impact (instrumental, conceptual, capacity, cultural and connectivity)Distinctive categories
SIAMPIActors from science, industry, government and non-profits as stakeholders in knowledge use : Productive interactions : (productive interactions)Not clearly distinguishable
Contribution MappingScientific and societal actors (including organizations, objects) engaged in priority-setting, proposal selection; producing, combining and using knowledge : Alignment : Contribution to actor scenariosIntegrated
Impact Narratives (REF)Non-academic actors from society, economy, culture and public policy as (potential) beneficiaries : Linear exchange : Effect, change or benefit beyond academiaCausally related
ASIRPAAcademic, economic, knowledge transfer and governmental actors as part of research production and, with media and farmers, as intermediaries and beneficiaries. Also objects as intermediaries : Translation networks and iterative learning processes : Effects on economy, environment, health, etc.Integrated
Evaluative InquiryNetworks of people, technologies and resources connected to research units enable achievement of academic and societal value : Translations within and between networksNot pre-definedIntegrated
MethodActor rolesInteraction mechanismsConcept of societal valueRelationship societal– scientific value
Payback FrameworkPolicymakers and professionals as contractors, agenda-setters and users : 7 stages with interfaces and feedback : Successively as products for, use by or benefits to research, policy, (health) practice and economyDistinctive, successive categories
Science and Technology Human CapitalScientists and engineers as producers and carriers of knowledge : People mobility : Increase in human capitalEmbodied
Public Value MappingInstitutional, social and economic ‘end users’; ‘knowledge value collectives’ as translators of research to new uses : Knowledge value collectives : Tracked backwards from public benefits to societal use and research outcomeIntegrated
MonetisationClinicians as users, patients as beneficiaries : Linear chain : Improvements to healthcareImplicitly connected
Flows of KnowledgePractitioners and policymakers as specific users; organizations and individuals as intermediaries : Dynamic process of iterative dialogue and reciprocal benefits : 5 types of impact (instrumental, conceptual, capacity, cultural and connectivity)Distinctive categories
SIAMPIActors from science, industry, government and non-profits as stakeholders in knowledge use : Productive interactions : (productive interactions)Not clearly distinguishable
Contribution MappingScientific and societal actors (including organizations, objects) engaged in priority-setting, proposal selection; producing, combining and using knowledge : Alignment : Contribution to actor scenariosIntegrated
Impact Narratives (REF)Non-academic actors from society, economy, culture and public policy as (potential) beneficiaries : Linear exchange : Effect, change or benefit beyond academiaCausally related
ASIRPAAcademic, economic, knowledge transfer and governmental actors as part of research production and, with media and farmers, as intermediaries and beneficiaries. Also objects as intermediaries : Translation networks and iterative learning processes : Effects on economy, environment, health, etc.Integrated
Evaluative InquiryNetworks of people, technologies and resources connected to research units enable achievement of academic and societal value : Translations within and between networksNot pre-definedIntegrated

5.1 Roles of actors in research production, exchange, and evaluation

Unsurprisingly, all methods include researchers as primary actors involved in the process of knowledge production. But evaluation methods construe different objects of evaluation by dealing in diverse ways with other actors. Some methods focus only on the process of knowledge exchange or impact generation. This turns the research process effectively into a black box from which non-scientific actors are excluded. This is the case for example with Impact Narratives , Flows of Knowledge , and S&T Human Capital . The object of evaluation in these cases is impact in terms of the dissemination, use and effects of research results. The roles of non-academic actors tend, accordingly, to be limited to user or beneficiary. In other methods, like Payback , PVM , SIAMPI , Contribution Mapping , ASIRPA , and Evaluative Inquiry , knowledge exchange is situated as one part of a larger process of knowledge production. Such methods construct a more complex object of evaluation as they include the entire process of research in their understanding of impact, allowing more diverse roles for non-academic actors.

Many methods reproduce institutional distinctions in their perspective on relevant actors and their roles—for example particular investigators or research groups funded by a specific project or part of a public institution—and track ‘forward’ from knowledge production to eventual impacts. Relevant actors are in most cases then identified by the researchers themselves. Few methods work the other way around, tracking ‘backwards’ from societal change to knowledge production: Monetisation starts at health benefits and works backwards to public investments, while Contribution Mapping offers it as one of multiple strategies. PVM adopts backtracking as central methodological approach by testing hypotheses about connections between public values and previous research contributions. In this way, PVM analysts construct evaluation objects by identifying relevant political, societal, and research actors as part of one collective connected to certain values and a field of knowledge. Similarly, SIAMPI is based on ‘productive interactions’ to focus on the process of interaction from which researchers emerge not as detached first movers, but as one stakeholder amongst other actors.

Most methods include users as informants, for example in case studies about specific impacts (with the exclusion of Monetisation and S&T Human Capital ). Only some methods also allow non-research actors to design the evaluation process itself, for example by identifying relevant actors and setting assessment criteria. Bozeman (2003) , who designed PVM , considers ‘knowledge users the proper evaluators’ and the authors behind Contribution Mapping state that ‘the roles and functions of those involved in the evaluation are not predetermined’ but a topic of discussion at an early stage ( Kok and Schuit 2012 ). Similarly, in the Evaluative Inquiry approach ‘audiences are seen not only as (co)producers of knowledge and its impact, but also as (co)producers of the criteria by which such impact is to be evaluated’ ( de Rijcke et al. 2019 ).

A final difference that emerges from the comparison of actor roles in evaluation methods is the importance of the intermediary in relation to theories of knowledge production. Three methods explicitly include intermediaries or brokers as key actor in the production and/or exchange of knowledge: PVM , Flows of Knowledge , and ASIRPA . This centrality of intermediary actors in the impact process cannot be explained by a shared theoretical framework, which suggests that various methodological perspectives provide support for the importance of intermediaries, or knowledge brokers, in the impact process. When we look, on the other hand, at four methods that explicitly share a theoretical commitment, in this case to actor-network theory, we find that they try to include materials or non-human actors in their model of knowledge production and exchange: ASIRPA (technological objects); SIAMPI (material ‘carriers’ such as texts, exhibitions, models or films); Contribution Mapping (non-human actors in actor scenarios); and Evaluative Inquiry (research outcomes). It remains unclear to what extent this non-human agency results in more than a semantic twist to the impact narrative.

5.2 Mechanisms of knowledge exchange

In our sample, we have identified instances of all three mechanisms of knowledge exchange. Two methods work with a linear model of knowledge exchange: Impact Narrative and Monetisation . In their approach, knowledge users feature mainly as recipients of knowledge rather than as active co-producers. These methods have in common that they were designed for summative, rather than formative, purposes. Five methods fit with a cyclical model , emphasizing the feedback mechanisms between the production of knowledge and the application of knowledge. The authors of these methods refer to this with terms such as ‘feedback loop’ ( Payback ), ‘non-linear pathway’ ( ASIRPA ), and the ‘churn model’ ( PVM ). Two methods, at last, maintain a co-production model to explain both production and exchange of knowledge, which allocates more agency to users and intermediaries.

Examples of cyclical knowledge exchange models with several feedback loops are the Payback Framework , Flows of Knowledge , ASIRPA , and SIAMPI . The Payback Framework consists of a detailed model including a number of specific feedback paths. The model includes arrows from outputs, adoption, and final outcomes back to topic identification, project specification, inputs to research and secondary outputs. Flows of Knowledge harbours an ‘indirect, non-linear’ understanding of research impact and considers bi-directional knowledge flows from researchers to policymakers and practitioners. The key assumption of ASIRPA is that impact of research develops ‘over a non-linear pathway’ in five main steps (inputs, outputs, intermediaries, impact 1, and impact 2). One-directional arrows between these steps suggest linear causality, but the approach explicitly emphasizes iterative learning processes. Also, the actors in SIAMPI mutually influence each other, so that societal value of research is the result of an iterative process between science, government, industry, and non-profit organizations ( Molas-Gallart and Tang 2011 ). When productive interactions are deployed as method, the attention is typically put however more on the effects of productive interactions in society, than on scientific knowledge ( De Jong et al. 2014 ; Muhonen, Benneworth, and Olmos-Peñuela 2020 ).

Two methods are based on a model of co-production. The mechanisms of knowledge exchange remain relatively unspecified. The Contribution Mapping advocates claim that agency in knowledge utilization is distributed between a number of actors and eventual change cannot be attributed to a single source. The assumption is that changes in action resulting from research are ‘part of evolving, complex, and open systems in which change is continuous, non-linear, multi-directional and difficult to control’ ( Kok and Schuit 2012 ). However, it is not specified how these systems would evolve, for example in terms of feedback loops. Evaluative Inquiry emphasizes the distributed nature of knowledge production and the heterogeneity of the actors involved. The authors stress the active role of stakeholders as (co)producers of knowledge and impact as opposed to a passive role as ‘audiences’ ( de Rijcke et al. 2019 ).

S&T Human Capital deviates from the pack because it does not consider knowledge users at all. As this method does not define societal value in terms of the usage of knowledge, but in terms of capacities, mobility, and careers of researchers, it puts forward a unique perspective on knowledge exchange in which knowledge travels in the people that embody it, rather than by transfer or interaction between different actors.

5.3 Concepts of societal value

Amongst the methods we have analysed, a variety of terms is used to describe the societal value of research: from impact and payback to public value, contributions, and social capital. We will compare these concepts with respect to the threefold characterization of societal value as product, use, and benefit.

S&T Human Capital is the only method that restricts its view on societal value to a product concept. This method gauges scientific research in its potential to contribute to non-academic environments in the sense of capabilities moving between spheres. Societal value as the use or uptake of research by stakeholders is difficult to identify in pure form in our sample of methods. The focus on proxies for societal impact in SIAMPI and Contribution Mapping comes closest to the use concept. Although their concepts of societal value do not imply actual use of knowledge products, they do regard societal value as contributions of research to the (potential for) actions of non-academic stakeholders. Lastly, we identified societal value as ultimate societal benefit or effect in three methods: Impact Narratives , Flows of Knowledge , and Monetisation . Each of these methods ultimately rests on indicators of change beyond academia, in policy, economy, or the environment. Note that there is also variety within this benefit concept of societal value. Flows of Knowledge , for example describes five types of benefits, ranging from the tangible to the intangible.

Three methods mix all three elements in their concept of societal value. In the Payback Framework , societal value successively emerges as product directly from the research process, as use after dissemination and adoption, and as benefit when the final outcome is reached. Interestingly, this is turned around in PVM : starting from a perceived public value (or benefit), the approach tries to establish a plausible link with a research result (or product) via intermediate users of that knowledge. Lastly, ASIRPA includes the three conceptual aspects of societal value in a matrix of four impact ideal types, ordered according to the levels of co-production and affectation of the users ( Joly et al. 2015 ). In Evaluative Inquiry , it is left up to each evaluation context to choose a suitable definition of societal value, allowing in principle the product, use and benefit version of the concept.

5.4 The relation between societal and scientific values

This brings us to the conclusion of this analysis: the relation between the societal and scientific value of research. Some methods are not explicit about the relation between societal and scientific value. The two types of value may be implicitly assumed to be similar, or the precise relation between the two is not elaborated. Monetisation relies, for example on the highly institutionalized field of medicine, where there is a widely shared understanding of societal value (improving healthcare practices).

Other methods do make clear distinctions between scientific and societal value. The Payback Framework contains distinctions between five payback categories, two of which are situated in close proximity of the research process (knowledge and benefits for future research); the other three relate to societal effects (policy, health and economic benefits). The scholars behind the Flows of Knowledge approach implicitly distinguish clearly between the scientific and societal value of research. In the REF assessments, the two are not only clearly separated from each other but also causally related: Impact Narratives have to be based on societal impacts that can be related to research of ‘high scientific quality’. The risk of this is that one creates blind spots for societal value based on ‘mediocre’, or normal, research. This is particularly relevant because it is not at all well established that scientific excellence is a proper predictor of societal value ( Buxton 2011 ). For these three methods, it seems that their origin in a practical request from policy has isolated the production of research from its exchange and use.

Finally, there are several methods that incorporate the view that the networks that produce scientific and societal value coincide, at least partly. The SIAMPI authors do not distinguish in a generic way between scientific and societal value of research. In this approach, the precise relations between the dimensions of scientific value (‘robustness’) and societal value (‘relevance’) depend on the specific field of research ( Spaapen and van Drooge 2011 ). For methods like ASIRPA , Contribution Mapping , PVM , and Evaluative Inquiry , the mechanisms that produce societal value—respectively, chains of translations, alignment efforts, knowledge value collectives, and socio-technical networks—are integral to the research process as such.

5.5 Comparative analysis

When comparing our analyses of the different methods, there seems to be a correlation between the level of aggregation at which an evaluation method approaches the research process, and their concept of societal value in terms of products, use, or benefit. The three methods that have a mixed concept ( Payback Framework , PVM , ASIRPA ) take entire research organizations, fields, or programmes into account. When methods evaluate the researchers and their groups in particular, the societal value concepts also remain closer to the research practice (i.e. product or potential use). With respect to the concept of societal value, two approaches stand out: Impact Narratives and S&T Human Capital . Both hold a product concept of societal value, take researchers as the primary actors, and use a linear model of knowledge exchange. All methods that take a broader set of actors and interactions into account have a use, benefit, or mixed concept of societal value.

Figure 1 illustrates our analysis by visualizing two of the four aspects of our analytical framework: the methods’ knowledge exchange model and their understanding of the relationship between scientific and societal values. When comparing the knowledge exchange models with the perspective on scientific versus societal value, we only observe an association between a co-production model and integration of both values (see Figure 1 ). Apparently, the conviction that scientific and societal values are strongly related is not compatible with a traditional view on researchers as the primary actor of knowledge production.

A classification of the evaluation methods with respect to the relation between scientific and societal values (vertical axis) and the knowledge exchange model (horizontal axis).

A classification of the evaluation methods with respect to the relation between scientific and societal values (vertical axis) and the knowledge exchange model (horizontal axis).

The only method that does not use any qualitative data ( Monetisation ) corresponds with a linear model of knowledge exchange. This suggests that quantitative data can carry a bias towards a linear model, while qualitative data, like interviews, allow (but not prescribe) non-linear views on knowledge exchange. This does not imply, however, that quantitative data are of no use to evaluation methods that hold cyclical or co-production models of research. For one, we see that metrics and indicators can supplement qualitative data and moreover and that alternative ‘contextual’ types of metrics are employed, which do not presuppose a linear model of exchange.

When comparing the purpose of the methods with their conceptual principles, we see that most summative methods hold a linear model of knowledge exchange, and formative methods take either a cyclical or co-production model. But we find no relationship between the purpose of evaluation and the concept of societal value, or the relationship between scientific and societal values, probably because the summative/formative distinction is not equal to a product/process distinction ( Scriven 1996 ).

Over the past few decades, a rich set of tools has been developed to measure, compare and assess the societal value of research. The aim of this review article was to analyse how impact evaluation methods relate to, and operationalize, the distinction between scientific and societal value. Our analysis has shown that different methods construct different objects of evaluation and produce a variety of societal value concepts.

First, because of their theoretical starting points, the methods construct different objects of evaluation. Some focus the attention selectively on knowledge exchange, dissemination, or impact generation as activities separate from the research process, while others treat knowledge production, translation and transformation as one integrated process. Second, the methods also construct different stakeholders of scientific research. Some methods produce a strong contrast between academic and non-academic actors, by considering societal stakeholders exclusively as the users of the final outcomes of scientific research. Other methods use a more inclusive concept of stakeholders and evaluate a long chain of connected actors, including intermediaries. Finally, we found that the different methods enable the production and articulation of fundamentally different types of value, as product, use, or benefit. Moreover, the societal value that they make visible and comparable relates to scientific value in various ways. While Contribution Mapping and Evaluative Inquiry articulate societal and scientific value in an integrated way, many methods specify societal value as separate from scientific value. These assessment methods reinforce a distinction between two systems of valuation, regardless of the fact that many authors ascribe to the theoretical principles and empirical findings of constructivist science studies.

The analytical and practical distinction between scientific and societal value in many methods is not surprising per se. The methods reviewed here were driven by a need to make societal value more explicit alongside the scientific value of research, which is traditionally more visible in the form of quantitative indicators and peer review assessments. Many of the methods have been developed at the request of funding or policy bodies, which suggests that a well-delineated object of evaluation existed beforehand. Only in some methods the starting point is the process of societal value creation rather than the actors of interest (publicly funded researchers or institutes in most cases). The balance that evaluation methods strike between societal value as separate performance indicator and as part of research practice will ultimately depend on the purpose of evaluation. Methods whose aim is comparison (in summative evaluations) employ generic approaches with clearly formulated actors and indicators, and a rigid concept of societal value. But for purposes of situational learning (in formative evaluations), methods take more tailored approaches. Hybrid approaches, which navigate between the extremities of standardization and specificity, offer interesting alternatives to the dichotomy of summative and formative evaluation ( Lau 2016 ). We believe that an integrated concept of scientific and societal value, adaptable to the local situation of the department, group, or field under study, will encourage doing what we value most instead of doing what counts ( Wouters 2017 ). Producing a strong distinction between scientific and societal value may stimulate researchers to concentrate on well-delineated activities that either yield peer recognition, like scientific publications, or that can be measured by indicators for societal value, like blogposts, patents, or policy reports. Moreover, a strict separation might limit the awareness of the heterogeneity of actors and institutions involved in knowledge production. This is especially urgent as intermediary and boundary actors, such as think tanks, consultancy firms and civil collectives, play increasingly important roles in science and innovation systems ( Etzkowitz and Zhou 2017 ).

Our analysis therefore incites a reflection on the position of evaluation itself in the process of knowledge production. Evaluation methods do not simply de scribe but also pre scribe how societal value of research is produced ( de Rijcke et al. 2016 ). The designers of some methods clearly show an awareness of the ways in which evaluation intervenes in research policy and practice. Monetisation , for example aims to advocate for future investments by demonstrating good value, while the PVM approach aims to align science policy with more diverse public values. With respect to research practices, methods like SIAMPI , Flows of Knowledge , and Contribution Mapping consider themselves as ‘tools of enlightenment’ that could support organizational learning about the conditions and obstacles to societal impact. The vantage point of Evaluative Inquiry , lastly, explicitly takes evaluation itself as knowledge production ‘transforming evaluators and analysts into collaborators alongside evaluees’ ( de Rijcke et al. 2019 ). Although this co-productive approach of Evaluative Inquiry is unique, an active role of evaluation in knowledge production applies to all assessment methods discussed here.

For that reason, we hesitate to finish this critical review with a set of recommendations as to which evaluation method is most effective. Ultimately, we would advise policymakers and research managers to use evaluation tools that match both the research practice under evaluation and the theoretical convictions about knowledge production, exchange, and translation in that field. This suggestion implies that a discussion about the fundamentals of knowledge production in a particular field or institute has to take place between all relevant actors as part of the evaluation process. We hope that our analytic overview can help policymakers and research managers in selecting the method that fits best to their situation based on a consideration of policy goals, theoretical convictions, and practical constraints (available data, time, and money).

On a more fundamental note, we recommend to all our fellow science studies scholars to keep questioning the theoretical assumptions of the policymakers or research managers that ask for methods and tools. Our professional responsibility is to develop methods that are grounded in the theoretical developments of the field, also when they are at odds with direct practical needs. Evaluation methods that combine the different aspects of societal value and align with the way the various actors in knowledge production perceive value do most justice to the practice of research and impact. It is our conviction that these evaluation methods will contribute most to learning processes that improve the societal value of scientific research.

Note that some methods focus entirely on the evaluation of societal value, while others include this variable next to other aspects under evaluation.

The authors gratefully acknowledge Caroline Wagner for many stimulating discussions and constructive feedback. They thank Stefan de Jong, Tjitske Holtrop, Leonie van Drooge, Sarah de Rijcke, Rodrigo Costas, and participants of the CWTS Seminar at Leiden University for helpful comments on earlier drafts. Lastly, special thanks to the anonymous reviewers for their excellent suggestions.

Part of this work was supported by the Netherlands Organisation for Scientific Research NWO [Grant number: 322-69-011]

Conflict of interest statement. None declared.

Best A. , Holmes B. ( 2010 ) ‘ Systems Thinking, Knowledge and Action: Towards Better Models and Methods ’, Evidence & Policy: A Journal of Research, Debate and Practice , 6 : 145 – 59 .

Google Scholar

Bornmann L. ( 2013 ) ‘ What is Societal Impact of Research and How Can It Be Assessed? A Literature Survey ’, Journal of the American Society for Information Science and Technology , 64 : 217 – 33 .

Bozeman B. ( 2003 ) ‘Public Value Mapping of Science Outcomes: Theory and Method’, in Bozeman B., Sarewitz, D., Feinson, S., Faladori, G., Gaughan, M., Gupta, A, Sampta, B., and Zachary, G. (eds) Knowledge Flows and Knowledge Collectives: Understanding the Role of Science and Technology Policies in Development , 2 , pp. 3 – 48 . Tempe : Consortium for Science, Policy & Outcomes .

Google Preview

Bozeman B. , Dietz J. S. , Gaughan M. ( 2001 ) ‘ Scientific and Technical Human Capital: An Alternative Model for Research Evaluation ’, International Journal of Technology Management , 22 : 716 – 40 .

Bozeman B. , Sarewitz D. ( 2011 ) ‘ Public Value Mapping and Science Policy Evaluation ’, Minerva , 49 : 1 – 23 .

Budtz Pedersen D. , Grønvad J. F. , Hvidtfeldt R. ( 2020 ) ‘ Methods for Mapping the Impact of Social Sciences and Humanities—A Literature Review ’, Research Evaluation , 29 : 4 – 21 .

Buxton M. ( 2011 ) ‘ The Payback of ‘Payback’: Challenges in Assessing Research Impact ’, Research Evaluation , 20 : 259 – 60 .

Buxton M. , Steve H. ( 1996 ) ‘ How Can Payback from Health Services Research Be Assessed? ’, Journal of Health Services Research & Policy , 1 : 35 – 43 .

Callon M. ( 1997 ) ‘Four Models for the Dynamics of Science’, in Tauber A. I. (ed) Science and the Quest for Reality , pp. 249 – 92 . London : Macmillan Press .

Dahler-Larsen P. ( 2011 ) The Evaluation Society . Stanford : Stanford University Press .

De Jong S. P. L. , Barker, K., Cox, D., Sveinsdottir, T., and Van den Besselaar, Pet al. . ( 2014 ) ‘ Understanding Societal Impact through Productive Interactions: ICT Research as a Case ’, Research Evaluation , 23 : 89 – 102 .

De Jong S. P. L. , Smit J. P. , van Drooge L. ( 2016 ) ‘ Scientists’ Response to Societal Impact Policies: A Policy Paradox ’, Science & Public Policy , 43 : 102 – 14 .

de Oliveira C Nguyen, H. V., Wijeysundera, H. C., Wong, W. W. L., Woo, G., Grootendorst, P., Liu, P. P., and Krahn, M. D. ( 2013 ) ‘ Estimating the Payoffs from Cardiovascular Disease Research in Canada: An Economic Analysis ’, CMAJ Open , 1 / 2 : E83 – 90 .

De Silva P. U. K. , Vance C. K. ( 2017 ) ‘Assessing the Societal Impact of Scientific Research’, in De Silva P. U. K. , Vance C. K. (eds) Scientific Scholarly Communication: The Changing Landscape , pp. 117 – 32 . Cham : Springer International Publishing .

De Rijcke S., Wouters, P. F., Rushforth, A. D., Franssen, T. P., and Hammarfelt, B et al. . ( 2016 ) ‘ Evaluation Practices and Effects of Indicator Use—A Literature Review ’, Research Evaluation , 25 : 161 – 9 .

De Rijcke S. , Holtrop, T., Kaltenbrunner, W., Zuijderwijk, J., Beaulieu, A., Franssen, T., van Leeuwen, T., Mongeon, P., Tatum, C., and Valkenburg, Get al. . ( 2019 ) ‘ Evaluative Inquiry: Engaging Research Evaluation Analytically and Strategically ’, Fteval Journal for Research and Technology Policy Evaluation , 48 : 176 – 82 .

Derrick G. E. et al.  ( 2018 ) ‘Towards Characterising Negative Impact: Introducing Grimpact’, in 23rd International Conference on Science and Technology Indicators (STI 2018), September 12–14, 2018. Leiden, The Netherlands: Centre for Science and Technology Studies (CWTS).

Donovan C. ( 2019 ) ‘ For Ethical ‘Impactology’ ’, Journal of Responsible Innovation , 6 : 78 – 83 .

Donovan C. , Hanney S. ( 2011 ) ‘ The ‘Payback Framework’ Explained ’, Research Evaluation , 20 : 181 – 3 .

Douglas H. ( 2014 ) ‘ Pure Science and the Problem of Progress ’, Studies in History and Philosophy of Science Part A , 46 : 55 – 63 .

Edgerton D. ( 2004 ) ‘‘The Linear Model’ Did Not Exist: Reflections on the History and Historiography of Science and Research in Industry in the Twentieth Century’, in Grandin K. , Wormbs N. , Widmalm S. (eds) The Science-Industry Nexus. History, Policy, Implications , pp. 1 – 36 . Sagamore Beach, MA : Science History Publications .

Etzkowitz H. , Zhou C. ( 2017 ) The Triple Helix: University-Industry- Government Innovation and Entrepreneurship . London : Routledge .

Glover M Buxton, M., Guthrie, S., Hanney, S., Pollitt, S., and Grant, J. ( 2014 ) ‘ Estimating the Returns to UK Publicly Funded Cancer-Related Research in Terms of the Net Value of Improved Health Outcomes ’, BMC Medicine , 12 / 1 : 10.1186/1741-7015-12-99

Godin B. ( 1998 ) ‘ Writing Performative History: The New New Atlantis? ’, Social Studies of Science , 28 : 465 – 83 .

Godin B. ( 2006 ) ‘ The Linear Model of Innovation: The Historical Construction of an Analytical Framework ’, Science, Technology, & Human Values , 31 : 639 – 67 .

Godin B. , Doré C. ( 2004 ) Measuring the Impacts of Science: Beyond the Economic Dimension . Montreal : Canadian Science and Innovation Indicators Consortium .

Grant J. , Hinrichs S. ( 2015 ) The Nature, Scale and Beneficiaries of Research Impact: An Initial Analysis of Research Excellence Framework (REF) 2014 Impact Case Studies. HEFCE-Higher Education Funding Council for England.

Greenhalgh T. , Raftery, J., Hanney, S., and Glover, M. ( 2016 ) ‘ Research Impact: A Narrative Review ’, BMC Medicine , 14 : 78 .

Health Economics Research Group (HERG), Office of Health Economics, RAND Europe ( 2008 ) Medical Research: What’s It Worth? Estimating the Economic Benefits from Medical Research in the UK . London: UK Evaluation Forum.

Hessels L. K. , van Lente H. ( 2008 ) ‘ Re-Thinking New Knowledge Production: A Literature Review and a Research Agenda ’, Research Policy , 37 : 740 – 60 .

Hessels L. K. , van Lente, H., Grin, J., and Smits, R. Eet al. . ( 2011 ) ‘ Changing Struggles for Relevance in Eight Fields of Natural Science ’, Industry and Higher Education , 25 : 347 – 57 .

Jacobson N. ( 2007 ) ‘ Social Epistemology: Theory for the ‘Fourth Wave’ of Knowledge Transfer and Exchange Research ’, Science Communication , 29 : 116 – 27 .

Joly P. , Gaunand, A., Colinet, L., Larédo, P., Lemarié, S., and Matt, Met al. . ( 2015 ) ‘ ASIRPA: A Comprehensive Theory-Based Approach to Assessing the Societal Impacts of a Research Organization ’, Research Evaluation , 24 : 440 – 53 .

Kaldewey D. , Schauz D. ( 2018 ) Basic and Applied Research the Language of Science Policy in the Twentieth Century . New York : Berghahn Books .

Klautzer L. , Hanney, S., Nason, E., Rubin, J., Grant, J., and Wooding, Set al. . ( 2011 ) ‘ Assessing Policy and Practice Impacts of Social Science Research: The Application of the Payback Framework to Assess the Future of Work Programme ’, Research Evaluation , 20 : 201 – 9 .

Kok M. O. , Schuit A. J. ( 2012 ) ‘ Contribution Mapping: A Method for Mapping the Contribution of Research to Enhance Its Impact ’, Health Research Policy and Systems , 10 : 21 .

Lamont M. ( 2012 ) ‘ Toward a Comparative Sociology of Valuation and Evaluation ’, Annual Review of Sociology , 38 : 201 – 21 .

Latour B. ( 1987 ) Science in Action: How to Follow Engineers and Scientists through Society . Cambridge : Harvard University Press .

Lau A. M. S. ( 2016 ) ‘ Formative Good, Summative Bad?’—A Review of the Dichotomy in Assessment Literature ’, Journal of Further and Higher Education , 40 : 509 – 25 .

Lynch M. ( 2017 ) ‘ STS, Symmetry and Post-Truth ’, Social Studies of Science , 47 : 593 – 9 .

Matt M. , Gaunand, A., Joly, P.-B., and Colinet, Let al. . ( 2017 ) ‘ Opening the Black Box of Impact—Ideal-Type Impact Pathways in a Public Agricultural Research Organization ’, Research Policy , 46 : 207 – 18 .

Meagher L. , Lyall C. ( 2013 ) ‘ The Invisible Made Visible: Using Impact Evaluations to Illuminate and Inform the Role of Knowledge Intermediaries ’, Evidence & Policy: A Journal of Research, Debate and Practice , 9 : 409 – 18 .

Meagher L. , Lyall C. , Nutley S. ( 2008 ) ‘ Flows of Knowledge, Expertise and Influence: A Method for Assessing Policy and Practice Impacts from Social Science Research ’, Research Evaluation , 17 : 163 – 73 .

Meagher L. R. , Martin U. ( 2017 ) ‘ Slightly Dirty Maths: The Richly Textured Mechanisms of Impact ’, Research Evaluation , 26 : 15 – 27 .

Miettinen R. , Tuunainen J. , Esko T. ( 2015 ) ‘ Epistemological, Artefactual and Interactional–Institutional Foundations of Social Impact of Academic Research ’, Minerva , 53 : 257 – 77 .

Molas-Gallart J. ( 2015 ) ‘ Research Evaluation and the Assessment of Public Value ’, Arts and Humanities in Higher Education , 14 : 111 – 26 .

Molas-Gallart J. , Tang P. ( 2011 ) ‘ Tracing ‘Productive Interactions’ to Identify Social Impacts: An Example from the Social Sciences ’, Research Evaluation , 20 : 219 – 26 .

Muhonen R. , Benneworth P. , Olmos-Peñuela J. ( 2020 ) ‘ From Productive Interactions to Impact Pathways: Understanding the Key Dimensions in Developing SSH Research Societal Impact ’, Research Evaluation , 29 : 34 – 47 .

Oancea A. ( 2019 ) ‘ Research Governance and the Future (s) of Research Assessment ’, Palgrave Communications , 5 : 27 .

Penfield T. et al.  ( 2014 ) ‘ Assessment, Evaluations, and Definitions of Research Impact: A Review ’, Research Evaluation , 23 : 21 – 32 .

Power M. ( 2000 ) ‘ The Audit Society—Second Thoughts ’, International Journal of Auditing , 4 : 111 – 9 .

Proctor R. ( 1991 ) Value-Free Science?: Purity and Power in Modern Knowledge . Cambridge : Harvard University Press .

Reale E., Avramov, D., Canhial, K., Donovan, C., Flecha, R., Holm, P., Larkin, C., Lepori, B., Mosoni-Fried, J., Oliver, E., Primeri, E., Puigvert, L., Scharnhorst, A., Schubert, A., Soler, M., Soòs, S., Sordé, T., Travis, C., and Van Horik, R. ( 2018 ) ‘ A Review of Literature on Evaluating the Scientific, Social and Political Impact of Social Sciences and Humanities Research ’, Research Evaluation , 27 : 298 – 308 .

REF ( 2012 ) Panel Criteria and Working Methods. UK: REF2014.

Samuel G. N. , Derrick G. E. ( 2015 ) ‘ Societal Impact Evaluation: Exploring Evaluator Perceptions of the Characterization of Impact under the REF2014 ’, Research Evaluation , 24 : 229 – 41 .

Sand F. , Toulemonde J. ( 1993 ) ‘Changing Criteria in the Evaluation of European R&D Programmes’, in: Süss W. , Becher G. (eds) Politik Und Technologieentwicklung in Europa. Analysen Ökonomisch-Technischer Und Politischer Vermittlungen Im Prozess Der Europäischen Integration , pp. 237 – 60 . Berlin : Duncker & Humblot .

Savigny H. ( 2020 ) ‘ The Violence of Impact: Unpacking Relations between Gender, Media and Politics ’, Political Studies Review , 18 : 277 – 293 .

Scriven M. ( 1996 ) ‘ Types of Evaluation and Types of Evaluator ’, Evaluation Practice , 17 : 151 – 61 .

Shinn T. ( 2002 ) ‘ The Triple Helix and New Production of Knowledge: Prepackaged Thinking on Science and Technology ’, Social Studies of Science , 32 : 599 – 614 .

Sismondo S. ( 2017 ) ‘ Post-Truth? ’, Social Studies of Science , 47 : 3 – 6 .

Sivertsen G. , Meijer I. ( 2020 ) ‘ Normal versus Extraordinary Societal Impact: How to Understand, Evaluate, and Improve Research Activities in Their Relations to Society? ’, Research Evaluation , 29 : 66 – 70 .

Smith K. , Bandola-Gill, J., Meer, N., Stewart, E., and Watermeyer, Ret al. . ( 2020 ) The Impact Agenda: Controversies, Consequences and Challenges . Bristol : Policy Press .

Spaapen J. , van Drooge L. ( 2011 ) ‘ Introducing ‘Productive Interactions’ in Social Impact Assessment’ , Research Evaluation , 20 : 211 – 8 .

Stengers I. ( 1997 ) Power and Invention: Situating Science . Minneapolis : University of Minnesota Press .

Thomas D. A. , Nedeva, M., Tirado, M. M., and Jacob, Met al.  ( 2020 ) ‘ Changing Research on Research Evaluation: A Critical Literature Review to Revisit the Agenda ’, Research Evaluation , 29 : 275 – 88 .

Ward V. , House A. , Hamer S. ( 2017 ) ‘ Developing a Framework for Transferring Knowledge into Action: A Thematic Analysis of the Literature ’, Journal of Health Services Research & Policy, 14: 156–164.

Williams K. ( 2020 ) ‘ Playing the Fields: Theorizing Research Impact and Its Assessment ’, Research Evaluation , 29 : 191 – 202 .

Wilsdon J. ( 2016 ) The Metric Tide: Independent Review of the Role of Metrics in Research Assessment and Management . London : Sage Publications Ltd .

Wouters P. F. ( 1999 ) The Citation Culture . Amsterdam : Universiteit van Amsterdam .

Wouters P. F. ( 2017 ) ‘ Bridging the Evaluation Gap ’, Engaging Science, Technology, and Society , 3 : 108 – 18 .

Month: Total Views:
April 2021 146
May 2021 62
June 2021 229
July 2021 268
August 2021 179
September 2021 442
October 2021 427
November 2021 379
December 2021 274
January 2022 349
February 2022 364
March 2022 470
April 2022 297
May 2022 309
June 2022 330
July 2022 211
August 2022 191
September 2022 309
October 2022 316
November 2022 300
December 2022 210
January 2023 327
February 2023 378
March 2023 452
April 2023 369
May 2023 455
June 2023 321
July 2023 270
August 2023 305
September 2023 314
October 2023 330
November 2023 369
December 2023 311
January 2024 268
February 2024 288
March 2024 349
April 2024 325
May 2024 409
June 2024 270
July 2024 220

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1471-5449
  • Print ISSN 0958-2029
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Site logo

  • Differences Between Evaluation and Social Science Research
  • Learning Center

Difference between social research and evaluation

▶️  Introduction

Evaluation and social science research are foundational tools for understanding human behavior, institutions, and societal trends. At their core, both methodologies seek to gather and scrutinize information systematically.

However, their objectives and the nuances of their methodologies often diverge. While evaluation primarily focuses on assessing the effectiveness and impact of a specific program, intervention, or initiative, social science research seeks to unravel deeper insights into human behavior, society, and culture.

This article delves into the intricacies of these two distinct yet intertwined fields, shedding light on their unique purposes and approaches.

▶️ How does evaluation differ from social science research?

The phrase “ evaluation ” refers to the process of determining the merit, worth, or value of objects. During the evaluation process , relevant values or standards that apply to what is being assessed are identified, empirical inquiry utilizing methodologies from the social sciences is performed, and finally, findings drawn from the study are integrated with the standards to provide an overall evaluation or collection of evaluations (Scriven, 1991).

In contrast, research in the social sciences does not attempt to reach evaluative conclusions nor does it succeed in doing so. It is limited to empirical study (rather than evaluative research), and it draws its findings only from factual outcomes, which are defined as information that has been observed, measured, or calculated. Research in the social sciences does not first determine criteria or principles, and then proceed to combine those with the findings of empirical investigations in order to arrive at judgments. In point of fact, the dominant theory in social science for many decades took great pride in the fact that it was value-free. For the time being, research in the social sciences does not include evaluation.¹

However, out of respect for the field of social scientific research, it is imperative that it be emphasized once more that very little evaluation can be done outside of the framework of social science methodologies. One cannot, however, assert that evaluation is the same thing as the application of social science methodologies to the resolution of social issues. It goes well beyond that in scope. A broadly accepted way of thinking about how evaluation and research are different comes from  Michael Scriven , an evaluation expert and professor.

▶️ Primary differences between evaluation and social science research

  • Evaluation : Primarily conducted to determine the effectiveness or worth of a program, policy, or intervention. Evaluations answer questions like “Is this program achieving its intended outcomes?” or “What aspects of the program are most effective?”
  • Social Science Research : Seeks to understand, explain, or predict human behavior and societal phenomena. Social science research answers questions like “Why do people behave the way they do?” or “What are the underlying causes of a societal trend?”

Utilization :

  • Evaluation : Often used for decision-making about a particular program or policy. The findings might inform adjustments, continuation, or discontinuation of the program.
  • Social Science Research : Often aims to contribute to the broader body of knowledge in a particular discipline. The findings might inform theory, subsequent research, or broader societal understandings.

Stakeholders :

  • Evaluation : Evaluations often have specific stakeholders such as program managers, funders, or policymakers who commission the evaluation and are invested in the results.
  • Social Science Research : While there might be specific funding sources or interested parties, the primary audience is often the academic community or the broader public.

Scope and Generalizability

  • Evaluation : Typically focuses on a specific program or intervention. Its findings may or may not be generalizable to other settings or populations.
  • Social Science Research : Often seeks to produce findings that are generalizable to broader populations or settings, beyond the immediate sample or context studied.

Methodological Approach

  • Evaluation : Might be more flexible in methodological approach, blending qualitative and quantitative methods to best address the evaluation questions. Stakeholder input can shape the methods used.
  • Social Science Research : While diverse in methods, there is often a more rigid adherence to methodological standards dictated by the research question and the discipline.

Time Horizon

  • Evaluation : This can be short-term (e.g., assessing immediate outcomes) or long-term (e.g., assessing impact after several years), but is generally bound by the life cycle of a particular program or intervention.
  • Social Science Research : Can explore phenomena from historical, contemporary, or future-oriented perspectives without being tied to a specific program’s timeline.

Reporting :

  • Evaluation : Reports are often tailored to stakeholders and may emphasize actionable findings and recommendations.
  • Social Science Research : Reporting typically follows the conventions of academic publishing, prioritizing rigorous documentation of methodology and contribution to theoretical discourse.

A useful visualization of this concept, created by  John LaVelle , is below.

evaluative research helps those in the social sciences to

▶️ Conclusion

While both evaluation and social science research employ systematic methodologies to gather and analyze data, they diverge in purpose, utilization, stakeholders, scope, methodological approach, time horizon, and reporting conventions.

Evaluations typically focus on assessing the effectiveness or value of specific programs or policies and cater to particular stakeholders seeking actionable insights.

In contrast, social science research endeavors to expand the body of knowledge about human behaviors and societal phenomena, aiming for generalizability and contributing to academic discourse.

Recognizing the distinctions between the two is crucial for understanding their individual contributions to knowledge and decision-making, even as their methodologies sometimes intersect.

Despite these differences, there is an overlap between evaluation and social science research. Evaluative methodologies can be applied in social science research, and rigorous research methods can be used in evaluations. Both can provide valuable insights, but their approaches and primary objectives differ.

Leave a Comment Cancel Reply

Your email address will not be published.

How strong is my Resume?

Only 2% of resumes land interviews.

Land a better, higher-paying career

evaluative research helps those in the social sciences to

Jobs for You

Program associate.

  • United States

Project Manager I – GH-TAMS

Chief of party – bosnia and herzegovina.

  • Bosnia and Herzegovina

Director of Impact, Evaluation and Learning

  • Washington, DC, USA
  • The International Fund for Animal Welfare

Democracy, Rights, and Governance Specialist

Senior advisor democracy, rights, governance, and conflict (drgc) – usaid/niger, senior policy advisor, monitoring, evaluation and learning (mel).

  • International Institute for Sustainable Development

Power Transmission Public Private Partnership (PPP) Specialist – Power Africa

  • Côte d'Ivoire

Power Generation / Energy Public Private Partnership (PPP) Specialist

Senior consultant, gh-tams assignment financial closeout, manager ii, institutional support program implementation, chief of party- data-driven collaborating, learning, and adapting activity in georgia, monitoring, evaluation and learning (mel) analyst.

  • Helsinki, Finland
  • United Nations University World Institute for Development Economics Research (UNU-WIDER)

Maternal, Newborn, Child Health, and Nutrition Team Senior Technical Advisor

  • United States (Remote)

Junior Program Analyst/Admin Assistant – USAID LAC/FO

Services you might be interested in, useful guides ....

How to Create a Strong Resume

Monitoring And Evaluation Specialist Resume

Resume Length for the International Development Sector

Types of Evaluation

Monitoring, Evaluation, Accountability, and Learning (MEAL)

LAND A JOB REFERRAL IN 2 WEEKS (NO ONLINE APPS!)

Sign Up & To Get My Free Referral Toolkit Now:

U.S. flag

An official website of the United States government, Department of Justice.

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

NCJRS Virtual Library

Evaluating social science research, additional details.

198 Madison Avenue , New York , NY 10016 , United States

No download available

Availability, related topics.

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Key facts about Americans and guns

A customer shops for a handgun at a gun store in Florida. (Joe Raedle/Getty Images)

Guns are deeply ingrained in American society and the nation’s political debates.

The Second Amendment to the United States Constitution guarantees the right to bear arms, and about a third of U.S. adults say they personally own a gun. At the same time, in response to concerns such as  rising gun death rates  and  mass shootings , the U.S. surgeon general has taken the unprecedented step of declaring gun violence a public health crisis .

Here are some key findings about Americans’ views of gun ownership, gun policy and other subjects, drawn from Pew Research Center surveys. 

Pew Research Center conducted this analysis to summarize key facts about Americans’ relationships with guns. We used data from recent Center surveys to provide insights into Americans’ views on gun policy and how those views have changed over time, as well as to examine the proportion of adults who own guns and their reasons for doing so.

The Center survey questions used in this analysis, and more information about the surveys’ methodologies, and can be found at the links in the text.

Measuring gun ownership in the United States comes with unique challenges. Unlike many demographic measures, there is not a definitive data source from the government or elsewhere on how many American adults own guns.

The Pew Research Center survey conducted June 5-11, 2023, on the Center’s American Trends Panel, used two separate questions to measure personal and household ownership. About a third of adults (32%) say they own a gun, while another 10% say they do not personally own a gun but someone else in their household does. These shares have changed little from surveys conducted in  2021  and  2017 . In each of those surveys, 30% reported they owned a gun.

These numbers are largely consistent with  rates of gun ownership reported by Gallup and those reported by  NORC’s General Social Survey .  

The FBI maintains data on background checks on individuals attempting to purchase firearms in the United States. The FBI reported  a surge in background checks  in 2020 and 2021, during the coronavirus pandemic, but FBI statistics show that the number of federal background checks declined in 2022 and 2023. This pattern seems to be continuing so far in 2024. As of June, fewer background checks have been conducted than at the same point in 2023, according to FBI statistics.

About   four-in-ten U.S. adults say they live in a household with a gun, including 32% who say they personally own one,  according to  a Center survey conducted in June 2023 . These numbers are virtually unchanged since the last time we asked this question in 2021.

A bar chart showing that nearly a third of U.S. adults say they personally own a gun.

There are differences in gun ownership rates by political affiliation, gender, community type and other factors.

  • Party: 45% of Republicans and GOP-leaning independents say they personally own a gun, compared with 20% of Democrats and Democratic leaners.
  • Gender: 40% of men say they own a gun, versus 25% of women.
  • Community type: 47% of adults living in rural areas report owning a firearm, as do smaller shares of those who live in suburbs (30%) or urban areas (20%).
  • Race and ethnicity: 38% of White Americans own a gun, compared with smaller shares of Black (24%), Hispanic (20%) and Asian (10%) Americans.

Personal protection tops the list of reasons gun owners give for having a firearm.  About seven-in-ten gun owners (72%) say protection is a major reason they own a gun. Considerably smaller shares say that a major reason they own a gun is for hunting (32%), for sport shooting (30%), as part of a gun collection (15%) or for their job (7%). 

Americans’ reasons behind gun ownership have changed only modestly since we fielded a separate survey  about these topics in spring 2017. At that time, 67% of gun owners cited protection as a major reason they had a firearm.

A horizontal stacked bar chart showing that nearly three-quarters of U.S. gun owners cite protection as a major reason they own a gun.

Gun owners tend to have much more positive feelings about having a gun in the house than nonowners who live with them do.  For instance, 71% of gun owners say they enjoy owning a gun – but just 31% of nonowners living in a household with a gun say they enjoy having one in the home. And while 81% of gun owners say owning a gun makes them feel safer, a narrower majority of nonowners in gun households (57%) say the same. Nonowners are also more likely than owners to worry about having a gun at home (27% vs. 12%).

Feelings about gun ownership also differ by political affiliation, even among those who personally own a firearm. Republican gun owners are more likely than Democratic owners to say owning one gives them feelings of safety and enjoyment, while Democratic owners are more likely to say they worry about having a gun in the home.

Non-gun owners are split on whether they see themselves owning a firearm in the future.  About half of Americans who don’t own a gun (52%) say they could never see themselves owning one, while nearly as many (47%) could imagine themselves as gun owners in the future.

Among those who currently do not own a gun, attitudes about owning one in the future differ by party and other factors.

A diverging bar chart showing that non-gun owners are divided on whether they could see themselves owning a gun in the future.

  • Party: 61% of Republicans who don’t own a gun say they could see themselves owning one in the future, compared with 40% of Democrats.
  • Gender: 56% of men who don’t own a gun say they could see themselves owning one someday; 40% of women nonowners say the same.
  • Race and ethnicity: 56% of Black nonowners say they could see themselves owning a gun one day, compared with smaller shares of White (48%), Hispanic (40%) and Asian (38%) nonowners.

A majority of Americans (61%) say it is too easy to legally obtain a gun in this country, according to the June 2023 survey. Far fewer (9%) say it is too hard, while another 30% say it’s about right.

A horizontal bar chart showing that about 6 in 10 Americans say it is too easy to legally obtain a gun in this country.

Non-gun owners are nearly twice as likely as gun owners to say it is too easy to legally obtain a gun (73% vs. 38%). Gun owners, in turn, are more than twice as likely as nonowners to say the ease of obtaining a gun is about right (48% vs. 20%).

There are differences by party and community type on this question, too. While 86% of Democrats say it is too easy to obtain a gun legally, far fewer Republicans (34%) say the same. Most urban (72%) and suburban (63%) residents say it’s too easy to legally obtain a gun, but rural residents are more divided: 47% say it is too easy, 41% say it is about right and 11% say it is too hard.

About six-in-ten U.S. adults (58%) favor stricter gun laws. Another 26% say that U.S. gun laws are about right, while 15% favor less strict gun laws.

A horizontal stacked bar chart showing that women are more likely than men to favor stricter gun laws in the U.S.

There   is broad partisan agreement on some gun policy proposals, but most are politically divisive. Majorities of U.S. adults in both partisan coalitions somewhat or strongly favor two policies that would restrict gun access: preventing those with mental illnesses from purchasing guns (88% of Republicans and 89% of Democrats support this) and increasing the minimum age for buying guns to 21 years old (69% of Republicans, 90% of Democrats). Majorities in both parties also  oppose  allowing people to carry concealed firearms without a permit (60% of Republicans and 91% of Democrats oppose this).

A dot plot showing that bipartisan support for preventing people with mental illnesses from purchasing guns, but wide differences on other policies.

Republicans and Democrats differ on several other proposals. While 85% of Democrats favor banning both assault-style weapons and high-capacity ammunition magazines that hold more than 10 rounds, majorities of Republicans oppose  these proposals (57% and 54%, respectively).

Most Republicans, on the other hand, support allowing teachers and school officials to carry guns in K-12 schools (74%) and allowing people to carry concealed guns in more places (71%). These proposals are supported by just 27% and 19% of Democrats, respectively.

A diverging bar chart showing that Americans are split on whether it is more important.

The public remains closely divided over whether it’s more important to protect gun rights or control gun ownership, according to an April 2024 survey . Overall, 51% of U.S. adults say it’s more important to protect the right of Americans to own guns, while a similar share (48%) say controlling gun ownership is more important.

Views have shifted slightly since 2022, when we last asked this question. That year, 47% of adults prioritized protecting Americans’ rights to own guns, while 52% said controlling gun ownership was more important.

Views on this topic differ sharply by party. In the most recent survey, 83% of Republicans say protecting gun rights is more important, while 79% of Democrats prioritize controlling gun ownership.

Line charts showing that the public remains closely divided over controlling gun ownership versus protecting gun rights, with Republicans and Democrats holding opposing views.

Americans are slightly more likely to say gun ownership does more to increase safety than to decrease it.  Around half of Americans (52%) say gun ownership does more to increase safety by allowing law-abiding citizens to protect themselves, while a slightly smaller share (47%) say gun ownership does more to reduce safety by giving too many people access to firearms and increasing misuse. Views were evenly divided (49% vs. 49%) when we last asked in 2023.

A diverging bar chart showing that men, White adults, Republicans among the most likely to say gun ownership does more to increase safety than to reduce it.

Republicans and Democrats differ widely on this question: 81% of Republicans say gun ownership does more to increase safety, while 74% of Democrats say it does more to reduce safety.

Rural and urban Americans also have starkly different views. Among adults who live in rural areas, 64% say gun ownership increases safety, while among those in urban areas, 57% say it  reduces  safety. Those living in the suburbs are about evenly split in their views.

More than half of U.S. adults say an increase in the number of guns in the country is bad for society, according to the April 2024 survey. Some 54% say, generally, this is very or somewhat bad for society. Another 21% say it is very or somewhat good for society, and a quarter say it is neither good nor bad for society.

A horizontal stacked bar chart showing that a majority of U.S. adults view an increase in the number of guns as bad for society.

About half of Americans (49%) see gun violence as a major problem,  according to a May 2024 survey. This is down from 60% in June 2023, but roughly on par with views in previous years. In the more recent survey, 27% say gun violence is a moderately big problem, and about a quarter say it is either a small problem (19%) or not a problem at all (4%).

A line chart showing that the share of Americans who view gun violence as a major problem has declined since last year.

A majority of public K-12 teachers (59%) say they are at least somewhat worried about the possibility of a shooting ever happening at their school, including 18% who are very or extremely worried, according to a fall 2023 Center survey of teachers . A smaller share of teachers (39%) say they are not too or not at all worried about a shooting occurring at their school.

A pie chart showing that a majority of teachers are at least somewhat worried about a shooting occurring at their school.

School shootings are a concern for K-12 parents as well: 32% say they are very or extremely worried about a shooting ever happening at their children’s school, while 37% are somewhat worried, according to  a fall 2022 Center survey of parents with at least one child younger than 18 who is not homeschooled. Another 31% of K-12 parents say they are not too or not at all worried about this.

Note: This is an update of a post originally published on Jan. 5, 2016 .

  • Partisanship & Issues
  • Political Issues

Download Katherine Schaeffer's photo

Katherine Schaeffer is a research analyst at Pew Research Center .

Americans’ Extreme Weather Policy Views and Personal Experiences

U.s. adults under 30 have different foreign policy priorities than older adults, many adults in east and southeast asia support free speech, are open to societal change, nato seen favorably in member states; confidence in zelenskyy down in europe, u.s., same-sex marriage around the world, most popular.

901 E St. NW, Suite 300 Washington, DC 20004 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

© 2024 Pew Research Center

Advertisement

Chaos and Confusion: Tech Outage Causes Disruptions Worldwide

Airlines, hospitals and people’s computers were affected after CrowdStrike, a cybersecurity company, sent out a flawed software update.

  • Share full article

A view from above of a crowded airport with long lines of people.

By Adam Satariano Paul Mozur Kate Conger and Sheera Frenkel

  • July 19, 2024

Airlines grounded flights. Operators of 911 lines could not respond to emergencies. Hospitals canceled surgeries. Retailers closed for the day. And the actions all traced back to a batch of bad computer code.

A flawed software update sent out by a little-known cybersecurity company caused chaos and disruption around the world on Friday. The company, CrowdStrike , based in Austin, Texas, makes software used by multinational corporations, government agencies and scores of other organizations to protect against hackers and online intruders.

But when CrowdStrike sent its update on Thursday to its customers that run Microsoft Windows software, computers began to crash.

The fallout, which was immediate and inescapable, highlighted the brittleness of global technology infrastructure. The world has become reliant on Microsoft and a handful of cybersecurity firms like CrowdStrike. So when a single flawed piece of software is released over the internet, it can almost instantly damage countless companies and organizations that depend on the technology as part of everyday business.

“This is a very, very uncomfortable illustration of the fragility of the world’s core internet infrastructure,” said Ciaran Martin, the former chief executive of Britain’s National Cyber Security Center and a professor at the Blavatnik School of Government at Oxford University.

A cyberattack did not cause the widespread outage, but the effects on Friday showed how devastating the damage can be when a main artery of the global technology system is disrupted. It raised broader questions about CrowdStrike’s testing processes and what repercussions such software firms should face when flaws in their code cause major disruptions.

evaluative research helps those in the social sciences to

How a Software Update Crashed Computers Around the World

Here’s a visual explanation for how a faulty software update crippled machines.

How the airline cancellations rippled around the world (and across time zones)

Share of canceled flights at 25 airports on Friday

evaluative research helps those in the social sciences to

50% of flights

Ai r po r t

Bengalu r u K empeg o wda

Dhaka Shahjalal

Minneapolis-Saint P aul

Stuttga r t

Melbou r ne

Be r lin B r anden b urg

London City

Amsterdam Schiphol

Chicago O'Hare

Raleigh−Durham

B r adl e y

Cha r lotte

Reagan National

Philadelphia

1:20 a.m. ET

evaluative research helps those in the social sciences to

CrowdStrike’s stock price so far this year

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

IMAGES

  1. Evaluative Research: Definition, Methods & Types

    evaluative research helps those in the social sciences to

  2. Welcome

    evaluative research helps those in the social sciences to

  3. Social Science Research Strategy Development Process

    evaluative research helps those in the social sciences to

  4. Social Science Research

    evaluative research helps those in the social sciences to

  5. How Social Sciences are Converging with STEM

    evaluative research helps those in the social sciences to

  6. (PDF) Facilitating Research With the Social Science Research Network

    evaluative research helps those in the social sciences to

VIDEO

  1. Measuring Social Value in Economic Appraisal Research Insights

  2. Answering Product Issues with Evaluative UX Research #shorts

  3. Upyoga Self Care Valley for Her Wand Set For Skin & Hair Care

  4. What is Evaluative Research : meaning, features, मूल्यांकनात्मक अनुसंधान I ugc-net social work

  5. Types of Evaluation Research

  6. Evaluative User Research is Critical to Build Better Products #shorts

COMMENTS

  1. Evaluation of research in social sciences and humanities

    The edited volume explores theory and practice of research evaluation in the social science and humanities (SSH). It is a both a detailed report on findings from perhaps the single largest and most comprehensive research assessment process in Europe in recent years—conducted in Italy between 2004 and 2016; and second, it strives to derive ...

  2. Design and Implementation of Evaluation Research

    Evaluation has its roots in the social, behavioral, and statistical sciences, and it relies on their principles and methodologies of research, including experimental design, measurement, statistical tests, and direct observation. What distinguishes evaluation research from other social science is that its subjects are ongoing social action programs that are intended to produce individual or ...

  3. Evaluation Research

    We then turn to aspects of research group evaluation and country studies. While evaluation of individuals is not discussed in depth, some general comments and caveats are provided. More specific topics include the evaluation of social sciences and humanities, and the evaluation of top labs.

  4. Evaluation and Social Research

    Evaluation is a large and growing field with applications to a wide range of disciplines - including sociology, social work, social policy, psychology, health, nursing, education, community development, etc. This text explains the different perspectives and needs of researchers and practitioners at the local level in plain, accessible English and develops a model for small scale evaluation. It ...

  5. 11.1 Evaluation research

    Evaluation research is a part of all social workers' toolkits. It ensures that social work interventions achieve their intended effects. This protects our clients and ensures that money and other resources are not spent on programs that do not work. Evaluation research uses the skills of quantitative and qualitative research to ensure clients ...

  6. PDF An Overview of Program Evaluation

    In its broadest meaning, to evaluate means to ascertain the worth of or to fix a value on some object. In this book, we use evaluation in a more restricted sense, as program evaluation or interchangeably as evaluation research, defined as a social science activity directed at collecting, analyzing, interpreting, and communicating information about the workings and effectiveness of social ...

  7. Evaluation Research: An Overview

    Evaluation research can be defined as a type of study that uses standard social research methods for evaluative purposes, as a specific research methodology, and as an assessment process that employs special techniques unique to the evaluation of social programs. After the reasons for conducting evaluation research are discussed, the general ...

  8. The Evaluation of Research in Social Sciences and Humanities

    This book examines very important issues in research evaluation in the Social Sciences and Humanities. It is based on recent experiences.

  9. 15.1 Evaluation research

    The focus on interventions and social problems makes it natural fit for social work researchers. Evaluation research might be used to assess the extent to which intervention is necessary by attempting to define and diagnose social problems in a social worker's service area.

  10. Evaluation Methods

    Evaluation Methods. Research Methods. Evaluations of social programs are vital to understand what works and what doesn't, in order to make the best use of available resources. This video describes the various types of evaluations and illustrates these using examples from evaluation research conducted in ISSR. Featured Projects.

  11. production of scientific and societal value in research evaluation: a

    In this article, we analyse the theoretical aspects of 10 societal impact evaluation methods to understand how evaluation differentiates between the scientific and societal value of research. The performative nature of evaluation, and the outpacing of theory by fast developments in the practice of research evaluation, motivates us to scrutinize the conceptual presuppositions of these methods ...

  12. Evaluation, research and demonstration in the social sciences

    In the social sciences, evaluation sometimes approaches a form of 'quantophrenia', as Sorokin put it. The factors that justify it (rationalization, the control of the correct use of public funds) are rapidly confronted with criticism which exposes the bureaucratization of research and asks major questions: in our disciplines what do we mean ...

  13. What Is Evaluation?: Perspectives of How Evaluation Differs (or Not

    With a lack of consensus of what evaluation is within the field of evaluation, there is a difficulty in communicating to nonevaluators what evaluation is and how evaluation differs from research. To understand how evaluation is defined, both evaluators and researchers were asked how they defined evaluation and, if at all, differentiated evaluation from research. Overall, evaluators believed ...

  14. Program Eval Quiz #1 Flashcards

    Study with Quizlet and memorize flashcards containing terms like Evaluative research helps those in the social sciences to:, What is the definition of a program?, Many nonprofit professionals are moving in the direction of evidence-based practice, which is the integration of which of the following three components? and more.

  15. Evaluation Research: Possibilities and Limitations

    Evaluation research offers several benefits: scientific formulation of a policy problem, measurement of key variables, and-a signal whether important social changes are occurring. Evaluation of social programs seems to answer two kinds of questions well: (a) does it matter if we choose one program over another? and (b) can we alter the number of people in a given social category? The ...

  16. Evaluative Research: Principles and Practice in Public Service and

    By and large, evaluation studies of action or service programs are notably deficient in both research design and execution. Examples of evaluative research which satisfy even the most elementary tenets of the scientific method are few and far between. An attempt by an Evaluation Planning Group in Mental Health to approach evaluation as an experiment in social change concluded that no ...

  17. Program Evaluation Quiz Questions Flashcards

    Study with Quizlet and memorize flashcards containing terms like What is the definition of a problem?, Evaluative research helps those in the social sciences to..., Characteristics of a "good" program include all of the following EXCEPT and more.

  18. Redesigning research evaluation practices for the social sciences and

    PDF | On Jan 1, 2020, Stefan de Jong and others published Redesigning research evaluation practices for the social sciences and humanities: perspectives from the European network for research ...

  19. Differences Between Evaluation and Social Science Research

    However, their objectives and the nuances of their methodologies often diverge. While evaluation primarily focuses on assessing the effectiveness and impact of a specific program, intervention, or initiative, social science research seeks to unravel deeper insights into human behavior, society, and culture.

  20. Evaluation Research

    The emergence of evaluation as a specialty field coincided with the growth of the social sciences and with increased support for research on social policies. Health and education programs were the frequent subjects of early evaluation efforts in the 20th century.

  21. EVALUATING SOCIAL SCIENCE RESEARCH

    THIS BOOK WAS DEVELOPED FOR A 12-WEEK COURSE ON THE EVALUATION OF SOCIAL SCIENCE RESEARCH; THE RATIONALE BEHIND THE EVALUATION PROCESS IS ALSO VALID IN OTHER CONTEXTS.

  22. Research Quiz 1 Flashcards

    The belief that valid knowledge about the objective world can be determined through scientific research is which philosophical foundation of program evaluation?

  23. Key facts about Americans and guns

    Pew Research Center conducted this analysis to summarize key facts about Americans' relationships with guns. We used data from recent Center surveys to provide insights into Americans' views on gun policy and how those views have changed over time, as well as to examine the proportion of adults who own guns and their reasons for doing so.

  24. P and D Flashcards

    Study with Quizlet and memorize flashcards containing terms like Evaluative research helps those in social sciences to, T/F: Within an agency, two programs may service philosophies that differ., What is the definition of a program? and more.

  25. CrowdStrike-Microsoft Outage: What Caused the IT Meltdown

    Airlines, hospitals and people's computers were affected after CrowdStrike, a cybersecurity company, sent out a flawed software update.