Program evaluation in the public interest: a new research methodology.

Program Evaluation in the Public Interest: A New Research Methodology Merton S. Krause, Ph.D.* Kenneth I. Howard, Ph.D.

A B S T R A C T : For every social welfare or social control service program there are several parties, each with different interests: patients, clients, staff, management, and sponsors. Evaluation of such a program in the public interest must take the interests of each of these parties into account. To do so requires an untraditional methodology, that of a second-person, or communal, science, which is not above the conflict of parties and their interests in specifying the variables, staffing the research, balancing considerations of intrusion against those of bias, considering the action implications of the data, sequentially staging the research, or even publishing findings. This all makes evaluation in the public interest a highly political process often unlikely to be logically decisive about intervariable relationships, to yield generalizable results, or even to be completed.

It has become customary and proper for public agencies that support service programs to require that these programs be accountable for their funds. This accountability is increasingly cast in terms of demonstrations of the effectiveness (presumably in terms of the public interest) of particular services. At least in part in response to pressures from the scientific community, the results of formal evaluation research often constitute the preferred form of evidence for these demonstrations. We contend that there are issues in the evaluation of service programs in the public interest that require significant modification and extension of the formal scientific model. In explicating this contention we address the following points. First, traditional scientific method does not concern itself with the issue of conflicting interests; however, evaluation of a service program in the public interest (as distinct from evaluation in some special interest) must consider the diverse commitments of the various groups or parties affected by or invested in that program. Second, even aside from conflicting interests, decisions to expand, curtail, modify, continue, or terminate a service program cannot be logically derived merely from the results of controlled experi*Dr. Krause is Professor of Psychology, Department of Psychiatry, University of Illinois Medical School. Dr. Howard is Professor, Department of Psychology, Northwestern University. Both are Senior Research Associates of the Illinois Department of Mental Health, Institute for Juvenile Research, 1140 South Paulina, Chicago, Illinois, 60612, where they do their evaluation research. This is a shortened, revised version of a paper presented at the Society for Psychotherapy Research, Philadelphia, Pennsylvania, in June, 1973. A number of colleagues helped with arguments about this topic and the authors" previous efforts at it. They especially wish to acknowledge E. Lessing, R. Krause, T. Bonoma, M. Gordon, D. Druckman, R. Boruch, D. T. Campbell, and H. Riecken. This paper is not intended to represent the policy of the State of Illinois Department of Mental Health with regard to the evaluation of service programs. Community Mental Health Journal, VoL 12(3), 1976

201

292

Community Mental Health Journal

ments, and the bases of this inferential uncertainty in research may differ from party to party. Third, even if a controlled experiment could yield pragmatically decisive results, what is evaluated is qualitatively changed by the introduction of the research, although the intrusion differs for each party. Fourth, and finally, the inherent effect of publication of evaluation research, done in the open, public, honest manner of full disclosure associated with the ethic of science, is often destabilizing for its host action program, because the various interest groups affected by a program can each use the findings "politically" to further their own often conflicting demands on the program or its funding source. PARTIES AND ALIGNMENTS OF INTERESTS There are different interested parties involved with any service program. Those persons w h o directly receive the services, treatment, or attention provided by the program, who are the subjects of its direct action, are the patients. The actors in this delivery of service are the program's service staff. The sources of support of the program, in terms of authoritatively allocating public or private resources to it, are its sponsors. The decision makers who authoritatively allocate the resources of the program itself are its management. Those whose interests are explicitly intended, by sponsors or management, to be served by the delivered services are the clients. Each of these parties may have different interests in the service program, and also in any evaluation of that program (Krause, 1969); and those w h o design, conduct, and interpret evaluative empirical studies of the service program are the researchers. Some examples might highlight the differential interests of parties within action programs. In a public alcoholism clinic the patients may be working-class male alcoholics; the staff may be master's degree social workers; the sponsor could be a city department of health; the management may be a part-time senior psychiatrist who performs only administrative functions in this setting; and the clients are often the spouses of the patients. In the private practice of psychotherapy the patient is a neurotic of sorts; the staff is generally a psychiatric, psychological, or social work therapist; the sponsor may be an insurance company; the management could be the therapist himself; and the client is the patient. In a high school counseling department the patient is a student; the staff are guidance counselors; the sponsor is the board of education; the management is the school principal; and the client is often a teacher. It is clear from these examples that the various parties in any program will have special and distinct interests and potentially conflicting evaluative criteria. These parties to a service program are also parties to any evaluation research done on the service program. Not only does each party to a service program have its o w n (implicit) evaluative criteria and standards of evidence for judging the program, but the parties may also have the same or

Merton S. Krause and Kenneth I. Howard

293

different functional roles relative to the research than they have relative to the action program. Thus those in whose interest the evaluation is being done, the research clients, may be a different group of persons from the action clients. The research clients may be the sponsors of the action, intent on proving their concern for serious accountability for public funds or for research legitimation or for some neat basis to use in allocating resources among competing agencies; or the research clients may be the program staff, learning more about the given social problem and its alleviation, or trying to evolve a better program, or engaging in self-serving impression management. Similarly, the sponsorship of both action and research, the subjects of their observation and intervention, those who staff their operations, their operational leadership, and those w h o favor or deplore them may each be the same or they may be different. Only when the persons in each of these functional roles in an action program are substantially the same persons in the same roles in the evaluation research on that program are the parties to action and research identically aligned. With identical alignment there is the least conflict of interest between research and action operations (at any given level of total funding), and there is least opportunity to explicate further any conflict of interests among the parties. With perfect alignment of research to action the evaluative and evidential criteria operative on the supply side of the "market" (that is, of the sponsors, management, or staff) are incorporated into the research, and thereby the success of the action (in terms the program leadership prefer or accept) is evaluated. This is the simplest and the paradigmatic case of formal evaluation research: One party's evaluative criteria must be somewhat explicated and then the technical problem of successfully deploying the "scientific m e t h o d " to see h o w variables X descriptive of the treatment package predict the outcome criteria Y is what such evaluation research is all about. Although this may do for consultation with private enterprise, for management-oriented evaluation it will not do for the evaluation of services in the public interest, where by "public interest" we only mean to indicate that the interests of all affected parties ought to be taken into account. Other alignments of the parties, between service and research, serve to separate the concerns or basis of the research from those of the program leadership. The traditional "misalignment" of which some service people are resentful involves (a) researchers as the primary client (as in embedding basic research in an evaluative study) and as management and staff of the research; (b) service staff and patients as the research subjects; (c) service sponsors and management as secondary research clients', research sponsors w h o are no party to the service p r o g r a m / a n d (d) service clients w h o are no party to the research. Every alignment has problems and advantages, but the only one structurally consistent with evaluation in the public interest involves all-to-each mappings of parties to functions; that is,

294


all parties to the service program are research clients, sponsors, and subjects (for dependent or independent variable assessment), and participate in research management and perhaps even in staffing the research. A n y one evaluative criterion is unavoidably partisan in that it probably would have a different position in the criterion value rankings of the several parties. Any finite set of criteria is also unavoidably partisan in what is left out, and so the unintended consequences of an action program may be far more important to certain parties than the few (often vague) intentions of its sponsors or leaders. Even anticipated consequences for some one party may be more important for future public policy, because of that party's power or projected growth or predilections, than they are for a particular action program's sponsors or leaders. For example, it seems that Head Start stimulated expansion of day care services and so freed some indigent mothers for the job market and from economic dependence on parents, sexual partners, or public aid, even though it was not an unqualified success as an educational supplement. The point is that some possible ramifications of a service program often outweigh others, but the weights and the possibilities differ from party to party. There is no such thing as objective evaluation research. The criteria of evaluation and the standards of evidence for it must depend on the values of the parties to an action-research program and on the essentially political and mutually educative process wherein they reach some agreement on these matters. UNCERTAINTIES AND INDECISIVENESS Knowing what ultimately matters to each party to a service program is the first problem, but knowing what findings from what research operations would be credible and usable by each party is still another problem (Carn& Hollister, 1972, pp. 135-137). However, given a reasonably representative set of criterion variables Y, random allocation of patients to treatments, and competent execution and statistical analysis, there will still be uncertainties that must be settled before attributing any observed effects to the actual service program. It remains to be shown that (a) the proposed treatment is what really happened, (b) the describable outcome is what really happened, and (c) the treatment was sufficient to cause the outcome. Unfortunately, these uncertainties are ineradicable in principle even though they can be progressively reduced as a monotonic function of research investment and intrusion.

The Proposed Treatment With regard to what was the treatment, we do not know how to construct a set of independent variables X capable of sufficiently depicting or specifying any nonmechanized service program. Consequently, we are unable to guarantee that the treatment has been maintained at a


295

single point in the multivariate space defined by X (Krause, 1972; Boruch, 1974). In other words, we do not know ex ante how to define, let alone control, all of the important independent variables in the treatment package. In the usual situation, X consists of only one global variable with only two or three values (for example, psychotherapy versus nontreated control versus waiting list group, and enrolled in Head Start versus not enrolled). In this situation a large variability in the outcome measures Y could be attributed to either unscheduled shifts over time in the actual treatment received by each patient (for example, more sessions, introduction of a special educational unit, or decreased contact with caseworkers) or to differences between the actual treatments received by different patients assigned to the same treatment condition. The answer lies in the increased specification of the set of independent variables X that constitutes or characterizes the treatment package. There is no way, of course, to be sure that all of the relevant independent variables in a treatment have been recognized or measured. If, however, the variability on the outcome criteria Y is very small, then we have some indication that the treatment package has been adequately specified; that is, if the results are located in a tight cluster in Y and if they are centrally located in a particular decision region, there is probably no need for further specification of X. It is the scattering of results across decision regions that directly calls even the most detailed ex post specification of X into question. The Outcome of Treatment

With regard to what was the outcome, we do not generally know how to establish the construct validity of operational measures of conceptual variables, even within only the research " c o m m u n i t y " itself. Consequently, valid measurement of outcome variables referring to such complex concepts as deviance, poverty, illness, incompetence, mastery, and suffering will be very difficult even for researchers. The situation is even more difficult in the measurement of outcome variables in evaluation research, since certain critical values of each variable also define the decision regions concerning the fate of the service program. Here the construct validity issue becomes entangled with the issue of the fate of the service program and contributes uncertainty to any decision concerning the latter. This leaves room for the parties' a priori attitudes for or against the service program more easily to influence their acceptance of results utilizing these outcome measures' uncertain validity. The ex ante partitioning of the multivariate space defined by the dependent variables into discrete decision regions can reflect the parties' attitudes toward both program and measurements and so localize the effects of uncertainties about construct validity, for example, in clearly marked buffer regions of indecision. Nevertheless, if a party is an advocate of the program and agrees that enhanced ego strength (for example) is a desired outcome of the program, and is shown

296


evidence that the patients did not change on the "ego strength" scale, he could claim that the scale, in view of these results, could not be an adequate measure of what he meant b y "ego strength."

Sufficient Treatment Finally, we do not know ex ante how to include all causally important variables in X. If there is a lot of variance in the outcome measures, then unmeasured "aliens" from X may be varying out of control, and so the treatment package, as complex as it may be, may not itself be a sufficient condition for change. Plausible alternative explanations can be evaluated by ex post covariance studies of promising alien variables, or replications controlling them, or tests of these alternatives on their own. This process can never achieve a logical proof, of course, that the specified treatment caused the outcome, but it can become quite convincing for parties open to being convinced as the residual variance in Y approaches closely enough to zero to satisfy them. Unless it is proper and possible politically to overpower some party to the action, that party must be satisfied by the evaluation research as to all three uncertainties and have its favorite alternatives to the particular treatment program accounted for. Thus decision region boundaries, construct validity problems, and dissatisfaction with the variables included are all considerations, beyond the experimental results, in reaching policy decisions about service programs. INTRUSION AND BIAS Collecting sufficient, valid, multivariate data on the nature or quality of the program itself and on its effects in a specified time frame (while maintaining good relationships with all interested parties and also minimizing the influence of this collecting and maintaining) is a very demanding matter. Whatever intrusions each party must bear for the data collection may tend to undermine that data collection through inducing opposition to or passive-resistive participation in it unless, as each party sees it, the knowledge expected to be generated by the research makes up for this cost. Somewhat less obvious than the intrusions of observations, and seemingly less desirable in the framework of objective disinterested science, are the intrusions of being impelled to advocacy, that is, of making sure that the data bearing on one's o w n interests are sympathetically taken and interpreted. Of course, there are gross indicators that can, at some ethical cost, be assessed through the bureaucratic infrastructure: arrests, unemployment compensation, hospital commitments, taxable income, housing status, and so on (Campbell, Schwartz, & Sechrest, 1966). But there certainly is a remarkable inconsistency between admitting a party into participation in the research and using such indirect (and prima facie, nonreactive) measures on its members. Obviously no indicator can be non-


297

reactive if the parties are aware of the nature and use of the indicator. The greater the conflict of interests and the less trust there is among parties, the more each will want to spend on advocacy. This will be a heavier expense as others have more control of the data. For example, self-reports can be modulated to one's o w n advantage where the impressions of participant observers with different interests at stake cannot be so easily modulated. Somehow the researchers must make themselves sympathetic, and equally so to all parties' interests, if the parties are to be spared the further intrusion of taking up advocacy roles and thus biasing the data into what is personally desirable. Disinterested researchers cannot do this, because the fairness of their attitude is vacuous for parties with much at stake and the reality of this fairness depends on the tenuous assumption that the theoretical and methodological interests of the researchers are all they have and that these are orthogonal to all other parties' interests. Certainly they are in competition for resources (the better the research, the more it costs and intrudes), the importance and so the effort they allot to the various dimensions in X and Y may be idiosyncratic, as may be the causal paths they expect when nonorthogonal independent variables are to be disentangled in data analysis. They may even become suspect because of the substantive treatment and outcome variables they have proposed and the safeguards against bias they try to insure (such as multiple sources of information on a variable, stability of assessment checks, and response set detection devices being seen as distrust of patients or others). Linked to the intrusion of having to be an advocate is the risk of damaging exposure. Risk of the other parties indiscriminate use or publication of data they are privy to is greatest for patients and those clients whose quality of life is also the issue in Y, but the staffs' competence and work life is also at issue in X. Dependence on the other parties discretion can induce secrecy and uncooperativeness in an effort to reduce the risks of overexposure and collusion (or it can, of course, induce some degree of community among the parties). If other parties relate to each other in a notably less communal (or "IThou" or "second person") manner than they do to trusted researchers, that will provide a remarkable contrast and a basis for conflict between the parties. The social relationships and "politics" consistent with valid detailed data collection on persons are thus intrusive on an action program in still another way. Besides being (a) competitive for scarce resources, and promotive of (b) advocacy and (c) secrecy, they are also likely to be (d) inductive of strain on the relationships among the parties in the actionresearch system. POLITICS AND INSTABILITY Every party to an action-research system has important interests at stake. Well-evidenced policy-decisive findings from the research are only one such interest, although they are usually the justifica-

298


tion for doing the research. The personal intrusion of the research process is another, although unintended, area of interest for each party. In addition to these, there are interests such as attaining program goals, avoiding outside political pressures and litigation, demonstrating fiscal responsibility, securing job protection, achieving career enhancement, making one's ideological or technical point, and doing exemplary research, which will also mean somewhat different things for each party. Where interested parties are excluded from explicit participation in the system politics or where they fail at that level, they must resort to apathetic resignation, connivance, resistance, or efforts to penetrate the system's power structure if they are to further their own interests. Since there is this diversity of interests and tactics, in addition to the number and diversity of parties (each party having its own internal complexity and effectiveness), recurrent efforts to achieve some balance of interests served will often be necessary. Moreover, a regular feedback of evaluative data to all parties must regularly call for such efforts. "If things are going well, w h y not better, or at less cost, or with more patients? .... If things are going badly, w h y not rectify them as soon as possible, even at the expense of securely evidencing for others how bad things are and why?" Conflicts of interest or issues become realized as, for example, the treatment package gets defined: patients want home visits, but staff prefer their own offices; clients and old inmates (patients) want hospitalization, but sponsors and some patients want community care; staff wants professionalization, but sponsors do not want that expense; researchers want to preserve the integrity of the research design, but management (and usually all other parties) want to maintain program flexibility in order to maximize the opportunity for obtaining better outcomes (Marris & Rein, 1967, pp. 121-207). Public interest evaluation implies participation, and participation implies both political pluralism and a new uncertainty, namely, that the research is continually subject to renegotiation and change. Such change makes replication impossible, which makes generalization unverifiable, and this renders the findings of the evaluation research untestable by other members of the scientific community. Beyond affecting matters of design, even the methods of research can come under recurrent "political" consideration. Methods as they bear on such matters as measurement bias, unfavorable reactivity, an excessively underserviced control group, a politically unfortunate random sample, long delays between data collection and reports of findings, ancillary variables excluded, will often become problematical as parties become more experienced and sophisticated. Such possible penetration of the other parties into the research underlines the increasingly realized interdependence of the parties and thereby distinguishes evaluation research in the public interest from the detached, objective, self-determined model of traditional research.


299

PRIORITIES There are some conditions under which public interest evaluation research is viable and many under which it is not. Such research cannot hope to be appended to just any program, and so it cannot be taught as a "game against nature" but rather as the establishment and enactment of a "symbiotic" relationship with multiple other parties. It is as much political and rhetorical as it is objective and logical. The following are a set of conditions that are necessary to some degree for public interest evaluation research. 1. The more cohesive, defined, politically socialized, representatively led, and structurally stable each of the parties to an action-research program is, the more likely the program and its evaluation are to become stabilized. Patients would seem to have little incentive, or sometimes even ability (children, psychotics, and so on), to organize their party, and clients even less incentive; so these conditions are not likely to develop spontaneously without advocate support (Lipsky & Levy, 1972). 2. Each party should have or develop as thoughtful, representative, articulate, politically able, and stable a cadre as it can (the broader and more unified, the better) to d e f n e Y, decision regions in Y, and an acceptable treatment package and research component, if these matters are to become and generally stay settled. 3. The representatives of all parties should, as much as possible, understand each other, be able to argue and bargain well together, and reach agreements accepted by their constituencies, in order to explicate and compromise conflicts of interest, so each party should have strong incentives for having the research completed. 4. Each party to the service program should be represented in every functional role in the research program, so as to make all parties share both knowledge and responsibility. 5. The first phase of formal research should be on the parties' outcome criteria, their weights for those criteria, the nature of promising service programs, and their standards of evidence for deciding the fate of any such service program, because everything else depends on where the parties stand on these issues. 6. When each party's predilections are clarified, a period of mutual education into the meanings of and reasons for these is needed before an initial research design can be formulated as a whole in a manner understood by all parties. This mutual education should also facilitate coming to a political decision on what design to adopt and on the credibility of any research results (Argyris, 1970, pp. 103-127). 7. The more nearly the several parties' decision regions in Y are congruent--so that the same subspaces correspond to "terminate," "continue," and so on--for all of them, the more likely the research results can be politically decisive; so Y, outcome variable validities, and decision regions should be negotiated to this end as far as possible. 8. Costs of active participation, fairness to the interests at stake, risky exposure, and alterations of the quality of life should be made sufficiently explicit and tolerable to all parties. It may be possible to attain this through mutual education and essentially political negotiation. 9. The initial size of X should be at least large enough to include all parties' required candidates, but multistage designs allowing for revisions of the treatment package and even of Y, in the face of indecisive outcome distributions in Y, are essential. Optimum-seeking methods rather than full factorial designs are needed here (Wilde & Beighfler, 1967, pp. 215-344). 10. The negotiation of action-research program changes should be staged at long enough intervals to prove enough about the current treatment package to everyone: a matter of sufficient trust.

At worst this suggests the politicization of evaluation research in order to have each party represent its interest in the design, execution, and in-

300


terpretation of research o n service programs. At best it can be a sort of c o m m u n i t y of interest in certain service objectives translated into a cooperative enterprise to get a m u t u a l l y satisfactory service p r o g r a m defined, operating, a n d s u p p o r t e d . M u t u a l concern a n d respect for each other, h o n e s t y , a n d valuing these relational qualifies over the p r o g r a m itself w o u l d be required for such a cooperative enterprise. But at its best or at its worst, public interest evaluation research is experimental research prof o u n d l y different f r o m traditional experimental science. It does not grant the researcher h e g e m o n y over the research because h e c a n n o t accomplish it w i t h o u t sharing p o w e r . It does not a s s u m e that inductive logic can be decisive in matters of service policy or that providing (or e v e n explaining) the facts is e n o u g h , because an e v e n m o r e complex rhetoric is n e e d e d for effective c o m m u n i c a t i o n in the political forum. It does not ignore or unilaterally engineer the impact of the research process o n those on w h o m the research m u s t intrude. It does not e v e n g u a r a n t e e the researcher a period of closure, w h e n his w o r k s h o u l d be i m m u n e to service (or intrusion) considerations, or a status of methodological privilege. It is more a research a m o n g equals, "I a n d T h o u , " t h a n an objective research b y an elite, "I on It." Such s e c o n d - p e r s o n research m a y well have its o w n distinctive m e t h o d o l o g y unlike that of traditional t h i r d - p e r s o n research, as we are arguing. Evaluation research in the public interest is an ideal t y p e a n d n e e d s to be recognized if the usual evaluation research is to be seen a n d u s e d for w h a t it is: for special interests, only suggestive of policy in an elaborate f r a m e w o r k of a priori a s s u m p t i o n s a p o w e r f u l i n t e r v e n t i o n on the systems it p u r p o r t s to study, a n d (with or w i t h o u t results) a p o t e n t political gesture. REFERENCES Argyris, C. Intervention theory and method. Reading, Mass.: Addison-Wesley, 1970. Boruch, R.P. Measurements in experiments. In H.W. Riecken, & R.F. Boruch (Eds.), Social experimentation, 1974. Cam, G.C., & Hollister, R.G. The methodology of evaluating social action programs. In P.H. Rossi, & W. Williams (Eds.), Evaluating social programs. New York: Seminar Press, 1972. Krause, M.S. Construct validity for the evaluation of therapy outcomes. Journal of Abnormal Psychology, 1969, 74, 524-530. Krause, M.S. Experimental control as a sampling problem in counseling and therapy research. Journal of Counseling Psychology, 1972, 19, 340-346. Lipsky, M., & Levi, M. Community organization as a political resource. In H. Hahn (Ed.), People and politics in urban society. Beverly Hills: Sage, 1972. Marris, P., & Rein, M. Dilemmas of social reform. New York: Atherton, 1967. Webb, E.J., Campbell, D.T., Schwartz, R.P., & Sechrest, L. Unobtrusive measures. Chicago: Rand McNally, 1966. Wilde, D.J., & Beightler, C.S. Foundations of optimization. Englewood Cliffs, N. J.: PrenticeHall, 1967.

Assuring the benefits of immunization in the future: research in the public interest.

Genetics and the public interest.

Impact of the Surgical Research Methodology Program on surgical residents' research profiles.

Scientists develop new interest in cancer vaccine research.

A novel program for ABSN students to generate interest in geriatrics and geriatric nursing research.

Some problems in community program evaluation research.

Public interest groups.

Public interest groups.

Evaluation of BiesseBioscreen as a new methodology for bacteriuria screening.

A new methodology for evaluation of nematode viability.

Drug-company data and the public interest.

Research ethics review: identifying public policy and program gaps.

The NIEHS Superfund Research Program: 25 Years of Translational Research for Public Health.

Community organization and the public interest.

The BASHH Public Health Special Interest Group.

Research methodology.

Public education and interest in environmental factors.

Quantitative Evaluation of the Community Research Fellows Training Program.

The Society for Implementation Research Collaboration Instrument Review Project: a methodology to promote rigorous evaluation.

Enhancing diversity in the public health research workforce: the research and mentorship program for future HIV vaccine scientists.

How is the "public interest" defined in learning networks?

Public health research in India in the new millennium: a bibliometric analysis.

Going public: reflections on developing the DöBra research program for health-promoting palliative care in Sweden.

Evaluation of a Health Professionals' Training Program to Conduct Research in New York City's Asian American Community.