A critique of project evaluations.

International Journal of the Addictions

ISSN: 0020-773X (Print) (Online) Journal homepage: http://www.tandfonline.com/loi/isum19

A Critique of Project Evaluations F. Catherine McCaslin & Daniel H. Ershoff To cite this article: F. Catherine McCaslin & Daniel H. Ershoff (1978) A Critique of Project Evaluations, International Journal of the Addictions, 13:8, 1263-1284, DOI: 10.3109/10826087809039341 To link to this article: http://dx.doi.org/10.3109/10826087809039341

Published online: 03 Jul 2009.

Submit your article to this journal

Article views: 2

View related articles

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=isum19 Download by: [RMIT University Library]

Date: 03 April 2016, At: 19:57

The International Journal of the Addictions. 1318). 1263-- 1284, 1978

A Critique of Project Evaluations Downloaded by [RMIT University Library] at 19:58 03 April 2016

F. Catherine McCaslin, M .A. Department of Sociology

Daniel H. Ershoff, M.P.H. School of Public Health University of California at Los Angeles L 0s Angeles, California 90024

Abstract In recent years an increased stress has been placed on the evaluation of mental health, education, and welfare service programs. The majority of studies readily available to most evaluators represent local project evaluations which usually contain diverse references to different aspects of the evaluative process. For evaluative results to be even minimally useful to other projects, however, certain requirements must be met. These are: (1) internal validity, ( 2 ) external validity, (3) specification of the population and treatment being implemented, and (4) standardization of indicators of treatment impact. To determine the extent to which published project impact evaluations meet these criteria, a study was undertaken to “evaluate the evaluations” themselves within heroin addiction treatment programs. Six high-yield journals and 100 random sources were systematically searched for reports of evaluations which provided measures of success in terms of the consumer. Articles were analyzed in regard to our four prerequisites for cross-project comparisons regarding pro1263 Copyright @ 1978 by Marcel Dekker, Tnc. All Rights Reserved. Neither this work nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without permission in writing from the publisher.

McCASLIN A N D EKSHOFF

1264

Downloaded by [RMIT University Library] at 19:58 03 April 2016

cess variables, impact variables, and methodologies. It became clear, however. that our original objectives in evaluating either the usefulness of published project evaluations or testing any specific impact hypotheses were not achievable due to the state of evaluative measurement and reporting practices at this time. The major problems we encountered in our inability to complete a necessary and potentially fruitful comparative assessment of project evaluations are discussed in detail with recommendations fot future work.

I NTR 0 D UCTI 0 N In recent years an increased stress has been placed on the evaluation of mental health, education, and welfare service programs. Indeed, the evaluation or assessment of program effectiveness has become a required aclivity for further and continued funding by government agencies who have become “less and less satisfied with unsystematic, subjective. and nondata based judgments about the effectiveness of programs” (Putnam et al., 1973, p. 10). Instead, funding agencies as well as public and community groups have begun to demand systematic and objective evidence of a program‘s ability to accomplish or positively approach its stated goals. According to Suchman (1967, p. 31), what is being demanded currently is rrducifive research or “specific use of the scientific method for the purpose of making an evaluation.” One of the complexities in program evaluation arises from the fact that various parties involved are interested in different aspects of the functioning of programs. Distinctions have thus been made between types of cvaluative activities. Wholey et al. (l970), for example, distinguish four types of evaluation defined in terms of the specific aspect of a program which is being emphasized. The first of these, program impact evaluation. is defined as “assessment of the overall effectiveness of a national program in meefing its objectives, or assessment of the relative effectiveness of two or more programs in meeting common objectives” (Wholey et al., 1970, p. 25). He cites the Westinghouse Learning Corporation--Ohio University evaluation of the national Head Start program as an example of this type. The second major type of evaluation outlined by Wholey, program strategy evaluation, is the level at which “relative effectiveness of different techniques used in a national program” is assessed. Third, a project evaluation is an “assessment of the effectiveness of an individual project in achieving its stated objectives. . . . This evaluative activity is accom”


A CRITIQUE OF PROJECT EVALUATIONS

1265

plished on a local level within the actual implementation phase of a larger program. Finally, Wholey et al. (1970, p. 25) define the activity of project rating as the “assessment of the relative effectiveness of different local projects in achieving program objectives.” Thus, depending on the focal point of interest, whether on the intervention itself or the effectiveness of the entire effort, on the national or local level, the type of evaluation which is most relevant to the various parties of interest will determine the area and type of data collected. A second exposition on levels of evaluation has been set forth by Freeman (1974). In his outline Freeman (1974, pp. .5-6) delineates the major types of evaluation as process, impact, and comprehensive evaluation. Process evaluation refers to “whether or not a particular program intervention or treatment was implemented according to its stated guidelines.” This type of evaluation corresponds generally to Wholey’s program strategy evaluation except that Freeman does not specify that it be done on a national level. This area examines the process of the intervention within the program and describes the specific target population being considered. Impact evaluation, in which the program’s achievements are measured according to a predefined set of criteria, refers to “a set of pre-specified operationally defined goals and criteria of success, and measured impact refers to movement or change toward the desired objectives” (Freeman, 1974, p. 6). The third type of evaluation delineated by Freeman is labeled comprehensive evaluation which incorporates both process and impact levels of study. Freeman advocates this type since only through a combined knowledge of the interventions which were introduced and the subsequent outcomes of these interventions can we with any confidence assess the level of effectiveness of that treatment effort. That is, it is necessary to know the various “strategies” or “processes” that were implemented in order to determine which, if any, led to the observed behavioral changes or, concomitantly, which were ineffective in producing such change. A third definition of levels of evaluative emphases has been outlined by Tripodi et al. These authors discuss program efforts, effectiveness, and efficiency as the three levels of activity (Tripodi et al., 1971, p. 11) which correspond to those of both Wholey and Freeman. Thus, despite the exact labels applied, we find that evaluations can be defined by (1) activities that occur within the actual program (strategy, process, or efforts); (2) the results of these interventions (impact or effectiveness); and (3) the efficiency of the operation.


12hO

McCASLIN A N D ERSHOFF

These evaluation approaches are summarized schematically in Table 1 . Although large-scale evaluations are occasionally undertaken which ivould qualify as pograru impact evaluations, by far the large majority of published studies must be described as Iocalprvjecl evaluations. The label o f “local” or “project” does not necessarily imply that the effort is locally funded in entirety. Nor does this label minimize its potential worth. Rather. the definition concerns the relative size of the evaluative approach. Within the mental health literature, journal articles regularly appear which describe the success of a particular type of treatment intervcntion for a very specific target group. Articles are frequently encountered which describe the experience of projects such as a crisis intervention center in Chicago, a family drug counseling project in Laguna Beach. California, or an alcoholic rehabilitation center i n Boston. Most of these articles reflect a wide diversity in content, reporting style, and generality of issues being dealt with. It is felt that the form and quality of this diversity does not reflect a mere literary variation but rather parallels the extreme specificity of thc actual treatment projects themscl \,cs. Gi\.-enthat the majority of studies which are readily available to most evaluators are on the project level and that these usually contain diverse references to different aspects of the evaluative process. certain requirements must be met for the results to be even minimally useful to other projects. Four such requirements are outlined here which must be considered before the results and implications of small-scale project evaluations (and indeed those on the program level) can be utilized by a broader audience. First, the evaluation must show some evidence that it is internally 1 alid. Ideally, internal validity will be secured through random assignment of patients to treatment and control groups. When this method is either unfeasible. unethical, o r otherwise impossible to introduce. a quasiexperimental design can be substituted (Campbell and Stanley, 1963). Second. there must also be some consideration of the evaluation’s external validity or the extent t o which the results of one project can be generalized beyond that specific setting. In addressing this problem. Wholey (1970) points out that any between-project comparisons of treatment effectiveness must necessarily control for both environmental contexts within each project as well as the characteristics of the clients being served. Basic demographic variables such as sex, race, and age must therefore be mentioned in addition to other characteristics which are known o r suspected to be specifically relevant to future treatment success

Use of environmental variables and short-term output measures

Use of cost and resources data including environmental information

Project rating (Wholey et al.)

Program efficiency

Unspecified

Individual project

Individual, local project

Unspecified

Unspecified

To determine the relative costs of achieving intended outcomes

To assess the relative success of local projects within a national program

To assess program or project achievement in terms of specified criteria for the purpose of program change or policy-making

National

Use of output variables and comparison groups

Program impact evaluation (Wholey et al.) Impact evaluation (Freeman) Program effectiveness (Tripodi et al.) Project evaluation (Tripodi et a!.)

Unspecified

To describe and assess differential effectiveness of treatment strategies and target populations to determine whether treatment was implemented according to stated guidelines

National comparisons within projects Unspecified

Use of appropriate environmental, input, and other process variables

Program strategy evaluation (Wholey et al., 1970) Process evaluation (Freeman, 1974) Program efforts (Tripodi et a]., 1971)

Objective

Population focus

Design and data demands

Emphasis

A Summary of Various Evaluation Approaches

Table 1



1268

McCASLIN A N D ERSHOFF

such as the length of previous addiction in drug abuse treatment. Thus. for an evaluation to be externally valid, results must be clearly specified in terms of patient background and demographic parameters. These factors should also be reported in terms of treatment subgroups so that the interactive effect of specific treatments with specific groups can be determined. Thc third prerequisite necessary for a broad application of project evaluation results concerns the precise specification of each intervention or treatment which is implemented. Any valid interpretation of a project’s success must be based in the procedural aspects of that intervention. Indeed. evaluations of social programs should be directed toward isolating those processes which affect the success of a target group. Merely reporting that counseling was available to the clients, for example. does not give an outside reader any real information as to what actually occurred to bring about the reported behavioral or attitudinal change. Knowledge only that methadone was dispensed to all clients in a heroin treatment program lacks even it minimal description of the total treatment received. Even the inclusion of the fact that group counseling was conducted lacks precision in regard to the type of groups which may have been formed. Thus very specific descriptions of treatment and other process variables deemed to be even partially causal in their effects must be included. The final prerequisite or minimum requirement for the implementation and form of reporting evaluative results is specific to the measurement of outcome variables. This phase of the evaluation, referred to as the impact or effectiveness stage of the study, is often as imprecisely described as process and environmental aspects. Our fourth prerequisite to the further use of evaluative results, therefore, is that indicators of the effectiveness of treatment efforts must be fairly standardized across similar projects. This should not imply that we advocate the complete standardization of outcome measures without regard to individual needs or local requirements. Rather, we point out the necessity for inclusion of measures which are most relevant to the interpretation of reported success rates. Practical issues such as the availability of evaluation funds will often necessitate a less comprehensive evaluation of overall treatment effectiveness than the evaluator might desire. However, beyond these considerations, whichever measures are chosen, specific indicators of any evaluative area should be similarly developed for maximum comparability across projects. It is impossible to compare, for example, the success rates for criminal activity of one program which are described in terms of the



1269

number of arrests per 50 man-years with those which are measured by the absolute number of arrests during the first 9 months of treatment. In sum, then, we would require that evaluators not attempt to be innovative in the measurement of outcome indicators when the goal for the use of the evaluation is comparability across similar treatment projects. Rather, measures which can be readily compared and thus put to greatest evaluative use by that project and others should be introduced. These are the minimum requirements which should be met by project evaluations to build a more generalizable and thus more useful body of evaluative data. To determine the extent to which published project evaluations meet these criteria, a study was undertaken to “evaluate the evaluations” themselves. Heroin addiction treatment was selected for detailed analysis as a representative field of mental health evaluation. This area was chosen after a preliminary review of a number of high-yield journals revealed that both process and impact variables for measuring behavioral and attitudinal change in this area were relatively clear-cut. There also seemed to be a more narrow definition of available types of treatment within heroin addiction treatment programs than in other mental health fields. Two problems in using published material are important to point out. First, as Bernstein and Freeman (1975) note, a large number of the evaluations of federal programs never reach the general research audience in the form of journal articles. This neglect, due to a number of factors including lack of agency stress on the publication of results, is thus a possible source of bias in our study population. Second, space limitations of journals necessarily preclude a full and comprehensive discussion within the publications of the entire evaluative process. Our evaluation of these articles in terms of the requirements discussed above may reflect in part such editorial policies. The remainder of this paper will describe the results of this study which examined project evaluations of heroin addiction treatment published between 1970 and 1974. A systematic search of six major high-yield sources was followed by a search of 100 random sources for reports of evaluation which provided some measures of success in terms of the consumer. Four specific research questions were explored:

1. Are the evaluations methodologically sound? That is, are they internally valid? 2. Do the evaluations deal with the problem of external va-

McCASLIN AND ERSHOFF


I2iO

lidity? That is, are subgroups specified according to demographic and other patient parameters? 3 . Is there a specification of the exact types of treatment interventions utilized within the projects? That is, d o the evaluations discuss crucial process variables deemed instrumental in exacting the reported behavioral or attitudinal changes'? 4. Which specific output variables are utilized by the programs to measure the desired change? Are the specific indicators of these measurement areas standard across project evaluations or is there a wide range of specificity between projects? Ii' o u r four prerequisites for cross-project comparisons discussed above ivere met, it was planned that an aggregation and analysis of specific levels of outcome effectiveness would be accomplished. This would allow specification of which modes of treatments were most effective in reaching the goals of behavioral change. With this analysis it would be possible to answer specific questions of the following types which would be useful to individual project directors and HEW policy-makers alike: 1.

Is one type of treatment intervention more effective than others in reducing criminal activity among heroin addicts'? For example, are therapeutic communities more effective in reducing the frequency of arrest among clients than a methadone maintenance program? 1. Are certain treatment modes more successful in producing desired change with clients of specified characteristics? For example, is methadone maintenance versus immediate detoxification more effective in decreasing the rate at which Black male patients with a 5-year or more addiction history return to heroin? 3 . Is one type of counseling, such as individual or group, more effective with particular client types in regard to specific measures of success? The remaining sections of this paper will discuss the design and methodological considerations of the study itself. A detailed analysis of major problems which were encountered in attempting to integrate these sources of evaluations will be given. Finally, a concluding section will offer recommendations and guidelines for improving and standardizing project


1271

evaluations which are meant for audiences beyond the specific project setting.


M ETH0 D0 LOGY The design of this study was based on the assumption that a prestructured coding system could be developed that would adequately reflect variations in the descriptions of both process and impact phases of evaluations. This assumption was based in part on our prior experience with the Databank of Program Evaluations (DOPE) at UCLA which has developed one coding scheme to incorporate a wide range of mental health areas as diverse as alcoholism and schizophrenia. [For a description of the Databank of Program Evaluations, see Wilner et al. (1976).] Our system, being concerned with only one area of mental health evaluation, would be capable of describing even more specific information on details of projects and their evaluations. A coding system was developed to include five subareas of projects treating heroin addicts. These five areas included process variables (treatment modalities utilized and characteristics of the patient population), impact variables (addictive, social, and criminal behavior as well as consumer satisfaction and psychological state), and methodologies (experimental design, types of controls, use of statistical tests of difference, and inclusion and timing of follow-up measures). The entire coding system was organized around one integrating scheme so that each treatment mode could be considered separately. All relevant information was completed for each treatment type being compared in the evaluation. For example, if a project incorporated methadone maintenance, methadone detoxification, and therapeutic community projects within the same report, each type of treatment being compared was coded individually for patient parameters, success rates, and research design to produce a maximal comparability of results between similar projects and treatments. This system was thus devised to serve as a control over the input, conditioning, and output variables since, as Wholey et al. state (1970, p. 97), “The ‘ideal’ evaluation will tell not only whether a program produces effects but also what strategies or components of the program are most important to production of the effects.”

PROCESS VARIABLES Two areas for investigating process variables were included: the type of treatments utilized by the projects and the characteristics of the


I272


treatmcnt population. Both .'major'' and any "secondary" types of treatment were included. Major treatment was defined as the type of heroin substitute therapy employed or, as in the case of abstinence and certain forms of therapeutic community modes, the absence of such chemicals. Secondary treatment was then defined as any other ancillary treatment designed as a supportive therapy such as individual counseling. i,ocational rehabilitation, or family group counseling. At various points throughout the study, definitional problems occurred when the major focus of the evaluation was unclear. These definitions served to circumvent such ambiguities. Also. our definition of major treatment type was based on the criterion that a group of patients receive only one type of major treatment and no other during any particular measurement period. Secondary treatments w.ere then all those supportive therapies which the same group received in addition to the major type. Any patient, for example, cannot be on both methadone maintenance and detoxification schedules at any one time but could be receiving counseling support.

PATIENT CHARACTER ISTICS To increase our ability to make valid cross-program and crosstreatment comparisons, sociodemographic characteristics of each treatment cubpopulation were included in the design. These variables were sex. age. race. education, and socioeconomic level. The mean length of ddiction was also included.

IMPACT VARIABLES The design phase which proved to be most difficult to formulate involved impact measures and subsequent success rates. Our major intcrcst was in five measurement areas: addictive, social, and criminal behavior. consumer satisfaction, and psychological state. Within each area specific measures of these outcome variables were developed. For example. criminal behavior was measured by arrest, conviction, and incarceration rates. Success rates were then defined for each specific outcotne measure within the evaluations either as an absolute percentage of' the group or in terms of statistical significance between groups. whichever method was used in the published reports. The time period at which each measure was taken was also included in order to control for



I273

this variable in comparing project effectiveness. Ten time periods were defined ranging from pretreatment measures through various follow-up periods. This specification was crucial for purposes of comparison between treatments as well as for validation of the effectiveness of an individual program. In this way, baseline measures at pretreatment were standardized for later analysis. An additional piece of information noted for each outcome measure was the data collection technique employed. This inclusion would allow us to later determine whether differential success rates between projects were possibly a function of data collection techniques rather than true project effectiveness.

METHODOLOGY A N D EXPERIMENTAL DESIGN The final area concerned the methodological design utilized within the evaluations. Our goal here was twofold. First, we could examine the general quality and diversity of methodological procedures within the evaluations surveyed. Subsequently, we could explore the possible relationship between study design and measurement procedures to reported rates of success. The methodological areas examined were experimental design, types of controls, use of statistical tests of difference, and inclusion and timing of follow-up measures. Other more specific methodological considerations included whether or not differences were noted by the evaluator between those who completed treatment and those who had dropped-out, the “internal failures” (Lerman, 1968, p. 225) as well as similar possible differences between those located and not located at follow-up. Finally, sample sizes were included to determine whether success rates were based on the complete admissions cohort or whether the sample size on which outcome measures were based reflected only those patients who remained in treatment for a given length of time. This area has been noted by other researchers as a major source of interpretive error (e.g., Maddux and Bowden, 1972; Lerman, 1968).

RESULTS Our system for evaluating results from project evaluations in the drug treatment field was initially felt to be sufficiently flexible to include all variations of measurement and reporting schemes. It became clear, however, after a series of many pretests and revisions, that there was


1774

McCASLlN A N D ERSHOFF

virtually no method to validly aggregate all information into one integrating scheme due to the extreme specificity of measures employed and variation of format for reporting effectiveness within projects. We concluded that our original objectives in evaluating either the usefulness of published project evaluations as described above or testing any specific impact hypotheses were not achievable without a complete revision of our goals. We determined that this failure was not due to our approach to these important objectives but rather was a function of the state of evaluative measurement and reporting practices in use at this time. This inability to complete a necessary and potentially fruitful comparative assessment of evaluation results is extremely important in its immediate and long-range implications. Since the implementation and comparison of evaluative data between projects is a major goal in disseminating information in the first place, our negative results are crucial. The remainder of this paper will detail major problems which were encountered and suggest guidelines for standardizing methods both for project evaluations and subsequent reporting schemes. Although some of these problems could theoretically have been alleviated by constructing a much more detailed coding system or by focusing on either process or impact variables separately, such a procedure would have compromised our goals. The absolute number of articles which would have met specific requirements for inclusion under each subject area would have diminished any attempt to retain a representative sample of evaluations. In addition, a more detailed system would not have eliminated basic inadequacies in the evaluations themselves. Inadequate Description of Treatment Modalities Employed Although it was recognized that a full and detailed description of treatment interventions employed in the projects was impractical within the space limitations of the media we surveyed, it was assumed that some mention of this area would be made. Even in evaluations which purport to assess only outcome effectiveness, some definition of the project’s services is necessary in order to know what brought about any reported success. As Weiss (1972) points out, treatment descriptions which are stated in broad, general terms may not allow a strong statement of causality, but knowledge that one type of intervention appeared to be more effective than another does, at the very least, point to areas which deserve more attention. The most basic inadequacy which we found in terms of project strategy



1275

was the inaccuracy in specifications of the types of treatment modalities which were introduced. Most evaluations included at least some reference to major interventions such as type of methadone therapies. The difficulty in defining the more exact treatment centered in our attempt to understand variations within these broad intervention categories. Further, when it was specified, for example, that methadone maintenance and detoxification were used, a general lack of consistency was evident in the definitions of these terms. Most evaluations studied rates of success in terms of variations in methadone treatment, and many subdivided these according to maintenance dosage levels and detoxification schedules. Within these broad categories further specification was sometimes made to exact dosage levels in milligrams while other reports did not include this information. Thus treatment groups were described very often according to an author’s individual definition of terms. In addition to these definitional problems with what we have termed “major” treatments, there were more serious inadequacies in definitions of “secondary” treatments. It was often unclear which supportive therapies were involved. Some reports merely specified, for example, that vocational rehabilitation, individual counseling, and family group therapy were available or mentioned only that their treatment profile was “multimodality” while others described the exact frequency of patient encounters. If we were to compare reported posttreatment employment rates between projects for which we had such differentially detailed information, we would not know which interventions or what it was about these interventions that affected subsequent rates of success. Nonspecification of Patient Characteristics

The second major problem encountered in our attempt to assess the usefulness of project evaluations concerned the frequent exclusion of sociodemographic characteristics of the patient populations. This problem was encountered on two levels. First, a large number of evaluations failed to include any reference to the characteristics of the patients on whom outcome data were reported. Second, when these were described, few offered the information in terms of specific treatment groups. Since, as Hyman et al. (1962, p. 79) state, a “description.of the subjects who were affected by the program is as essential as description of the program . . . , these exclusions would affect valid interpretation of treatment outcomes. That is, when sociodemographic characteristics of the treatment population are omitted, we are unable to (1) generalize ”


1276


results beyond a specific setting, or (2) determine which patients are most affected by or susceptible to a particular type of treatment, combination of Treatments, or, indeed, any treatment at all. In addition, there was virtually no attempt in the large majority of these evaluations to report patient characteristics for initial pretreatment levels on dependent or outcome variables. This practice would negate any further analysis of causal relationship in terms of specific patient groups. For example, if we are told that a treatment population consists of 60”,, male, 75:; Black, and 35”/, high school educated patients and then later find a 45:’;) reduction in the number of arrests at some posttreatment measure, statements can be made about the relationship of these demographic factors and success on this measure, however gross. If a second evaluation does not specify these data and reports a 65% reduction in arrests. with duplicate treatment interventions, to what do we attribute the higher success of the second program? Differential success rates are often a function of many factors. However, if the essential descriptions of treatment interventions are known and can therefore be controlled for in analysis, we must examine the relevant characteristics of the patients themselves. An even more exact procedure would be to specify the rates of outcome on admission to the program in terms of these variables. It would then be possible to determine which type of patient was most affected or changed by the treatment to which he was exposed and which type was least susceptible to this intervention. Again referring to Hyman et al. (1962, p. S l ) , “Study of differential effects among subjects to determine u hich types are more or less susceptible is guided by the distributions of characteristics within the total group. . . . *’ We find, then, that the most valid and therefore beneficial approach to the description of patient characteristics in project evaluations is first to include population parameters at initial pretreatment measures on the dependent variable involved. Second, when possible, posttreatment levels of‘ stability or change along these same measures should be reported in terms of patient characteristics. Finally, any findings should be at least minimally evaluated according to these characteristics. This approach will provide maximal comparability of projects from the start and will aid in the interpretation of results. Other patient variables relevant to treatment success should be reported as well when these may exert a suspected influence on the results. In heroin addiction treatment these variables will include the length of previous addiction and the definition of admission status as voluntary or involuntary. I t should be noted here that the procedure we have described for



1277

reporting project effectiveness in terms of patient characteristics is ideal and one in which other problems of interpretive error may occur, particularly for local evaluations whose sample sizes are often relatively small. Overinterpretation of results can ensue when an initially small sample is further reduced in analysis. Interpretation may then be based on “purified” cases which in reality may be confounded by other variables (Hyman et al., 1962). With larger samples such unknown factors will usually average out. Even when these problems are properly taken into account, we should expect that some description of the patient population would be included if only for qualitatively comparative purposes. Unfortunately, a large number of evaluations failed to make any reference at all to the type of patient being treated. This is a major problem in the level of generalizability and thus in the usefulness of results of project evaluations with important implications for administrators and evaluators alike. Extreme Variation in the Measurement of Outcome The variation in standards for reporting conceptually similar outcome measures seriously compromised our efforts to compare results across project evaluations. The major problem encountered here was the extreme specificity and thus lack of standardization of indicators employed to evaluate project effectiveness even within similar measurement areas. In order to compare the success of projects treating similar populations with similar treatment strategies, a group of common measures is necessary. It is difficult if not impossible to compare the effectiveness of two projects if success is measured along different sets of criteria. As Weiss (1970, pp. 34-35) states, “As different evaluation studies use common indicators of outcome (for example, scores on the same test) it becomes possible to begin to make comparisons about the relative effectiveness of one program against another.” Two major sources of variation were most problematic. First, outcome measures were sometimes reported in terms of the clients themselves and other times according to nonclient criteria. For example, many project evaluations reported success rates for change in patterns of addictive behavior as the percent of clients showing negative urine tests for drugs over a specified time period. Other reports presented these results as the percentage of urine tests themselves which were negative. Since more than one urine test may have been given to each client within the same measurement period, we cannot validly compare these rates. Second. even when the same data-base was used across a number of

I778



projects, some measures were so extremely project specific that any level of comparability was negated. Examples of these from the area of addictive behavior include: Total patient months with clean urine samples (Hoffman, 1970) Percent of patients abstinent at the 27th or 104th week of treatment (Goldstein and Judson, 1973; Chappell et al., 1973) Percent of patients abstinent at the 5th and 6th weeks versus the 1st and 2nd weeks of treatment (Nightingale and Goldberg, 1974) Percent of patients abstinent on any day (Bloom and Sudderth, 1970) Percent of patients abstinent at time of termination (Cuskey et al., 1971) These examples concern measures taken during treatment. The variation in follow-up measures after treatment was terminated posed similar problems for comparison. For example: Percent of patients reporting no drug abuse during 18-month followup period (Anderson et al., 1972) Percent of patients not readmitted for readdiction during 6-month follow-up (Resnick et al., 1970) Percent of patients narcotic free at 6 months after detoxification (Canada. 1972) These examples point out the tremendous variation in outcome criteria which we encountered in our survey of project evaluations. This variation was not merely a function of differential time periods at which measures were taken since statistical devices are available for their adjustment to other pcriods. Rather, the crucial problem was within the conceptualization of measurement itself. Although not advisable, we can convert a success rate originally taken at a 6-month interval and make estimates as to what the rate might have been if taken at 9 months. We cannot, however, adjust measures like “the percentage of patients using heroin” to *‘the percentage of treatment weeks.” An additional problem in outcome measurement was that both dynamic and static measures were employed in the evaluations. A dynamic measure is one which follows the same cohort of patients over time as. for example, “200, of the sample entering treatment showed all negative urines over the first 6 months.” Opposed to this is a static measure which reports a one-shot outcome measure for a specific point in time. I t may be reported, for example, that 75% of the treatment sample



1279

had negative urine tests in the sixth month of treatment. This is an entirely different measure than one which reports results over 6 months. The results from using these two measures must be interpreted differently since the dynamic measure gives a more inclusive basis for assessing treatment success. Confounding factors which may influence the results tend to average out over a longer period of time. The results from only one static measure, on the other hand, may strongly reflect such confounding variables and thus be merely a function of the choice of measurement time. Most evaluations reported either a dynamic or static measure but rarely included both. Extensive Neglect to Operationalize Measures Employed

In addition to the extreme specificity of outcome measures, the equally important problem of reporting outcomes in unoperationalized terms was encountered. This neglect was quite evident for measures of social adjustment and psychological state in particular. Examples which were found in the drug abuse literature included such measures as: 1. Good or fair adaptation of patients at the 33rd week of the program (Petursson and Preble, 1970) 2. Percent of clients ultimately achieving personal and social adjustment (Kissin and Sang, 1973) 3. Percent reporting problems were better at follow-up (Perkins and Bloch, 1971)

Unless the specific behaviors or instruments used to measure these larger concepts are known, their use is limited for comparison with other, similar evaluative data. Further, such measures often reflected little concern for the internal validity of the evaluative study by generating statements of effectiveness through subjective opinions of a therapist or counselor with no further mention of tests for reliability. Other Methodological Weaknesses

The importance of utilizing a research design which maximizes the level of external and internal validity of results has been discussed throughout the methodological literature; e.g., the need for adequate control groups and the inclusion of pretreatment measures. One methodological problem which has not been stressed is the phenomenon of the “shrinking” sample size. This methodological error involves basing results


I280

McCASLIN AND ERSHOF‘F

on a continually diminishing sample rather than on an entire admissions cohort. Patients who are lost to treatment are thus excluded from outcome analysis. Estimates of project effect which are based only on those patients remaining in treatment for a given length of time tend to be inflated since there is no consideration of “internal failures” but merely of “internal successes.” To illustrate this bias, let us say that 100 patients were admitted to methadone treatment and that employment rates were calculated after 6 and 12 treatment months. Let us also assume that 25 patients left treatment during the first 6 months and 25 more during the second 6 months. We might find that 50 patients were employed at the first measure and 40 at the second measure. If we convert these frequencies to percentages of those still in treatment at the measurement periods, that is, on a “shrinking” sample size, rates would be 66:{ employed at 6 months and 80‘)

A critique of recent economic evaluations of community water fluoridation.

The method of 'principlism': a critique of the critique.

A critique of stapedectomy.

A critique of principlism.

Codependency: a critique.

A transpersonal critique of behaviorism.

A critique of "fiber deficiency".

Cometabolism: a critique.

Euthanasia--a critique.

A critique of concept analysis.

Hysterical conversion. II: A critique.

Rock v. Arkansas: a critique.

The McMaster curriculum: a critique.

Biomarkers in Psychiatry - A Critique.

Critique of a Kleinian case presentation.

A critique of the 'novel ecosystem' concept.

A critique of the standardized mortality ratio.

A critique of the Barthel Index.

Regulation of mail-order pharmacy. A critique.

A Critique of Sociocultural Values in PBIS.

A critique of anthropological research on homosexuality.

Press self-regulation in Britain: a critique.

Mary and femininity: A psychological critique.

Theory formation in chapter VII: a critique.