Program Evaluation Techniques in the Health Services JACK MEREDITH, PHD

Abstract: This article addresses the issue of program evaluation in the area of health services; examples are drawn from the field of mental health. Current arguments concerning the goals, characteristics, and methodologies of program evaluation are discussed

and two generally useful quantitative evaluation models are,presented. The models are compared and their advantages for clinicians and administrators are detailed. (Am. J. Public Health 66:1069-1073, 1976)

Introduction

goals) analysis thus appears to place evaluation as a planning tool. There is no need to make a choice between these two viewpoints. Evaluation can, and should, be used for both planning and control and thus serve as an all-important feedback link between these crucial functions. This paper compares two general-purpose program evaluation models that can be used for both planning and control. The models are illustrated with applications in the complex, ill-defined area of mental health. To illustrate the models it is assumed that psychiatric cases can be specified objectively, outcome criteria can be designated and agreed upon, treatment programs remain stable, and patient selection, treatment selection, and outside influences are controlled or accounted for. Clearly, few situations will satisfy all these assumptions and thus the models' outputs must be tempered with experience, judgment, intuition, and other qualitative factors before a final decision is reached. If so tempered, such models can provide important information for the clinician or administrator.

Program evaluation is currently a subject of major attention in the health services field. Much of this attention has centered around the definition of "evaluation" itself. Simply stated, the purpose of evaluation is to find out what worked, what did not work, and why.' However, the proper use of the evaluation results is frequently a source of disagreement. Levey and Loomba2 distinguish between on-going evaluation and retrospective evaluation as follows: The purpose of on-going evaluation is to measure progress toward program goals so as to direct control over a program. In contrast, retrospective evaluation is conducted to determine the effect of a program so as to facilitate program planning. If, as is generally accepted, program evaluation is the determination of the degree of progress in achieving program objectives, then evaluation is a tool for control. However, there also exists the opposite view-that the consideration and evaluation of predetermined goals is not only unnecessary but also possibly contaminating.3 The concern here is that the evaluator will develop tunnel vision by accepting the validity of the program goals and thus overlook other, perhaps more important, effects of the program. This viewpoint of evaluation as an unbiased (in terms of the program's

Address reprint requests to Dr. Jack Meredith, Associate Professor of Management, Florida International University, Tamiami Trail, Miami, FL 33199. This paper was submitted to the Journal November 17, 1975, revised, and accepted for publication June 1, 1976.

AJPH November, 1976, Vol. 66, No. 11

An Index Model The first model, adapted from Halpern and Binner,4 is an index model that examines a program's effectiveness (how well it achieves its goals) and its efficiency (results per unit cost). Summary ratios for these two measures are developed from program value and program cost data. Mental health programs may result in changed individual functioning, intervention, and protection of the individual and/or so1 069

MEREDITH

ciety. However, change in the functioning of the individual is perhaps the most important result. A surrogate measure of the value of individual functioning is suggested by Halpern and Binner, based on the individual's primary means of contributing to society: his economic productivity. Since the patient's most recent economic productivity may not accurately reflect his potential due to his current mental handicap, his economic productivity is based instead on the average of his previous 12 months' earnings and the expected annual earnings of an average member of his educational and occupational peer group. Annual earnings is used, rather than total expected future earnings, to adknowledge the general impermanence of the improved level of functioning due to the program. A lower bound on earnings, the minimum annual wage, is imposed to reflect society's implied minimal worth of all individuals. This obviates the difficulties of inferring a wage for housewives, students, the severely retarded, children, the retired, and other such groups. The thought of using an individual's personal economic productivity as a surrogate for program value may be anathema to some because this suggests accepting only the rich or the working male into treatment programs in order to obtain the most cost-effective programs. Thus, a more desirable alternative may be to utilize the mean national (or regional) income so as to equally value all patients. Another alternative would be to develop monetary values based on broader concepts than wages. For simplicity's sake, however, we will illustrate the example here with individual economic productivity. The surrogate measure of improved functioning is then the product of the patient's average annual productivity and an index of change.* The index of change is a rough measure of the effect of the program on the individual. Table I lists TABLE 1-Specification of Indices of Change (A Model Input) Discharge Level of Impairment Admission Level of Impairment

Slight Moderate Severe

None

Slight

Moderate

Severe

+40% +70% +100%

0 +40% +70%

-30% 0 +40%

-70% -30% 0

NOTE: 100 per cent or 1.0 represents progression from severe impairment to no impairment, 70 per cent or .7 from moderate to none, etc. Level of impairment categorizaton is preferably determined by place on a standardized scale whose reliability and validity have been established.

some examples of indices of change, on a gross scale, as a function of impairment at admission and discharge. Table 2 presents hypothetical data for an Incarceration program serving three individuals to illustrate the determina-

*This index may be derived through the use of scales of impairment in functioning whose reliability and validity have been established. Rehabilitation frequently uses such types of scales, for ex-

ample. 1070

tion of program value. The index of change is based on columns (4) and (5). The program value is then based on this index of actual change in column (6) and the average productivity in column (3). The overall program value index computed in the table, 477, is computed on a per patient basis so as to be comparable with other programs. The maximum program value possible in column (9) is found in the same manner as the program value in column (8) but the index of possible change is used instead of the actual. Thus, the index of possible change for Individual I whose admission level was moderate is, from Table 1, 70 per cent (discharge level of none). For Individual 2, the maximum is 100 per cent, and for Individual 3, it is the same value as before, 40 per cent. Program costs are more straightforward. The only difficulty here is the decision of whether to charge the program with only direct costs or whether to include indirect costs as well. If the patient spent different amounts of time in different cost statuses (hospital, family care, outpatient) within a program, a total cost figure may be obtained by multiplying the days spent in each status by the daily cost of the status and then summing over all statuses. Table 3 presents the hypothetical summary program evaluation indices for an agency with four drug treatment programs. The incarceration program's values, columns (2) & (3), were derived from Table 2; the values for the remaining programs would be derived in the same manner. These numbers are interpreted in relation to annual wages-the treatment home saves three times the per patient wages as incarceration (1,510 vs 477). The maximum possible value per program is shown in column (3) for comparison with column (2). Thus, incarceration saved 477 out of a possible 4,480 while methadone maintenance saved less, 450, but out of a much smaller possible, 1,753. Thus, methadone maintenance is less effective than incarceration in terms of absolute value but more effective in relation to its potential, as shown by the effectiveness index in column (6). Column (4) gives the program cost per patient and is the basis for the efficiency index in column (5). Of course, by utilizing the maximum possible program value, column (3), a maximum possible efficiency index could have been determined-in some cases it still may not exceed unity, a fact of significant interest. Comparing the indices in columns (5) and (6), it is seen that not all programs have an efficiency index exceeding unity. This is a reflection of the fact that not all programs are self-sustaining, in terms of investment-some are successful, some are not. Those least successful tend to be the pure custodial programs. Similarly, the four programs vary considerably in terms of their effectiveness with incarceration making a very poor showing and the treatment home looking very good. Note, however, that the most effective program is not the most efficient, nor is the most efficient (methadone detoxification) necessarily very effective. For that matter, the least effective (incarceration) here is not the most inefficient either. If these were real data we would conclude that the treatment home is not only unusually effective but also quite efficient. Incarceration is neither effective nor exceptionally efficient. Methadone maintenance is weak while methadone deAJPH November, 1976, Vol. 66, No. 11

EVALUATION TECHNIQUES IN HEALTH SERVICES TABLE 2-Example of the Determination of Program Value for Incarceration

(1)

(4)

(3)

(5)

(6)

(7)

Month's Earn.

Group Ave. Annual Earn.

Avg. of (1) & (2)

Admission Level

Discharge Level

Index of Change

7,400 4,680* 5,100

11,000 4,680 6,500

9,200 4,680 5,800

Mod. Sev. Slight

Severe Mod. None

+40% +400/o

Previous 12

Individual 1. Individual 2 Individual 3 Total (T)

(2)

-30%/0

Index of Poss. Change (Discharge Level of None)

+ 70% + 100% + 40%

(8)

(9)

Pgm. Value (3) x (6)

Pgm. Value Possible (3) x (7)

-2,760 1,872 2,320 1,432 477

Avg. = T/3

6,440 4,680 2,320 13,440 4,480

*Based on a minimum wage of $2.00 per hour.

ments against the index evaluation model when they state that ". . . the administrator of a mental health program may not be able to maximize his return on investment. In fact, he may have to follow strategies that lower his return, if he is to serve those who need his help most." This, of course, raises a fundamental question: Who should receive the benefit of the program's limited resources? For example, should the patient who can utilize them the most receive the resources or the patient who needs them the most? There are no methodological or technical answers to such moral and ethical questions.

toxification, although not extremely effective in terms of its potential, is an extremely efficient program. We might therefore move to expand treatment homes at the expense of incarceration and try to improve the effectiveness of methadone detoxification. Table 3 can also give overall efficiency and effectiveness indices for the agency's four programs. These are found by multiplying each program index by the number of patients in that program (column 1) and then dividing by the total group size: Overall = .82x3 + 1.16x57 + 1.28x81 + .76x 163 Agency 3 + 57 + 81 + 163 Efficiency = .97 .llx3 + .68x57 + .34x81 + .26x163 Overall Agency 3 + 57 + 81 + 163

The Markov Chain Model Eyman's5 6 statistical evaluation model, the Markov Chain, differs significantly from that of Halpern and Binner in that it focuses exclusively on the functional level of the patient, using it as the measure of change and the basis for evaluation. The Markov model delineates, via contingency tables, the movement of the patient (or groups of patients) along a scale in terms of the patient's initial position on the scale. Eyman used the Markov model to evaluate the effectiveness of a school program5 and an intensive treatment program6 in a hospital for the mentally retarded. However, the Markov model is much more powerful than these limited applications suggest. For example, as will be shown, program costs can be included in the model so that it has the potential of being combined with the index model advanced by Halpern and Binner. In addition, under very general conditions the model has been used7' 8 to predict the probable time-

Effectiveness = .36 The poor performance of incarceration is minimized in the overall indices due to the small number of patients in that program. In terms of efficiency, the agency is saving almost as much, in terms of patient productivity, as the program cost. The effectiveness of .36 indicates that they are achieving about one-third of the maximum possible achievablenot necessarily a poor showing. This example has compared different programs but the same model can also be applied to the same program at different time periods or to groups of patients categorized by means other than treatment programs; e.g., by sex, age, or diagnosis. Halpern and Binner4 point out one of the inevitable argu-

TABLE 3-Drug Treatment Program Evaluation Indices (Hypothetical)

(1)

Program

Incarceration Treatment Home Methadone Detroxification Methadone Maintenance

AJPH November, 1976, Vol. 66, No. 11

(2)

(3)

(4)

(5)

Efficiency = (2) / (4)

(6)

Effectveness Index = (2) / (3)

Program Value Poss.

Pgm. Cost

1,510

4,480 2,205

1,300

.82 1.16

.68

81

975

2,870

760

1.28

.34

163

450

1,753

590

.76

.26

No. of Patents

Pgm. Value

3 57

477

583

Index

.11

1071

MEREDITH

varying outcome of a treatment program on a patient. Furthermore, the model can handle more than one treatment program at a time so a sequential variation of treatments can also be analyzed. The basis of the Markov model is a one period, (e.g., one year) matrix which tabulates the probabilities of a patient changing ability or status levels over the duration of the period. These probabilities are typically obtained from historical cohort data concerning patient's changes in selfhelp abilities such as arm-hand use, toilet training, communication, etc. An example from a hospital for the mentally retarded is given in Table 4.

bers above the main diagonal than below it (e.g., 1970 Group I) indicate improvement in group functional ability. Conversely, larger numbers below the diagonal than above it (e.g., 1970 Group V) indicate deterioration. By adding the probabilities across each row it will be noticed that 2 per cent of each starting group is unaccounted for; this is the annual death rate for the particular group of patients used in this

study.8 To compare treatment programs, the probability matrices can be statistically compared to determine if significant differences exist; this was Eyman's5 objective. However, as mentioned earlier, the matrices may also be used, under certain general conditions,* to predict the probable outcome and cost-effectiveness of a series of treatment programs on a patient. Consider the set of programs and corresponding matrix shown in Table 5 for example. Table 5 shows the probabilities for an assumed set of treatment programs and their costs as a function of the ability level of the patients. Note that death is now explicitly considered so that all enumerations of the prognosis of the initial cohort (or, equivalently, the probabilities of movement for the individual patient) will be exhaustive. The result of using the set of programs in Table 5 is shown in Tables 6 and 7. The comparison of tables such as 6 and 7 for different sets of treatment programs (such as in Table 5) thus shows the administrator which program to use for each ability level. The mathematics involved in obtaining Tables 6 and 7, although not complex, will not be presented here since they are not relevant to the purpose of this paper. A simple summary of the methods may be found in the references 7, 8 or 9. Table 6 presents an example of the first set of supplementary information. This table shows the expected change in the distribution of groups due to the programs listed in Table 5 between 1970 and some future year; 1974 was used here as an example. As can be seen by comparing the column and row of total patients, the distribution is expected to shift considerably toward the higher groups in the four-year period. The number from each group expected to regress, progress, and die are detailed in the cells of the matrix. This information then tells the administrator what patient distribu-

TABLE 4-Markov Matrix of Probabilities of Moving between Ability Levels in One Year in "Standard Care" Program 1971 Ability Group

(percentage distribution)

1970 Ability

Group

I

II

III

IV

V

VI

1 11 III

72

20 49 20 02 0 0

06 31 51 28 05 0

0 04 24 50 35 0

0 0 02 17 47 04

0 0 0 01 11 94

IV V

VI

14 01

0 0 0

The horizontal rows in Table 4 correspond to groups of patients with successive levels of general ability, Group VI being the highest level. The time period covered in this example was July 1970 to July 1971 and the treatment program being evaluated was "standard" (custodial) care. The table indicates, for example, that 49 per cent of the patients in Group II in 1970 neither progressed nor regressed in their general ability level by 1971. However, 14 per cent of the group regressed and of the 35 per cent who progressed, 4 per cent advanced all the way to Group IV. Note that the largest numbers in each row (set in bold face type) are those along the diagonal, thus indicating the high probability of remaining in the starting group at the end of the period. Larger num-

*These conditions are discussed in Kemeny and Snell.9

TABLE 5-Varying Treatment Program Matrix with Costs 1971 Ability Group

1970 Ability Group

I 11

ill IV V

VI Death

(percentage distribution) II

III

IV

72 11 0 0

20 43 0

06 37 51 01

0 07 30 48

0 0 0

0 0 0

03 0 0

32 0 0

Program

Std. Care Behav. Mod School Intensive Training School Placement

14

Cost Death ($/Pat./Yr.)

V

VI

0 0 03 48

0 0 0 01

02 02 02 02

7675 8600 10030 12220

50 04 0

13 94 0

02 02 100

10030 2016 0

Note: Each ability group is placed in the specific program indicated.

1 072

AJPH November, 1976, Vol. 66, No. 11

EVALUATION TECHNIQUES IN HEALTH SERVICES TABLE 6-Expected Change in Cohort Distribution from 1970 to 1974 with Varying Treatment Programs 1974 Ability Group 1970 Ability Group

II

III

IV

V

VI

D

Total Patients

89

72

95

133

118

58

50

221 112 114 96

73 12 4 0

46 16 9 1

0 0 0

0 0 0

27 27 34 30 14 1 0

11 19 32 36 18 2 0

0 3 8 18 14 15 0

18 9 9 8

52 20 0

46 26 18 3 2 0 0

1 11 III IV V VI D

4 2 0

tion he will have in any future year he desires from which he may determine what resources he will need to service those patients; of course, distant projections have less reliability than near projections. Table 6 was generated, specifically for the 1970 cohort shown, from a four-year probability matrix similar to that of Table 5. The next three sets of supplementary information are all shown in Table 7. This table lists the number of years a patient is expected to spend in each of the groups before either dying or being placed in a community foster home (limited to Group VI as per Table 5) for the first time. The total cost the patient is expected to incur up to this event is also shown and was derived by multiplying the years spent in each group by the annual cost of the program selected for that group (from Table 5). Lastly, the table gives the probability of each of the outcomes occurring first. Table 7 shows, for example, that a patient in the lowest ability group (I) is expected to spend 4.4 years in that group, including time spent there due to regressing from higher groups, about two and one-half years in Groups II and III each, and about four and one-half years in Groups IV and V each before he either dies or is placed in a foster home, for a total of 18.3 years spent in the hospital at a total cost of almost $180,000. The likelihood is almost twice as great that he will be placed in a foster home rather than dying first, a very positive prognosis for a patient in this group. TABLE 7-Expected Stay rimes in Years with Varying Treatment Programs Until Either First Placement or Death Outcome

Ability Group

Total

Probability

1970

Ability

Place-

Group

I

II

Ill

IV

V

Years

1 11 III IV V

4.4 1.1 0.4 0.1 0.1

2.2 2.8 0.9 0.2 0.2

2.8 2.6 3.2 0.6 0.6

4.6 4.9 5.1 5.6 3.9

4.5 4.8 5.1 5.4 5.8

18.3 16.2 14.6 11.9 10.5

AJPH November, 1976, Vol. 66, No. 11

Cost ($/Pat) ment

179,500 166,400 155,400 130,900 113,000

.64 .68 .71

.76 .79

Death

.36 .32

.29 .24 .21

Lastly, the Markov model is not limited to evaluating only those programs with which an agency has experience. For instance, if a new mode of operant conditioning becomes available or a modification to an existing program is contemplated, subjective estimates of the transition probabilities could equally well be used in the transition probability matrix. The effect of the new or modified program could then be determined in a manner similar to Tables 6 and 7. This would also give the effect on the total cost to the hospital and thus ascribe an equivalent worth to the program.

Discussion Although the Markov model and the Halpern and Binner model require a considerable amount of effort to use, even more effort is required to provide accurate, meaningful input data for the models. Clearly, the statistical manipulation of inaccurate data is worse than useless-it may well be misleading. Thus, to use these models for the purpose they were intended requires an enormous amount of careful work ascertaining the reliability and validity of the raw data. In addition, an appropriate scoring and recording system must be designed. monitored, and properly utilized. And finally, the results of the model analyses must be interpreted and very carefully explained lest incorrect inferences be drawn. Furthermore, any evaluation model is only one tool in the total program evaluation process. The Markov model and the Halpern and Binner model appear to be especially useful in this regard. Undoubtedly, more evaluation models for the health services will be developed and become available as time progresses but during the interim the two models described here appear to possess the most general applicability for quantitative program evaluation. When used in conjunction with additional qualitative information and real world constraints these models should prove extremely helpful to the clinicians and health administrators faced with the task of program evaluation.

REFERENCES 1. Weikel, K. Evaluation of national health programs. Am. J. Public Health 61:1801, 1971. 2. Levey, S. and Loomba, N. P. Health Care Administration, p. 421. Lippincott Co., 1973. 3. Scriven, M. Goal-free evaluation. Evaluation 1:62, 1973. 4. Halpern, J. and Binner, P. R. A model for an output value analysis of mental health programs. Administration in Mental Health 1:40-51, 1972. 5. Eyman, R. K., Tarjan, G.. and McGunigle, D. The Markov Chain as a method of evaluating schools for the mentally retarded. Am. J. Mental Deficiency 72:435-444, 1967. 6. Eyman, R. K., Tarjan, G., and Cassady, M. Natural history of acquisition of basic skills by hospitalized retarded patients. Am. J. Mental Deficiency 75:120-129. 1970. 7. Meredith, J. A Markovian analysis of a geriatric ward. Management Science 19:604-612, 1973. 8. Meredith, J. Program evaluation in a hospital for the mentally retarded. Am. J. Mental Deficiency 78:471-481, 1974. 9. Kemeny, J. C. and Snell. J. L. Finite Markov Chains. D. Van Nostrand, 1960.

1073

Program evaluation techniques in the health services.

Program Evaluation Techniques in the Health Services JACK MEREDITH, PHD Abstract: This article addresses the issue of program evaluation in the area...
808KB Sizes 0 Downloads 0 Views