Journal of Cenmioloi; v 1977. Vol. 32. No. I.'89-96

Interrupted Time Series Analysis: A Research Technique for Evaluating Social Programs for the Elderly1

After arguing that treatment programs for the elderly need to be evaluated with better research designs, the authors illustrate how interrupted time series analysis can be used to evaluate programs for the elderly when random assignment to experimental and control groups is not possible. Examples of how to use time series analysis for evaluating programs in institutions and community settings, as well as for posthoc analysis of social policy changes are provided. The authors explain how time series analysis minimizes threats to internal validity by making multiple assessments of the outcome variable both before and after introduction of the treatment. Other topics discussed in the paper include the further advantages of multiple time series analysis and the statistical tests used in assessing treatment results using this evaluation design.

LTHOUGH many innovative programs A for helping the elderly have been introduced in the past 10 years, very few of them have been evaluated by means of true experiments employing random assignment of clients to control and experimental conditions. The authors are in agreement with Fairweather (1967) that a higher percentage of social programs, including those for the elderly, could and should be evaluated by means of true experiments. However, like Campbell (1969) we are equally convinced that a number of good "quasiexperimental" techniques exist which would provide fairly unambiguous results with regard to a program's effectiveness in those situations where random assignment is not feasible. One such technique is interrupted time series analysis which has been described by Campbell in a number of publications (Campbell & Stanley, 'This article is a revision of a manuscript presented at the 28th annual convention of the Gerontological Society in Louisville, October, 1975. "Assistant Professor of Psychology, Univ. of Missouri-St. Louis, 8001 Natural Bridge Rd., St. Louis 63121. 'Research Associate, Dept. of Psychology, Michigan State Univ., Project Manager, MSU-NIMH Innovation Diffusion Project. 'Project Director, Nursing Home Consultation and Training Project, St. Lawrence Hospital CMHC, Lansing, Mich. Doctoral candidate, Dept. of Psychology, MSU. 'Persons interested in obtaining the computer program for interrupted time series analysis should contact Dr. Gene V. Glass, Laboratory of Education Research, Univ. of Colorado, Boulder 80302.

1966; Campbell, 1969). Glass, Wilson, and Gottman (1972) probably provide the most complete presentation of interrupted time series analysis. A computor program for performing time series analysis can be obtained from Glass for a nominal charge.5 The present paper presents a brief overview of interrupted time series analysis and provides illustrations of the use of the technique in a variety of situations. The examples include post-hoc analysis of changes in law or other policies as well as planned interventions. TIME SERIES AND LONGITUDINAL ANALYSIS

Definition. — In interrupted time series analysis the researcher makes multiple assessments of the outcome (dependent) variable prior to the introduction of a treatment in order to establish a baseline for comparison. Then the treatment (independent) variable is introduced and followed by additional assessments of the dependent variable to see if the treatment altered the baseline pattern on the dependent variable in terms of either slope or mean level. A control group is not essential, although the use of a control strengthens the 89

Downloaded from http://geronj.oxfordjournals.org/ at University of New South Wales on July 10, 2015

Robert J. Calsyn, PhD, 2 Esther O. Fergus, PhD, 3 and Jonathan L. York, MA4

90

CALSYN, FERGUS, AND YORK

Time series analysis, on the other hand, is a technique for evaluating the effectiveness of a planned treatment intervention. Consequently, the true experiment is the standard against which time series analysis is compared. THE LOGIC OF INTERRUPTED TIME SERIES ANALYSIS

The primary function of any research design, including time series analysis, is to rule out alternative explanations (rival hypotheses) of treatment effects, i.e., to make certain that any observed changes in people exposed to the program can be attributed to the efforts of the program and not to extraneous factors. Campbell and Stanley (1966) provide a list of eight potential threats to the internal validity of experiments which, if present in any research design, render suspect the conclusions from the research. Campbell and Stanley's definitions of the eight threats are listed below. 1. History, the specific events occurring between the first and second measurement in addition to the experimental variable. 2.Maturation, processes within the respondents operating as a function of the passage of time per se (not specific to the particular events), including growing older, growing hungrier, growing more tired, and the like. 3. Testing, the effects of taking a test upon the scores of a second testing. A.Instrumentation, in which changes in the calibration of a measuring instrument or changes in the observers or scorers used may produce changes in the obtained measurements. 5. Statistical regression, operating where groups have been selected on the basis of their extreme scores. 6. Biases resulting in differential selection of respondents for the comparison groups. 7. Experimental mortality, or differential loss of respondents! from the comparison groups. ^-Interactions with selection. In some designs the method of selection can interact with the other threats to validity and be confounded with the effect of the experimental treatment.

The preferred strategy for controlling threats to internal validity and ruling out alternative explanations is random assignment to experimental and control groups. However, when the researchers cannot get an adequate control group, they often use the treatment group as its own control group. First, prior to initiating the treatment, an observation on the dependent variable is made; then the treatment is administered, followed by a later posttreatment observation on the dependent variable. This design has been labeled the one group pretest-

Downloaded from http://geronj.oxfordjournals.org/ at University of New South Wales on July 10, 2015

conclusions which can be made from time series analysis. The term multiple time series is used when time series analysis using a control is implied. If the modifier "multiple" is not present, single group time series analysis is implied. Advantages of longitudinal data. — Time series analysis is clearly in the tradition of longitudinal data analysis in contrast to crosssectional data analysis. Although the advantages and disadvantages of longitudinal versus cross-sectional studies have been discussed in greater detail elsewhere (Fairweather, 1967; Riley, Johnson, & Foner, 1972), it seems appropriate to summarize these issues relative to both research with the elderly and time series analysis. The disadvantages of longitudinal data analysis are more pragmatic inconveniences than conceptual weaknesses. It is usually more costly in terms of time and money to collect longitudinal data. Also there is the problem of how to handle loss of subjects (attrition). Finally the policy implication of a longitudinal study may be minimal by the time the study is completed because a program has been discontinued due to funding problems or for other reasons. Even in these cases longitudinal data often can be used to justify or reject new program efforts. Longitudinal data are clearly preferred over cross-sectional data if a primary research goal is to study change. This is often the case in gerontological research, since the elderly are subject to a multitude of biological and social changes. Longitudinal data are also indicated when the goal of the research is to evaluate the efficacy of a particular intervention or treatment. In designing programs to deal with the problems of the elderly researchers and practitioners need to know: (1) if there is a time delay before a treatment takes effect and/or (2) whether the effectiveness of a given treatment subsides over time. Although time series analysis is a longitudinal data analysis technique, it is important to distinguish it from other longitudinal techniques such as the study of cohorts (Riley et al. 1972) and panel studies (Kenny, 1975). These latter techniques are correlative techniques more appropriate for studying complex systems where no deliberate treatment has been introduced. The distinction between independent and dependent variables has little meaning for these types of analyses.

INTERRUPTED TIME SERIES

THREATS To VALIDITY O F TIME SERIES EXPERIMENTS

According to Campbell and Stanley, only instrumentation and history, are potential threats to the internal validity of time series designs. As mentioned previously an instrumentation threat to the internal validity of a research project can occur if the method of measuring the dependent variable changes during the course of the research project. Researchers using archival data such as medical records or social security records for their time series analysis must be on the lookout for instrumentation threats to the internal validity of the research project, since oftentimes program administrators want to introduce reforms in recordkeeping simultaneously with the introduction of a new treatment program. For example, a nursing home administrator interested in increasing family visits by instituting an orientation program for the families of residents might be tempted to revise the method for recording visits to get a more accurate count. Such a revision would be unwise, because it would be impossible to determine whether any change in the number of family visits was caused by the orientation program or a change in the record-keeping system. In most instances it is better to retain a somewhat poorer measurement device than to change the measuring instrument and risk not being able to interpret the result of the time series analysis. Instrumentation threats to validity can also occur with observational ratings if one observer is used for making observations before the treatment and a different observer is used for making posttreatment observations. In summary, while instrumentation threats to validity are potentially present in interrupted time series analysis, they usually can be avoided with some

planning by the researcher. History is the greatest threat to the internal validity of time series analysis. As indicated before, history refers to specific events, other than experimental treatment, occurring between the pretest observations and the posttest observations which might account for any observed change in the dependent variable. For example, consider a campaign by a local health planning group to reduce hospital costs of the elderly by persuading doctors to release patients earlier. If there is a reduction in the average number of days hospitalized, a number of alternative explanations must be ruled out before concluding that the persuasion campaign caused the reduction in days hospitalized. Did insurance companies change their policies regarding the number of days of hospitalization they will pay for? Was there an epidemic which created a shortage of hospital beds resulting in earlier release dates? Without a control group only an exhaustive historical analysis will rule out these alternative explanations. Ross, Campbell, and Glass (1970) provide an excellent example of this type of analysis in their study of the effect on traffic fatalities of the British Road Safety Act of 1967. Although mortality is not listed by Campbell and Stanley as a normal threat to the internal validity of time series experiments, experimental mortality in gerontological research due to death or serious illness is a ubiquitous phenomenon and precautions must be taken to guard against bias. The direction of bias (if any) caused by experimental mortality is not always the same. Experimental mortality would probably bias an evaluation in favor of finding a significant treatment effect (even if none actually exist) if participants who are initially low on the dependent variable drop out of the study. In other situations where participants who are initially high on the dependent variable drop out of the sample, experimental mortality might mask a significant treatment effect. The authors recommend analyzing the data in two ways. The first analysis would only include those participants who had complete data for all of the pre- and postmeasuring points. While this procedure may restrict the generalizability of the findings somewhat, it is the best way to insure internal validity. A second time series analysis can then be run including all of the participants regardless of whether they have complete data or not. Should results

Downloaded from http://geronj.oxfordjournals.org/ at University of New South Wales on July 10, 2015

posttest design (Design 2) by Campbell and Stanley. Unfortunately, although this research design is frequently used in human services research, including gerontological research, six of the eight threats to internal validity are potentially present (only selection and mortality are not potential threats). Time series analysis also uses the treatment group as its own control group, but unlike the one group pretest-posttest design the multiple assessments of the dependent variable make it possible to reduce the number of potential threats to internal validity to only two.

91

92

CALSYN, FERGUS, AND YORK

Using Multiple Time-Series to Control Effect of History As pointed out above, history is the major threat to the internal validity of time series studies, since procedures exist for controlling mortality and instrumentation threats. If a control group can be found, even the potential history threat can be eliminated from the time series studies. For example, in the previous example on reducing days of hospitalization, control groups might have been formed by finding hospitals in neighboring cities or states which were also exposed to the epidemic and the changes in insurance policies but did not receive the persuasion campaign. The control group for this multiple time series technique need not to have been formed by random assignment to be useful, since multiple time series analysis does not examine differences in mean values between control and experimental groups but rather looks at changes in level and slope within each group. Thus, in our example it would not be necessary for the control in the level or slope or the length of stay prior to treatment as the experimental hospital to do multiple time series analysis. Regardless of initial differences in length of stay between programs, the persuasion program would be considered effective only if there was a change in the level or slope or the length of stay measure in the experimental hospital but not in the control hospital. POSSIBLE TIME SERIES OUTCOMES

Fig. 1 is a diagrammatic presentation of some

possible outcomes from an interrupted time series analysis. Points Oi to 04 are measurements on some dependent variables taken prior to the introduction of treatment T (represented by a dotted vertical line). Points 05 to 08 are measurements on dependent variables taken after the initiation of the treatment. To conclude that the treatment had an effect there must be either a change in mean level or a change in the slope of the dependent variable after the treatment. Thus, assuming statistical significance, the researcher would be tempted to say that the •



A

0, 0 2 0 3 0 4 0 5 0 6 0 7 0 8 Fig. 1. Some possible outcome patterns from the introduction of an experimental variable at point T into a time series of measurements, 0,-0»-

Downloaded from http://geronj.oxfordjournals.org/ at University of New South Wales on July 10, 2015

of this second analysis replicate the findings of the first analysis, the researchers can state their conclusions confidently with no loss in generalizability due to experimental mortality. If the results of the two analyses do not confirm each other, more confidence should be placed in the first analysis since the potential threat to internal validity due to experimental mortality has been controlled for. To further determine what group differences might account for the discrepant results of the two analyses, those people who had complete data can be compared against those with missing data on initial level in the outcome variable, demographic characteristics, and other available information. Thus, while the researcher must be cognizant of experimental mortality, procedures do exist to eliminate this bias from time series analysis.

INTERRUPTED TIME SERIES

mental hospitals and traffic accidents. With only two observations a program will appear effective or ineffective depending at what point in the cycle the treatment is introduced. The multiple observations required for time series analysis avoids this problem. STATISTICS

A complete discussion of the assumptions of various statistical methods used for analyzing time series data is beyond the scope of this paper. The reader is referred to Glass et al. for a complete mathematical presentation of the various statistical models and their assumptions. However, a more intuitive presentation may aid the potential user of time series analysis in selecting the appropriate statistical model. Pattern F in Fig. 1 illustrates why a simple comparison of the average pretest scores minus the average posttest scores provides an invalid conclusion. The mean of points 05 to 08 would obviously be greater than the mean of Oi to 04, erroneously indicating a significant treatment effect. Thus, a statistic is needed which examines the slope of the time series to see if it has changed as a result of the treatment. Patterns A and C in Fig. 1 illustrate why the statistic must also be able to detect a jump in level of the dependent variable after the initiation of the treatment. There is no change in slope in patterns A and C, but there is a decided jump in the level of dependent variable. In patterns A and C, if the line formed by the pretreatment observations were extrapolated to intercept at T, it would not coincide with an extrapolation from the posttreatment observations to T. Thus, the statistic used in time series analysis must include both a slope parameter and an intercept parameter. Similarly the statistic should also include a parameter which takes into account the fact adjacent observations will oftentime be more similar than nonadjacent observations. For example, 05 will generally be more like 04 than Oi. The statistics discussed by Glass et al. include such a parameter. However, there will be those cyclical situations where nonadjacent points will be more similar than adjacent points. For example, the number of nursing home residents receiving visitors around Christmas is probably more similar to the number of residents receiving visitors the previous Christmas than to the number of visitors received 2 weeks prior to Christmas. The usual strategy in

Downloaded from http://geronj.oxfordjournals.org/ at University of New South Wales on July 10, 2015

treatment had an effect in patterns A through E in Fig. 1, but no effect in patterns F and G. Pattern A depicts a situation where the treatment caused a change in the mean level of the dependent variable but no change in slope. Pattern B is similar to pattern A except the effect of the treatment dissipated by the time of the second posttfeatment measurement. Pattern C is also similar to pattern A except the effect of the treatment is not immediate and does not appear until the second posttest measurement. Pattern D depicts the situation where the treatment caused a change in the slope of the dependent variable. The authors expect this pattern would be very common in gerontological research. For example, reality orientation programs might not be able to restore patients to their previous level of cognitive functioning, but the program might be able to slow down the rate at which the cognitive functioning of patients was decreasing. Pattern E depicts a situation where the treatment affects the mean level and the slope of the dependent variable. An example of this type of pattern would be a reality orientation program that not only decreased the rate of deterioration, but also resulted in a partial restoration of cognitive functioning which was maintained over time. Close examination of patterns F and G clearly illustrates the necessity for multiple measurements when there is no control group. If the researchers had used the one group pretest-posttest and made only one observation before the treatment (e.g. 04) and one observation after the treatment (e.g. 05), they would have mistakenly concluded that the treatment had an effect on the dependent variable. However, with the additional observations prior to 04 the researcher can see that 05 is not an effect of treatment and merely represents a continuation of the pattern which existed prior to the initiation of the treatment. Thus, pattern F might depict a physical therapy exercise group for stroke patients which would be deemed quite successful if one looked only at points 04 and 0s. However, time-series analysis would reveal that the improvement was a natural process which had not been accelerated in any way by the physical therapy program. Pattern G depicts the dangers of using the one group pretestposttest design with a dependent variable subject to seasonal fluctuations. Examples of variables which are frequently subject to seasonal fluctuations include admissions to

93

CALSYN, FERGUS, AND YORK

94

time series analysis is to remove any cyclical trends before performing any tests of statistical significance. Although there are a number of mathematical models which could be used to represent time series data, a brief presentation of one of the models (the integrated moving average model with deterministic drift) presented by Glass, Tiao, and Maquire (1971) may be informative. t-1 dyIi=l

(i,

Z, is the value of the variable observed at time t, L is a fixed but unknown location parameter y is a parameter descriptive of the degree of interdependence of the observations in the time series and takes values (Ky

Interrupted time series analysis: a research technique for evaluating social programs for the elderly.

Journal of Cenmioloi; v 1977. Vol. 32. No. I.'89-96 Interrupted Time Series Analysis: A Research Technique for Evaluating Social Programs for the Eld...
912KB Sizes 0 Downloads 0 Views