Statistical Analyses for Research in Arthritis When Assessing Change ~~

John E. Hewetl, Sharon K. Anderson, and Marian A. Minor Clinical trials are designed to compare the effects of two or more treatments on chronic diseases such as rheumatoid arthritis. A two-factor (group, time) experimental design with repeated measures on one factor (time)is often employed in these clinical trials. Here the group factor represents the different treatments. There arc a variety of different statistical methods that can be used in these designs to (1)compare the effects of the treatments and to (2) assess changes over time. This article is devoted to a presentation of the more commonly used methods with a discussion of their appropriateness for the problem of comparing the effects of the treatments only. Two :specific designs will be used in the presentation. In Design 1 the group factor consists of two independent groups (treatment and control or two different treatments), and the time factor consists of two levels (pre- and posttreatment). Subjects are randomly assigned to the two groups. Table 1 contains a diagram of this design. For this design we will discuss methods suitable for determining If the effects of the treatments are different. In some cases it will be to determine if one treatment is better than the other, i.e., treatment better than the control. In other cases we just want to know if the effects of the two treatments are different. In Design 2 the group factor also consists of two John E. Hewett, PhI), is Professor of Statistics and Internal Medicine and a member of the Biostatistics Group, Sharon K. Anderson, MA is a member of the Biostatistics group, and Marian A. Minor, PhD, is Assistant Professor in Physical Therapy. All three are participants in the Missouri Arthritis Rehabilitation Research and Training Center, 1Jnivi:rsity of Missouri, Columbia, MO 65211. Address correspondence to John E. Hewett, Department of Statistics, 222 Mathematical Sciences Building, University of Missouri, Columbia, MO 65211. Submitted for publication July 15, 1991; accepted January 10, 1992 0 1992 by the Arthritis Health Professions Associations. 74

TABLE 3 Design 1

Time Group

Pre

Group 1 Group 2

X, X2

Post Treatment 1 Treatment 2

Y, Y2

independent groups (treatment and control or two different treatments), but the time factor consists of three levels (pretreatment, posttreatment and “longterm” follow-up). Table 2 contains a diagram of this design. For this design we are not only interested in determining if the effects of the treatrnents are different at the posttreatment time but also at the “longterm” follow-up time. Over the years, there have been many articles published that contain descriptions as well as illustrations crf a variety of different methods that pertain to the questions and designs of interest in this article. We propose to focus our attention on a few of them. One approach is to test the null hypothesis of no interaction between the group and time factors. The basic idea here is that if there is no interaction, then what is true for the prevalue is probably true for the postvalue. This implies that if there is no difference in prevalues, then there is probably no difference in postvalues either. The disadvantage of this approach is that if there is a significant interaction, the investigator still does not know the answer to the research question of whether the treatment effect is different for the two groups. Additional analyses must still be done in order to answer this question. ‘This approach will not answer without further analyses the question of whether a specified treatment is better than the other one. 0693-7 5:!4/92/$5 .O(I

Arthritis Care and Research

Analysis of Change 75

‘TABLE 2

treatment 1 w = {01 for for treatment 2. Design 2 Time

Group

Pre

Group 1 Group 2

X, X2

Treatment 1 Treatment 2

Post

Follow-up

y,

Z,

YZ

zz

A second approach is to compute a change score ( Y -- X where X and Y are the pre and post measurements, respectively) and test the null hypothesis of no group differences relative to the mean change score. This can bt: done with either a parametric or a nonparametric analysis. Computing this change score does seem like the natural thing to do. However, Cronbach and Furby [l] argued that change scores are seldom useful After their article appeared, many articles followed with some authors supporting the use of change scores and others agreeing with Cronbach and Furby. Knapp [2] presents additional arguments against the use of change scores. The primary argument against the use of change scores is the claim that they are unreliable. However, Kogosa and LVillett [3] suggest that change scores may in fact he quite useful. Rogosa et al. [4] explained that change scores are unreliable in extreme but frequently cited examples. They argue that when subjects do not all change alike, then the change score is a good measure of change. The interested reader is encouraged to read the many articles on this topic cited in our list crf references. One disadvantage of the use of the change score is that it does not take into account the magnitude of the prevalue. That is, in experiments where there is a ceiling effect, if X is too big, then Y - X does not have a chance to be big. A third approach is to test the null hypothesis that the postmeans are not different. This can be done with either a parametric or nonparametric analysis. Again, this does not take into account the prevalue. Ehen though subjects may have been randomized to the two groups, with relatively small samples, the randomization may not be as effective as might be needed. A fourth approach is to employ a parametric analysis of covariance (ANCOVA). The parametric ANCOVA of interest here is where a regression model is employed with postmeasurement ( Y ) being the dependent variable, the premeasurement (X) being an independent variable, and a second independent variable W is included in the model where W is defined by

Egger et al. [5], Samuels [6], Egger and Reading [7], Laird IS], and Crager [9] all contain discussion describing the usefulness of this method. The fifth approach is to employ a nonparametric ANCOVA such as that proposed by Islam and Hewett [lo] where a regression line is fit to the set of data points from both groups, and the Wilcoxon Kank Sum Test is applied to the resulting residuals. The coordinates of the data points are the pre and postvalues. Both of the ANCOVA methods are desirable because lhey directly answer the question of whether the groups differ on the postvalue while taking into account the prevalue. Some investigators such as Menlo and Johnson [11] and Kaiser [12] have looked at percent change, i.e., 100 (Y- X ) / X as a way of adjusting for the prevalue. This approach has some serious problems. ( Y - X ) / X is the ratio of two random variables that are usually correlated. The resulting probability distribution is seldom symmetric. Furthermore, if X is large, Y X may not have the potential to be very large, hence, inferences could be quite misleading. In general in this article, we include nonparametric methods to cover the situations when the investigator is unable to argue that the data originate from normally distributed populations.

METHODS AND ILLUSTRATIONS The data set that we use to illustrate the methods was taken from Minor’s previously published data [13]from a study designed to compare the effects of different modes of exercise on arthritis patients. We use only a subset of the data generated from the study. The two groups of interest here are the water exercise group and a group we call the control group. The two sample sizes are 19 and 18, respectively. The variable of interest is peak oxygen uptake. We include the data set so the interested reader can use it to evaluate his or her own analyses. We also give the pre-, post-, and follow-up means and standard deviations for the two groups (Tables 3 and 4). We note that the groups do not differ rolative to the prevalue. In particular, the two-sample t test yields a p value of 0.13, and the Wilcoxon Rank Sum Test yields a p value of 0.07. In the remainder of this section, we consider Designs 1 and 2 separately. For each design we will pose the research question and then list analyses that

76 Hewett et al.

Vol. 5, No. 2, June 1992

TABLE 3 Peak Oxygen Uptake Raw Data Treatment

Control

Pre-

Post-

Follow-up

Pre-

Post-

Follow-up

>!O.l 2!1.7 :i8.8 3 9.7 17.3 25.7 12.4 :.9.0 19.0 ?6.0 ::6.0

22.4 24.8 19.5 23.0 19.9 25.0 19.0 25.0 21.0 16.0 18.0 26.0 23.0 26.0 19.0 23.0 22.0 22.0 26.0

19.6 19.1 24.7 24.2 27.0 28.0 21.0 21.0 19.0

17.2 29.0 12.5 9.2 26.8 18.1 13.6 15.7 17.2 16.8 14.0 18.0 16.8 19.0 16.0 18.0 17.0 16.0

17.4 17.8 16.3 16.6 20.0 17.4 20.1 19.0 19.2 16.6 15.4 20.0 14.0 21.0 20.0 18.0 19.0 21.0

16.7 21.9 13.3 18.5 18.9 17.8 13.3 21.0 24.0 21.0 20.0 19.0 17.0 23.0 19.0 16.0 16.0 23.0

:!5.0 20.0 26.0 '14.0 '16.0 "7.0 '17.0 .l9.0

17.0

20.0 18.2 23.0 29.0 20.0 22.0 23.0 25.0 27.0

are appropriate for answering the question. Each method is illustrated with a numerical example. Although we only list procedural commands from SAS, other packages such as SPSS, BMDP, etc. could also be used.

Design 1: Two Independent Groups with Preand Postmeasures 'The research question of interest is to determine if the two groups (treatment and control) differ relative to the change produced. We will discuss five analyses that could be used here. The question of whether the treatments are effective is not addressed. (1)Test the null hypothesis of no group by time interaction. This hypothesis is tested by computing the ratio of certain sums of squares that has an F distribution if the null hypothesis is true. If this null hypothesis is rejected, we know only that the experimental and control group means are not consistent with each other across the two time periods. We

'TABLE 4 Mean

Water Control

&

Standard Deviaton of Peak Oxygen Uptake Pre-

Post-

Follow-up

19.98 t 5.87 17.29 t 4.53

22.13 2 2.97 18.27 t 1.98

22.72 t 3.39 19.12 t 2.84

do not know which posttreatment mean is larger, or if the changes are different from the two treatments. This test can be executed by PROC GLM in SAS. (2) Test the null hypothesis that the two mean changes are the same by employing the two-sample t test on the change score. Here we compute Y - X and use these differences as the data points in the analysis. A significant result here indicates that the mean change occurring in the experimental group differs from that of the control group. The direction and magnitude of the mean change scores indicates where the difference occurred. It should be noted that this test is mathematically equivalent to t h e test of no interaction discussed in (1).However, a one-sided alternative that cannot be addressed with the test for interaction can be considered here. The procedures PROC TTEST, GLM, and NPARlWAY in SAS, using the change scores as data points, can all be used to execute this analysis. (3) Testing the null hypothesis that the posttreatment means are the same using ANCOVA with the pretreatment as a covariate. Note that other covariates can also be included. If the group effect is significant, we conclude after adjusting for the pretreatment value that the posttreatment means arc: different. The post-pre change can also be used as the comparison variable in the ANCOVA, but this yields the identical result as just using the postvalue. PROC GLM in SAS can be used to execute this analysis. (4) Test the null hypothesis of no difference in the treatment and control change score distributions. The Wilcoxon Rank Sum Test is used to do this. This is a nonparametric approach to the null hypothesis and test given in (2). If the test is significant, we conclude that the two change score distributions are different. A one-sided alternative can also be considered here. If the resulting test is significant, we conclude that the specified treatment is the more effective. The twosided test can be executed by using PROC NI'AR1 WAY of SAS with the change score as the comparison variable. (5) Test the null hypothesis that the joint distribution of the pre- and postmeasurements is the same in the treatment and control groups with the alternative of interest being that the conditional distribution of the postscore given the prescure is different for the two groups for at least one value of Y for some X. This is a nonparametric version of' ( 3 ) . This test is executed by fitting one regressionline to the data points (X, Y) for the combined groups and (:omparing the groups by conducting a Wilcoxon Rank Sum Test on the residuals. See Islam and Hewett [lo] for a complete description of this test. We emphasize the reporting of the p value in what

Arthritis Care and Research

follows. However, for a given a level (probability of a type I error), if p s a,we reject H, in favor of HA for that particular a. (1)The value of the F statistic for testing the null hypothesis of no group by time interaction is 0.48 (1 and 35 degrees of freedom]. This yields a p value of p = 0.49. The conclusion is that there is not a significant interaction, which implies that the relationship of the two group means does not change between pre and post. (2:)If one uses an F statistic (square of the t statistic] to test the null hypothesis of no difference in mean change, a value of 0.48 is obtained with 1 and 35 degrees of freedom. Note this is identical to the result given in (1)as it should be. The p value is p = 0.49. We conclude that the mean change scores are not different. Note that this is a more direct answer to the research question than what is obtained from the interaction test of (I). (31 The value of the F statistic for testing the null hypothesis that the posttreatment means are the same using ANCOVA with the pretreatment as a covariate is 17.36 (1 and 34 degrees of freedom). This yields a p value of p = 0.0002. We conclude that the postbalues tend to be different for the two groups if we adjust for the prevalue. When the change score is used as the dependent variable, we get the same exact values for the test statistic and p value. This happens because the variability of the prevalue has essentially been removed. (41 The p value resulting from the Wilcoxon Rank Sum Test to test the null hypothesis of no difference in the two change score distributions is p = 0.166. Thus, we conclude that the two change score distributions do not differ. (5) The nonparametric ANCOVA for testing the null hypothesis that the joint distribution of the preand postscores is the same for both groups yields a p value of p = 0.0005. Thus, we conclude that the two conditional distributions of the postvalues given the ptevalues are different. What does it all mean? The interaction test of (l), the F test of (2) for comparing the mean changes, and the Wilcoxon Rank Sum Test of (4) for comparing the change score distributions all yield p values greater than 0.05, which we interpret to be nonsupportive oi a group difference. i.e., nonsignificant. It should bt: noted that none of these tests makes use of the prescore. The change score, although it is a function of the prevalue, does not incorporate the level of the prevalue into the analysis. However, both the parametric and nonparametric covariate analyses yielded quite small p values, which we interpret as evidence that the postscore distributions are different when

Analysis of Change 77

PP

P

26 25 24 23 22R

21 R

20 19 -

’*:

n16.

R

P

fl@ P R RR

R

R

R R

P

R

15 14~,,,,,,,.,, . . ., .,, ,. ,.. , . , ,. .~, , . , .~ 0 10 20 30

, , . , r

40

iT4E

Figure 1. Plot of (pre-, post-) exercise data for treatment (P) and control (K)groups.

the prevalue is taken into account. The plot of the (pre-, post-) data in Figure 1 with the regression line whose equation is given by Post = 16.13

4-

0.22 Pre

gives tin indication of why the five tests yield their respective p values. In the plot of the data, P represents a point from the water group, and R represents a point from the control group. If you look at the relative positions of the points in the two-dimensional picture, you note that for given values of the pre value, the P’s tend to be higher on the plot than are the R’s.

Design 2 As in Design 1 we do not deal with the question of whether the treatments are effective. Thus, there are two primary questions to be answered with this design: (11Do the groups differ on the change that occurred from pre to post? That is, is the treatment effect different for the two groups? The five tests discussed for Design 1 are relevant for answering this question. [2j Are group differences that occurred pre to post (if any) maintained through the follow-up period? Another way to look at this second question is to ask whether the groups changed differtmtly between the post- and the follow-up measure with the hope that they didn’t. However, this is not the correct

78

Hewett et a].

approach because statistical tests are designed to detect change rather than establishing that no change took place. Thus, we are faced with the interesting question of how to answer (2). We now discuss some possible answers. (1)Test the null hypothesis that the group by time interacton is zero. If this is significant additional analyses are necessary to determine what differences exist. Several p values are involved before a complete answer is given. Thus, this is not a very direct way to answer the question. Proc GLM in SAS can be used. (2) Test the null hypothesis that the mean change (follow-up-pre-) score is not different between the two groups. This directly answers the question whether the pre- to follow-up change is different for the two groups. Note that a one-sided alternative can also be used here. PROC TTEST, GLM, and NPARlWAY in SAS can be used. (3) Conduct the analysis of covariance as described in (:3) of Design 1 with the follow-up measurement used in the analysis rather than the postvalue. The prevalue is still the covariate. This answers the question whether the groups differ relative to the followup value after adjusting for the prevalue. Again, note that the same result will be achieved whether the dependent variable is the follow-up measure or the change (follow-up-pre). (4,s) As has already been illustrated in (2) and (3), (4) and (5) for Design 2 will be the same as (4) and (5) of Design 1 if we replace the post measurement with the follow-up measurement. These analyses answer analogous questions to those answered in (4) and (5) of Design 1.The same computer packages can also be used. The following numerical results illustrate these tests again using the data from the Minor experiment. (1) The value of the F statistic for testing the null hypothesis of no group by time interaction is 0.34 (degrees of freedom 2 and 70). This yields a p value of p = 0.71. If we use the repeated statement in SAS that makes use of the multivariate nature of the data, we get a value of the F statistic of 1.23 with degrees of freedom 2 and 34 and p value p = 0.79. Note that this analysis makes use of a change score. We again conclude that there are no differences. (2) The value of the F statistic for testing the null hypothesis of no difference in the mean change (follow-up-pre) is 0.29, which yields a p value of 0.59. Again no difference is detected in the mean changes. (3) The analysis of covariance F statistic for testing the null hypothesis that the follow-up treatment means are the same adjusting for the prevalue is 9.01, which yields a p value of 0.005.

Vol. 5, No. 2 , June 1992

(4) ‘The p value for the Wilcoxon Rank Sum Test for testing the null hypothesis of no difference in the two change score distributions is p = 0.447, which implies that there is no difference in the change score distributions. (5) The nonparametric ANCOVA for testing for null hypothesis that the joint distribution of the pre- and follow-up scores is the same for both groups yields a p value of p = 0.016, which implies the follow-up scores are different between the two groups after adjusting for the prevalue. As was the case with Design 1, the analyses did not show the groups to be different when the prescore was not used as a covariate. However, when the prevalue is used as a covariate, there appears to be a difference between the two groups relative to the follow-up variable. The difference is in the same direction as it was in the analysis using the postvalue, so we can conclude that the difference that occurred at the post was in fact maintained at the follow-up.

COMMENTS As indicated in the Introduction, the references contain many different discussions about the pros and cons of using change scores. Our purpose is to recommend what we feel are the most appropriate statistical methods to answer the research questions of interest. Our recommendation is based on the premise that the statistical analysis that most directly answers the question is preferred. We first remind the reader of some facts. Note that for Design 1, the F test for interaction is equivalent to the t test for comparing the mean changes. We interpret this to mean that if you do not believe in change scores, you should not do the interaction test either. Furthermore, the test for significant interaction does not directly answer the question. In our example neither of these produced significant differences. Recall that Egger and Reading [7] recommended the use of the ANCOVA with the prevalue as the covariate. Laird [8]also supports this approach when the treatment assignment is random. When we employ this methodology to our data, we discover that there are group differences. Recommendation For both Designs 1 and 2, it is imperative when one is making group comparisons to use the information contained in the prevalues when the prevalue is correlated with the postvalue. The most efficient way of doing this is to employ an ANCOVA. Thus, our recommendation for the reader

Arthritis Care and Research

Analysis of Change 79

is to employ either a parametric or n o ~ ~ a r a ~ e t r i c15. ANCOVA when testing to determine if the two groups differ for the two factor design that we have been discussing.

W e wish to thank Dixie L. Fingerson for conducting the computer assisted literature search on change scores, which was a great help to us in compiling our list of references,

REFERENCES 1. Cronbach LJ, Furby L: How we should measure

“change”-or should we? Psychol Bull 74:68-80,1970 2. Knapp T: The (un)reliability of change scores in counseling research. Measurement and Evaluation in Guidance 13349-157, 1980 3. Kogosa DR, Willett JB: Demonstrating the reliability of the difference score in measurement of change. J Education Measurement 20:335-34 3, 1983 4. Rogosa DR, Rrandt D, Zimowski M: A growth curve approach to the measurement of change. Psychol Bull 90:726-748, 1982 5. Egger M,Coleman M, Ward J, Reading 1, Williams H: Uses and abuses of analysis of covariance in clinical trials. Controlled Clin Trials 6:12-24, 1985 6. Samuels M: Use of analysis of covariance in clinical trials: a clarification. Controlled Clin Trials 7:325-329, 3 986 7. Egger M, Reading J: Uses and abuses of analysis of covariance: further discussion. Controlled Clin Trials ~:33a-331,1986 8. Laird N: Further comparative analyses of pretest-posttest research designs. The American Statistician 37:329330,1983 9. Crager M: Analysis of covariance in parallel-group clinical trials with pretreatment baselines. Biometrics 43:895-901, 1987 10. Islam MZ, Hewett JE: Tests for equality of populations with multivariate data for ordered alternatives. Mathematics Sciences Technical Report No. 139 Department of Statistics, University of Missouri, Columbia, k10 65211, 1987 11. Menlo A, Johnson M: The use of percentage gain as a means toward the assessment of individual achievement. California ] Educational Research 22:193-200, 1971

12. Kaiser L: Adjusting for baseline: change or percentage change? Stat Med 8:1183-1190 1 3 . Minor M: Efficacy of physical conditioning exercise in rheumatoid arthritis and osteoarthritis. Arthritis Rheum 32:1396-1405, 1989 1 4 . Benjamin L: Remarks on behalf of change scores and associated correlational statistics: a response to the Etaughs. Dev Psychol 8:180-183, 1973

Bereiter C: Some persisting dilemmas in the measurement of change. In Harris CW (ed): Problems in Measuring Change. Madison, Univ. of Wisconsin Press, 1963 16. Berger-Gross V: Difference score measures of social perceptions revisited: a comparison of alternatives. Organizational Behavior and Human Performance 29279285, 1982 17. Blomqvist N: On the relation between change and initial value. J Am Stat Assoc 72:746-749, 1977 18. Bohrnstedt GW: Observations on the measurement of change. In Borgatta EF, Bohrnstedt GW (eds): Sociological Methodology. Sari Francisco, Jossey-Bass, 1969 19. Bond I,: On the base-free measure of change proposed by Tucker, Damarin, and Messick. Psychometrika 44~351-355, 1979 20. Brogan DR, Kutner MH: Comparative analyses of pretest-posttest research designs. Am Stat 34~229-232, 1980 21. Corder-Bolz C: The evaluation of change. Educational and Psychological Measurement 38:959-976, 1978 22. Cox DR, McCullagh R: Some aspects of analysis of covariance. Biometrics 38:541-554, 1982 23. Davis C: The effect of regression to the mean in epidemiologic and clinical studies. Am J Epidemiol 104:493-498, 1976 24. Delany H, Maxwell S: On using analysis of covariance in repeated measures designs. Multivariate Behavioral Research 16:105-123, 1981 25. Etaugh AF, Etaugh CF: Overlap: hypothesis of tautology? Dev Psychol 6:340-342, 1972 26. Fleiss JL: Comment of Overall and Woodward’s asseried paradox concerning the measurement of change. Psychol Bull 83774-775, 1976 27. Fortune J, Hutson B: Selecting models for measuring change when true experimental conditions do not exist. J Educ Res 77:197-206, 1984 28. Gardner R, Neufeld: Use of the simple change score in correlation analyses. Educational and Psychological Measurement 47:849-864, 1987 29. Glass GV: Response to Traub’s “Note on the Reliability of Residual Change Scores.” Journal of Educational Measurement 5:265-267, 1968 30. Grieve AP: Letter to the editor. Am Stat 35:177-178, 1981 31. Gupta J, Srivastava A, Sharma K: On the optimum predictive potential of change measure. J Experimental Education 56:124-1 27, 1988 32. Harris CW: Problems in measuring change. Madison, University of Wisconsin Press, 1963 33. Howard GS, Ralph KM, Gulanick NA, Maxwell SE, Nance SW, Gerber SK: Internal Invalidity in pretestposi test self report evaluations and reevaluations of retrospective pretests. Applied Psychological Measurement 3:l-23, 1979 34. Huck SW, McLean RA: IJsing a repeated measures ANCOVA to analyze the data from a pretest-posttest de-

80

Hewett et al.

sign: a potentially confusing task. Psychol Bull 82:511-

Vol. 5, No. 2. June 1992

51. O’Connor EF Jr: response to Cronbach and Furby’s “How

518, 1975

35. James KE: Regression toward the mean in uncontrolled clinical studies. Biometrics 29:121-130, 1973 36. Johns G: Difference score measures of organizational behavior variables: a critique. Organizational Behavior and Human Performance 27:443-463, 1981 37. Kenny DA: A quasi-experimental approach to assessing treatment effects in the nonequivalent control group design. Psychol Bull 82:345-362, 1975 38. Kessler RC: The use of change scores as criteria in longitudinal survey research. Quality and quantity 11:43-66, 1977 39. Levin JR: Letters to the editor. The American Statistician 35:178-179, 1981 40. Levin JR, Marascuilo LA: Post hoc analysis of repeated rrieasures interactions and gain scores: Whither the inconsistency? Psychol Bull 84:247-248, 1977 41. Linn RL, Slinde JA: The determination of the significance of change between pre- and posttesting periods. Rev Educational Research 47:121-150, 1977 42. Lord FM: The measurement of growth. Educational and Psychological Measurement 116:421-437, 1956 43. Lord FM: Further problems in the measurement of change. Education Psychological Measurement 18:437454, 1958 44. Lord FM: Elementary models for measuring change. In Harris CW (ed) Problems in Measuring Change. Madison, University of Wisconsin Press, 1963 45. Lord FM: A p a a d o x in the interpretation of group comparisons. Psychol Bull 68:304-305, 1967 46. Lord FM: Statistical adjustments when comparing preexisting groups. Psychol Bull 72:336-337, 1969 47. hlarks E, Martin CG: Further comments relating to the measurement of change. Am Educational Research J 101179-191, 1973 48. hlaxwell S, Johnson M: The use of percentage gain as a means toward the assessment of individual achievement. California Journal of Educational Research 22:193-200, 1971 49. hlessick S: Denoting the base-free measure of change. Psychometrika 46:215-217, 1981 !50. O’Connor EF JI: Extending classical test theory to the measurement c’f change. Rev of Educational Research 42:73-97, 1972

52.

53.

54.

55.

56.

We Should Measure ‘Change’--or should we?“ Psychol Bull 78:159-160, 1972 Overall JE, and Woodward JA: Unreliabilitv of differenc.e scores: a paradox for measurement 111’ change. Psychol Bull 82:85-86, 1975 Overall JE, Woodward JA: Reassertion of the paradoxical power of tests of significance based on unreliable difference scores. Psychol Bull 83:776--777, 1976 Schafer WD: Letter to the editor. The .4merican Statistician 35:179, 1981 Sharma KK, Gupta JK: Optimum reliability of gain scores. J Experimental Education 54:105-108, 1986 Thorndike EL: The influence of chance imperfections of measures upon the relationship of initial score to gain or loss. J Experimental Psychology 7:225-232, 19214

57. Traub RE: A note on the reliability of residual change scores. J Educational Measurement 4:253--256, 1967 58. Tucker LR, Damarin F, Messick S: A base free measure of change. Psychometrika 31:457-473, 1966 59. Van Belle G; Uhlmann R, Huges J, Larscln 1:: Reliability of estimates of changes in mental status test performance in senile dementia of the Alzheimer type. J Clin Epidemiol 43:589-595, 1990 60. Williams RH, Zimmerman DW: The reliability of difference scores when errors are correlated. Educational and Psychological Measurement 37:679-689, 1977 61. Williams R, Zimmerman W: The conipar,itive reliability of simple and residualized difference scores. J Experimental Education 51:94-97, 1982 62. Williams RH, Zimmerman DW: Empirical esiimates of the validity of four measures of change. Percept Mot Skills 58:891-896, 1984 63. Zirnmerman DW, Williams RH: Gain scores in research can be highly reliable. J Educational Measurement 19:149-154, 1982 64. Zimmerman D, Andrews D, Robinson D, Williams R: A note on non parallelism of pretest and posttest measures in assessing change. J Experimental Education 53:234-236, 1985 65. Zirnmerman D, Williams R: The relative error magnitude in three measures of change. Psychometrika 471141-147, 1982

Statistical analyses for research in arthritis when assessing change.

Statistical Analyses for Research in Arthritis When Assessing Change ~~ John E. Hewetl, Sharon K. Anderson, and Marian A. Minor Clinical trials are d...
727KB Sizes 0 Downloads 0 Views