NIMH Collaborative Research Treatment of Depression To the Editor.\p=m-\Thelong-awaited article on the National Institute of Mental on

Health (NIMH) research program

comparing interpersonal psychotherapy (IPT), cognitive behavior (CBT), placebo case management (PLA-CM) and imipramine case management (imipramine-CM) has finally arrived!1 Outstanding is the lack of lucidity with regard to just what specific hypotheses are being tested. Since the meaningful pairwise contrasts are not stipulated, counterproductive and arbitrary significance level "adjustments" were inflicted on the data, thus obscuring the findings.

It should be noted that all six contrasts between the four treatments are of clinical interest and public health importance. We want to know the relative merits of all the treatments. The logic of multigroup data analysis often proves difficult for the cally sophisticated reader, so I present a parable indicating why the Bonferroni correction, selected as the familywise type I error control strategy in this report, interferes with seeing real differences between the treatments. (A type I error results when chance variation is con¬ sidered real.) Let us say you

hired to conduct IPT vs CBT. You contrast the treatments on 40 varia¬ bles and find that on 15 variables you have significant differences at the pre¬ set .05, two-tailed, type I error rate per comparison. Plainly this is fiction. You are about to happily write up your discoveries of real substantive differences, when your boss tells you that you are actually part of a larger study. Down in the basement, patients from the same pool were randomized to receive imipramine-CM. Therefore, you are actually part of a three-group study. You should therefore Bonferroni adjust your values to a critical value of .017 (.05/3) to preserve the .05 familywise a rate. Unfortunately, of your 40 comparisons, you only have 3 at the .017 level. But this still exceeds chance, so you write up a more con¬ servative report that still affirms a few real differences between the two a

are

study comparing

psychotherapies.

However, your amnesic boss returns to say that he meant to tell you, up in

the attic they were also conducting a PLA-CM component of this trial. Since the patients were randomized to

four different cells allowing six con¬ trasts, you should only accept signifi¬ cant pairwise contrasts at the .008 (.05/6) level. Well, sad to say, you do not have any pairwise contrasts sig¬ nificant at the .008 level. Therefore, you tearfully burn your first manuscript, which presented your trailblazing discovery that the two psychotherapies were really dif¬ ferent from each other, and now write a less interesting (and probably less

valid) report indicating your inability to demonstrate differences. Mind you, this is not because your data have

changed, but because there were other patients in the attic and basement. Further, it does not matter what ac¬ tually happened to these other pa¬

tients. Does that make sense? Since in the NIMH study every pairwise comparison is meaningful, only a per-comparison type I error rate is important. The Bonferroni fam¬ ilywise "correction" simply loses power, with no compensating benefit. I believe a per-comparison basis for the analysis would have shown a real overall superiority of medication to psychotherapy as well as the superi¬ ority of both psychotherapies to the PLA-CM condition. One might wonder when you should ever use a familywise error rate. Let us

say you

are

manufacturing a motor

that has 20 crucial components. If any of those components fail, the motor will not work. The economics are such that 1 in 20 defective motors is the maximum acceptable number. What should the acceptable failure rate per component be? Obviously it should not be 5%. The chance that any one com¬ ponent will not fail is .95. If the com¬ ponents fail independently, the chance that all the components will not fail is .9520, which is .36. Therefore, the chance that at least 1 component fails is 1-.36 or .64. If you accept a 5% component failure rate, you will end up with a 64% motor failure rate. How can you get around this? Well, you can Bonferroni correct the .05 acceptable overall motor failure rate by dividing .05 by 20, yielding .0025. That means that the allowable chance that the individual component will not fail is very high, .9975, but .997520 is .951; therefore, only 4.9% of the mo¬ tors will not work, so you are where you want to be. If one overall decision will be made on the basis of numerous group com¬ parisons (and if any group comparison

Downloaded From: by a UNIVERSITY OF ADELAIDE LIBRARY User on 10/18/2017

falsely positive, the entire deci¬ sion would be erroneous), then you had better have very stringent rules for your group comparisons. But that is not the case when comparing the mer¬ its of multiple treatments since no overall judgment is being made, but rather a number of pairwise contrasts of relative value. Further, it is certainly not the case in a treatment trial that you have only one measure. Rather, you have numer¬ was

ous,

only partially redundant,

mea¬

If you were to accept the notion that every contrast is crucial, and you had four groups and five independent measures, a Bonferroni correction .00167 would ensure an (.05/30) to extraordinary inflation of the type II error rate, thus missing the boat. However, consistent findings on mul¬ tiple independent measures are one of our best defenses against type I er¬ sures.

=

rors.

Another important complication is the question of severity. It is common¬ place that for depressive outpatients there is a strong tendency to get well without treatment. Further, this is most apparent in those who are the least ill. Therefore, a difference be¬ tween an active treatment and pla¬ cebo, apparent in the severely ill pa¬ tients, may not exist in the mildly ill

patients.

This is immediately addressed by analysis of covariance (ANCOVA), us¬ ing baseline scores as a covariate. One looks at the slope, within each treat¬

scores against the If the treatment is largely inactive, those who started out badly end up relatively badly. Those who started out better end up rela¬ tively better. In contrast, if you had a wonderfully active treatment, every¬ one would end up well regardless of the initial degree of illness. In the first placebolike case, the slope of the re¬ gression would approximate 1, whereas in the second case, the slope would approximate 0. Whether treat¬ ment benefit relates to the initial se¬ verity is made obvious by the first analysis that occurs in an ANCOVA, carried out for the detection of slope differences. If slope heterogeneity is found, this is no tragedy, but a strong finding. Now, you not only know that the treatment is active, you can esti¬ mate the degree of initial severity necessary before the treatment should be employed. In a letter to Elkin et al1 written

ment, of the final initial

scores.

some

time before their data

were an¬

alyzed, I pointed out that the issue of severity was all important for the anal¬ ysis. A priori I suggested that in the patients with less severe illness no differences would be found, whereas in the more severe cases, there would

be a substantial difference between medication and placebo. It remained to be seen just where the psychother¬ apies would fall. In statistical analysis, even if you wish to address a specific pairwise contrast, it usually pays to embed this contrast within the entire four-group analysis, since this gives you a nar¬ rower estimate of your pooled error variance. Therefore, although the technical statistical maneuver might be

overall

four-group analysis, nonetheless, one decomposes this analysis to address the specific pairan

wise contrasts. I suggest the following plan for anal¬ ysis. The historically primary ques¬ tion was comparing the two psycho¬

therapies. Comparison One. —Does IPT differ from CBT? This is a straight pairwise comparison. Doing an ANCOVA, using initial status as the covariate, auto¬ matically provides the important pre¬ liminary analysis as to whether there is a relationship between initial status

and treatment effect. In this analysis you get two F tests. The logically prior F test is for heter¬

ogeneity (nonparallelism) of slope. The second F test is for group differ¬ ences, adjusted for the initial score, but assuming parallel slopes. A basic issue is that ANCOVA assumes that the slopes of initial status against final status are equal between treatment groups.

Hays2 states that you do not use the usual .05 level (as Elkin et al1 did across all four groups). You use a much more generous level because you wish to challenge the assumption that the slopes are parallel. If there is a good indication they are not parallel, an analysis to take this into account is necessary.

Hays suggests that a P

NIMH collaborative research on treatment of depression.

NIMH Collaborative Research Treatment of Depression To the Editor.\p=m-\Thelong-awaited article on the National Institute of Mental on Health (NIMH)...
565KB Sizes 0 Downloads 0 Views