The Power of Statistical Studies in Consultation-Liaison Psychiatry JOSEPH BROWN, PH.D. MAHLON

S.

HALE, M.D.

Sereral allfhors rece//tly hal'e proclaimed the need for empirically hased research articles ill consultation-liaison psychiatry, The allfhors report that although the proportion of empirically hased studies puhlishl'd ill Psychosomatics illcreased /48% from /979 to /989, the power of statistical analyses alld the deleterious effect of multiple tl'sts were often neglected, A power allalysis of empirical studies puhlished in the /989 \'Olume year (~lPsychosomatics is reported. showillg statistical power to he low for all hut the most rohust ofeffect si:es.

W

ithin the past few years. several authors have proclaimed the need for more empirically based research articles in consultationliaison psychiatry. 1-, In fact, publication of such research in the major consultation-liaison psychiatry journal, Psychosomatics, has more than doubled in the last decade. In the 1989 journal year. 31 of 67 articles (46.3%) published were categorized as research articles by the editor of Psychosomatics. In contrast. only 14 of the 75 articles (18.7%) published in 1979 could be so classified. (In fairness, it must he mentioned that Psychosomatics had an appreciably different readership in 1979, with a different editor. which might mean that good research studies might have been rejected or not even submitted during this time.) Increased confidence in empirically based research can be hannful, however, when based only on a vague belief in the numbers without a more complete understanding of the process of statistical hypothesis testing. Articles that focus on these issues from the perspective of psychological research have recently appeared.~5 Similarly, many articles addressing some of these issues from a general psychiatry research perspective have been published,6--11 and a particuVOLUME 33· NUMBER 4· FALL 1992

larly good review of methodological issues from the consultation-liaison psychiatry perspective is also available.l~ After a review of some rudimentary statistical principles. we examine the power of statistical procedures as employed in recent consultation-liaison research. HYPOTHESIS TESTING REVISITED We conduct research with the hopes of being able to infer something meaningful from the data we have so arduously collected. In the process of making these inferences we often need to test hypotheses. Regardless of the topic studied or the statistical procedure used, we usually test the null hypothesis. That is. we generally test the hypothesis that there is no difference caused in some dependent measure Received April 17. 1991; revised Augusl 30. 1991; accepted September 6. 1991. From the Departmenl of Psychiatry. University of Connecticut School of Medicine. Farmington. Address reprint requests to Dr. Brown. Dept. of Psychiatry. University of Connecticut School of Medicine. Fannington. IT 06032. Copyright © 1992 The Academy of Psychosomatic Medicine.

437

Power of Statistical Studies

or variable by the different conditions of independent variables we have chosen to study. In essence, we decide to either reject the null hypothesis or we fail to reject the null hypothesis. Of course, for most researchers, rejection of the null hypothesis is the ultimate goal of the research endeavor. By rejecting the hypothesis that there is no difference in the dependent variable attributable to the various levels of our independent variables, we are suggesting that these independent variables (which we have so wisely chosen to study) are indeed important and worthy of study. Figure I depicts the simplest example of statistical decision-making in a two-by-two table. The columns of this table represent the two possible decisions we can make regarding the null hypothesis: I) to reject the null hypothesis or 2) to fail to reject the null hypothesis. The rows of this table represent the actual facts of reality: the null hypothesis is either I) true or 2) false. Of course we are rarely privy to this latter information, for if we were we would not need to rely upon statistical decision-making aids. In making the decision to not reject the null hypothesis, we like to have some idea of the probability that this decision is correct. This is equal to one minus the level of significance or alpha level (I - 0.), which defines the probability of being in the second cell of Figure I. That is, it defines the probability of failing to reject the null hypothesis when in fact the null hypothesis is true. An error in which the null hypothesis is falsely rejected is referred to as a Type I error. FIGURE I. Graphic presentation of the statisticat decision-making process STATISTICAL DECISION MADE

R E A

L I

T y

43X

Reject Null Hypothesis

Fail to Reject Null Hypothesis

Null Hypothesis True

Celli Probability = ulevel

Cell 2 Probability = I - ulevel

Null Hypothesis False

Cell 3 Probability = I - ~ level

Cell 4 Probabilily = ~ level

This state of affairs is represented in cell I of Figure I. We run the risk of making a Type I error only when we reject the null hypothesis. The probability of rejecting a true null hypothesis is equal to the alpha level chosen. It is the only type of statistical error over which we exhibit complete control. We chose the alpha level for each statistical test. Most often we choose 0.05 as a matter of convenience and tradition. There is an analogous error that can be made only when one fails to reject the null hypothesis. This is the Type II error, depicted as that state of affairs described in cell 4 of Figure I. It occurs when we fail to reject a false null hypothesis. The probability of making this error is defined by a numerical value labeled beta (~). It is not an easy error to control in that we can not specifically define it by simply choosing a beta level. Instead, it is indirectly controlled by the alpha level chosen, the number of subjects employed in the study, and the size of the effect we are hoping to discover. Cell 3 of Figure I represents those instances where a false null hypothesis is correctly rejected. The probability of being within cell 3 is the power of the statistical test. Type II error is an important concept because it helps to define the power of the statistical test, since power is equal to one minus beta (I - ~). The ability to estimate beta, therefore, helps define the power of an analysis to correctly reject a false null hypothesis. Since rejecting the null hypothesis is often the goal of researchers, one would think that knowing the probability of missing an opportunity to reject correctly the null hypothesis would be of much interest for researchers. The larger the alpha level and attendant Type I error rate we are willing to accept, the lower beta will be for any given combination of sample size and underlying effect size and, consequently, the higher will be the power of our analysis to correctly reject a false null hypothesis. Studies of statistical power in biological psychiatry research o.7 and psychology research!.' have found consistently that an actual small or medium effect would be detected only in about one-fifth to one-half of all instances. PSYCHOSOMATICS

Brown and Hale

POWER ANALYSIS OF RECENT CONSULTATION-LIAISON PSYCHIATRY RESEARCH An examination of the empirical research published in Psychosomatics during 1989 was conducted to determine the mean statistical power of each investigation. In such a power analysis it is the published data that are studied rather than the original raw data of each article. Thus, the 31 empirical studies published in Psychosomatics during 1989 constitute the subjects of this power analysis. Statistical tests reported in these studies were classified as being either central to hypotheses being tested by the research or peripheral to these hypotheses. Since much of the research was exploratory in nature. the vast majority of statistical tests were classified as being central to the articles. The power of all reported analyses, excluding approximately 130, was calculated based upon the number of subjects in each comparison and assuming an alpha level of 0.05 and universal use of nondirectional tests. (The approximately 130 statistical tests not included in the power analyses came from four constituent studies. Nine tests were not included because they used statistical procedures for which we had no formulae available for the computation of power. Approximately 121 analyses were not included from a single constituent study because results were reported in a way as to obscure the number of tests completed and the degrees of freedom comprising each test. In fact, the approximation of this number speaks to the poor reporting methods of this article's results. Suffice to say that of an estimated 128 statistical analyses from this single study. only 8 were reported because they were deemed to be statistically significant. This is only about one more significant finding than one would expect to find by chance alone among these many tests using the traditional alpha level of 0.05.) The power of each statistical test was determined by reference to the standard statistical reference for computing power values. 14 The power of each test is determined by the sample size studied, the alpha level used, and the actual underlying effect size VOLUME .'." NUMBER 4' FALL 1'192

being studied. Sample size was readily available from each study, and an alpha level of 0.05 was assumed for all studies. The only variable remaining to be defined to permit a power calculation is the underlying effect size. Simply put, the effect size is "the degree to which the phenomenon is present in the population."'4 Cohen has provided effect size values for a wide variety of statistical tests that serve to operationally define what are considered to be small, medium, and large effect sizes. '4 In this power analysis, the power of each statistical test was calculated for each of these three possible effect sizes. Finally, the mean power of all tests performed in each study was calculated for each of the three possible effect sizes. It is these power values that are the topic of this study. Table I shows the power characteristics of these empirical studies. Of the six studies reporting at least one peripheral statistical test, a total of 39 peripheral tests were performed, resulting in a mean ± SO of 6.5 ± 4.04 peripheral tests per study employing peripheral tests. The mean power of all peripheral tests was calculated on a study-by-study basis for three possible levels of the underlying effect size. TABLE I. Power characteristics of consultationliaison psychiatry research Power of Peripheral Statistical Tests Number of publicalions reponing al leasl one peripheral lest: Total numl>cr of pcripheraltests pcrfonned:

Strength of Underlying Effect Size

6 39

Mean Power

SO

Small

O.12X

O.OX3

Medium

O.5m

O.NI

Large

O.X47

0.170

Power of Central Statistical Tests Number of publicalions reponing at least one cenlrallest: TOlal numl>cr of centraltesls pcrfonned:

24 962

Strength of Underlying Effect Size

SJ)

Mean Power

Small

0.195

0.196

Medium

O.WO

O.2X6

Large

O.S39

0.199

Power of Statistical Studies

These power values are 0.128, 0.507, and 0.847 for small, medium, and large underlying effect sizes, respectively. Of the 23 studies reporting at least one central statistical test, a total of962 central tests were performed, resulting in a mean of 40.1 ± 42.54 central tests per study employing central tests. The mean power of all central tests was calculated on a study-by-study basis for three possible levels of the underlying effect size. These power values are 0.195, 0.600, and 0.839 for small, medium, and large underlying effect sizes, respectively. Table 2 shows the frequency distribution, cumulative percentage, and summary statistics of the power for central hypotheses tests in the 23 articles. Small, medium. and large population effect sizes are considered under nondirectional 0.05 level conditions. DISCUSSION Only when it is assumed that a large effect size exists does the mean power of these analyses approach a reasonable power level of 0.839. Assuming a medium effect size results in only TABLE 2.

Power O.l)

The power of statistical studies in consultation-liaison psychiatry.

Several authors recently have proclaimed the need for empirically based research articles in consultation-liaison psychiatry. The authors report that ...
846KB Sizes 0 Downloads 0 Views