The Journal of Foot & Ankle Surgery xxx (2014) 1–2

Contents lists available at ScienceDirect

The Journal of Foot & Ankle Surgery journal homepage: www.jfas.org

Investigators’ Corner

Counting Your Chickens before They’re Hatched: Power Analysis Daniel C. Jupiter, PhD Assistant Professor, Department of Preventive Medicine and Community Health, The University of Texas Medical Branch, Galveston, TX

a r t i c l e i n f o

a b s t r a c t

Keywords: hypothesis testing power analysis sample size statistical significance Type II error

How does an investigator know that he has enough subjects in his study design to have the predicted outcomes appear statistically significant? In this Investigators’ Corner I discuss why such planning is necessary, give an intuitive introduction to the calculations needed to determine required sample sizes, and hint at some of the more technical difficulties inherent in this aspect of study planning. Ó 2014 by the American College of Foot and Ankle Surgeons. All rights reserved.

Financial Disclosure: None reported. Conflict of Interest: None reported. Address correspondence to: Daniel C. Jupiter, PhD, 2401 South 31st Street, Temple, TX 76508. E-mail address: [email protected]

In earlier Investigators’ Corners, I discussed the logic of the p value and how to construct statistical proof (1). I explained how to plan studies that looked for significant similarity rather than significant difference (2). And I considered the meaning of the non-rejection of the null hypothesis (3). I showed that in the non-rejection of the null is a conundrum, a situation in which one can say nothing conclusive: either there is no difference, or one just did not see the difference that is actually there. If the former 2 Investigators’ Corners concerned proper planning for study design, the latter was the disaster control, the guide to follow when best plans fail. In this Investigators’ Corner I am attempting to bring the threads together in order to show how to use knowledge of the p value to avoid situations in which researchers falsely fail to reject the null hypothesis (or make a Type II error, in the technical language of statistics). The question we researchers are faced with, then, is this: How can we, properly using the p value to guide our design, ensure with high probability that we declare as statistically significant differences that actually exist? Or, in the case of non-inferiority or equivalence studies: How can we properly use the p value to ensure, with high probability, that we declare as significant the equivalences that actually exist. This is called planning power analysis. In order to gain a feel for what is needed in a power analysis, without getting mired in technical details, I present a simple example. In a study with 2 arms, say 2 possible diabetic ulcer healing protocols, the outcome measure is a continuous variable: length of time to healing (let’s assume, for simplicity’s sake, that all wounds heal within a finite time and that, within the timeframe of the imagined study, there is no need for Kaplan-Meier curves or censorship). The study authors, through pilot studies, are fairly confident that the 2 protocols differ in length of time to healing; in fact, under 1 protocol, wounds heal, on average, 5 weeks faster than under the other protocol. The authors also have a sense of the variability of time to healing within both of the populations. The randomized clinical trial then will be to use the 2 protocols in 2 distinct groups of suitably chosen patients and

1067-2516/$ - see front matter Ó 2014 by the American College of Foot and Ankle Surgeons. All rights reserved. http://dx.doi.org/10.1053/j.jfas.2014.05.001

2

D.C. Jupiter / The Journal of Foot & Ankle Surgery xxx (2014) 1–2

to test the difference in time to healing. The statistical test used in this situation is the Student’s t test. Now, let’s approach the t test a little differently than usual. We think of this test, like many of our statistical tools, as a black box into which we pour numbers and out of which comes a p value. The truth, of course, is subtler and more mysterious. An examination of the computations involved in producing a p value from the t test reveals that the test does not need raw data: the formula for the t value, and through it the p value, includes the following: 1. The mean of the measurements in each of the 2 study groups 2. The standard deviation of the measurements in each of the 2 study groups 3. The number of subjects in each of the 2 study groups No raw data at all. Now let’s work backward from the formula. We know what p value we want (p  .05) and thus what t value we need, and we know, as just described, the means and standard deviations in each group. Let’s simply unwind the t value formula and extract from it the number of subjects! When we do so, we have, using our assumptions about the data and the results that we expect to see, determined how many subjects we need in each group in order to see that difference as being statistically significant. The final piece in the planning puzzle has just slid into place, and we can approach our regulatory and granting agencies with a solid notion of the logistical requirements for our study. This thought experiment is, naturally, an oversimplification, as the extraction of the number of subjects from the t test formula is not quite as trivial a computation as suggested. But the overall flavor and the general outline of the procedure are intuitively correct. For other statistical tests (e.g., ANOVA, linear regression, paired Student’s t test, Fisher’s exact test, tests of proportions, Cox regression) the derivation of the desired number of subjects will, of course, differ, with the general approach remaining the same. Not mentioned as yet are 2 technical concerns with power analysis. The first technical concern is that most statistical tests have

some distributional assumptions about the data that they analyze. The t test, for example, requires that, within a certain tolerance, the data being tested are normally distributed. In planning a study and executing power analyses, the investigator must keep in mind whether his data meet the requirements of the tests he plans to use. The second, closely related, technical concern is that of nonparametric statistical tests. These, the subject of a future Investigators’ Corner, are sometimes used when the data do not meet the aforementioned distributional requirements. To power these statistical tests is a challenging task, which will be discussed in that Investigators’ Corner. I considered, in the 2-armed example study presented earlier in this column, a case in which I was simply looking for the appearance of any difference, however small, between 2 groups. As mentioned in an earlier Investigators’ Corner (2), I often focus on finding clinically significant differences, and in this case, I would want to look to see if the 2 groups differ by at least a given fixed amount. Using the same general outline as in this example study, this contingency can be planned for, as can non-inferiority or equivalence. The computations are slightly more involved but of a piece with my simple example. Here is a final note concerning counting chickens. It may feel strange to predict your results before actually doing the study that produces those results. It is thus with great care and with complete honesty involving our estimates that power analysis must be done. Look only for real differences and, in doing power analysis, underestimate the difference you actually think the data will demonstrate and overestimate the variability within the data. In doing so, you will hedge your bets, guess high about the number of subjects needed, and avoid the trap of non-rejection of the null. References 1. Jupiter D. Mind your p values [investigators’ corner]. J Foot Ankle Surg 52:138–139, 2013. 2. Jupiter D. Anything you can do I can do better [investigators’ corner]. J Foot Ankle Surg 53:252–253, 2014. 3. Jupiter D. Turning a negative into a positive [investigators’ corner]. J Foot Ankle Surg 52:556–557, 2013.

Counting your chickens before they're hatched: power analysis.

How does an investigator know that he has enough subjects in his study design to have the predicted outcomes appear statistically significant? In this...
395KB Sizes 2 Downloads 3 Views