Letter to the Editor

Comments on number-needed-to-treat derived from ordinal scales

Statistical Methods in Medical Research 2014, Vol. 23(1) 107–110 ! The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0962280212469202 smm.sagepub.com

Helmuth Zimmermann and Volker W Rahlfs

A well-known effect size measure in evidence-based medicine is the number-needed-to-treat (NNT) in order to get one more responder or one more patient with improvement. It is defined as the reciprocal of the simple risk difference of the two groups being compared (test group with innovation product and control group). The NNT effect size measure has been largely accepted in the scientific community. It was thus desirable to develop the NNT for other well-known effect size measures beyond pure risk differences. It was in this Journal that Kraemer1 advocated using the Mann–Whitney difference superiority measure P(X < Y)  P(Y < X) as a generalized risk difference and interpreted the reciprocal as NNT. Also there have been several more papers by Kraemer and coworkers recommending this type of interpretation.2–4 Several researchers in the biostatistical field accepted this definition or re-invented it with a different rationale. Quite recently there appeared a publication, in which it was shown that the Kraemer method did not fit with the usual procedure of a responder analysis, with responders obtained by dichotomization of an ordinal or continuous scale:5 The NNT number by the Kraemer method simply was much too small when compared with the well-known responder analysis NNT number. In the following we will show that the derivation rule currently being used for NNT is not correct and should be replaced by another one. Using this new rule the implausible difference between the two NNT derivations as presented by Furukawa and Leucht5 will vanish. We will clarify the relations between a risk difference and the Mann–Whitney superiority measure P(X < Y) þ 0.5P(X ¼ Y), in some publications also called area under curve (AUC) because this measure is identical with the AUC in the receiver operating graph (ROC), well-known in the field of medical diagnostics. Our arguments will become clearer when using the percentile-percentile (P-P) plot,6 which in principle is identical with the ROC graph. The P-P plot is a graph of an empirical distribution function (EDF) of one group against that of another group. Figure 1 gives the P-P plot for a binary scale with the observed risk difference (RD) as could be shown in a two-by-two table. Note that the perpendicular line from the P-P function to the diagonal is identical with the risk difference. Figure 1 shows the area of the triangle (O, P, I) [¼ Area of The Triangle (ATR)] which is exactly as large as half the risk difference RD.

idv-Data Analysis and Study Planning, 82152 Krailling/Munich, Konrad-Zuse-Bogen 17, Germany Corresponding author: Volker Rahlfs, Germany. Email: [email protected]

Downloaded from smm.sagepub.com by guest on April 12, 2015

108

Statistical Methods in Medical Research 23(1)

Figure 1. P-P plot for binary scale; one risk difference (RD).

The rationale is: The area of the triangle (O, P, I) can be seen as summed up by very small risk difference parts. As those go from (G  F) to zero in both directions their sum must be half of G  F, as the weights are F and (1  F) respectively. Thus: F

ðG  FÞ ðG  FÞ ðG  FÞ þ ð1  FÞ  ¼ 2 2 2

Thus the area MW  0.5 is identical with RD/2 (see also the results by Hou et al.7). Therefore RD ¼ 2 ATR ¼ 2(MW  0.5) ¼ Mann–Whitney Difference (MWD), but only in the dichotomous case. Now many researchers assumed that a generalized risk difference (GRD) based on an ordinal scale should also be equal to MWD. However, for scales with more than two categories (ordinal scales) this is not true. Figure 2 gives the P-P plot for an additional category, demonstrating the fact: For a convex graph the area 2 ATR is larger than any of the two RDs, as it consists of the original area ATR for binary data plus an additional triangle ATR1 ¼ (O, P1, P).

Downloaded from smm.sagepub.com by guest on April 12, 2015

Letter to the Editor

109

Figure 2. P-P plot for ternary scale (ordinal scale); two risk differences (RDs).

In the special case of shifted normal distributions with equal pffiffiffi zero and R 1variances and means , respectively, the MW-measure is pgiven by: PðX 5 YÞ ¼ GðFÞdF ¼ ð= 2Þ and thus 0 ffiffiffi pffiffiffi MWD ¼ PðX 5 YÞ  PðY 5 XÞ ¼ ð= 2Þ  ð= 2Þ (with () denoting the cumulative standard normal distribution). On the other hand the largest risk difference is RDmax ¼ (/2)  (/2) which is smaller than the RD, calculated as MWD. Thus every RD at any threshold is smaller than MWD. As an example: assuming  ¼ 2 the maximum available risk difference is RD ¼ 0.683; but the Mann– Whitney difference is 0.842. For continuous data this can be written in statistical terms as expected risk difference. Z1 Z1 EðRDÞ ¼ EðG  FÞ ¼ EðGÞ  EðFÞ ¼ G dF  F dF ¼ PðX 5 YÞ  1=2 0

0

Downloaded from smm.sagepub.com by guest on April 12, 2015

110

Statistical Methods in Medical Research 23(1)

Thus the area between the graph and the diagonal (¼ MW  0.5) is identical to the average risk difference estimable from the whole graph. We wanted to demonstrate that the centered Mann–Whitney measure (MW  0.5) can be interpreted as an average risk difference for ordinal scales (>2 categories) and that the Mann– Whitney difference MWD ¼ PðX 5 YÞ  PðY 5 XÞ ¼ ðMW  0:5Þ  2 is almost always too large. We will speak of an average RD (ARD) and consequently of a resulting average NNT. Based on this new definition an average RD of 6% percentage points corresponds to a MW ¼ 0.56 which according to Cohen’s effect size definition is a small difference. We emphasize that the NNT definition should be based on the average RD, and thus is simply half of the Mann– Whitney difference (the Mann–Whitney difference is larger than any observed risk difference, at least for convex graphs). Of note, in a recent publication it is shown by Furukawa and Leucht5 that the old method (here called Kraemer’s method) leads to very low numbers of NNT when compared with NNT numbers based on dichotomized scales (control event rate, CER) on the background of published data of clinical trials. This is easily explained by the reasoning shown above. Doubling all NNT numbers of Kraemer’s method, the resulting NNTs fit quite well to the numbers of the Furukawa and Leucht method. The only difference is in the meaning of parameters: Furukawa and Leucht give NNT numbers for different CER numbers whereas NNT numbers based on Mann–Whitney effect size (MW  0.5) give global or overall superiority across the whole scale. References 1. Kraemer HC. Correlation coefficients in medical research: from product moment correlation to the odds ratio. Stat Meth Med Res 2006; 15: 525–545. 2. Kraemer HCh, Morgan GA, Leech NL, et al. Measures of clinical significance. J Am Acad Child Adolesc Psych 2003; 42: 1524–1529. 3. Kraemer HC and Kupfer DJ. Size of treatment effects and their importance to clinical research and practice. Biol Psychiatry 2006; 59: 990–996. 4. Kraemer HC. Toward non-parametric and clinically meaningful moderators and mediators. Stat Med 2008; 27: 1679–1692.

5. Furukawa TA and Leucht S. How to obtain NNT from Cohen’s d: comparison of two methods. PLOS One 2011; 6: 1–5. 6. Wilk MB and Gnanadesikan R. Probability plotting methods for the analysis of data. Biometrika 1968; 55: 1–17. 7. Hou Y, Ding V, Li K, et al. Two new covariate adjustment methods for non-inferiority assessment of binary clinical trials data. J Biopharm Statist 2011; 21: 77–93.

Downloaded from smm.sagepub.com by guest on April 12, 2015

Comments on number-needed-to-treat derived from ordinal scales.

Comments on number-needed-to-treat derived from ordinal scales. - PDF Download Free
282KB Sizes 2 Downloads 0 Views