Randomised controlled trials: understanding power.

BMJ 2015;350:h3229 doi: 10.1136/bmj.h3229 (Published 18 June 2015)

Page 1 of 3

Endgames

ENDGAMES STATISTICAL QUESTION

Randomised controlled trials: understanding power Philip Sedgwick reader in medical statistics and medical education Institute for Medical and Biomedical Education, St George’s, University of London, London, UK

The effects of manual lymph drainage on the development of lymphoedema related to breast cancer were investigated using a randomised controlled trial.1 The intervention was a six months’ treatment programme consisting of guidelines about prevention of lymphoedema, exercise therapy, and manual lymph drainage. Control treatment consisted of the same programme as the intervention but without the guidelines about manual lymph drainage. Participants were consecutive patients with breast cancer and unilateral axillary lymph node dissection. The length of follow-up was 12 months after surgery. The setting was hospitals in Belgium.

The outcome measures included the cumulative incidence of arm lymphoedema by follow-up. Arm lymphoedema was defined as an increase in arm volume of 200 mL or more in the value before surgery. The sample size was based on having 80% power to detect a difference between treatment groups of 20% in the cumulative incidence of arm lymphoedema, assuming a cumulative incidence of 30% for the control group at follow-up. The sample size calculation assumed a two sided hypothesis test and critical level of significance of 0.05 (5%). In total, 146 patients were required. To account for an estimated dropout rate of 10%, the required sample size was adjusted to 160 patients. In total, 160 patients were recruited, with 79 allocated to the intervention, and 81 allocated to control. Overall, 154 (96.3%) patients completed follow-up, with four patients in the intervention group and two in the control group lost to follow-up. At 12 months after surgery, the percentage of patients with arm lymphoedema was higher in the intervention group than in the control group, although the difference was not significant (24% (n=18) versus 19% (n=15); difference 5%, 95% confidence interval −8% to 18%; P=0.45). It was concluded that there was no evidence that manual lymph drainage in addition to guidelines and exercise therapy after axillary lymph node dissection for breast cancer reduced the incidence of arm lymphoedema in the short term.

Which of the following statements, if any, are true? a) The proposed difference of 20% between treatment groups in the primary outcome used to calculate the sample size was the smallest effect of clinical interest.

b) To have 100% statistical power would require sampling the entire population. c) The trial was overpowered for the statistical test of the primary outcome. d) Because the difference between treatment groups in the primary outcome was not significant, it can be assumed the intervention is equally as effective as the control.

Answers

Statements a, b, and c are true, whereas d is false.

The above trial was a superiority trial by design; the aim was to establish whether intervention was superior in effectiveness to the control treatment and reduced the cumulative incidence of arm lymphoedema, or whether the control treatment was superior. Superiority trials have been described in a previous question.2 Although it was anticipated that intervention would reduce the cumulative incidence of arm lymphoedema compared with the control treatment, sometimes results are unexpected and it was important that statistical hypothesis testing allowed for the possibility of the control treatment being superior. Therefore, traditional statistical hypothesis testing with a two sided alternative hypothesis was used to compare treatment groups in the primary outcome.3 When calculating the required sample size for the above trial, it was necessary to stipulate that the alternative hypothesis was two sided since it influences the required sample size. It was also necessary to indicate the critical level of significance when calculating the required sample size, although the standard level of significance of 0.05 (5%) is typically used when calculating the required sample size for trials. It was essential that the researchers calculated the optimal sample size before starting the trial. The required number of participants was based on the clinical significance of the difference between treatment groups in the primary outcome. It was assumed that the cumulative incidence of arm lymphoedema at 12 months would be 30% for the control group. The assumption was based on previous research. For the intervention to be considered clinically effective and superior to the control treatment, the intervention group was required to demonstrate a 20% reduction in the cumulative incidence of

Correspondence to: P Sedgwick [email protected] For personal use only: See rights and reprints http://www.bmj.com/permissions

Subscribe: http://www.bmj.com/subscribe

BMJ 2015;350:h3229 doi: 10.1136/bmj.h3229 (Published 18 June 2015)

Page 2 of 3

ENDGAMES

arm lymphoedema at follow-up. This difference in the primary outcome is called the smallest effect of clinical interest (a is true). The smallest effect of clinical interest was proposed by the researchers on the basis of clinical experience or previous research. Obviously, larger differences between treatment groups would show clinical superiority, whereas smaller differences would not. If the control group demonstrated a reduction of 20% or more in the cumulative incidence of arm lymphoedema when compared with intervention, statistical significance would also be demonstrated.

The smallest effect of interest may not exist for the population. That is, the difference in cumulative incidence of arm lymphoedema between treatments groups at 12 months follow-up that would be seen if the treatments were applied to the entire population may be less than 20%. However, if the smallest effect of clinical interest does exist for the population, then the probability it is observed in the trial as statistically significant needs to be maximised. To achieve this, an optimal sample size was required. This underlies the concept of statistical power. Statistical power is based on the hypothetical situation of repeating the above trial an infinite number of times and under the same conditions—in particular, the samples would be of the same size. Random sampling would be employed to select the samples from the population, and therefore the samples would have different sample estimates for the population parameter of the difference between treatments in the primary outcome. Each trial would involve a statistical hypothesis test, resulting in a P value for the comparison of treatments in effectiveness. These P values would vary in magnitude. The percentage of these repeated samples that would demonstrate the smallest effect of clinical interest (if it existed in the population) as a statistically significant difference (P

Randomised controlled trials: understanding effect sizes.

Randomised controlled clinical trials.

Randomised controlled trials?

Randomised controlled trials: subgroup analyses.

Clear obstacles and hidden challenges: understanding recruiter perspectives in six pragmatic randomised controlled trials.

Randomised controlled trials of surgical procedures.

Randomised controlled trials: balance in baseline characteristics.

Not only randomised controlled trials, but also controlled observational studies.

Randomised controlled trials: "within subject" versus "between subject" designs.

The possibility of critical realist randomised controlled trials.

Treatment resistant schizophrenia: a comprehensive survey of randomised controlled trials.

Can "realist" randomised controlled trials be genuinely realist?

The future of randomised controlled trials in urology.

Reporting of harms by randomised controlled trials in ophthalmology.

Ziconotide Monotherapy: A Systematic Review of Randomised Controlled Trials.

Randomised clinical trials.

Randomised consent trials.

Randomised clinical trials.

[Cluster randomised trials].

Non-compliance with randomised allocation and missing outcome data in randomised controlled trials evaluating surgical interventions: a systematic review.

A protocol for a systematic review of non-randomised evaluations of strategies to improve participant recruitment to randomised controlled trials.

Randomised trials in developing countries.

Conveying Equipoise during Recruitment for Clinical Trials: Qualitative Synthesis of Clinicians' Practices across Six Randomised Controlled Trials.

Pipelle for Pregnancy (PIP): study protocols for three randomised controlled trials.