Page 1 of 2

Endgames

ENDGAMES STATISTICAL QUESTION

Sample size: how many participants are needed in a cohort study? Philip Sedgwick reader in medical statistics and medical education Institute for Medical and Biomedical Education, St George’s, University of London, London, UK

Researchers estimated the prevalence of pertussis infection in school age children after implementation of the preschool pertussis booster vaccination in the UK in October 2001. A prospective cohort study design was used. Children aged 5-15 years who presented with persistent cough in primary care were recruited. Recruitment took place between November 2010 and December 2012. The primary outcome was pertussis infection, as diagnosed by an oral fluid anti-pertussis toxin IgG titre.1

The sample size was based on an anticipated 20% prevalence of pertussis in the study population. The precision of the estimate needed to be within five percentage points as assessed by the 95% confidence interval for the population prevalence—that is, a 95% confidence interval of 15% to 25%. The required sample size was 246. To allow for a potential 20% failure rate in obtaining oral fluid samples with sufficient total IgG for analysis, the required sample size was increased to 300.

In total, 294 children were recruited of whom 279 (94.9%) had sufficient total IgG for analysis. It was reported that 56 (20.1%, 95% confidence interval 15.4% to 24.8%) children had evidence of recent pertussis infection, including 39 (18.1%, 13.0% to 23.3%) of 215 children who had been fully vaccinated. It was concluded that a fifth of school age children who present in primary care with persistent cough have pertussis. Furthermore, the authors suggested that these findings might help inform consideration of the need for a pertussis booster vaccination during adolescence in the UK. Which of the following statements, if any, are true? a) An improvement in the precision of the sample estimate for the population prevalence would result in a 95% confidence interval of reduced width

b) An improvement in the precision of the sample estimate for the population prevalence would require a larger sample size c) The failure rate in obtaining oral fluid samples with sufficient total IgG for analysis was overestimated

Answers Statements a, b, and c are all true.

The aim of the study was to estimate the population prevalence of pertussis infection in school age children. A prospective cohort study design, described in a previous question,2 was used. A sample size calculation was performed before starting the study. This necessitated a statement of the anticipated population prevalence, plus the required precision of the estimate as assessed by the 95% confidence interval for the population prevalence. The anticipated population prevalence of 20% was based on previous research. The required precision of the sample estimate was to within five absolute percentage points—that is, a 95% confidence interval from 15% to 25%. A sample size of 246 was needed. The 95% confidence interval represents the inaccuracy of the sample prevalence in estimating the population parameter.3 It is an interval estimate, and there is a probability of 0.95 that the population parameter is contained between the limits of the interval. In the study above, a precision of within five absolute percentage points for an anticipated prevalence of 20% was needed, giving a 95% confidence interval from 15% to 25%. An improvement in precision for the above study—for example, a sample estimate within three percentage points, would result in a narrower 95% confidence interval of 17% to 23% (a is true). An improvement in precision requires an increased sample size (b is true); for example, to have precision within three percentage points for the estimated prevalence of 20% would require a sample size of 1537 participants.

When calculating the required sample size for a study, particularly for prospective designs, it is important to consider the potential for participants being lost to follow-up. If necessary the sample size should be adjusted. The amount by which the sample size should be adjusted is typically based on results from previous studies or possibly an informed approximation. In the study above, the required sample size was increased to 300 to account for a potential 20% failure rate in obtaining samples with sufficient total IgG for analysis. In total, 294 children were

[email protected] For personal use only: See rights and reprints http://www.bmj.com/permissions

Subscribe: http://www.bmj.com/subscribe

BMJ 2014;349:g6557 doi: 10.1136/bmj.g6557 (Published 31 October 2014)

Page 2 of 2

ENDGAMES

recruited to the study and 279 (94.9%) of them provided samples with sufficient total IgG for analysis. The failure rate was therefore 5.1%, and had been overestimated by the researchers when calculating the required sample size (c is true). The eventual sample size was greater than needed to estimate a population prevalence of 20% to within five percentage points.

The sample size calculation was based on an estimated population prevalence of 20%, with precision to within five absolute percentage points. Sometimes sample size calculations for a proportion or percentage are based on relative precision. For example, it may have been required that the sample estimate had a relative precision to within 10 percentage points for the estimated population prevalence of 20% for pertussis. A relative precision of 10% for the estimated prevalence is 2%, and therefore the 95% confidence interval for the population parameter would be 18% to 22%. The sample estimate of prevalence was similar in magnitude to the expected prevalence of 20%. It is unlikely that this was a coincidence—the authors were no doubt well informed through their own clinical experience and the experience of other researchers. If the magnitude of the sample estimate was appreciably different from the predicted prevalence of 20%, the precision of the sample estimate would differ from that originally specified. Sample size calculations should not be viewed as an exact science, but rather as providing a “ballpark” figure of the number of participants needed. More generally, a sample size calculation provides an indication of the potential length of the study and therefore the costs involved.

It was essential to calculate the required sample size when designing the above study. If the sample size had been too small it may have produced an estimate for the population prevalence of pertussis infection that was too imprecise to be useful for public health planning and prevention. Too large a sample may have used too many resources and taken an unacceptably long time to collect. Sample size calculations are as important for observational studies as they are for randomised controlled trials. For clinical trials, sample size calculations include consideration of statistical power and the smallest effect of clinical interest,4 5 whereas for observational studies, as in the study above, they may be based on the statistical precision of a sample estimate. Sample size requirements for observational studies that depend on, for example, the risk of a disease in groups exposed and unexposed to a risk factor will be described in a future question. Competing interests: None declared. 1

2 3 4 5

Wang K, Fry NK, Campbell H, Amirthalingam G, Harrison TG, Mant D, et al. Whooping cough in school age children presenting with persistent cough in UK primary care after introduction of the preschool pertussis booster vaccination: prospective cohort study. BMJ 2014;348:g3668. Sedgwick P. Prospective cohort studies: advantages and disadvantages. BMJ 2013;347:f6726. Sedgwick P. Understanding confidence intervals. BMJ 2014;349:g6051. Sedgwick P. Sample size: how many participants are needed in a trial? BMJ 2013;346:f1041. Sedgwick P. Cluster randomised controlled trials: sample size calculations. BMJ 2013;346:f2839.

Cite this as: BMJ 2014;349:g6557 © BMJ Publishing Group Ltd 2014

For personal use only: See rights and reprints http://www.bmj.com/permissions

Subscribe: http://www.bmj.com/subscribe