Sample size: how many participants are needed in a cohort study?

BMJ 2014;349:g6557 doi: 10.1136/bmj.g6557 (Published 31 October 2014)

Page 1 of 2

Endgames

ENDGAMES STATISTICAL QUESTION

Sample size: how many participants are needed in a cohort study? Philip Sedgwick reader in medical statistics and medical education Institute for Medical and Biomedical Education, St George’s, University of London, London, UK

Researchers estimated the prevalence of pertussis infection in school age children after implementation of the preschool pertussis booster vaccination in the UK in October 2001. A prospective cohort study design was used. Children aged 5-15 years who presented with persistent cough in primary care were recruited. Recruitment took place between November 2010 and December 2012. The primary outcome was pertussis infection, as diagnosed by an oral fluid anti-pertussis toxin IgG titre.1

The sample size was based on an anticipated 20% prevalence of pertussis in the study population. The precision of the estimate needed to be within five percentage points as assessed by the 95% confidence interval for the population prevalence—that is, a 95% confidence interval of 15% to 25%. The required sample size was 246. To allow for a potential 20% failure rate in obtaining oral fluid samples with sufficient total IgG for analysis, the required sample size was increased to 300.

In total, 294 children were recruited of whom 279 (94.9%) had sufficient total IgG for analysis. It was reported that 56 (20.1%, 95% confidence interval 15.4% to 24.8%) children had evidence of recent pertussis infection, including 39 (18.1%, 13.0% to 23.3%) of 215 children who had been fully vaccinated. It was concluded that a fifth of school age children who present in primary care with persistent cough have pertussis. Furthermore, the authors suggested that these findings might help inform consideration of the need for a pertussis booster vaccination during adolescence in the UK. Which of the following statements, if any, are true? a) An improvement in the precision of the sample estimate for the population prevalence would result in a 95% confidence interval of reduced width

b) An improvement in the precision of the sample estimate for the population prevalence would require a larger sample size c) The failure rate in obtaining oral fluid samples with sufficient total IgG for analysis was overestimated

Answers Statements a, b, and c are all true.

The aim of the study was to estimate the population prevalence of pertussis infection in school age children. A prospective cohort study design, described in a previous question,2 was used. A sample size calculation was performed before starting the study. This necessitated a statement of the anticipated population prevalence, plus the required precision of the estimate as assessed by the 95% confidence interval for the population prevalence. The anticipated population prevalence of 20% was based on previous research. The required precision of the sample estimate was to within five absolute percentage points—that is, a 95% confidence interval from 15% to 25%. A sample size of 246 was needed. The 95% confidence interval represents the inaccuracy of the sample prevalence in estimating the population parameter.3 It is an interval estimate, and there is a probability of 0.95 that the population parameter is contained between the limits of the interval. In the study above, a precision of within five absolute percentage points for an anticipated prevalence of 20% was needed, giving a 95% confidence interval from 15% to 25%. An improvement in precision for the above study—for example, a sample estimate within three percentage points, would result in a narrower 95% confidence interval of 17% to 23% (a is true). An improvement in precision requires an increased sample size (b is true); for example, to have precision within three percentage points for the estimated prevalence of 20% would require a sample size of 1537 participants.

When calculating the required sample size for a study, particularly for prospective designs, it is important to consider the potential for participants being lost to follow-up. If necessary the sample size should be adjusted. The amount by which the sample size should be adjusted is typically based on results from previous studies or possibly an informed approximation. In the study above, the required sample size was increased to 300 to account for a potential 20% failure rate in obtaining samples with sufficient total IgG for analysis. In total, 294 children were

[email protected] For personal use only: See rights and reprints http://www.bmj.com/permissions

Subscribe: http://www.bmj.com/subscribe

BMJ 2014;349:g6557 doi: 10.1136/bmj.g6557 (Published 31 October 2014)

Page 2 of 2

ENDGAMES

recruited to the study and 279 (94.9%) of them provided samples with sufficient total IgG for analysis. The failure rate was therefore 5.1%, and had been overestimated by the researchers when calculating the required sample size (c is true). The eventual sample size was greater than needed to estimate a population prevalence of 20% to within five percentage points.

The sample size calculation was based on an estimated population prevalence of 20%, with precision to within five absolute percentage points. Sometimes sample size calculations for a proportion or percentage are based on relative precision. For example, it may have been required that the sample estimate had a relative precision to within 10 percentage points for the estimated population prevalence of 20% for pertussis. A relative precision of 10% for the estimated prevalence is 2%, and therefore the 95% confidence interval for the population parameter would be 18% to 22%. The sample estimate of prevalence was similar in magnitude to the expected prevalence of 20%. It is unlikely that this was a coincidence—the authors were no doubt well informed through their own clinical experience and the experience of other researchers. If the magnitude of the sample estimate was appreciably different from the predicted prevalence of 20%, the precision of the sample estimate would differ from that originally specified. Sample size calculations should not be viewed as an exact science, but rather as providing a “ballpark” figure of the number of participants needed. More generally, a sample size calculation provides an indication of the potential length of the study and therefore the costs involved.

It was essential to calculate the required sample size when designing the above study. If the sample size had been too small it may have produced an estimate for the population prevalence of pertussis infection that was too imprecise to be useful for public health planning and prevention. Too large a sample may have used too many resources and taken an unacceptably long time to collect. Sample size calculations are as important for observational studies as they are for randomised controlled trials. For clinical trials, sample size calculations include consideration of statistical power and the smallest effect of clinical interest,4 5 whereas for observational studies, as in the study above, they may be based on the statistical precision of a sample estimate. Sample size requirements for observational studies that depend on, for example, the risk of a disease in groups exposed and unexposed to a risk factor will be described in a future question. Competing interests: None declared. 1

2 3 4 5

Wang K, Fry NK, Campbell H, Amirthalingam G, Harrison TG, Mant D, et al. Whooping cough in school age children presenting with persistent cough in UK primary care after introduction of the preschool pertussis booster vaccination: prospective cohort study. BMJ 2014;348:g3668. Sedgwick P. Prospective cohort studies: advantages and disadvantages. BMJ 2013;347:f6726. Sedgwick P. Understanding confidence intervals. BMJ 2014;349:g6051. Sedgwick P. Sample size: how many participants are needed in a trial? BMJ 2013;346:f1041. Sedgwick P. Cluster randomised controlled trials: sample size calculations. BMJ 2013;346:f2839.

Cite this as: BMJ 2014;349:g6557 © BMJ Publishing Group Ltd 2014

For personal use only: See rights and reprints http://www.bmj.com/permissions

Subscribe: http://www.bmj.com/subscribe

Sample size: how many participants do I need in my research?

How many doctors are needed in general practice?

How many dimensions are needed to describe pain properly?

Assessment of maximal handgrip strength: how many attempts are needed?

How many longitudinal covariate measurements are needed for risk prediction?

How Many Genes are Needed to Resolve Phylogenetic Incongruence?

Effectiveness of removals of the invasive lionfish: how many dives are needed to deplete a reef?

How many radiographs are needed to detect angular stable head screw cut outs of the proximal humerus - a cadaver study.

Implementation of approach bias re-training in alcoholism-how many sessions are needed?

[Increasing shortage of anesthesiologists in France: how many are needed and when?].

How Many Is Enough? Effect of Sample Size in Inter-Subject Correlation Analysis of fMRI.

How many testers are needed to assure the usability of medical devices?

The Per Oral Endoscopic Myotomy (POEM) technique: how many preclinical procedures are needed to master it?

How many probes are needed for HLA-DPB1 typing with sequence-specific oligonucleotide probes? A theoretical approach using computer simulation.

How many suicide terrorists are suicidal?

How many scientific papers are not original?

How sample size influences research outcomes.

Erratum: How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

How many work hours are requisite to publish a manuscript?

How many clinic BP readings are needed to predict cardiovascular events as accurately as ambulatory BP monitoring?

How many genes are selected in populations of Dacus oleae.

Sample size for a dose-response study.

Sedentary Behavior in Preschoolers: How Many Days of Accelerometer Monitoring Is Needed?