The numbers game: evaluation of statistics by obstetrics & gynecology.

Current Commentary

The Numbers Game Evaluation of Statistics by Obstetrics & Gynecology Roy M. Pitkin,

MD,

James R. Scott,

MD,

and Leon F. Burmeister,

Statistical analysis has become integral to the planning, conduct, and reporting of modern medical research. Attention to the statistical aspects of manuscripts submitted to Obstetrics & Gynecology goes back approximately 40 years and the process used in their evaluation has evolved over that time. For the past 20 years, submissions with any type of statistics and being seriously considered for acceptance have routinely been reviewed by a Statistical Editor who judges the work on a number of statistical and design characteristics. Findings of the statistical design review (which has been done by one Statistical Editor over the entire 20-year period) are integrated into the editorial decision about acceptance. The statistical review generally leads to rejection of approximately 16–25% of manuscripts and in a larger proportion, it identifies less serious problems, the correction of which improves the final product. (Obstet Gynecol 2014;123:353–5) DOI: 10.1097/AOG.0000000000000079

O

ver the past 50 years or so, the use of statistics has assumed ever-increasingly importance in planning, conduct, interpretation, and reporting of medical research. Clearly, the point has been reached where statistical analysis is integral and essential in all but the simplest of observational reports (eg, case reports). Thus, peer review journals such as Obstetrics & Gynecology have needed to include evaluation of statistics as an integral part of their assessment of manuscripts submitted for consideration of publication. The From the Departments of Obstetrics and Gynecology, David Geffen School of Medicine at UCLA, Los Angeles, California, and University of Utah School of Medicine, Salt Lake City, Utah; and the College of Public Health, University of Iowa, Iowa City, Iowa. Corresponding author: Roy M. Pitkin, MD, 78900 Rancho La Quinta Drive, La Quinta, CA 92253; e-mail: [email protected]. Financial Disclosure The authors did not report any potential conflicts of interest. © 2014 by The American College of Obstetricians and Gynecologists. Published by Lippincott Williams & Wilkins. ISSN: 0029-7844/14

VOL. 123, NO. 2, PART 1, FEBRUARY 2014

PhD

purpose of this article is to trace the evolution of the process of evaluating statistics in this Journal. A detailed review of the 50-year history of Obstetrics & Gynecology1 identified recurring concern with statistics in meetings of the Editorial Board as far back as 1973 when the Editor was authorized to obtain statistical consultation of manuscripts when it was felt to be necessary. How often—indeed if at all—this system of ad hoc consultation with a statistician was used is not known. Statistical consultation was formalized in 1986 with the appointment of a designated Statistical Consultant to review statistical and design characteristics in manuscripts referred by the Editor. The consultant’s second duty was to present an educational program to the Editorial Board at its annual meeting. The latter was designed as a 4-year cycle so that each member would receive the full program during his or her time on the Board. The person appointed to the position (Leon F. Burmeister) would serve the Journal for 27 years under a number of titles: Statistical Consultant 1986–1999, Assistant Editor (Statistics) 1999–2002, and Associate Editor (Statistics) 2001–2013.

SELECTIVE EVALUATION The system initiated in 1986 involved formal evaluation of statistics in manuscripts identified during the standard review process as needing it. Extramural peer reviewers were advised that the Journal had a Statistical Consultant and if they felt anything about the manuscript’s design, methodology, results, or interpretation warranted this kind of specialized evaluation, they should so indicate in their comments to the Editor. Additionally, the consultant became involved if the Editorial Board member or the Editor primarily responsible for the manuscript regarded formal statistical review as indicated. By this process of selective evaluation, some 5–6% of submitted manuscripts were examined by the consultant, whose findings were incorporated into the Editor’s disposition

OBSTETRICS & GYNECOLOGY

353

letter. A few submissions were judged to be so flawed with respect to design, statistical analysis, or both that they were declined for publication on this basis alone and more had needed revisions identified to improve the work. The system seemed sound and there was a general sense that it addressed the issue appropriately. However, a suggestion arose that it might be overly optimistic. Editorial Board members traditionally gave short talks at their last Board meeting and in 1991, Susan Johnson gave as her valedictory an analysis of statistics and design of articles published over 1 year. She found that only 72% of articles using statistics identified the analytical techniques used and that problems such as multiple comparisons and failure to emphasize disease prevalence in relation to predictive values were encountered occasionally. These observations and other concerns suggested that the selective system might not be functioning optimally. Therefore, in 1993, it was decided to conduct an internal study of routine statistical screening. For the study, 100 consecutive manuscripts reporting any type of statistical data or analysis were sent to the Statistical Consultant simultaneously with being sent for review by external referees and Editorial Board members. The consultant screened the papers using a checklist including such items as definition of the study population, justification of sample size, assignment to interventions, and plan of statistical analysis. The results were surprising: fully one-fourth of submissions were judged to have serious enough statistical or design flaws that the consultant concluded they should be rejected on that basis alone. More importantly, this was unrecognized by the standard review process in 15 of the 25 cases. Moreover, more than half of those manuscripts returned to the author with an invitation to revise went with important but nonfatal flaws identified by the statistical screen, whose correction improved the final product. These results indicated clearly that formal screening of manuscripts for statistical and design aspects makes unique contributions by 1) identifying fatal flaws that would otherwise go undetected and 2) pointing out important improvement needed in papers ultimately accepted and published. The Editorial Board considered the matter at its 1994 meeting and decided to establish a policy of routine screening of all manuscripts before acceptance.

ROUTINE STATISTICAL SCREENING On July 1, 1994, the Journal initiated its new policy of routine screening.2 At the time of initial disposition (ie, when the Editor analyzes the manuscript and the

354

Pitkin et al Statistics in Obstetrics & Gynecology

reviews by expert referees and the Editorial Board), if it appears that the work is potentially acceptable, it is then sent to the Statistical Editor. The results of the screen are integrated into the editorial decision on initial manuscript disposition. Sometimes the screening finds one or more fatal flaws, leading the editors to decide to decline the paper for publication on that basis alone. More often, the screening identifies important ways of improving the manuscript, and these are incorporated into the Editor’s letter inviting revision. Experience with the first 8 months of this program clearly supported the value of the approach.3 Some 16% of manuscripts screened were found to have such serious statistical or design flaws as to prompt rejection, an astonishing datum in view of the fact that all had “passed” the standard peer review process. In 65% of manuscripts screened, a need for important improvements was identified; common deficiencies found were inadequate description of the study population, failure to justify sample size, use of statistical tests without presenting evidence their underlying assumptions are satisfied, and inappropriate use of the term “randomized.” Gratifyingly, the process did not prolong the review time appreciably.

REFINEMENTS (2001–2013) A new Editor-in-Chief and editorial team, which took office in 2001, continued with a similar approach of all manuscripts containing any statistical data and being seriously considered for publication undergoing formal statistical review. This review followed a format covering 10 essential points of design and analysis, with additional comments as appropriate, and the Statistical Editor’s report was reviewed by each of the editors at a weekly telephonic conference at which manuscript disposition was decided. If a study was judged unacceptable by the Statistical Editor because of poor design or other statistically related issues that could not be corrected, it was usually rejected. With less serious and potentially remediable statistical design issues, the manuscript was returned for possible revision; if the editorial team felt the author(s) had been able to address less serious statistical issues by satisfactory revision, the paper usually was accepted. During the 12 years from 2001 to 2013, 5,305 manuscripts underwent statistical review. Recent tabulation of 18 months’ experience (July 2011 to January 2013), involving 719 consecutive manuscripts, reveals the relation between the Statistical Editor’s recommendation and ultimate manuscript disposition was quite close (Table 1). Most importantly, 85% of papers the Statistical Editor recommended be rejected were ultimately declined for publication.

OBSTETRICS & GYNECOLOGY

Table 1. Statistical Editor Recommendation and Manuscript Acceptance* Recommendation Accept (n563) Minor revision (n5313) Major revision (n5138) Reject (n5205)

Accepted and Published 48 233 87 31

(76) (74) (63) (15)

Data are n (%). * Seven hundred nineteen consecutive manuscripts, July 2011 to January 2013.

Additionally, the Statistical Editor’s educational lecture at the annual Editorial Board meeting clarified controversial topics and questions that had arisen during the past year. Among the subjects covered were meta-analysis, decision analysis, and cost-effectiveness analysis; data “torturing” or overanalysis; limitations and consequences of various sample designs; interpretation of relative risks and odds ratios; and failed assumptions on multivariable analyses. In addition to emphasizing formal statistical evaluation of manuscripts, Obstetrics & Gynecology has been among the first medical journals to adopt newly promulgated guidelines for the proper reporting of specific types of studies.4 Each of these sets of guidelines has its own requirements for reporting statistics and includes a checklist to be filled out by the author and submitted with the manuscript. These international standards and the year of their adoption by Obstetrics & Gynecology are: Consolidated Standards of Reporting Trials (CONSORT) for reporting randomized trials adopted in 1996; Quality of Reporting of Meta-analyses (QUOROM) for reporting systematic reviews and meta-analyses of randomized trials adopted in 2000, replaced by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines in 2009; Meta-analysis of Observational Studies in Epidemiology (MOOSE) for reporting meta-analyses of observational studies in epidemiology adopted in 2001; STAndards for

VOL. 123, NO. 2, PART 1, FEBRUARY 2014

the Reporting of Diagnotic accuracy studies (STARD) for reporting studies of diagnostic accuracy adopted in 2004; and STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines for reporting the results of observational studies in epidemiology adopted in 2007. In 2002, a Consultant Editor for Epidemiology (David A. Grimes, MD) joined the editorial team. An editorial introducing this new appointment emphasized the importance of evidence-based medicine, which “seeks to replace unproved clinical practices with a more effective and scientific approach to patient care.”5 During his tenure, from 2002 to 2012, the Epidemiology Editor participated in the assessment of certain complex manuscripts and in addition contributed educational lectures on epidemiologic topics during annual Editorial Board meetings.

CONCLUSION Statistical aspects of manuscripts submitted to Obstetrics & Gynecology have been a subject of attention going back at least 5 years. Over this period, the methodology has evolved and for approximately 20 years, some form of routine screening by a Statistical Editor has been part of the system. In addition, the closely related field of epidemiology has been incorporated in the evaluation process, whose ultimate goal is to provide the readers with the most reliable information possible. REFERENCES 1. Pitkin RM. The Green Journal: fifty years on. Washington (DC): American College of Obstetricians and Gynecologists; 2003. 2. Pitkin RM. Statistical evaluation of manuscripts: it’s all in the numbers. Obstet Gynecol 1994;83:1043–4. 3. Pitkin RM, Burmeister LF. Routine statistical screening revisited. Obstet Gynecol 1995;86:124–5. 4. Begg G, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA 1996;276:637–9. 5. Scott JR. Show me the evidence. Obstet Gynecol 2002;100:403–4.

Pitkin et al

Statistics in Obstetrics & Gynecology

355

Obstetrics & Gynecology Science, on the runway.

Ultrasound in obstetrics & gynecology. Editor's note.

Q&A: The numbers game.

Chinese Obstetrics & Gynecology journal club: a randomised controlled trial.

Evaluation of ethics education in obstetrics and gynecology residency programs.

From the Chair of Obstetrics and Gynecology.

Is gynecology good for obstetrics?

The numbers game.

Is gynecology good for obstetrics?

Psychosomatic Primary Care in Gynecology-Assessment and Acceptance by Residents for Obstetrics and Gynecology in Germany.

gynecology.

Statistical literacy of obstetrics-gynecology residents.

Herpes simplex in gynecology and obstetrics.

[Anaerobic infections in gynecology and obstetrics (proceedings)].

Experience with ultrasonics in gynecology and obstetrics.

Implementation and evaluation of a novel operating room curriculum for the obstetrics and gynecology clerkship.

A numbers game.

Antibiotic therapy in obstetrics and gynecology.

Biomarkers in women's cancers, gynecology, and obstetrics.

[Endoscopic appendectomy in gynecology and obstetrics].

Indian contribution to obstetrics and gynecology.

Indian contribution to obstetrics and gynecology.

Navigating changes in obstetrics and gynecology.

Indian contribution to obstetrics and gynecology.