Cluster randomized trials must be better designed and reported.

Journal of Clinical Epidemiology 68 (2015) 601e602

EDITORIAL

Cluster randomized trials must be better designed and reported Cluster randomized trials are increasingly important as the public attention for interventions to improve quality of health care, health services, environmental health, and health policy is growing and becoming more critical. For example, in order to study the effectiveness of interventions to improve the performance of clinicians, practices, or health care centers in treating patients, it is generally required to avoid intra-clinician, intra-practice or intra- center contamination. Randomization of clinicians, practices or centers - representing clusters of patients - is then needed. Other examples are randomization of postal codes or even regions in studying public health, environmental or policy interventions, as such interventions can often not be confined and randomly assigned to individual subjects. In some cases, cluster randomization can be part of multilevel randomization, when the investigators are interested in studying the effects of interventions at various levels in the context of one comprehensive study. As cluster randomized trials (CRTs) pose specific and quite complex methodological challenges, their design and methods deserve much interest. Given the additional complexities, it is also important to evaluate how CRTs are being carried out and reported. For this purpose, guidance has been provided by an extension of the CONSORT Statement [1,2]. Special attention is needed with regard to all steps of the research protocol, from defining the research question and the contrast of interest, the comparison(s) to be made, the procedure of consent and the method of randomization [3], measurements, sample size estimation, to data analysis and interpretation. In a systematic review of a random sample of 300 CRTs, Wright et al. studied the use of covariates in randomization, the reporting of covariates, and the way these were dealt with in the data analysis. They found a number of marked discrepancies between practice and guidance on the use of covariates in the design, analysis, and reporting of CRTs. In a second article, Rutterford cs. evaluated the reporting of sample size estimation in CRTs. They conclude that the reporting of important sample size elements in the context of CRTs remains suboptimal, and that editors and peer reviewers should be stricter in requiring implementation of CONSORT recommendations. Also in other fields reporting weakness is still a frequent phenomenon. Since abstracts of research articles are essential for efficient identification and selection of published studies, Korevaar and co-workers reviewed the informativeness of http://dx.doi.org/10.1016/j.jclinepi.2015.04.004 0895-4356/Ó 2015 Elsevier Inc. All rights reserved.

abstracts of diagnostic accuracy studies, using 21 items deemed important based on guidance for adequate reporting. These investigators found that, in high-impact journals, crucial information on key design characteristics were not reported in most of the cases. They suggest that reporting guidelines for abstracts of diagnostic accuracy studies be developed. An interesting question is whether including librarian and information specialists in research teams is associated with a higher quality of reporting. Rethlefsen and her group reviewed systematic reviews that have been published in high-impact general internal medicine journals, focusing on search quality characteristics and reporting quality. They showed that, while search quality and reporting remain poor, librarian and information specialist participation was correlated with higher quality of reporting. They suggest that editors and peer reviewers encourage librarian engagement in performing and authoring systematic reviews to help improving documentation of search strategies. Durao and colleagues developed and improved a search strategy to identify diet and nutrition trials. Using a reference set, they determined the relative recall (sensitivity) of the search and then, based on the identified set, modified the strategy with additional terms to reach a final sensitivity of 89%. The authors suggest that their method can be useful to establish a nutrition trials register to support future research and reviews. In data analysis, a challenging question still is how many subjects per variable are required in regression analysis. Austin and Steyerberg studied this for multiple linear analysis, using Monte Carlo simulations. They found that a number of two subjects per variable tends to permit an accurate estimation of regression coefficients in a linear regression model, which is much smaller than for logistic regression and Cox proportional hazards regression models. Another challenging data analytical issue is how to deal with missing items in multi-item questionnaires. This was studied by Eekhout et al., who demonstrated two novel methods, that were previously examined in a simulation study, in two empirical data sets. These methods incorporate item information at the background when the study outcomes are simultaneously estimated. The investigators found that including a summary of a parcel of available item scores as auxiliary variables is an efficient method, and recommend its use when item scores are missing in using multi-item questionnaires as outcome measure in longitudinal studies.

602

Editorial / Journal of Clinical Epidemiology 68 (2015) 601e602

Also other methodological issues in outcome measurement keep us busy. Vanier et al. considered that, according to the Response Shift (RS) theory, a change in healthrelated quality of life (HRQL) scores in RCTs may not only be a result of treatment effects but also of a change of how people appraise their HRQL. Based on their methodological analysis of this issue, they suggest that the impact of the RS effect on HRQL scores may be avoided or reduced if they are designed with a rule of ‘‘the least semantic and psychometric complexity’’ in mind. Since estimates of minimal clinically important differences in outcome measures can be affected by the anchor used, Ward and his group compared the construct validity of domain-specific and global transition questions as anchors for measures in a domain. Based on a prospective study of patients with rheumatoid arthritis, the group found that domain-specific transition questions were more correlated with changes in clinical measures than either global disease or general health transition questions. The authors therefore recommend that domain-specific questions should be used as anchors for clinical measures focused on a single domain. A new questionnaire to measure computer vision syndrome at the workplace was developed by Segui cs., based on literature review and validated through expert discussion and testing and retesting, and using Rasch analysis. The investigators conclude that the computer vision syndrome questionnaire (CVS-Q) showed acceptable psychometric properties and that it can be used in clinical trials and outcome research. Rasch analysis, in addition to classical test theory, was also applied by Chang and co-authors to evaluate the psychometry of the Affiliate Stigma Scale, in comparison with three other instruments, among caregivers of relatives with mental illness. It was found that the (three) domains of the Affiliate Stigma Scale can be separately used and can be considered suitable for their purpose. Consent to use data is a classic requirement for research, but this is a challenging issue in an era of increasing opportunities for record linkage. Cruise et al. investigated variation in consent to health record linkage, based on data of a large UK general purpose household survey. Most respondents consented to record linkage, and there was little evidence of systematic variation across demographic and socioeconomic characteristics. However, consent was lower in non-white ethnic groups and there was marked variation between countries within the UK. The authors make recommendations for further exploration and improvement. Participation rates of general practitioners(GPs) in surveys is important to study aspects of quality of care. The impact of unconditional and conditional financial incentives on participation of GPs in an online survey about cancer care was evaluated by Young cs. in a randomized trial. It turned out that both unconditional (book voucher mailed with the letter of invitation) and conditional (book voucher

mailed on completion of the survey) incentives did better than no incentive, with the first being more effective and the second more cost-effective. But as response rates were still very low in all groups, more effective response-aiding strategies should be identified. It is good to see that the GRADE guidelines (Grading of Recommendations Assessment, Development and Evaluation) [4,5] are being frequently used to evaluate clinical recommendations and guidelines. Nasser et al. assessed the influence of the strength of recommendations included in World Health Organization (WHO) HIV and tuberculosis guidelines on the uptake of recommendations in national guidelines of 20 low- and middle-income countries in Africa and Southeast Asia. This investigation showed that the uptake of WHO recommendations in national guidelines is high and associated with strength of recommendation and quality of evidence. Given their finding that strong recommendations are more frequently adopted than conditional recommendations, the authors emphasize the importance of ensuring that such recommendations are justified. The GRADE approach was also used by Elraiyah and coworkers, to assess how well the black box warnings issued by the United States Food and Drug Administration (FDA) present and communicate evidence consistently with evidence-based practice. For this purpose, the authors critically appraised teriparatide black box warnings for osteosarcoma based on human and animal studies. It was found that the black box warning was not based on solid evidence. The investigators suggest that black box warnings should be supplied with with a rating of the evidence and a guide for implementation. J. Andre Knottnerus Peter Tugwell Editors E-mail address: [email protected] (J.A. Knottnerus) References [1] Campbell MK, Elbourne DR, Altman DG, CONSORT Group. CONSORT statement: extension to cluster randomised trials. BMJ 2004;328:702e8. [2] Campbell MK, Piaggio G, Elbourne DR, Altman DG, CONSORT Group. Consort 2010 statement: extension to cluster randomised trials. BMJ 2012;345:e5661. [3] Schellings R, Kessels AG, Ter Riet G, Sturmans F, Widdershoven GA, Knottnerus JA. Indications and requirements for the use of prerandomization. J Clin Epidemiol 2009;62:393e9. [4] Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, AlonsoCoello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336: 924e6. [5] Guyatt GH, Oxman AD, Sch€unemann HJ, Tugwell P, Knottnerus A. GRADE guidelines: a new series of articles in the Journal of Clinical Epidemiology. J Clin Epidemiol 2011;64:380e2.

Cluster randomized trials: another look.

Challenges of cluster randomized trials.

Maximin optimal designs for cluster randomized trials.

Must we be courageous?

Reporting and methodological quality of sample size calculations in cluster randomized trials could be improved: a review.

If some hypothermia is good, more hypothermia must be better, right?

Cluster randomized trials: evaluating treatments applied to groups.

Incorporating Contact Network Structure in Cluster Randomized Trials.

Cluster randomized trials for health care quality improvement research.

Ghost Surgeries Must Be Eradicated.

Must new drugs be superior to those already available? The role of noninferiority clinical trials.

The effect of cluster size variability on statistical power in cluster-randomized trials.

How to get all trials reported: audit, better data, and individual accountability.

[Paraphrasing Plutarch, "female physicians must not only be good professionals, they must be seen to be so"].

BETTER HEALTH: Durham -- protocol for a cluster randomized trial of BETTER in community and public health settings.

Pain and opioids in the military: we must do better.

Fixing flaws in science must be professionalized.

Research must be relevant to real life.

How large must an iridotomy be?

Cell therapy must be regulated as medicine.

[Above all, the steam must be comfortable].

The nursing voice must be heard.

Doctors must be trained to assess credibility.

Child health must be a priority.