REVIEWS Bringing rigour to translational medicine David W. Howells, Emily S. Sena and Malcolm R. Macleod Abstract | Translational neuroscience is in the doldrums. The stroke research community was among the first to recognize that the motivations inherent in our system of research can cause investigators to take shortcuts, and can introduce bias and reduce generalizability, all of which leads ultimately to the recurrent failure of apparently useful drug candidates in clinical trials. Here, we review the evidence for these problems in stroke research, where they have been most studied, and in other translational research domains, which seem to be bedevilled by the same issues. We argue that better scientific training and simple changes to the way that we fund, assess and publish research findings could reduce wasted investment, speed drug development, and create a healthier research environment. For ‘phase III’ preclinical studies—that is, those studies that build the final justification for conducting a clinical trial—we argue for a need to apply the same attention to detail, experimental rigour and statistical power in our animal experiments as in the clinical trials themselves. Howells, D. W. et al. Nat. Rev. Neurol. 10, 37–43 (2014); published online 19 November 2013; doi:10.1038/nrneurol.2013.232

Introduction

Florey Institute of Neuroscience and Mental Health, 245 Burgundy Street, Heidelberg, Vic 3084, Australia (D. W. Howells). Department of Clinical Neurosciences, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK (E. S. Sena, M. R. Macleod). Correspondence to: D. W. Howells david.howells@ unimelb.edu.au

Each year, member countries of the Organization for Economic Cooperation and Development spend, on average, 9.5% of their gross domestic product, or US$3,387 per person, on health care (2011 figures). The figure for the USA is 17.7%, or $8,811 per person.1 Because brain and mind disorders are chronic and debilitating, they account for a disproportionately large proportion of these costs. In Europe, brain diseases are estimated to account for 35% of all disease burden,2 costing €798 billion in 2010.3 Mood disorders, demen­ tia, psychotic disorders, anxiety disorders, addiction and stroke alone accounted for a total of €516.7 billion in 2010 (€113.4 billion, €105.2 billion, €93.9 billion, €74.4 billion, €65.7 billion and €64.1 billion, respectively).3 These costs are high because many brain diseases are common, and we lack good treatments for the majority of these conditions. Currently, we have no treatments to prevent or cure most forms of dementia.4 The symptoms of psychosis can be reduced in many cases, but not without appreciable adverse effects.5 Also, given that addiction treatments still tend to be discussed in a judicial rather than a clini­ cal context,6 we cannot really claim to be on top of the problem. The situation is not much better for mood dis­ orders. A 2003 US National Institutes of Mental Health report stated that “even when they do receive treatment, only slightly more than half of all patients respond well to therapy, defined as experiencing a 50% or greater reduction from baseline symptom severity. If complete symptom remission or restoration of function is the outcome, then the proportion is even lower.”7 In the case of stroke, we have seen substantial advances in risk reduction in recent years. Blood pressure control, Competing interests The authors declare no competing interests.

antiplatelet and anticoagulant treatment, and choles­ terol lowering have each been shown to reduce the risk of stroke. However, patient adherence to treatment is poor,8 and not all the improvements that are theoreti­ cally possible have been realized.9 Thus, despite impor­ tant advances, stroke is still the second most common cause of death and disability. Approximately 15 million people have a stroke each year, with 6 million dying as a result.10 Around 30% of the 62 million stroke survivors worldwide have marked neurological impairment. We do have one very powerful acute therapy, thrombolysis with tissue plasminogen activator (tPA), but a narrow window of opportunity for therapy and risk of bleeding mean that only a small proportion of patients with stroke benefit from this potent drug.11 Moreover, despite thrombolysis, around half of patients still remain dependent or die.12 In this Review, we explore the barriers to the develop­ ment and implementation of effective treatments for neurological disorders, focusing predominantly on the stroke field. We highlight the inherent deficiencies and conflicts of interest in our current systems of pre­ clinical and clinical research, and suggest ways in which ­scientific rigour might be improved.

Drug development—have we lost our way? The example of stroke indicates that we are making pro­ gress in developing new treatments, but from a modern perspective the pace of this progress is frustratingly slow. Admittedly, our expectations may be unrealistic: it took 183 years between Jenner’s demonstration that vaccina­ tion could induce immunity to smallpox and the official eradication of the disease in 1979. Surgery advanced only slowly after the introduction of anaesthesia in 1846 and Lister’s description of the virtues of cleanliness in 1867. The first blood bank was not opened until 1937, and the first heart transplant was performed in 1967, only 2 years

NATURE REVIEWS | NEUROLOGY

VOLUME 10  |  JANUARY 2014  |  37 © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS Key points ■■ The costs of treating brain diseases are high, because many of these conditions are common, and we lack good treatments for the majority ■■ Drug development for these diseases is hampered by a paucity of experimental and reporting rigour and a lack of statistical power, leading to overoptimistic interpretation of the literature ■■ Recognition that we have a problem is the first step towards finding a solution ■■ We have started to provide transparency in experimental design and rigour of reporting in preclinical publications, and we now need to bring similar quantifiable measures to funding decisions ■■ Multicentre, randomized, blinded, powered and appropriately governed preclinical trials conducted by expert consortia offer an opportunity to ensure that only the very best drug candidates reach clinical trial

before NASA landed men on the Moon. Even chemo­ therapy, which can be traced back to the 1919 discovery that mustard gas suppresses haematopoiesis,13 is still very much a work in progress.14 In this context, the 17 years since the use of tPA was suggested by the successful National Institute of Neurological Disorders and Stroke (NINDS) trial15 is not such a long time. While increasing societal pressures may contribute to unrealistic expectations, many in the pharmaceuti­ cal industry argue that medicine has already plucked the low-hanging fruit.16 The pharmaceutical industry is undoubtedly in the doldrums: fewer drugs are being approved for use,17 while it is claimed that development costs are increasing steadily.18 Have we really run out of therapeutic targets, or have we just lost our way? Some have argued eloquently for the former propo­ sition, asserting that the law of diminishing returns explains why we have to work harder for each new drug.16 However, each new year brings further enrichment to our understanding of neurobiology. To optimists, this presents a world of therapeutic opportunity. To pessi­ mists, the complexity of the brain is perhaps a tempting excuse for failure. However, a more plausible and testable explanation for the slowing of drug development is that our systems of scientific discovery, and of drug discovery and testing, have been compromised and we have indeed lost our way.

Deficiencies in study design The belief that “everything works in animals but nothing works in man” came early to stroke researchers. The Stroke Therapy Academic Industry Roundtable (STAIR) first met in 1999 to discuss the translational roadblock and to proffer solutions. 19 From the very beginning, STAIR stressed the importance of key study design factors such as randomization and blinding, and over the years these important meetings have drawn attention to deficiencies in clinical trial design, including inappropri­ ate extrapolation of animal data to the human setting, and a narrow approach to animal modelling that might limit generalizability in the clinic.19–25 By 2006, however, it was clear that despite these efforts, the situation was not improving rapidly. With our colleagues, we demon­ strated that the agents selected for clinical trials were not necessarily those that were most effective in animals, and that within the animal data, the more a drug was tested, 38  |  JANUARY 2014  |  VOLUME 10

the less effective it seemed to be.26 Systematic review and meta-analysis of the published data for individual drugs selected because they were considered good candidates for clinical trial revealed a persistent pattern of failure to report a priori sample size calculations, or to take measures to reduce the risk of bias. Importantly, where comparisons have been made, studies that do consider these issues usually report less benefit.27–35 Analyses in the wake of the failure of the SAINT II randomized con­ trolled trial of the free radical trapping agent NXY‑059 highlighted the potential magnitude of the impact of bias on reported effect size.32

Bias and lack of statistical power To our knowledge, only one study has used a ran­ domized, blinded and appropriately powered approach to re-examine efficacy in animals of drugs that seemed to have therapeutic potential for stroke, and the results failed to confirm the promise of the published litera­ ture.36 These problems are clearly not unique to the study of stroke. The NINDS Facilities of Research Excellence in Spinal Cord Injury (FORE-SCI) programme, which sponsored independent replication studies of novel strat­ egies to treat spinal cord injury, confirmed beneficial effects for only half of the strategies studied.37 In studies of transgenic models of motor neuron disease, it now seems likely that apparent improvements in survival were not mediated by drug effects, but were a manifestation of bias and lack of statistical power.38 Most experimental stroke studies use fewer than 10 experimental animals in each cohort (Figure 1). Statisticians are uneasy about the retrospective analysis of statistical power because, in testing their hypotheses, investigators have usually established de facto whether the study was adequately powered. However, it is reason­ able to ask whether, across a field of research, studies are adequately powered to detect the effects that they report. If not, it implies, first, that selective reporting (an excess of significance) could have occurred; second, that the positive predictive value of those studies that report statistically significant effects is reduced; and, last, that a number of studies falsely conclude that an effect is not present (false-negative studies). In studies modelling stroke, the efficacy of tPA, the only drug that currently works in both animals and humans, is around 25%. 33 This is, therefore, a clini­ cally relevant threshold to guide power calculations in experiments testing other drugs. Along with the average reported SD of 30% for experiments using Sprague Dawley rats (the strain most commonly used in preclinical stroke studies),39 this suggests that for 80% power, the required cohort size is more than 20 animals. If the threshold effect size is inflated by publication bias,40 the corrected target effect size of 16% requires over 50 rats, and if it is further inflated by bias arising from methodological issues (for example, randomiza­ tion and blinding 27–35), then the cohort sizes will need to be yet larger. Variance may be reduced by using rat strains that produce more-consistent infarct sizes, such as the spontaneously hypertensive (SHR) rat (average SD



www.nature.com/nrneurol © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS 40

Control cohorts Treatment cohorts

Number of rats per cohort

35 30 25 20 15 10 5 0 1

17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305 321 337 353 369 385 401 417 433 Experiment number

Figure 1 | Numbers of animals used in experimental stroke studies. Graph shows number of Sprague Dawley rats in each control and treatment cohort in 446 neuroprotection experiments extracted from a database published by O’Collins et al. (2006).26

per experiment = 20%; Figure 2, Table 1),39 or by using models (for example, Tamura or photothrombosis) that provide more-consistent vascular lesions. However, if consistency comes at the expense of generalizability (for instance, through use of models with a very small ischaemic penumbra),41,42 these strategies will prove to be false economies. Evidence of publication bias and methodological bias also exists for other disease models, such as experimen­ tal autoimmune encephalomyelitis, and should be taken into account in power calculations.43

Lack of generalizability The heterogeneity of drug response in humans is com­ pounded by well-known comorbidities and the fragility that accompanies ageing, and we know that these same problems worsen outcomes in our animal models. 44 However, only rarely do we choose to study candidate drugs in experimental settings that are important for generalizability in the clinic; instead, the majority of animal experiments are performed in the young healthy male animals most likely to provide a positive signal.45 For stroke, therefore, an additional source of bias is found in the experiments that the stroke community chooses to perform. The strengths and weaknesses of the models available to stroke researchers and the impact of comorbidities on outcome in experimental animals and humans have been reviewed elsewhere.24,44,46–48 No single experiment— nor, indeed, single experimental model—is sufficient to explore all the variables that might limit a drug’s utility. It is logical to start the process of drug evaluation in a setting that is cost-effective, and in which a signal is easy to detect (for example, immediate therapy after thread occlusion in young, male, normotensive Wistar Kyoto rats). To reduce the risks of failure before proceeding to clinical trial, however, we need to know whether the drug works in a clinically useful time window in aged male and female animals. We would also like to know whether

the drug works if these same animals have hypertension and/or diabetes, or if they have metabolic syndrome or are exposed to cigarette smoke. In addition, does the drug retain its activity when given with other drugs such as anti-clotting agents and tPA, as would be expected in patients with stroke? We know that we can perform such experiments; for example, we can induce stroke in aged SHR and diabetic (streptozotocin-treated) rats,49 and can combine thera­ pies with tPA.50 Moreover, detection of broadly the same limits to the efficacy of tPA in experimental animals and humans suggests that although each of the models used by the stroke community has different strengths and weaknesses, they are predictive of outcome in humans.33 Moreover, a drug need not satisfy all of the safety and efficacy criteria; for example, tPA is contraindicated in hypertensive patients but still has an important place in the clinic.12

Implausible analyses Some of the publications describing the in vitro and early in vivo experiments that lay the groundwork for transla­ tional research present implausible analyses. Often, such publications report a sequence of five or six experiments using independent samples, each testing a different stage or prediction of a single mechanistic hypothesis. If that hypothesis is correct, each experiment may either support this hypothesis (true positive) or appear to refute it (false negative). Over a series of publications, the pro­ portion of true positive studies should reflect the statisti­ cal power of those experiments. We might then calculate the probability that a sequence of experiments will all be true positives if the mechanistic hypothesis is true. Assuming that individual studies are powered at 60%, then two experiments would each be true positives in 36% of publications, and four experiments in 13%—or one-eighth—of publications. In our experience, the vast majority of publications report experiments that all reach statistical significance.

NATURE REVIEWS | NEUROLOGY

VOLUME 10  |  JANUARY 2014  |  39 © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS 450

manuscripts in which a large number of positive experi­ ments are presented. Even if the hypothesis is correct, such an eventuality is highly unlikely to have occurred without some help. For authors, it may be helpful, before a series of experiments, to specify which experiment is to be considered as the primary test of the hypothesis, or to use tests of significance that consider the totality of evidence presented.

Data points Mean (bars indicate SD)

400

Infarct volume (mm3)

350 300 250 200 150 100 50 0 1.5 h tMCAo (young)

2 h tMCAo (young)

pMCAo (young)

2 h tMCAo (aged)

Sprague Dawley

2 h tMCAo (aged)

Wistar Kyoto

2 h tMCAo (young)

Spontaneously hypertensive

Figure 2 | Infarct volume variability in different strains of rat after MCAo. Graph shows infarct volumes after transient or permanent MCAo for 1.5 h or 2 h in three rat strains: Sprague Dawley, Wistar Kyoto and spontaneously hypertensive. Abbreviations: MCAo, middle cerebral artery occlusion; p, permanent; t, transient. Table 1 | Cohort sizes required for preclinical drug testing in animal models of stroke Rat strain

Occlusion time

Cohort size* 50% effect size‡

40% effect size‡

30% effect size‡

20% effect size‡

10% effect size‡

Sprague Dawley

1.5 h (young) 2 h (young) 24 h (young)

43 56 29

67 87 45

119 154 80

268 345 179

1,062 1,389 715

Wistar Kyoto

2 h (aged)

12

19

33

74

295

Spontaneously hypertensive

2 h (young) 2 h (aged)

2 4

3 5

5 9

11 20

41 80

*Calculated to provide a power of 80% and α = 0.05 under different experimental circumstances. ‡Effect size of drug under investigation.

For 80% of publications describing a sequence of four experiments, to find them all truly positive would require that individual experiments were powered at 95%, which is simply not plausible. Alternatively, more experiments might have been conducted but not reported; again, with power of individual experiments of 60%, seven indepen­ dent experiments testing aspects of that hypothesis would provide at least four true positives suitable for publication 70% of the time. Selective analysis and reporting might explain the excess of significant studies seen in the in vivo model­ ling of neurological disease.51 In the same way that it is not reasonable to conduct multiple statistical tests on data from the same cohort of animals without correct­ ing for type I errors, it is also unreasonable to penalize a series of independent studies testing the same hypoth­ esis in the same publication; journals should encourage the use of alternative statistical approaches that might address this issue, and be more tolerant of manuscripts in which some experiments do not give a statistically sig­ nificant result. Indeed, journals should be suspicious of 40  |  JANUARY 2014  |  VOLUME 10

Implications Taken together with evidence of failure to avoid experi­ mental bias27–35 and widespread publication bias,40,51 it seems likely that at best the true effect of many candi­ date drugs for stroke and other indications is signifi­ cantly less than suggested by the literature, and that some might be completely inert.36 These biases in the animal literature may result in biologically inert or even harmful substances being taken forward to clinical trials, thus exposing patients to unnecessary risk and wasting scarce research funds. Chalmers and colleagues have estimated that we might waste as much as 85% of our research effort.52 For another estimate of the costs of research failure, we can assess the impact on the market capitalization of pharmaceutical companies when an apparently promis­ ing drug fails in clinical trials. The day before publication of the neutral results of the SAINT II trial, Astra Zeneca shares were trading at $66.49, but within a few days had fallen to $58.68. Given the number of shares in circu­ lation, we can estimate that the market’s valuation of a positive trial result was at least $12 billion.53 Overall, the evidence suggests that we have indeed lost our way. If this is the case, how do we find our way back to the true path?

Improving research conduct Removing incentives that compromise conduct The problems that we face are multifaceted, and lasting solutions will required the cooperation of educators, ­scientists, funders, publishers and the community at large. In many regards, we reward ‘quick and dirty science’, favouring for promotion and funding those who report ‘exciting’ findings in high-impact journals, with less attention paid to the veracity or long-term value of the findings. NINDS-sponsored studies have failed to replicate many exciting studies in spinal cord injury,37 and while exciting genetic association studies reporting disease associations have been published in high-impact journals, their subsequent refutation with better-­quality data has not received such favourable treatment. 54 Conversely, as the biomedical sciences industry grows, we are confronted with a plethora of new journals, which ensure that almost all articles can eventually find a home. One part of the solution would be to empower editors and reviewers to make publishing decisions on the basis of quantifiable criteria. We should judge novelty according to evidence from systematic search strategies. We should judge the veracity of the data on the basis of demonstrated experimental rigour and evidence of insti­ tutional oversight. We should judge the completeness of



www.nature.com/nrneurol © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS the story, and only then—if at all—consider the purely subjective potential for ‘impact’ in the future. Under such a schema, journal impact and citations would have true value, and a scientist’s publication record would facili­ tate informed decisions on the risks involved in funding and employment. Expensive data-capture schemes to help governments rank institutions and scientists are perhaps missing the point: that it may be more costeffective to fix the way in which data enter the literature than to manipulate existing but compromised data. In a world where electronic rather than paper publishing is becoming the norm, peer review needs to add more value if scientists are to continue to expend their time and effort on the process. Better publication policies are on their own unlikely to solve the problem. Researchers inhabit a complex world of rewards and challenges relating to granting, publish­ ing and promotion systems. The challenge is to provide incentives that reward the best science and scientists. The quantitative criteria outlined to assess publication quality might contribute to this process if they were also used to assess the quality of prior work in considering funding applications; those scientists whose work had been read and cited most widely and had been replicated by others would have greater access to funding and, thus, oppor­ tunity for continued success. Applications from groups with established systems for institutional oversight and monitoring, audit of research quality, and provision for the continuing professional development and appraisal of the research workforce might similarly be favoured.

Practical steps The beginnings of the broad structural changes outlined above are in place and are gaining momentum. Perhaps most important is the recognition that we have a large problem, which requires a cooperative and multinational solution. In the USA, NINDS has sponsored replication studies and symposia to discuss these issues. Major stroke journals on either side of the Atlantic have carried editorials making the case for bringing the rigour of ran­ domized, controlled, multicentre clinical trialling to the preclinical setting.55,56 Both clinical and basic science focused meetings have made the same plea, and the European Union has provided seed funding to develop MultiPART, the first randomized controlled trial network for preclinical stroke research. Multicentre animal studies are not at present a routine part of drug development. In the early 1980s, the US National Heart, Lung and Blood Institute was concerned about inconsistencies in findings from dif­ ferent laboratories that were testing the same hypoth­ eses using very similar experimental designs. To address this issue, they funded a study, Animal Models for the Protection of Ischemic Myocardium, which involved five groups testing the efficacy of ibuprofen and verapamil in dogs (four centres) and rats (one centre).57 As well as demonstrating the feasibility of this approach, con­ cerns about differences in the data from different centres led to detailed statistical analysis that revealed that an ­investigator in one centre had been fabricating data.58

Since multicentre animal studies were proposed for focal ischaemia, 59 other disease communities have advocated a similar approach. Operation Brain Trauma Therapy describes a network of laboratories that would systematically collect evidence to support a decision to move to clinical trial. However, while this consortium advocates central coordination of their research effort, they do not propose that the same experiment should be conducted across a number of sites, or that opportu­ nities for central randomization, data monitoring and ­statistical analysis should be leveraged.60 More recently, the International League Against Epilepsy/American Epilepsy Society (ILAE/AES) Working Groups joint meeting to optimize preclinical epilepsy therapy discovery suggested that translational efforts in epilepsy might be enhanced by adopting what they describe as a phase II multicentre trial paradigm employing an approach similar to human multicentre clinical trials.61 They argue that this approach “would reduce biases related to individual laboratory practices and conditions, implement rigorous blinding and stat­ istical design, and incorporate independent monitor­ ing of data collection and analysis.” In many ways, this senti­ment echoes the efforts of the stroke community in establishing Multi-PART. However, the ILAE/AES group claims that there may be circumstances where suffi­ cient evidence is available from single-centre studies to justify a decision to proceed to clinical trial without a confirmatory multicentre animal study. Their use of the term ‘double-blinded’ is unclear, implying that they would take steps to ensure that the animal subjects were not aware of treatment group allocation. Knowledge of treatment group allocation on the part of the investi­ gator might affect how the animal is handled, thereby possibly affecting corticosterone levels and, thus, animal behaviour, but it would be more precise to describe this as blinding of the investigator to treatment group alloca­ tion rather than implying that the animals were in some way blinded.

Conclusions and future prospects Preclinical and translational journals have embraced the principles of rigour and transparency by adopting publication guidelines62–66 similar to those which allowed clinical journals to improve the rigour of clinical trials after publication of the CONSORT statement in 1996.67,68 The benefits of this development, in terms of the quality of published research, should start to become apparent in the near future. The presence of a substantial publication bias must be addressed. Less-exciting findings might be left on the investigator’s hard drive, or are published in journals that are not readily available to the community, or in languages other than English. To know which research has been done, repositories of work that has been funded or received ethical approval should be established, and deposition in these repositories should be a precondition of funding, ethical approval and publication. Investigators might measure a number of outcomes and report only the most interesting or those that

NATURE REVIEWS | NEUROLOGY

VOLUME 10  |  JANUARY 2014  |  41 © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS support their hypothesis; and statistical analysis plans might be adjusted after the event in light of the data. Both of these practices will introduce bias, and could be avoided by prior publication (or deposition) of the study protocol. At first, this information might only be available to those reviewing manuscripts for publication, with a sunrise clause that after a given period, or on pub­ lication of study results, it becomes more widely avail­ able. Publication of all outcomes will inevitable mean that those manuscripts contain some findings that do not unequivocally support the study hypothesis; jour­ nals and reviewers should be tolerant of these blemishes, and consider the merits and interpretation of a body of work as a whole. The adoption of meaningful sample size calculations, and the avoidance of models with low variance but restricted generalizability, will mean that more animals need to be used in each experiment. Funders and ethics committees must understand that it is better to fund a large study that gives useful and reliable results than to fund any number of smaller studies that might appear to provide a reduction in animal numbers but in fact provide data of limited use. We believe that multicentre animal studies might provide the capacity required for such studies. The experience of attempted replication suggests that a systematic approach to confirming findings is as important as the initial research. There should be funding and recognition for those carrying out such studies. In areas where translational failure is apparent or where clinical trials are based on findings from animal studies, replication, in adequately powered studies at low risk of bias, should be a precondition for further clinical development. The Nature and Science journals have taken impor­ tant steps to improve the rigour of the work that they publish.69,70 Risk of bias might be reduced further if all journals publishing in vivo research were to take action 1.

2.

3.

4.

5.

6.

7.

8.

OECD Health Data 2012. OECD [online], http:// stats.oecd.org/index.aspx?DataSetCode= HEALTH_STAT (2012). Olesen, J. & Leonardi, M. The burden of brain diseases in Europe. Eur. J. Neurol. 10, 471–477 (2003). Olesen, J. et al. The economic cost of brain disorders in Europe. Eur. J. Neurol. 19, 155–162 (2012). Rafii, M. S. & Aisen, P. S. Recent developments in Alzheimer’s disease therapeutics. BMC Med. 7, 7 (2009). McDonagh, M., Peterson, K., Carson, S., Fu, R. & Thakurta, S. Drug Class Review: Atypical Antipsychotic Drugs: Final Update 3 Report (Oregon Health and Science University, 2010). National Institute on Drug Abuse. Principles of Drug Addiction Treatment: A Research-Based Guide 19 (NIH Publication No. 12-4180, 2012). National Institute of Mental Health. Breaking Ground, Breaking Through: The Strategic Plan for Mood Disorders Research 85 (NIH Publication No. 03-5121, 2003). Wang, Y. R., Alexander, G. C. & Stafford, R. S. Outpatient hypertension treatment, treatment intensification, and control in Western Europe

42  |  JANUARY 2014  |  VOLUME 10

9.

10.

11.

12.

13.

14.

to reach the same standards. The next challenge will be to ensure that the measures taken to reduce the risk of bias within the laboratory were appropriate. For instance, coin-toss randomization is much less rigorous than the use of computerized randomization systems. In most professions, there is an expectation of continu­ ing development, such that individuals do not come to a role fully formed. Scientists are by nature ‘improvers’, but there is a dearth of organized activities to aid their career progression. Such activities might include opportuni­ ties for developing new skills in relevant areas, time set aside for audit and quality control activities, and oppor­ tunities for mentoring and reflection at all levels. Other professions have instituted formal programmes for con­ tinuing professional development; while some research establishments are developing similar programmes for their workforce, the focus is still largely on grants in and papers out. Many of the concerns highlighted above have arisen through a combination of translational research and empirical research that has explored the epistemology of translational medicine: how we know what we know. There is an ongoing need to enrich our knowledge of what makes for effective translation, including the risk of bias and how to address it, and to study the impact of the changes suggested here and elsewhere. This effort will require research, an appropriately trained research workforce, and appropriate funding. Review criteria We selected articles for inclusion in this Review on the basis of our existing knowledge. We did not employ a formal search strategy but, rather, selected articles that illustrate the points we thought relevant to the subject matter being discussed. While we have attempted to provide a fair summary of the relevant literature, there is likely to be an unconscious bias in favour of publications that support our point of view.

and the United States. Arch. Intern. Med. 167, 141–147 (2007). Weber, M. A. & Giles, T. D. Inhibiting the renin– angiotensin system to prevent cardiovascular diseases: do we need a more comprehensive strategy? Rev. Cardiovasc. Med. 7, 45–54 (2006). Strong, K., Mathers, C. & Bonita, R. Preventing stroke: saving lives around the world. Lancet Neurol. 6, 182–187 (2007). Donnan, G. A. et al. How to make better use of thrombolytic therapy in acute ischemic stroke. Nat. Rev. Neurol. 7, 400–409 (2011). Lees, K. R. et al. Time to treatment with intravenous alteplase and outcome in stroke: an updated pooled analysis of ECASS, ATLANTIS, NINDS, and EPITHET trials. Lancet 375, 1695–1703 (2010). Krumbhaar, E. B. & Krumbhaar, H. D. The blood and bone marrow in Yellow Cross gas (mustard gas) poisoning: changes produced in the bone marrow of fatal cases. J. Med. Res. 40, 497–508.3 (1919). Haddad, R. et al. Induction chemotherapy followed by concurrent chemoradiotherapy (sequential chemoradiotherapy) versus concurrent



15.

16.

17. 18.

19.

20.

chemoradiotherapy alone in locally advanced head and neck cancer (PARADIGM): a randomised phase 3 trial. Lancet Oncol. 14, 257–264 (2013). [No authors listed] Tissue plasminogen activator for acute ischemic stroke. The National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. N. Engl. J. Med. 333, 1581–1587 (1995). Stockwell, B. R. The Quest for the Cure: The Science and Stories Behind the Next Generation of Medicines (Columbia University Press, 2011). Kling, J. Fresh from the biologic pipeline—2010. Nat. Biotechnol. 29, 197–200 (2011). Morgan, S., Grootendorst, P., Lexchin, J., Cunningham, C. & Greyson, D. The cost of drug development: a systematic review. Health Policy 100, 4–17 (2011). Stroke Therapy Academic Industry Roundtable (STAIR). Recommendations for standards regarding preclinical neuroprotective and restorative drug development. Stroke 30, 2752–2758 (1999). Stroke Therapy Academic Industry Roundtable II (STAIR-II). Recommendations for clinical trial evaluation of acute stroke therapies. Stroke 32, 1598–1606 (2001).

www.nature.com/nrneurol © 2014 Macmillan Publishers Limited. All rights reserved

REVIEWS 21. Fisher, M. & Stroke Therapy Academic Industry Roundtable. Recommendations for advancing development of acute stroke therapies: Stroke Therapy Academic Industry Roundtable 3. Stroke 34, 1539–1546 (2003). 22. Fisher, M. et al. Enhancing the development and approval of acute stroke therapies: Stroke Therapy Academic Industry Roundtable. Stroke 36, 1808–1813 (2005). 23. Fisher, M. et al. Recommendations from the STAIR V meeting on acute stroke trials, technology and outcomes. Stroke 38, 245–248 (2007). 24. Fisher, M. et al. Update of the Stroke Therapy Academic Industry Roundtable preclinical recommendations. Stroke 40, 2244–2250 (2009). 25. Saver, J. L. et al. Stroke Therapy Academic Industry Roundtable (STAIR) recommendations for extended window acute stroke therapy trials. Stroke 40, 2594–2600 (2009). 26. O’Collins, V. E. et al. 1,026 experimental treatments in acute stroke. Ann. Neurol. 59, 467–477 (2006). 27. Macleod, M. R., O’Collins, T., Howells, D. W. & Donnan, G. A. Pooling of animal experimental data reveals influence of study design and publication bias. Stroke 35, 1203–1208 (2004). 28. Macleod, M. R., O’Collins, T., Horky, L. L., Howells, D. W. & Donnan, G. A. Systematic review and meta-analysis of the efficacy of melatonin in experimental stroke. J. Pineal Res. 38, 35–41 (2005). 29. Macleod, M. R., O’Collins, T., Horky, L. L., Howells, D. W. & Donnan, G. A. Systematic review and metaanalysis of the efficacy of FK506 in experimental stroke. J. Cereb. Blood Flow Metab. 25, 713–721 (2005). 30. Macleod, M. R. et al. Sources of bias in animal models of neurological disease. J. Neurol. Neurosurg. Psychiatry 77, 135–136 (2006). 31. van der Worp, H. B., Sena, E. S., Donnan, G. A., Howells, D. W. & Macleod, M. R. Hypothermia in animal models of acute ischaemic stroke: a systematic review and meta-analysis. Brain 130, 3063–3074 (2007). 32. Macleod, M. R. et al. Evidence for the efficacy of NXY‑059 in experimental focal cerebral ischaemia is confounded by study quality. Stroke 39, 2824–2829 (2008). 33. Sena, E. S. et al. Factors affecting the apparent efficacy and safety of tissue plasminogen activator in thrombotic occlusion models of stroke: systematic review and meta-analysis. J. Cereb. Blood Flow Metab. 30, 1905–1913 (2010). 34. Jerndal, M. et al. A systematic review and metaanalysis of erythropoietin in experimental stroke. J. Cereb. Blood Flow Metab. 30, 961–968 (2010). 35. Janssen, H. et al. An enriched environment improves sensorimotor function post-ischemic stroke. Neurorehabil. Neural Repair 24, 802–813 (2010). 36. O’Collins, V. E. et al. Preclinical drug evaluation for combination therapy in acute stroke using systematic review, meta-analysis, and subsequent experimental testing. J. Cereb. Blood Flow Metab. 31, 962–975 (2011).

37. Steward, O., Popovich, P. G., Dietrich, W. D. & Kleitman, N. Replication and reproducibility in spinal cord injury research. Exp. Neurol. 233, 597–605 (2012). 38. Scott, S. et al. Design, power, and interpretation of studies in the standard murine model of ALS. Amyotroph. Lateral Scler. 9, 4–15 (2008). 39. O’Collins, V., Donnan, G., Macleod, M. & Howells, D. in Animal Models for the Study of Human Disease (ed. Conn, M.) 532–569 (Academic Press, 2013). 40. Sena, E. S., van der Worp, H. B., Bath, P. M., Howells, D. W. & Macleod, M. R. Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol. 8, e1000344 (2010). 41. McCabe, C. et al. Differences in the evolution of the ischemic penumbra in stroke-prone spontaneously hypertensive and Wistar-Kyoto rats. Stroke 40, 3864–3868 (2009). 42. Letourneur, A., Roussel, S., Toutain, J., Bernaudin, M. & Touzani, O. Impact of genetic and renovascular chronic arterial hypertension on the acute spatiotemporal evolution of the ischemic penumbra: a sequential study with MRI in the rat. J. Cereb. Blood Flow Metab. 31, 504–513 (2011). 43. Vesterinen, H. M. et al. Improving the translational hit of experimental treatments in multiple sclerosis. Mult. Scler. 16, 1044–1055 (2010). 44. Ankolekar, S., Rewell, S., Howells, D. W. & Bath, P. M. The influence of stroke risk factors and comorbidities on assessment of stroke therapies in humans and animals. Int. J. Stroke 7, 386–397 (2012). 45. O’Collins, V., Donnan, G., Macleod, M. & Howells, D. Hypertension and experimental stroke therapies. J. Cereb. Blood Flow Metab. 33, 1141–1147 (2013). 46. Howells, D. W. et al. Different strokes for different folks: the rich diversity of animal models of focal cerebral ischemia. J. Cereb. Blood Flow Metab. 30, 1412–1431 (2010). 47. Wang-Fischer, Y. (Ed.) Manual of Stroke Models in Rats (CRC Press, 2008). 48. Stroke Therapy Academic Industry Roundtable (STAIR). Recommendations for standards regarding preclinical neuroprotective and restorative drug development. Stroke 30, 2752–2758 (1999). 49. Rewell, S. S. et al. Inducing stroke in aged, hypertensive, diabetic rats. J. Cereb. Blood Flow Metab. 30, 729–733 (2010). 50. Zhu, H. et al. Annexin A2 combined with low-dose tPA improves thrombolytic therapy in a rat model of focal embolic stroke. J. Cereb. Blood Flow Metab. 30, 1137–1146 (2010). 51. Tsilidis, K. et al. Evaluation of excess significance bias in animal studies of neurological diseases. PLoS Biol. 11, e1001609 (2013). 52. Chalmers, I. & Glasziou, P. Avoidable waste in the production and reporting of research evidence. Lancet 374, 86–89 (2009). 53. Howells, D. W., Sena, E. S., O’Collins, V. & Macleod, M. R. Improving the efficiency of the development of drugs for stroke. Int. J. Stroke 7, 371–377 (2012). 54. Ioannidis, J. P., Ntzani, E. E., Trikalinos, T. A. & Contopoulos-Ioannidis, D. G. Replication

NATURE REVIEWS | NEUROLOGY

55.

56.

57.

58.

59.

60.

61.

62.

63.

64.

65.

66.

67.

68.

69. 70.

validity of genetic association studies. Nat. Genet. 29, 306–309 (2001). Dirnagl, U. & Fisher, M. REPRINT: International, multicenter randomized preclinical trials in translational stroke research: it is time to act. Stroke 43, 1453–1454 (2012). Dirnagl, U. & Fisher, M. International, multicenter randomized preclinical trials in translational stroke research: it’s time to act. J. Cereb. Blood Flow Metab. 32, 933–935 (2012). Reimer, K. A. et al. Animal models for protecting ischemic myocardium: results of the NHLBI Cooperative Study. Comparison of unconscious and conscious dog models. Circ. Res. 56, 651–665 (1985). Bailey, K. R. Detecting fabrication of data in a multicenter collaborative animal study. Control. Clin. Trials 12, 741–752 (1991). Bath, P. M., Macleod, M. R. & Green, A. R. Emulating multicentre clinical stroke trials: a new paradigm for studying novel interventions in experimental models of stroke. Int. J. Stroke 4, 471–479 (2009). Kochanek, P. M. et al. A novel multicenter preclinical drug screening and biomarker consortium for experimental traumatic brain injury: operation brain trauma therapy. J. Trauma 71 (Suppl.), S15–S24 (2011). O’Brien, T. J. et al. Proposal for a “phase II” multicenter trial model for preclinical new antiepilepsy therapy development. Epilepsia 54 (Suppl. 4), 70–74 (2013). Macleod, M. R. et al. Good laboratory practice: preventing introduction of bias at the bench. Stroke 40, e50–e52 (2009). Macleod, M. R. et al. Reprint: Good laboratory practice: preventing introduction of bias at the bench. J. Cereb. Blood Flow Metab. 29, 221–223 (2009). Macleod, M. R. et al. Reprint: Good laboratory practice: preventing introduction of bias at the bench. Int. J. Stroke 4, 3–5 (2009). Kilkenny, C., Browne, W. J., Cuthill, I. C., Emerson, M. & Altman, D. G. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol. 8, e1000412 (2010). Landis, S. C. et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490, 187–191 (2012). Begg, C. et al. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 276, 637–639 (1996). Schulz, K. F., Altman, D. G., Moher, D. & the CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann. Intern. Med. 152, 726–732 (2010). [No authors listed] Announcement: reducing our irreducibility. Nature 496, 398 (2013). Kelner, K. Medicine: playing our part. Sci. Transl. Med. 5, 190 (2013).

Author contributions All three authors researched the data for the article, provided a substantial contribution to discussions of the content, wrote the article, and reviewed and/or edited the manuscript before submission.

VOLUME 10  |  JANUARY 2014  |  43 © 2014 Macmillan Publishers Limited. All rights reserved

Bringing rigour to translational medicine.

Translational neuroscience is in the doldrums. The stroke research community was among the first to recognize that the motivations inherent in our sys...
711KB Sizes 0 Downloads 0 Views