JCM Accepted Manuscript Posted Online 24 February 2016 J. Clin. Microbiol. doi:10.1128/JCM.02664-15 Copyright © 2016, American Society for Microbiology. All Rights Reserved.
1
Role of Clinicogenomics in Infectious Disease Diagnostics and Public Health
2
Microbiology
3 4
Lars F. Westblade, Ph.D.1 , Alex van Belkum, Ph.D. 2, Adam Grundhoff, Ph.D.3,4, George M.
5
Weinstock, Ph.D.5, Eric G. Pamer, M.D.6, Mark J. Pallen, M.D.7, Wm. Michael Dunne, Jr.,
6
Ph.D.8#
7 8
Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, NY, USA1;
9
bioMérieux, Inc., LaBalme, France2; HeinrichPette Institute, Leibniz Institute for Experimental Virology,
10
Hamburg, Germany3; German Center for Infection Research, Partner Site Hamburg-Lübeck-Borstel,
11
Hamburg, Germany4; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA5; Memorial
12
Sloan-Kettering Cancer Center, New York, NY, USA6; Warwick Medical School, University of Warwick,
13
Coventry, UK7; bioMérieux, Inc., Durham, NC, USA8
14 15
To whom correspondence should be addressed:
16
Wm. Michael Dunne, Jr., Ph.D
17
100 Rodolphe Street,
18
Durham, NC, 27712, USA
19
E-mail:
[email protected] 20
Key words: Clinicogenomics, Next-Generation Sequencing, Infectious Disease Diagnostics,
21
Public Health
1
22
Abstract
23
Clinicogenomics is the exploitation of genome sequence data for diagnostic, therapeutic, and
24
public health purposes. Central to this field is the high-throughput DNA sequencing of genomes
25
and metagenomes. The role of clinicogenomics in infectious disease diagnostics and public
26
health microbiology was the topic of discussion during a recent symposium (session 161)
27
presented at the 115th general meeting of the American Society for Microbiology held this past
28
Spring in New Orleans, LA, USA. What follows is a collection of the most salient and promising
29
aspects from each presentation at the symposium.
30 31 32 33 34 35 36 37 38 39 40 41 2
42
Introduction
43
The explosion of microbiome research is driven by high-throughput DNA sequencing, so-called
44
“next-generation sequencing” (NGS), technologies that allow the genomic content of entire
45
microbial communities (bacterial, viral, and eukaryotic organisms) to be described. Although
46
much of this work is aimed at describing the structure of “commensal” communities, the
47
methodology works equally well to identify pathogens in clinical samples. The key concept in
48
using NGS methodology is that detection of microbes is independent of culture and is not limited
49
to targets used for polymerase chain reaction (PCR) assays. Rather, it is a process of:
50
generating large-scale sequence data sets that adequately sample a specimen for microbial
51
content and then applying computational methods to resolve the sequences into individual
52
species, genes, pathways, or other features.
53
Most microbiome analyses have focused on describing bacterial content and this is usually
54
performed by sequencing the 16S rRNA gene. PCR primers with degenerative sequences are
55
used to amplify all or part of the 16S rRNA gene from a broad range of species in the sample.
56
The mix of amplicons generated from different organisms in the community is then sequenced
57
and the abundance of each species is determined by the number of sequences found for its
58
respective 16S rRNA gene. Although this is useful for defining communities, it is also affords the
59
identification of pathogens with unique 16S rRNA sequences.
60
The sensitivity and specificity of this method is determined in large part by the NGS technology.
61
Before NGS, the full-length 16S rRNA gene was sequenced with the high quality, 700 base-long
62
reads of Sanger, or chain termination, sequencing (sometimes referred to as “first-generation”
63
sequencing technology). This was laborious and expensive and deep sampling was not possible.
64
When NGS became available most work was done on the FLX sequencing instrument (a
65
second-generation sequencing technology) from 454 Life Sciences (Roche Diagnostics, 3
66
Indianapolis, IN). This only permitted 400 base-long sequencing reads and only a portion of the
67
16S rRNA gene was sequenced. The 16S rRNA gene has nine hypervariable regions that
68
provide much of the specificity in species identification. With 454 sequencing typically only three
69
of these regions could be sequenced. But nevertheless this allowed detection to the genus level
70
of most taxa. This methodology can correctly identify pathogens in stool samples from patients
71
with diarrhea, as compared to culture results (GW unpublished results). In addition, using this
72
NGS approach an additional pathogen that was not reported by the diagnostic laboratory in 15%
73
of the samples was identified.
74
Recently, 16S rRNA gene sequencing has moved to the MiSeq and HiSeq sequencing
75
instruments from Illumina (San Diego, CA). This is in part due to the closing of the 454 Life
76
Sciences company and the higher data production and lower cost of the Illumina instruments.
77
These instruments produce shorter reads (100-300 bases) and thus further limits the amount of
78
the 16S rRNA gene that can be sampled, often limited to a single hypervariable region.
79
However, organism identification is possible as a result of shotgun sequencing of several
80
hypervariable regions.
81
A new alternative to Illumina has been developed using the Pacific Biosciences RSII
82
sequencing platform, which is often referred to as a third-generation sequencing technology
83
(PacBio, Menlo Park, CA). With PacBio sequencing, much longer sequence reads are possible
84
and full-length 16S rRNA gene sequencing can now be accomplished at higher data output,
85
lower cost, and much greater convenience than was possible with Sanger sequencing. This
86
methodology is still more expensive than Illumina’s platform but bodes for continued
87
improvement in the use of 16S rRNA gene sequencing for microbiome analysis.
88
The alternative to focusing on the 16S rRNA gene for microbiome analysis is shotgun
89
sequencing of the sample so that all parts of the genome are sequenced. Whereas the 16S 4
90
rRNA gene is only found in bacteria, shotgun sequencing is agnostic and archaebacterial,
91
viruses, and eukaryotic microbes are also sampled. This is often referred to as metagenomic
92
shotgun sequencing since all genomes (the metagenome) are sequenced. This approach
93
requires many more sequencing reads than with 16S rRNA gene sequencing to adequately
94
sample the genomes, and thus only the sequencing platforms that produce the most data are
95
used (Illumina HiSeq and NextSeq instruments). This methodology is significantly more
96
expensive than 16S rRNA gene sequencing and this has also limited its use. But metagenomic
97
shotgun sequencing also allows for antibiotic resistance genes to be detected, as well as
98
virulence factors and other features that could help distinguish a pathogen at the strain level
99
from other non-pathogenic members of a species. Shotgun sequencing is also used for analysis
100
of RNA, either to identify RNA viruses or for transcriptional analysis. In this case,
101
complementary DNA is generated and then NGS is performed. Metagenomic transcription
102
analysis is particularly noteworthy as this method determines which organisms are actively
103
growing and/or whether a gene of interest (antibiotic resistant determinant) is expressed, and
104
thus contributing to the organism’s phenotype.
105
Although use of metagenomics shotgun sequencing is limited by the output and cost required,
106
trends in DNA sequencing technology continue to emphasize instruments that are smaller,
107
faster, and lower cost. The MinION™ instrument from Oxford Nanopore Technologies (Oxford,
108
UK) is a handheld sequencing instrument, and although they are still in development phase,
109
they have been used to sequence bacterial and viral samples (1,2). Thus one can expect
110
continued development in this area and more routine use of these methods in the future for
111
routine diagnostic microbiology.
112
Unbiased Infectious Disease Diagnostics
5
113
Conventional diagnostic methods such as PCR, serology, or microbial culture have been
114
validated and standardized over decades, and continue to represent the gold standard for
115
infectious disease diagnostics. However, while generally cost-effective and robust, these
116
methods share a common limitation: they represent targeted detection approaches and require
117
an accurate initial hypothesis as to the type of pathogen(s) that may be present in a sample of
118
interest. Their narrow scope, especially for PCR- and serology-based methods, is likely one of
119
the reasons why conventional diagnostic tests fail to detect a causative agent in a significant
120
number of cases (3-5). Recently established mass spectrometry-based approaches are less
121
biased, but in most cases still require culture of the infectious agent, thus precluding
122
identification of viruses or other pathogens which are difficult to grow in culture. In contrast, with
123
the advent of NGS technologies it is now possible to perform direct sequencing of DNA or RNA
124
isolated from primary diagnostic material. Hence, metagenomic shotgun sequencing has the
125
potential to fundamentally improve infectious disease diagnostics by allowing broad-range
126
detection of bacterial, viral, fungal, or parasitic agents in a single assay (Figure 1) (6-10).
127
Moreover, it extends the exciting possibility to detect pathogen sequences with only distant
128
homology to existing database entries, or to even identify entirely novel infectious agents.
129
In recent years, the steadily decreasing cost for NGS infrastructure and reagents as well as
130
development of increasingly simplified library preparation workflows have made the
131
establishment of NGS platforms in clinical labs technically feasible. However, a number of
132
challenges still hinder the widespread use of this technique in infectious disease diagnostics.
133
One of the most fundamental requirements is the development of analysis software that is
134
streamlined towards the needs of diagnostic laboratories. Although a number of open-source
135
analysis pipelines for NGS-based pathogen detection are available, their use often requires a
136
significant degree of bioinformatic expertise that is typically not available in clinical laboratories.
137
To facilitate clinically actionable diagnostics, appropriate software solutions must also strike a 6
138
reasonable balance between analytical depth and processing time, and deliver results within
139
hours rather than days (or even weeks). Furthermore, whereas samples subject to truly
140
hypothesis-free clinical diagnostics will require pathogen identification across all taxa, the
141
majority of existing pipelines are designed with an emphasis on either viral or bacterial
142
sequences. Currently available commercial software solutions are likewise limited to the
143
analysis of amplicon sequencing of conserved bacterial genes (e.g., 16S rRNA gene) and
144
therefore are generally unable to detect viral, fungal, or parasitic agents. One of the few publicly
145
available pipelines that has been specifically designed for use in clinical diagnostics is SURPI, a
146
platform for the unbiased detection of infectious agents in shotgun sequencing data that has
147
been used to identify viral or bacterial agents in primary diagnostic material (11-13). Clearly,
148
further refinement of this and other pipelines, preferentially with a graphical user interface that
149
facilitates interpretation by non-informatics personnel, will be a pivotal requirement for the future
150
implementation of NGS in infectious disease diagnostics.
151
At present, there is also a profound lack of harmonization and universally recognized standards
152
for NGS-based microbial diagnostics, a fact which is not surprising given that NGS is still a
153
relatively young technique. While a number of studies have proven the technique’s ability to
154
identify diverse pathogens directly from clinical material, and in some instances in a clinically
155
actionable timeframe (11-16), substantially more empirical data will have to be collected to
156
address a number of open questions. For example, given that shotgun sequencing usually only
157
recovers snippets of genomic information rather than whole genomes, what are the
158
requirements to call the presence of a specific infectious agent to a given taxonomic level?
159
Since it is often not possible to unequivocally assign fragments to a single species, and since
160
current second-generation high-throughput DNA sequencers utilize PCR amplification and thus
161
can only deliver relative rather than total abundance values, how should one arrive at a
162
reasonably meaningful abundance estimation for individual infectious agents? How should one 7
163
deal with potential contaminants, especially those nucleic acids which are frequently introduced
164
via library preparation kits? (17) Considering that not only the choice of the sequencing platform,
165
but also library preparation methods as well as sample matrix composition can have a dramatic
166
impact on the ability to recover infectious agent sequences, what are the read depths at which
167
different diagnostic sample entities should be sequenced, and what are the limits of detection
168
that should be expected for individual pathogens? Resolving these questions and other issues
169
will not only take time, but also require a significant number of systematic multi-center studies
170
with large sample cohorts. Establishment of novel databases that are rigorously annotated and
171
provide either primary read or assembled contig sequences together with clinical metadata
172
would also be an invaluable resource as they would greatly facilitate the identification of
173
‘unusual’ sequence signatures that could indicate the presence of putative pathogens, even if
174
such sequences do not exhibit any recognizable homology to taxonomically classified infectious
175
agents.
176
Given the number of issues that still need to be addressed, conventional methods for routine
177
diagnostics are unlikely to be completely replaced by unbiased NGS anytime soon. For the
178
investigation of challenging clinical cases or outbreak samples, however, it has already become
179
an invaluable complement to conventional tests. In view of its tremendous potential and the
180
rapid technological developments, including steadily increasing throughput of second-generation
181
sequencers and the availability of the first third-generation sequencing units that are small
182
enough to be taken into the field (1), it is clear that unbiased NGS will become an essential
183
instrument in the toolbox of clinical infectious disease diagnostics.
184 185 186 8
187
Antimicrobial Susceptibility Testing Using Next-Generation Methods
188
Over the past century, antimicrobial susceptibility testing (AST) has been dominated by
189
phenotypic approaches. Assays are largely based on the detection of microbial growth. These
190
strategies utilize solid or liquid culture media where the concentration of antimicrobial agent is
191
adjusted to permit definition of minimum bactericidal or bacteriostatic (collectively, inhibitory)
192
concentrations. Formats for such measurements include agar dilution, broth microdilution (BMD),
193
antibiotic gradient diffusion, selective chromogenic media, and ultimately, automated systems
194
such as the Beckman Coulter MicroScan Walkaway™ (Brea, CA, US), the Becton, Dickinson
195
and Company Phoenix™ (Sparks, MD, US) and the bioMérieux VITEK®2 (Marcy I'Etoile,
196
France).
197
Recently, new approaches have been adapted to growth-based AST technology, and most deal
198
with innovative means of distinguishing growing from inhibited/dead microorganisms. These
199
include the use of microfluidics (nanodrop BMD), mass spectrometry (including MALDI-TOF),
200
cantilever technology, micro-calorimetrics, nuclear magnetic resonance and magnetic bead
201
rotation, real-time microscopy, and intrinsic fluorescence to name a few (for a recent review see
202
[18]). All these approaches are promising and beyond the proof of principle stage, but none
203
have entered the current in vitro diagnostic market.
204
Whether nucleic acid-based methods can serve as a proxy for growth-based AST methods has
205
yet to be thoroughly vetted for many clinically relevant species (19). These methods excel in
206
resistance gene detection but equating a resistance gene to an actual minimum inhibitory
207
concentration value is still a work in progress. This may change as high-throughput genomics
208
including NGS and transcriptomics become increasingly accessible, with transcriptomic analysis
209
of stress marker expression (e.g., the SOS response) potentially offering an opportunity to relate
210
molecular AST with phenotypic susceptibility data (20). 9
211
To better understand the potential value of NGS for AST, recent studies have shown that
212
associations between phenotypic resistance profiles (antibiograms) and genotypic resistance
213
predicted from whole genome sequencing (WGS) data can be accurately defined. Using
214
genome sequence information, an inventory of all known antibiotic resistance determinants,
215
including mutations within protein-coding and -noncoding regions (e.g., regulatory elements),
216
can be obtained (21). This generates a global view of the bacterial “resistome” that can be used
217
to assess the presence/absence of such genes and mutations in de novo microbial genome
218
sequences. When comparing the Staphylococcus aureus resistome to a comprehensive
219
reference antibiogram for a development set of >1,000 strains and an equally sized validation
220
set, the documented percentages of major errors (ME: predicted to be resistant but
221
phenotypically susceptible) and very major errors (VME predicted to be susceptible but
222
phenotypically resistant) associated with genotypic antibiotic resistance prediction were 0.2%
223
and 1.1%, respectively (unpublished data). This is in the same range, or better, than that
224
demonstrated for commercial AST systems. Additional studies have demonstrated the
225
applicability of this approach for other organisms, but for species that are genetically more
226
heterogeneous than S. aureus, the levels of ME and VME were higher (22). At present, from a
227
routine laboratory workflow and regulatory standpoint, automated AST systems are better suited
228
for clinical diagnostics; however, with ever decreasing overheads and further maturation of
229
resistome databases, WGS AST may become increasingly more competitive and invasive in the
230
clinical management of patients (23). In addition, these approaches could promote the discovery
231
and characterization of new and emerging antibiotic resistance mechanisms, which will broaden
232
the reliability of WGS AST, and could stimulate the discovery of novel antibiotics.
233
Despite the obvious optimism surrounding NGS AST platforms, prior to their routine
234
implementation in the clinical setting there are several important aspects that must be
235
addressed: i) establishment of tightly regulated genomic databases. These databases will need 10
236
continuous update, and perhaps supplementation with phenotypic, metabolomic, clinical, and
237
outcome data to accommodate the emergence of antimicrobial resistance; ii) implementation of
238
robust, reproducible testing methodologies that generate data in a clinically actionable time
239
frame; iii) development of interpretative guidelines specific for these data (24); iv) approval by
240
various regulatory bodies; v) the expense of such testing compared to phenotypic AST. Clearly,
241
there must be extensive collaboration between academic, corporate, and regulatory bodies to
242
ensure NGS-based AST moves into practice to combat the frightening frequency that multi- and
243
pan-drug-resistant isolates are isolated (25). Importantly, WGS AST will also provide the identity
244
of the offending microorganism, its virulence potential, and epidemiological typing.
245 246
Human Microbiome as a Diagnostic and Prognostic Marker of Disease
247
With the advent of benchtop high-throughput DNA sequencing platforms and accessible
248
computational tools, definition of the composition and abundance of microbes and their
249
genomes (i.e., the microbiome) in a given anatomical environment has been greatly facilitated.
250
Utilizing these high-throughput DNA sequencing platforms, numerous studies have linked the
251
structure of the microbiome, in particular the gastrointestinal microbiome, with human disease,
252
including obesity (26), type 2 diabetes (27), bacterial infection (28), and cancer (29), and with
253
malnutrition (30) and the metabolism of drugs (31). Consequently, survey of an individual’s
254
microbiome using high-throughput DNA sequencing methodologies could be diagnostic for a
255
given disorder and, possibly, prognostic of the likely disease course. However, to account for
256
the extensive microbial variation within and between individuals, it is essential these data are
257
controlled by comparison with microbiome data obtained from healthy and diseased persons
258
spanning a wide geographic and ethnic range.
11
259
The mammalian gastrointestinal microbial flora elicits a number of key functions, not least the
260
development of the immune system (32) and protection against colonization by antibiotic-
261
resistant microorganisms (33). Administration of antibiotics can perturb this fragile ecological
262
niche (34), resulting in colonization with antibiotic-resistant organisms or enhanced risk of
263
intestinal infection with Clostridium difficile (33). Microbes that undergo marked expansion in the
264
intestine as a result of antibiotic exposure have been associated with invasive bloodstream
265
infection. To explore a possible relationship between dense intestinal colonization and
266
bloodstream invasion in humans, investigators have performed NGS sequencing of DNA
267
extracted from fecal specimens obtained from subjects undergoing allogeneic hematopoietic
268
stem cell transplantation (allo-HSCT) was performed (28). Remarkably, intestinal domination of
269
the
270
Enterococcus faecium, preceded bloodstream invasion in this cohort. Enterococci, streptococci,
271
and various Proteobacteria, which include members of the family Enterobacteriaceae, were
272
found to undergo expansion in the gut following antibiotic treatment. Enterococcal intestinal
273
domination was associated with prior metronidazole administration, and increased the risk of
274
vancomycin-resistant Enterococcus bacteremia nine-fold. Similarly, proteobacterial domination
275
resulted in a five-fold increase in the risk of Gram-negative bacteremia, while dominance was
276
reduced 10-fold by fluoroquinolone treatment.
277
In an extension of this work, the diversity of the intestinal microbiota was demonstrated to be
278
predictive of mortality in allo-HSCT recipients (35). By analyzing the microbiota of fecal
279
specimens collected from 80 subjects at the time of stem cell engraftment, it was possible to
280
stratify subjects into high, intermediate, and low microbial diversity groups. Strikingly, overall
281
survival three years after allo-HSCT was 36%, 60%, and 67% for the low, intermediate, and
282
high diversity groups, respectively; implying that high intestinal microbial diversity is prognostic
283
of
gut
with
favorable
a
single
clinical
predominant
outcomes.
antibiotic-resistant
Additionally, 12
species, Vancomycin-resistant
commensal
members
of
the
families
284
Lachnospiraceae and Actinomycetaceae were associated with survival, while Gram-negative
285
bacteria from the phylum Proteobacteria were positively correlated with mortality.
286
Exposure to antibiotics is related to C. difficile infection (33,36), a major cause of infectious
287
diarrhea in hospitalized patients (37). To combat this public health threat, high-throughput DNA
288
sequencing of the fecal microbiota of mice and hospitalized patients treated with antibiotics was
289
utilized to identify C. difficile resistance-associated bacterial species (36). The species with the
290
strongest resistance correlation was Clostridium scindens, which dramatically reduced C.
291
difficile infection, and attendant weight loss and mortality, in an animal model when transferred
292
alone or as part of a microbial consortium post-antibiotic exposure. The mechanism of C.
293
difficile inhibition centers on the C. scindens-dependent conversion of primary into secondary
294
bile acids in the cecum and colon. These data suggest C. scindens offers promise as an
295
alternative treatment option for C. difficile-mediated intestinal disease.
296
In addition to its capacity as a marker for intestinal disease, the gut microbiome has potential as
297
a diagnostic and prognostic marker for systemic diseases, such as rheumatoid arthritis (38). To
298
identify and validate microbial species allied with rheumatoid arthritis, high-throughput 16S
299
rRNA gene sequencing of DNA extracted from 114 stool specimens obtained from patients with
300
rheumatoid arthritis and controls was performed (39). In the setting of untreated new-onset
301
rheumatoid arthritis, Prevotella copri was considerably more abundant than in healthy
302
individuals, signifying that P. copri could play a role in the pathogenesis of rheumatoid arthritis.
303
The increase in Prevotella correlated with reduction in Bacteroides and loss of reportedly
304
beneficial microbes. Similarly, the gut microbiota of patients with psoriatic arthritis and skin
305
psoriasis was observed to be less diverse compared to healthy controls (40). Whereas some
306
genera were less abundant in both conditions, psoriatic arthritis patients had a lower abundance
307
of reportedly beneficial microbes. Taken together, these data suggest that interrogation of the 13
308
gut microbiome could be of diagnostic and prognostic utility for arthritis and other systemic
309
ailments.
310 311
The Role of Clinicogenomics in Public Health Microbiology
312
Over the past 50+ years, public health microbiology, (“public health microbiology version 1.0” )
313
was constrained with complex and labor-intensive workflows and protocols for microbial culture,
314
identification, growth-based phenotypic susceptibility testing, and strain typing (41). Recently,
315
high-throughput DNA sequencing, particularly bench-top sequencing, has brought many new
316
opportunities to this field (42, 43-45) and allows bacterial genomics to be integrated into what
317
might be called “public health microbiology version 2.0 (v2.0) through whole-genome
318
sequencing (WGS) of cultured isolates to provide simultaneous information on organism identity,
319
epidemiology, and antimicrobial therapy (Figure 2).
320
As a practical example of public health microbiology v2.0, a recent case study describes how
321
WGS was applied to a protracted hospital outbreak of multi-drug-resistant Acinetobacter
322
baumannii in Birmingham, England (46). The results showed that the outbreak strain was
323
distinct from previously genome-sequenced strains and enabled the identification of seven
324
major genotypic clusters within the outbreak. WGS also allowed the investigative team to rule
325
17 initially suspicious isolates as unrelated to the outbreak strain. Analysis of genomic data
326
documented within-host diversity in several patients, including mixtures of unrelated strains and
327
within-strain genetic diversity. Using WGS data and conventional epidemiology, the study team
328
was able to reconstruct potential transmission events that linked all but seven of the patients
329
and could also associate patient isolates to those recovered from the environment. WGS
330
focused attention on a contaminated bed and on a burns unit as sources and sites of 14
331
transmission, catalyzing improvements in decontamination protocols. This approach has also
332
been adopted for the WGS of Mycobacterium tuberculosis isolates (47).
333
To fast forward into the near future (public health microbiology v2.1), it is plausible that culture
334
of bacterial isolates might in some settings be replaced by shotgun metagenomic sequencing of
335
clinical samples. There are several potential advantages of “diagnostic metagenomics”: (10) it
336
represents a one-size-fits-all approach to all bacteria that contrasts with the need for so many
337
different laboratory media and atmospheric conditions in conventional bacteriology; it avoids the
338
onerous optimization of target-specific assays needed for amplification- or probe-based
339
diagnosis; it is unbiased and open-ended, i.e., not restricted to finding only what you expected
340
to find. A second case study highlights this approach in which metagenomics was applied to
341
fecal samples obtained from patients with diarrhea during the 2011 outbreak of Shiga-toxin-
342
producing Escherichia coli (STEC) O104:H4 in Germany (16). The investigative team obtained
343
the genome of the STEC outbreak strain from ten samples at greater than ten-fold coverage
344
and from over two-dozen samples at greater than one-fold coverage. In several samples, they
345
found an increased coverage of the Shiga toxin bacteriophage genome relative to other STEC
346
sequences. From some samples, they recovered sequences from Clostridium difficile,
347
Campylobacter jejuni, and Salmonella enterica, and from one, they recovered sequences from
348
the emerging human pathogen Campylobacter concisus, illustrating the ability of metagenomics
349
to deliver unexpected results.
350
Metagenomic analysis has also be applied to the recovery of M. tuberculosis genomes from
351
both historical and contemporary human samples and the results have shown that mixed
352
infections were common in 18th Century Europe. Further, in a proof-of-principle study, the same
353
process was used to identify and characterize pathogenic mycobacteria in modern sputum
15
354
samples (48-50). There have been several other recent proof-of-principle studies demonstrating
355
the utility of this diagnostic approach (13, 15, 51,52)
356
We can envisage an even more ambitious vision for public health microbiology v3.0, in which
357
long-read single-molecule nanopore sequencing will enable an integrated approach to
358
“macromolecular monitoring”, combining analysis of DNA, RNA, and proteins shed in urine and
359
feces together with characterization of informational macromolecules circulating in the
360
bloodstream to provide information not just on infection but also on, for example, cancer and the
361
health of the fetus or of organ transplants (53-57).
362
However, there will be a need for a new computational infrastructure to cope with the demands
363
of big data in clinical microbiology, including a role of cloud computing (58), illustrated by the
364
CLIMB (CLoud Infrastructure for Microbial Bioinformatics) project supported by the UK’s Medical
365
Research Council (59).
366 367
Conclusion
368
Based on the discussions above, next-generation sequencing will steadily work its way into
369
routine diagnostic use within the clinical and public health laboratories over the coming years.
370
This prediction, albeit not entirely in the near future, is based on the universality of the science,
371
i.e., its applicability to the diagnosis of infectious processes and resistance markers in an
372
unbiased fashion for all manner of microorganisms be they viral, bacterial, fungal, or parasitic.
373
Furthermore, it will allow for the ability to monitor changes in the human (or animal) microbiome
374
that forecasts potential risk for, or the existence of other, noninfectious disease processes thus
375
allowing earlier intervention or avoidance – perhaps even alternative treatment modalities.
376
While most of this review centers on the use of NGS and all the analytical permutations that 16
377
have been developed in conjunction with it, we can likely expect more user-friendly distillations
378
of these studies (i.e., multiplex PCR assays) to appear in clinical laboratories in the near future.
379
And this road will provide a fascinating journey indeed.
380 381
References
382
1. Quick J, Ashton P, Calus S, Chatt C, Gossain S, Hawker J, Nair S, Neal K, Nye K, Peters T,
383
De Pinna E, Robinson E, Struthers K, Webber M, Catto A, Dallman TJ, Hawkey P, Loman NJ.
384
2015. Rapid draft sequencing and real-time nanopore sequening in a hospital outbreak of
385
Salmonella. Genome Biol. 16:114
386
2. Judge K, Harris SR, Teuter S, Parkhill J, Peacock SJ. 2015. Early insights into the potential of
387
the Oxford Nanopore MinION for the detection of antimicrobial resistance genes. J Antimicrob
388
Chemother 70:2775-2778.
389
3. Ambrose HE, Granerod J, Clewley JP, Davies NW, Keir G, Cunningham R, Zuckerman M,
390
Mutton KJ, Ward KN, Ijaz S, Crowcroft S, Brown DW, and U.K.A.o.E.S. group. 2011. Diagnostic
391
strategy used to establish etiologies of encephalitis in a prospective cohort of patients in
392
England. J Clin Microbiol 49:3576-3583.
393
4. Denno DM, Shaikh N, Stapp JR, Qin X, Hutter CM, Hoffman V, Mooney JC, Wood KM,
394
Stevens HJ, Jones R, Tarr PI, Klein EJ. 2012. Diarrhea etiology in a pediatric emergency
395
department: a case control study. Clin Infect Dis 55:897-904.
396
5. Louie JK, Hacker JK, Gonzales R, Mark J, Maselli JH, Yagi S, Drew WL. 2005.
397
Characterization of viral agents causing acute respiratory infection in a San Francisco University
398
Medical Center during the influenza season. Clin Infect Dis 41:822-828.
17
399
6. Barzon L, Lavezzo E, Constanzi G, Franchin E, Toppo S, Palu G. 2013. Next-generation
400
sequencing technologies in diagnostic virology. J Clin Virol 58:346-350.
401
7. Chiu CY. 2013. Viral pathogen discovery. Curr Op Microbiol. 16:468-478.
402
8. Dunne,Jr. WM, Westblade LF, Ford B. 2012. Next-generation and whole genome sequencing
403
in the diagnostic clinical microbiology laboratory. Eur J Clin Microbiol Infect Dis 31:1719-1726.
404
9. Miller RR, Montoya V, Gardy JL, Patrick DM, Tang P. 2013. Metagenomics for pathogen
405
detection in public health. Genome Med 5:81.
406
10. Pallen MJ. 2014. Diagnostic metagenomics: potential applications to bacterial, viral, and
407
parasitic infections. Parasitology 141:1856-1862.
408
11. Greninger AL, Naccache SN, Messacar K, Clayton A, Yu G, Somasekar S, Federman S,
409
Stryke D, Anderson C, Yagi S, Messenger S, Wadford D, Xia D, Watt JP, van Haren K,
410
Dominguez SR, Glaser C, Aldrovandi G, Chiu CY. 2015. A novel outbreak enterovirus D68
411
strain associated with acute flaccid myelitis cases in the USA (2012-2014): a retrospective
412
cohort study. Lancet Infect Dis 15:671-682.
413
12. Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, Bouquet J,
414
Greninger AL, Luk KC, Enge B, Wadford DA, Messenger SL, Genrich GL, Pellegrino K, Grard G,
415
Leroy E, Schneider BS, Fair JN, Martinez MA, Isa P, Crump JA, DeRisi JL, Sittler T, Hackett Jr.
416
J, Miller S, Chiu CY. 2014. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen
417
identification from next-generation sequencing of clinical samples. Genome Res 24:1180-1192.
418
13. Wilson MR, Naccache SN, Samayoa E, Biagtan M, Bashir H, Yu G, Salamat SM,
419
Somasekar S, Federman S, Miller S, Sololic R, Garabedian E, Candotti F, Buckley RH, Reed
420
KD, Meyer TL, Seroogy CM, Galloway R, Henderson SL, Gern JE, DeRisi JL, Chiu CY. 2014. 18
421
Actionable diagnosis of neuroleptospirosis by next-generation sequencing. N Engl J Med
422
370:2408-2417.
423
14. Fischer N, Indenbirken D, Meyer T, Lutgehetmann M, Lellek H, Spohn M, Aepfelbacher M,
424
Alawi M, Grundoff A. 2015. Evaluation of unbiased next-generation sequencing of RNA (RNA-
425
seq) as a diagnostic method in influenza virus-positive respiratory samples. J Clin Microbiol
426
53:2238-2250.
427
15. Fischer N, Rohde H, Indenbirken D, Gunther T, Reumann K, Lutgehetmann M, Meyer T,
428
Kluge S, Aepfelbacker M, Alawi M, Grundoff A. 2014. Rapid metagenomic diagnostics for
429
suspected outbreak of severe pneumonia. Emerg Infect Dis 20:1072-1075.
430
16. Loman NG, Constantinidou C, Christner M, Rohde H, Chan JZ, Quick J, Weir JC, Quince C,
431
Smith GP, Betley JR, Aepfelbacher M, Pallen MJ. 2013. A culture-independent sequence-based
432
metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli
433
O104:H4. JAMA 309-1502-1510.
434
17. Salter SJ, Cox MH, Turek EM, Calus ST, Cookson WO, Moffatt MF, Turner P, Parkhill J,
435
Loman NJ, Walker AW. 2014. Reagent and laboratory contamination can critically impact
436
sequence-based microbiome analysis. BMC Biology 12:87.
437
18. van Belkum A, Dunne WM Jr. Next-generation antimicrobial susceptibility testing. 2013. J
438
Clin Microbiol. 51:2018-2024.
439
19. Cangelosi GA, Meschke JS. 2014. Dead or alive: molecular assessment of microbial viability.
440
Appl Environ Microbiol. 80:5884-5891.
441
20. Barczak AK, Gomez JE, Kaufmann BB, Hinson ER, Cosimi L, Borowsky ML, Onderdonk AB,
442
Stanley SA, Kaur D, Bryant KF, Knipe DM, Sloutsky A, Hung DT. RNA signatures allow rapid 19
443
identification of pathogens and antibiotic susceptibilities. 2012. Proc Natl Acad Sci U S A.
444
109:6217-6222.
445
21. Walker TM, Kohl TA, Omar SV, Hedge J, Del Ojo Elias C, Bradley P, Iqbal Z, Feuerriegel S,
446
Niehaus KE, Wilson DJ, Clifton DA, Kapatai G, Ip CL, Bowden R, Drobniewski FA, Allix-Béguec
447
C, Gaudin C, Parkhill J, Diel R, Supply P, Crook DW, Smith EG, Walker AS, Ismail N, Niemann
448
S, Peto TE; Modernizing Medical Microbiology (MMM) Informatics Group. 2015. Whole-genome
449
sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a
450
retrospective cohort study. Lancet Infect Dis. 2015 Jun 23. pii: S1473-3099:62-66.
451
22. Holt KE, Wertheim H, Zadoks RN, Baker S, Whitehouse CA, Dance D, Jenney A, Connor
452
TR, Hsu LY, Severin J, Brisse S, Cao H, Wilksch J, Gorrie C, Schultz MB, Edwards DJ, Nguyen
453
KV, Nguyen TV, Dao TT, Mensink M, Minh VL, Nhu NT, Schultsz C, Kuntaman K, Newton PN,
454
Moore CE, Strugnell RA, Thomson NR. 2015. Genomic analysis of diversity, population
455
structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to
456
public health. Proc Natl Acad Sci U S A. 112:E3574-3581.
457
23. Wright GD.The antibiotic resistome: the nexus of chemical and genetic diversity. 2007. Nat
458
Rev Microbiol. 5:175-186.
459
24. Kahlmeter G. 2015. The 2014 Garrod Lecture: EUCAST - are we heading towards
460
international agreement? J Antimicrob Chemother. 70:2427-2439.
461
25. Nathan C, Cars O. 2014. Antibiotic resistance--problems, progress, and prospects. N Engl J
462
Med. 371:1761-1763.
463
26. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML,
464
Jones WJ, Roe BA, Affourtit JP, Egholm M, Henrissat B, Heath AC, Knight R, Gordon JI. 2009.
465
A core gut microbiome in obsese and lean twins. Nature 457:480-4 20
466
27. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, Liang S, Zhang W, Guan Y, Shen D, Peng Y, Zhang
467
D, Jie Z, Wu W, Qin Y, Xue W, Li J, Han L, Lu D, Wu P, Dai Y, Sun X, Li Z, Tang A, Zhong S, Li
468
X, Chen W, Xu R, Wang M, Feng Q, Gong M, Yu J, Zhang Y, Xhang M, Hansen T, Sanchez G,
469
Raes J, Falony G, Okuda S, Almeida M, LeChatelier E, Renault P, Pons N, Batto JM, Zhang Z,
470
Chen H, Yang R, Zheng W, Li S, Yang H, Wang J, Ehrlich SD, Nielsen R, Pedersen O,
471
Kristiansen K, Wang J. 2012. A metagenome-wide association study of gut microbiota in type 2
472
diabetes. Nature 490:55-60
473
28. Taur Y, Xavier JB, Lipuma L, Ubeda C, Goldberg J, Gobourne A, Lee YJ, Dubin KA, Socc
474
ND, Viale A, Perales MA, Jenq RR, van den Brink MR, Pamer EG. 2012. Intestinal domination
475
and the risk of bacteremia in patients undergoing allogeneic hematopoietic stem cell
476
transplantation. Clin Infect Dis 55:905-14
477
29. Ahn J, Sinha R, Pei Z, Dominianni C, Wu J, Shi J, Goedert JJ, Hayes RB, Yang L. 2013.
478
Human gut microbiome and risk for colorectal cancer. J Natl Cancer Inst 18:1907-11.
479
30. Smith MI, Yatsunenko T, Manary MJ, Trehan I, Mkakosya R, Cheng J, Kau AL, Rich SS,
480
Concannon P, Mychaleckyj JC, Liu J, Houpt E, Li JV, Holmes E, Nicholson J, Knights D, Ursell
481
LK, Knight R, Gordon JI. 2013. Gut microbiomes of Malawian twin pairs discordant for
482
kwashiorkor. Science 339:548-54
483
31. Haiser HJ, Gootenberg DB, Chatman K, Sirasani G, Balskus EP, Turnbaugh PJ. 2013.
484
Predicting and manipulating cardiac drug inactivation by the human gut bacterium Eggerthella
485
lenta. Science 341:295-8
486
32. Cebra JJ. 1999. Influences of microbiota on intestinal immune system development. Am J
487
Clin Nutr 69:1046S-51S
21
488
33. Buffie CG, Pamer EG. 2013. Microbiota-mediated colonization resistance against intestinal
489
pathogens. Nat Rev Immunol 13:790-801
490
34. Dethlefsen L, Huse S, Sogin ML, Relman DA. 2008. The pervasive effects on an antibiotic
491
on the human gut microbiota, as revealed by deep 16S rRNA sequencing. PLoS Biol 6:e280
492
35. Taur Y, Jenq RR, Perales MA, Littmann ER, Morjaria S, Ling L, No D, Gobourne A, Viale A,
493
Dahi PB, Ponce DM, Barker JN, Giralt S, van den Brink M, Pamer EG. 2014. The effects of
494
intestinal tract bacterial diversity on mortality following allogeneic hematopoietic stem cell
495
transplantation. Blood 124:1174-82
496
36. Buffie CG, Bucci V, Stein RR, McKenney PT, Ling L, Gobourne A, No D, Liu H, Kinnebrew
497
M, Viale A, Littmann E, van den Brink MR, Jenq RR, Taur Y, Sander C, Cross JR, Toussaint NC,
498
Xavier JB, Pamer EG. 2015. Precision microbiome reconstitution restores bile acid mediated
499
resistance to Clostridiumk difficile. Nature 517:205-8
500
37. Rupnik M, Wilcox MH, Gerding DH. 2009. Clostridium difficile infection: new developments
501
in epidemiology and pathogenesis. Nat Rev Microbiol 7:526-36
502
38. Scher JU, Abramson SB. 2011. The microbiome and rheumatoid arthritis. Nat Rev
503
Rheumatol 7:569-78
504
39. Scher JU, Sczesnak A, Longman RS, Segata N, Ubeda C, Bielski C, Rostron T, Cerundolo
505
V, Pamer EG, Abramson SB, Huttenhower C, Littman DR. 2013. Expansion of intestinal
506
Prevotella copri correlates with enhanced susceptibility to arthritis. Elife 2:e01202
507
40. Scher JU, Ubeda C, Artacho A, Attur M, Isaac S, Reddy SM, Marmon S, Neimann A, Brusca
508
S, Patel T, Manasson J, Pamer EG, Littman DR, Abramson SB. 2015. Decreased bacterial
22
509
diversity characterizes the altered gut microbiota in patients with psosiatic arthritis, resembling
510
dysbiosis in inflammatory bowel disease. Arthritis Rheumatol 67:128-39
511
41. Didelot X, Bowden R, Wilson DJ, Peto TE, Crook DW. 2012. Transforming clinical
512
microbiology with bacterial genome sequencing. Nat Rev Genet 13:601–612.
513
42. Loman NJ, Constantinidou C, Chan JZ, Halachev M, Sergeant M, Penn CW, Robinson
514
ER, Pallen MJ. 2012. High-throughput bacterial genome sequencing: an embarrassment of
515
choice, a world of opportunity. Nat Rev Microbiol 10:599–606.
516
43. Pallen MJ, Loman NJ. 2011. Are diagnostic and public health bacteriology ready to
517
become branches of genomic medicine? Genome Med 3:53.
518
44. Pallen MJ, Loman NJ, Penn CW. 2010. High-throughput sequencing and clinical
519
microbiology: progress, opportunities and challenges. Curr Opin Microbiol 13:625–631.
520
45. Robinson ER, Walker TM, Pallen MJ. 2013. Genomics and outbreak investigation: from
521
sequence to consequence. Genome Med 5:36.
522
46. Halachev MR, Chan JZ, Constantinidou CI, Cumley N, Bradley C, Smith-Banks M,
523
Oppenheim B, Pallen MJ. 2014. Genomic epidemiology of a protracted hospital outbreak
524
caused by multidrug-resistant Acinetobacter baumannii in Birmingham, England. Genome Med
525
6:70.
526
47. Heart of England NHS Foundation Trust. 2014. TB genomics service pilot project.
23
527
http://www.heftpathology.com/item/tb-genomics-pilot-scheme.html
528
48. Chan JZ, Sergeant MJ, Lee OY, Minnikin DE, Besra GS, Pap I, Spigelman M, Donoghue
529
HD, Pallen MJ. 2013. Metagenomic analysis of tuberculosis in a mummy. N Engl J Med
530
369(3):289–290.
531
49. Doughty EL, Sergeant MJ, Adetifa I, Antonio M, Pallen MJ. 2014. Culture-independent
532
detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum
533
samples using shotgun metagenomics on a benchtop sequencer. PeerJ 2:e585.
534
50. Kay GL, Sergeant MJ, Zhou Z, Chan JZ, Millard A, Quick J, Szikossy I, Pap I, Spigelman M,
535
Loman NJ, Achtman M, Donoghue HD, Pallen MJ. 2015. Eighteenth-century genomes show
536
that mixed infections were common at time of peak tuberculosis in Europe. Nat Commun 6:6717.
537
51. Andersson P, Klein M, Lilliebridge RA, Giffard PM. 2013. Sequences of multiple bacterial
538
genomes and a Chlamydia trachomatis genotype from direct sequencing of DNA derived from a
539
vaginal swab diagnostic specimen. Clin Microbiol Infect 19:E405–8.
540
52. Hasman H, Saputra D, Sicheritz-Ponten T, Lund O, Svendsen CA, Frimodt-Moller N,
541
Aarestrup FM. 2014. Rapid whole-genome sequencing for detection and characterization of
542
microorganisms directly from clinical samples. J Clin Microbiol 52:139–146.
543
53. Acharya S, Edwards S, Schmidt J. 2015. Research highlights: nanopore protein detection
544
and analysis. Lab Chip 15:3424–3427. 24
545
54. Ayub M, Stoddart D, Bayley H. 2015. Nucleobase Recognition by Truncated alpha
546
Hemolysin Pores. ACS Nano 9:7895–7903.
547
55. Daly KP. 2015. Circulating donor-derived cell-free DNA: a true biomarker for cardiac
548
allograft rejection? Ann Transl Med 3:47.
549
56. Ignatiadis M, Dawson SJ. 2014. Circulating tumor cells and circulating tumor DNA for
550
precision medicine: dream or reality? Ann Oncol 25:2304–2313.
551
57. Liao GJ, Gronowski AM, Zhao Z. 2013. Non-invasive prenatal testing using cell-free fetal
552
DNA in maternal circulation. Clin Chim Acta
553
58. Drake N. 2015. How to catch a cloud. Nature 522:115–116.
554
59.CLIMB consortium,. 2015. Cloud Infrastructure for Microbial Bioinformatics.
555
http://www.climb.ac.uk
556 557 558 559 560 561 562 25
563
Figure Legends
564
Figure 1. Next-generation sequencing for clinical infectious disease diagnostics
565
(A) Schematic depiction of diagnostic NGS workflows. Nucleic acids isolated from primary
566
diagnostic material are directly queried by either shotgun or amplicon sequencing. Amplicon
567
sequencing uses PCR amplification with primers that target conserved regions (e.g., the
568
bacterial 16S rRNA gene). Clustered amplicon sequences are then compared to appropriate
569
databases (e.g., Greengenes or SILVA) to identify clusters of so-called “operational taxonomic
570
units” (OTUs) on different taxonomic levels. Amplicon sequencing is sensitive, fast, and cost
571
effective, but due to the use of specific PCR primers is also strongly biased when compared to
572
random shotgun sequencing. Shotgun sequencing reads are usually first aligned to the human
573
(or an appropriate animal host) genome to eliminate reads of host origin (digital subtraction).
574
The remaining reads are then either directly mapped to sequence databases, or first assembled
575
into longer contiguous sequences (contigs) that are subsequently aligned to the database. De
576
novo assembly considerably increases computational overhead and analysis time, but at the
577
same time also significantly decreases classification bias by facilitating the identification of
578
pathogens which exhibit little or no sequence homology to known infectious agents. (B)
579
Whereas the term ‘metagenomics’ in its literal sense suggests the analysis of full genome
580
sequences, the throughput of current NGS technologies usually only allows partial recovery of
581
individual infectious agent genomes, especially in complex diagnostic samples (e.g., stool or
582
respiratory samples). Thus, diagnostic NGS requires bioinformatic approaches that sort
583
sequence fragments (or tags) into taxonomic bins to evaluate the composition of clinical
584
samples.
585 586
26
587
Figure 2. Progressive integration of genomics and metagenomics into public health
588
microbiology. As time progresses, we anticipate that the 19th Century techniques of
589
microscopy and culture will give way to sequence-based approaches, which will also lead to
590
closer integration with the rest of laboratory medicine.
591 592
27
Public Health Microbiology 1.0 Microscopy, Culture, Susceptibility Onerous and complex workflow for phenotypic characterisation of isolates
Public Health Microbiology 2.0 ATGACCATGATTACGGATT CACTGGCCGTCGTTTTACA ACGTCGTGACTGGGAAAAC
Whole-genome sequencing Identification, epidemiology, Susceptibilities of cultured isolates
ATGACCATGATTACGGATTC ACTGGCCGTCGTTTTACAAC GTCGTGACTGGGAAAAC
Diagnostic Metagenomics Culture-independent diagnosis of infection using bench-top sequencer
Public Health Microbiology 2.1
Public Health Microbiology 3.0
ATGACCATGATTACGGATT CACTGGCCGTCGTTTTACA ACGTCGTGACTGGGAAAAC AUGACCAUGAUUACGGAUU CACUGGCCGUCGUUUUACA ACGUCGUGACUGGGAAAAC MYYLKNTNFWMFGLFFFFY FFIMGAYFPFFPIWLHDIN HISKSDTGIIFAAISLFSL
Macromolecular monitoring Nanopore sequencing of DNA, RNA, proteins to monitor infection, cancer, health of microbiome, fetus, transplants
Lars Westblade, Ph.D is an Assistant Professor in Pathology and Laboratory Medicine at Weill Cornell Medical College, and the Associate Director for Microbiology at New York-Presbyterian Hospital (Weill Cornell Campus). Prior to joining Weill Cornell Medical College, he was an Assistant Professor at Emory University. He completed his training in Medical and Public Health Microbiology under the direction of Dr. Michael Dunne and Dr. Carey-Ann Burnham at the University of Washington in St. Louis, School of Medicine. Dr. Westblade is a Diplomate of the American Board of Medical Microbiology.