Review

Omics: Fulfilling the Promise

Exploring bacterial epigenomics in the next-generation sequencing era: a new approach for an emerging frontier Poyin Chen1,2, Richard Jeannotte1,2,3, and Bart C. Weimer1,2 1

Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA, USA Universidad de Tarapaca´, Avenida General Vela´squez N81775, Arica, Chile 3 Facultad de Ciencias, Universidad de Tarapaca´, Arica, Chile 2

Epigenetics has an important role for the success of foodborne pathogen persistence in diverse host niches. Substantial challenges exist in determining DNA methylation to situation-specific phenotypic traits. DNA modification, mediated by restriction-modification systems, functions as an immune response against antagonistic external DNA, and bacteriophage-acquired methyltransferases (MTase) and orphan MTases – those lacking the cognate restriction endonuclease – facilitate evolution of new phenotypes via gene expression modulation via DNA and RNA modifications, including methylation and phosphorothioation. Recent establishment of largescale genome sequencing projects will result in a significant increase in genome availability that will lead to new demands for data analysis including new predictive bioinformatics approaches that can be verified with traditional scientific rigor. Sequencing technologies that detect modification coupled with mass spectrometry to discover new adducts is a powerful tactic to study bacterial epigenetics, which is poised to make novel and far-reaching discoveries that link biological significance and the bacterial epigenome. Increasing bacterial genomes and the epigenome The field of epigenetics is poised to change with the use of multi-omics approaches to examine the epigenome. The routine availability of thousands of bacterial genomes enables new approaches for finding novel genes associated with epigenomics. This review focuses on the integration of omics approaches as applied to DNA modification for highthroughput approaches. The reader will be referred to specific detailed reviews for the nuances of specific genes so that this review can highlight the integration of next-generation sequencing (NGS), metabolomics, and bioinformatics. The bacterial epigenome is a dynamic feature that changes during growth in response to external stimuli, and thereby facilitating adjustment to varying environmental conditions that controls gene exchange, transcription, Corresponding author: Weimer, B.C. ([email protected]). Keywords: multi-omics; epigenetics; DNA adduct; phosphorothioation; methylation. 0966-842X/ ß 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tim.2014.03.005

292

Trends in Microbiology, May 2014, Vol. 22, No. 5

and genome stability on a broad scale. The epigenome consists of modifications to the nucleotides with small molecules, such as methylation, or to atoms between nucleotides, such as phosphorothioation (PT) [1]. These modifications are also used to change protein–DNA binding, and thereby altering the biochemical landscape of DNA that directly impacts the phenotype. DNA modifications were initially identified during bacteriophage transmissibility studies [2]. This discovery heralded the study of the influence of DNA modifications on foreign DNA recognition and ushered in understanding that DNA modification in bacteria is important. Many techniques have been developed to quantify methylation, the number of modified nucleotides, and improve detection resolution (global, site-specific, and genome-wide). Bisulfite sequencing was the first method used for determining DNA methylation via sequencing. Mass spectrometry analysis of digested DNA remains the only approach enabling discovery new modifications [3–5]. With the advent of NGS technologies, bacterial genomes are being sequenced at an exponential pace with epigenome detection not far behind. Among NGS technologies, single molecule real time (SMRT) DNA sequencing technology allows simultaneous acquisition of both genomic and epigenomic information at the nucleotide level [6]. NGS has paved the way for numerous large-scale sequencing efforts, which will probably increase the discovery of methylation density, location, strandedness, and catalytic enzymes. With this in mind, a new approach is possible to examine the genome for genes used in methylation events that lack specific homology, but contain conserved domains to preserve the functional activity. The 100K Genome Project (Genomics England) aims to sequence the genomes of 100 000 patients with a focus on cancer, rare diseases, and infectious diseases (http:// www.genomicsengland.co.uk/). The United Kingdom Food Standards Agency will sequence 1000 Campylobacter isolates that will contribute to the characterization of the genomic diversity of Campylobacter in the UK (https://fsaesourcing.eurodyn.com/epps/cft/prepareViewCfTWS.do? resourceId=52167). The 100K Pathogen Genome Project in the USA is a collaborative effort among the FDA, the University of California Davis, and Agilent Technologies.

Review The US-based 100K Pathogen Genome Project will sequence 100 000 foodborne pathogens, 1000 of which will be done using SMRT sequencing, allowing for identification of novel modifications and the extent by which these genomes are modified. These genomes will be released on the 100K Food Pathogen Bioproject web page (http:// www.ngbi.nlm.nih.gov/bioproject/186441). The FDA also created the GenomeTrakr bioproject to routinely sequence foodborne pathogens across the USA, releasing the genomes as they are sequenced (http://www.ncbi.nlm.nih.gov/ bioproject/183844). Between the US-based projects, >4500 genomes have been deposited at the National Center for Biotechnology Information (NCBI) in 6 months. The current scale of sequencing will produce an additional 10 000 genomes in 2014. Collectively, these projects will radically increase the number of available genomes with the longterm goal of increasing public health. It is very likely that with this scale of genome availability new restriction modification (RM) systems and methyltransferases (MTases) will be discovered. The sheer scale of these data require new methods for epigenetic studies to mine the information for occurrence and localization, information that can be examined in isolates using traditional approaches to verify bioinformatic predictions. Importantly, these projects will probably uncover new orphan MTases, redefine MTase diversity, and discover new DNA modifications. Use of such large datasets also enables population-based comparisons of domain conservation in addition to gene sequence homology for discovery of new genes used in epigenome modification of single genes and the protein networks used to catalyze DNA modifications. The explosive production of epigenomes in the coming years foreshadows the need for new computational tools and platforms to conduct large-scale data analysis for genome and epigenetic annotation. Prediction of interaction networks is now possible on a population scale, which will provide strength of estimation based on gene and network distribution. With the likely discovery of new genes for DNA modification and new modifications, it will become increasingly important to use population genetics in bacteria to gain insights into gene distribution, genetic diversity, and discovery of new DNA modification hypotheses. The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project sequenced 3500 genomes at specific phylogenetic branches and increased the Tree of Life [7]. With 50 new genomes, 1060 new protein families were found. Jacobsen et al. [8] increased the number of gene families with 36 Salmonella genomes. Accordingly, it is very likely that the nearly 4500 genomes added by the US-based projects will contain many new gene families, which are yet to be examined for genes related to epigenetics. Creating new high-throughput bioinformatic strategies to predict these features is needed on a wide scale. Current informatics tools are limited in their ability to conduct multi-genome comparisons. Metabolic networks can be created individually using several resources [9,10]. Currently, databases for high-throughput analysis of physical and functional protein–protein interactions include STRING, which predicts protein–protein interaction networks for a single genome (Box 1). REBASE is a tool

Trends in Microbiology May 2014, Vol. 22, No. 5

Box 1. Databases and tools REBASE is an online database (http://rebase.neb.com/rebase/rebase.html) containing all known information regarding known and putative RM system-associated proteins. In addition to recognition sequences, cleavage sites, and source, this database also includes the following information: recognition sequences, cleavage sites, source, commercial availability, sequence data, crystal structure information, isoschizomers, and methylation sensitivity. All genomic sequences uploaded to GenBank are analyzed via data mining techniques followed by manual confirmation for RM systems with those found curated in REBASE. Whereas MTases can be predicted with relative ease and accuracy, REase genes are more highly divergent and as such are usually predicted by proximity to, or cooccurrence with, the MTase [12]. STRING is a public access tool (http://string.embl.de/) that draws together experimental, predicted, and transferred interactions, including interactions predicted through text mining to provide a putative protein–protein interaction map. The premise for STRING is that protein–protein interactions extend beyond physical interactions. Included in protein–protein networks are catalytic interactions in metabolic, transcriptional, and translational pathways as well as proteins that contribute to a larger unit without ever directly interacting. STRING users may input single or multiple queries and preset the target organism in which to build the network. Each node in the resulting network represents an interactor as identified by STRING prediction algorithms, chosen as a result of neighborhood, gene fusion, co-occurrence, co-expression, experimental evidence, databases, text mining, and homology. Supplementary information is provided for each node including protein sequence, structure, and domains, and its homologs. Networks may be expanded or recentered on the desired node and nodes may be categorized by biological relevance such as biological processes, molecular function, and cellular components [11].

specific to RM systems that is available and can be used for multi-genome comparisons in a table format (Box 1) [11,12]. Strategic use of these types of databases will allow scientists to assemble information from large-scale genome projects to create new network predictions with many new genomes. Advances in sequencing technology and data processing will help to elucidate DNA modifications and the possible network of proteins that catalyze the modifications. Tools for rapid assessment of biological importance of these genes for current modifications and those that are yet to be discovered remain unclear. Methylation and restriction modification systems The most comprehensively studied DNA modification is methylation. It is extensively reviewed in numerous reviews [2,13–15]; therefore, only a brief summary of pertinent information from those reviews will be included here to set the stage for the discussion of NGS and highthroughput methods to analyze genomes for these traits. DNA methylation is well established as a part of bacterial RM systems, of which 43 650 RM enzymes are cataloged in over 3600 bacterial isolates (http://REBASE.neb.com/ REBASE/REBASE.html). Only a fraction of the newly released genomes from the US-based 100K Pathogen Genome Project and the FDA GenomeTrakr are analyzed, leading to anticipation that a dramatic increase in these enzymes will occur shortly. The challenge will be to experimentally verify the predictions and understand relative biological importance. RM systems were first described in a series of experiments exploring bacteriophage (phage) infection [2]. Phage 293

Review DNA was later identified as the source of the host-induced variability where DNA methylation was defined as the key factor for restriction site recognition. The discovery of methionine, via S-adenosyl-L-methionine (SAM), as the donor for production of methylated DNA linked sulfur metabolism and DNA modification [16,17]. These experiments shaped the general consensus that RM systems evolved to function as a rudimentary bacterial immune system for protection against exogenous DNA. Subsequently, RM systems components were defined to broadly influence additional cell processes including cell cycle, virulence, and gene expression [18]. RM systems require a DNA recognition domain, an MTase, and an endonuclease (REase). Four RM types are described and cataloged in REBASE [2,12,19–22]. Whereas Types I, II, and III RM systems recognize methylated DNA, the Type IV RM system recognizes phosphorothioated DNA in addition to methylated DNA [23]. Exploration of PT modification is relatively new and understudied (see the section on ‘PT modification’). MTases selectively methylate at 4-methylcytosine (m4C), 5-methylcytosine (m5C), and 6-methyladenine (m6A) within specific sequence motifs along the chromosome located by the RM recognition domain [24]. If the corresponding recognition domain does not detect methylation, the RM system catalyzes DNA cleavage by the REase. To circumvent host restriction of phage DNA, phage often introduces additional MTases during infection that are often horizontally transferred via plasmids. Owing to the nature of RM enzyme–DNA dynamics, these MTases may be retained by the host following phage infection giving rise to orphan MTases lacking a reciprocal restriction enzyme [13,18]. Early experiments focused on the manipulation of RM systems to produce viable cells with R+M+ and RM+ phenotypes. Interestingly, the R+M phenotype is lethal, suggesting that in the absence of DNA methylation, the REase will digest self-DNA, resulting in cell death [25]. In studying post-segregation cell death by RM systems, Ichige and Kobayashi [26] observed a larger number of MTases relative to REases in steady state cells. Dysregulation of cellular REase expression in proportion to MTase levels also increases cell death due to REase-induced doublestranded breaks in the chromosome. Easy acquisition and retention of foreign MTases – termed orphan MTases – by host bacteria contributes to the increased diversity of MTases relative to REases with possible sources of MTases horizontal gene transfer [13]. Orphan methyltransferases Orphan MTases can arise from internal loss of function in the REase of an RM system. Externally, phage infection and acquisition of foreign DNA via multiple mechanisms, including horizontal gene transfer via mobile elements, can change the MTase content. Orphan MTases can effectively bypass the RM system to impart additional functions that methylation controls in unpredictable directions. This leads to individual isolate variation with unique characteristics. Many MTases contain multiple target recognition domains, allowing for methylation of various motifs that ultimately result in numerous variations of phenotypes 294

Trends in Microbiology May 2014, Vol. 22, No. 5

[13] (Table 1). Orphan MTases regulate gene expression to vary phenotypes within a population and increase survival probability [13]. Assembly and annotation via current homology-based methods provide identification of conserved MTases that include ccrM, dam, and dcm, which are conserved within the Rhizobiales taxon, which contains pathogens and environmental microbes alike. Interestingly, the protein interaction networks used by MTases are not conserved, yet these organisms all have DNA methylation (Figure 1). This predicted interaction network provides insight into the possible molecular configurations used for modification, but also allows postulation of a putative interaction network for these well-studied orphan MTases. Use of this approach infers biological meaning to the conservation of the critical MTases. For example, CcrM is conserved and correctly predicts interaction with the cell cycle protein CtrA, confirming experimental evidence – yet the predicted interacting partners vary among organisms. Although it is beyond the scope of this review to speculate as to the associations between CcrM and the listed interactors for all listed Rhizobiales, it would be interesting to consider the role of ccrM in gene expression of bacteria inhabiting drastically different niches (e.g., Caulobacter vs Brucella). As the STRING network of Brucella abortus ccrM predicts, this orphan MTase is associated with numerous cellular processes and pathways including, but not limited to, replication (dnaA), tryptophan biosynthesis (trpE), hydrogen peroxide resistance (ohr), and response to DNA damage and repair (lexA and radA) – all of which are casually known to occur with methylation regulation generally in bacteria, yet the estimated methylation-related network is not clearly supported with experimental evidence. As a pathogen well suited for survival in intracellular niches of host phagocytic cells, B. abortus is adapted for survival in the presence of oxidative stress and DNA damage [27]. Although ccrM from B. abortus has yet to be experimentally implicated in the regulation of these pathways, its participation is not inconceivable because methylation frequency increases on exposure to self-induced and environmental stresses [28–30]. SMRT sequencing of Caulobacter, another member of the Rhizobiales taxon, illustrates the diversity of genes under transcriptional regulation by CcrM as well as the dynamic quality of the methylome as a function of growth phase [31,32], as predicted in part using this approach, including fully methylated and hemimethylated states. Based on sequence homology (Figure 1), dam is highly conserved among members of the Enterobacteriaceae family. The consequences of dam deletion are well studied in Salmonella with a particular emphasis on the impacts on virulence. Although Ddam deletion mutants created in Salmonella enterica spp. enterica serovar Typhimurium do not exhibit growth related deficiencies, Dam-deficient mutants in other Salmonella serotypes exhibit a 10 000fold increase in LD50 in mice [33]. Transcriptional profiling of Dam-deficient Salmonella attributes attenuation to an induction of spvB, along with >35 other virulence genes coupled with a reduction in sipABC transcripts [34]. The estimated dam STRING network in Salmonella has numerous experimentally confirmed predictions, providing a

Review

Trends in Microbiology May 2014, Vol. 22, No. 5

Table 1. Phenotypic changes associated with epigenetic modification of DNA or RNA in bacteria Phenotype RM systems

Genome structure and stability

Stress response

Growth and metabolism

Virulence

MTase (recognition site) a Dam (GATC) Dcm (CCWGG) Dam, CcrM (GANTC), Dcm yhdJ

Dam, CcrM, DpnM (GATC) yhdJ Dam, CcrM, Dcm yhdJ

Dam, Dcm yhdJ

Function

Associated proteins b

Phage infectivity and host range RM recognition Phage head packing DNA replication Conjugation DNA structure Mismatch repair Histone-like proteins Nucleoid segregation Transposition

Mom, Cre HsdS P9 Ori, SeqA, DnaA, CtrA Lrp, H-NS RecA MutHLS, VSR, UvrD H-NS SeqA RNA polymerase, transposase, IS10, IS50, Tn10, IS3, Tn903, Tn5 MutHLS

Single nucleotide polymorphism accumulation SOS Redox pH Nutrient availability Osmolarity Culturability Persistence Motility Biofilm formation Aggregation SAM production (i.e., sulfur metabolism) Amino acid catabolism Lipid production Co-factor production Transporter expression Serotype expression [lipopolysaccharide (LPS) and O-antigen] Host association and adaptation Infectious dose Effector molecule production Antibiotic resistance Phase variation

Gph, RadA, LexA OxyR Pap, CAP, GalK Hlp Hlp StdA, Eae, Tir, Stx, Yops, Omp, Inv, Ail, LPS genes OxyR, Ag43, ModA13 Ag43 MtnM, Sped Nir, MetFK, TrpCDE, Aro, Fol

StdA, Eae, Tir, Stx, Yop’s, Omp, Inv, Ail, LPS genes

StdA, Eae, Tir, Stx, Yop’s, Omp, Inv, Ail, LPS genes SPI-1 Yop Mod, Cmr Lrp, OxyR, RpoS, Pap

a

The recognition site is listed at the first occurrence of the MTase.

b

Associated proteins are linked to the specific function in the function column by biological experimentation or found in the predicted interaction network for the associated MTase in Figure 1.

basis to extrapolate the possibility of Dam involvement in additional cellular functions including F-pilus ( finO and finP), oxidative stress response (gph), and its established role in mismatch repair (mutHLS and uvrD) that are also predicted. In contrast to ccrM, the predicted network using STRING for this orphan MTase and its interactors are well conserved in Enterobacteriaceae, particularly in pathogens including Escherichia coli, Klebsiella, Shigella, and Citrobacter. This conservation across pathogenic enteric bacteria raises questions about the role of Dam in virulence and survival in the host. Methylome determination using SMRT sequencing [34–36] can be tied to biological relevance with refined predictions of possible biological impact based on the interaction network (Figure 1). Phosphorothioation modification in bacteria PT is another recently discovered modification in which a sulfur atom replaces non-bridging oxygen in the phosphate backbone. PT-linked dinucleotides include d(APSA),

d(APSC), d(CPSA), d(CPSC), d(GPSA), (GPSG), d(GPST), d(TPSA), and d(TPSC) (Figure 2) [1,37,38]. A primary challenge for newly discovered DNA modifications is the routine detection using sequencing strategies. Ou et al. [39] found that the dnd locus was responsible for the modification and was widely, but not universally, spread around the bacterial phylogenetic tree using a genome smearing assay using electrophoresis. Although this is a simple method, it is often difficult to find this modification with an unreliable method. Even though this trait has been known for some time, it is not clear what the biological role may be for bacteria. Xie et al. [40] found that the stereospecific sulfur linkage acts as an antioxidant for DNA. When Salmonella was challenged with H2O2, an increase in PT modification was correlated with resistance to the oxidant. Howard et al. [41] associated PT modification in Mycobacterium abscessus in nearly half of the clinical isolates from the USA. They went on to find that dnd is on a mobile genomic island that they hypothesized to be 295

Review

Trends in Microbiology May 2014, Vol. 22, No. 5

ntrB

finP

ccrM nifR3

ntrC

pefl

speD

ohr

aroC

dnaB

trpD

damX

gnd

rpe

tktA

dnaA

recA

rpsG

tktB

rpsJ vsr

rpsC mrr

mutS

glyA

rpsK

cmr

trpS

uvrD

metF

tktC

tktN

hsdM rpsS

mutL hsdS

miaA folD

STM0864 dcm

gph

mutH

ctrA

BruAb2_0080

rpsE

rpiA

seqA

ccrM

trpC

dcm luxS

metK

dam

lexA

trpE

mtnN

rpsD

radA nnrR

speE

dam

finO

rpsL

metH STM3794

amiB

Enterobacteriaceae

Gammaproteobacteria

Enterobacteriaceae

Rhizobiales

Gammaproteobacteria

Proteobacteria

TRENDS in Microbiology

Figure 1. Predicted protein interaction networks for orphan methyltransferases (MTases). The specific organism used for the seed is unique to the specific MTase using the STRING v9.1 web service (http://string.embl.de) [11]. The title protein in each panel was used as the seed by searching in the protein mode and is indicated by a red ball in the plot. Experimentally verified interactions are indicated by filled stars ($), whereas interactions experimentally verified in a different organism are indicated by filled triangles (~). The occurrence plot indicates the phylogenetic distribution of the seed gene (red arrow) and the initial interaction partners. Increasing red intensity indicates increasing homology of the specific seed gene and the network members. Dog-eared boxes indicate that some members of the phylogenetic group lack the specific protein. Each occurrence plot is expanded in the most relevant phylogenetic clade, and the near neighbors, where the seed protein was discovered or has the most well-characterized role. Each network is a dynamic link and when clicked on it will lead the reader directly to the interaction network at STRING. Where possible the same organism was used: Dam, Salmonella enterica ssp. enterica serovar Typhimurium LT2 (http://string.embl.de/newstring_cgi/show_network_section.pl?all_ channels_on=1&additional_network_nodes=10&interactive=yes&network_flavor=confidence&targetmode=proteins&identifier=99287.STM3484); CcrM, Brucella abortus 9941(http://string.embl.de/newstring_cgi/show_network_section.pl?all_channels_on=1&additional_network_nodes=10&interactive=yes&network_flavor=evidence& targetmode=proteins&identifier=262698.BruAb1_0513); Dcm, S. Typhimurium LT2 (http://string.embl.de/newstring_cgi/show_network_section.pl?all_channels_on=1& additional_network_nodes=10&interactive=yes&network_flavor=evidence&targetmode=proteins&identifier=99287.STM1992).

related to human disease, but lacked the evidence to directly link it to virulence. Although these findings are exciting and have some basis to biological function, it was not possible to quickly identify the exact location of the PT modification. However, SMRT sequencing can detect this modification along with methylation in a single sequencing run to provide the specific location and density of the PT modification. Additionally, An et al. [42] used an informatics approach to determine that the dnd locus was lacking a critical enzyme (cysteine desulfurase), but four existed in the genomes. Using a bacterial two-hybrid system, they found that only IscS interacts with dnd to catalyze the PT modification. This is an example of the power of informatics coupled with biological validation to verify specific gene networks. However, it is impractical to conduct this type of 296

study for each isolate at the pace of genome production today. As noted above, it is possible to quickly derive comparative information using advanced sequencing and bioinformatics methods to bring about relevant biological hypotheses. To fully utilize this information requires the exact base and location in the genome to assess biological importance. Placement of these modifications in relative importance using large-scale network comparisons to inform the biological function is on the horizon. Increased availability, reduction in cost, and detection of DNA modification using SMRT sequencing opens the door to linking specific modifications at the base pair level with specific genes and ultimately biological importance. Until recently, discovery of these systems was slow due to the inadequacy

Review

Trends in Microbiology May 2014, Vol. 22, No. 5

Bacterial culture

DNA extracon

Whole genome sequencing

DNA hydrolysis or digeson

(SMRT, bisulfite labeling)

(Thermal, mild acidic, enzymac)

Enrichment (Size exclusion, liquid extracon, solid phase extracon, preparave LC)

Combined output to provide locaon, abundance, and type of epigenec modificaon in DNA or RNA

Quanficaon and idenficaon Chromatography/mass spectrometry (GC/MS, LC/MS, DART, MALDI) TRENDS in Microbiology

Figure 2. Scheme depicting a pipeline to discover nucleotide modifications using genome sequencing and advanced analytical methods.

of analytical methods required to find the exact location on the genome. However, increased use of large-scale genomic tools that detect these changes will enable rapid discovery of additional systems, especially in cooperation with analytical metabolomics, as new modifications will probably be found that are not immediately known using sequencing. To solve this, one must couple biochemistry and adduct detection with NGS. DNA modification detection Emerging sequencing technologies, such as SMRT sequencing, allow high-throughput analysis of known and new DNA modifications with strand specificity and nucleotide level resolution. These technologies also expand our capacity to study the biological significance of these modifications in bacteria. However, sequencing technologies require the genomic location and type of modification. Analytical chemistry is required to define the specific chemical group before it can be directly assigned by sequencing. Three main challenges exist to analyze DNA for modification: (i) determining the exact base after completely digesting the DNA; (ii) determining the modification group; and (iii) determining the location in the genome. Combining DNA sequencing with mass spectrometry (MS) methods will permit detection and mapping of modifications directly onto the genome location and base. Multiple approaches were developed to study DNA methylation [3,4]. These methods offer different levels of resolution (global, region- or site-specific, or genome-wide). They are also based on one of the three main pretreatment techniques such as bisulfite conversion, restriction enzyme digestion, or affinity enrichment. Bisulfite conversion is based on the deamination of cytosine to uracil, leaving methylated cytosine intact. Bisulfite sequencing is a common approach to determine DNA methylation status

and also relies on bioinformatics to assess the modification and location of the methyl group [3,4,43–46]. Global DNA modification detection using analytical chemistry tools After extraction of purified DNA, it is chemically digested or sheared before MS analysis (Figure 3) [1,5]. After DNA extraction and purification, thermal, chemical, or enzymatic hydrolysis, with nucleases and phosphatases, is used to CH3 HN

NH2 H3C N

N

N

N H

N H

N m6A

O

m5C O O

CH3

Base

HN O P

N

N H m4C

S

O-

O O

Base

O O

PT TRENDS in Microbiology

Figure 3. Chemical structures of DNA modifications found in bacteria. Abbreviations: m6A, 6-methyladenine; m5C, 5-methylcytosine; m4C, 4-methylcytosine; PT, phosphorothioate nucleotides.

297

Review digest DNA and release individual nucleosides or nucleobases. Analysis of nucleosides is more sensitive, accurate, and precise compared with direct analysis of DNA oligonucleotides [5]. Resulting DNA monomers could be enriched using liquid/liquid extraction, solid phase extraction (SPE), immunoaffinity, or preparative liquid chromatography (LC) [5]. Ultrafiltration is an alternate purification that can eliminate proteins and reagents used during hydrolysis [1]. Low DNA modification frequency (103 to 108 per base) requires a large amount of DNA (micrograms) for adduct signal detection [1,5]. Recently, a novel TiO2-based SPE strategy was developed to efficiently eliminate ribonucleotide contamination after digestion [47]. This strategy is based on a specific mechanism of interaction between TiO2 and cis-diol containing ribonucleosides, and it eliminates purification steps to remove RNA. A multidimensional platform for the purification of noncoding RNA was recently developed for system level analysis of modified ribonucleosides [48]. As hundreds of possible DNA epigenetic modifications and adducts exist due to damage caused by stress, the selection of the optimal analytical equipment is a critical consideration. Methods that require little to no sample preparation or LC separation could be used to screen for new modifications if enough DNA can be obtained. Nucleosides can be directly analyzed using direct analysis in real time (DART) coupled with time-of-flight (TOF) MS [49]. Inductively coupled plasma mass spectrometry (ICP-MS) is also used to quantify at trace levels metals, metalloids, and several non-metals that could modify DNA [50]. Codetection of the atoms involved in the modification with phosphorus permits estimation of the rate of modification in a bacterial genome as a proportion of the total genome, because this provides an orthogonal quantitative measure of the phosphate bonds in the genome. Often, gas chromatography (GC) or LC and MS are used with LC being widely adopted. GC requires derivatization of DNA monomers to increase their volatility. Tretyakova et al. [5] and Balbo et al. [51] have extensively reviewed analytical instrumentation and sample preparation methods to determine DNA modification. Essentially, MS, such as single quadrupole, in selected ion monitoring mode, and triple quadrupole instruments can be used to sensitively quantify known modified bases. A sensitive method using LC/MS triple quadrupole was developed to quantify 16 phosphorothioate modifications in bacteria [1]. Ion trap MS can also be used to quantify modifications, but this approach is suited to discover structural information about new DNA modifications. To detect unknown adducts, it is essential to determine the molecular formula from the acquisition of accurate mass data on the novel DNA adducts and to acquire MS/ MS fragmentation information for structural elucidation. The new generation of (linear trap or quadrupole) orbital trap and quadrupole-TOF instruments are capable of these analyses. Vibrational spectroscopy could also be used to characterize DNA modifications [52,53]. MicroRaman spectroscopy detects changes in global DNA methylation levels in humans with polymorphism in methylenetetrahydrofolate reductase [52]. This technique directly analyzes undigested DNA extract in a 298

Trends in Microbiology May 2014, Vol. 22, No. 5

non-destructive manner, but it has yet to be applied to bacterial samples. Emerging technologies to detect DNA modifications by sequencing These new technologies enable the simultaneous acquisition of both genomic and epigenomic information at nucleotide level (reviewed in [6]). Among them, SMRT DNA sequencing technology from Pacific Biosciences (Menlo Park, CA, USA) produced numerous methylomes simultaneously with DNA genomic sequences in a variety of bacteria [35,54–57]. In SMRT sequencing [6,58], nanostructures (called zero-mode waveguides) are used to isolate, for optical analysis, single molecules of a phi29derived DNA polymerase. Each polymerase is bound to a single and circular molecule of template DNA. DNA synthesis from template is performed using nucleotides carrying a distinct fluorophore per base. The incorporation of a nucleotide generates a fluorescent ‘pulse’ and thousands of reactions from the activity of the DNA polymerase are monitored in parallel, thus generating the primary sequence data. Monitoring of the DNA synthesis kinetics yields two important parameters that could be used to identify the possible presence of modifications: pulse width (PW, length of time a nucleotide is retained to active site of the polymerase) and interpulse duration (IPD, interval of time between nucleotide-bound states). Presence of modifications in template DNA will modulate these parameters when compared with unmodified control template [6,58,59]. Flusberg et al. [59] demonstrated that modified nucleotides (m6A, m5C, and 5-hydroxymethylcytosine) will yield characteristic variations in PW and IPD. These kinetic signatures (KVs) were used to identify DNA modifications in metal-reducing Shewanella oneidensis MR-1 [54], Helicobacter pylori [35], Mycoplasma genitalium, Mycoplasma pneumoniae [56], Geobacter metallireducens GS15 [57], Chromohalobacter salexigens [57], Vibrio breoganii [57], Bacillus cereus [57], Campylobacter jejuni [57], and a pathogenic strain of E. coli [55]. SMRT technology was also used to characterize the specificity of several bacterial DNA methyltransferases [60]. Clark et al. [61] also characterized the KVs that damaged bases such as 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5hydroxymethyluracil, or thymine dimers. These base modifications were shown to be detectable with single modification resolution and DNA strand specificity. Circular DNA templates allow analysis of the same modified base multiple times, thus facilitating robust statistical analysis of the sequence and associated kinetics data. SMRT sequencing technology coupled with advanced analytical chemistry tools will enable discovery of new modifications and enzymatic activities. Combining these technologies represents a powerful approach to unravel the complexities of the epigenetics systems in bacteria and the biological roles of these modifications. As sequencing costs continue to decline, the increase in bacterial genome availability and the rapid expansion of analytical chemistry tools will enable new discoveries. Coupling DNA sequencing to analytical chemistry will provide a powerful option for discovery and confirmation of new epigenetic events.

Review Concluding remarks The rapid increase in bacterial genomes enables a new era of discovery and opportunities to redefine epigenetics and its role in the bacterial life cycle. Use of DNA sequencing strategies that produce specific details about the epigenome, such as the exact base modification, the specific site in the genome, and the density of modification, provide information that can be readily linked to function. New DNA adducts from biological activity will probably be discovered at an increasing pace. This requires NGS to be coupled with the study of metabolism using high-throughput metabolomics so as to inform researchers about the molecular changes directly causal to changes in bacterial behavior. Creating bioinformatic tools that integrate the genome sequence, the epigenome, and metabolism is an unmet need that is critical for the future of functional genomics. With the capacity of NGS to provide genomes and with strategic use of additional omics resources, the field of bacterial epigenetics is poised to make breakthroughs in understanding the molecular mechanisms and dynamic changes involved in the bacterial life cycle that include variable events that are not coded in the genome, which include mutation rate, growth rate, metabolic shifts, DNA repair, virulence, environmental persistence, host adaptation, and zoonotic disease transmission. The future challenges are exciting and filled with many opportunities to make new discoveries that define epigenomics in the role of the bacterial life cycle in the era of multi-omics integration. Acknowledgments The US-based 100K Pathogen Genome Project thanks the FDA (Washington, DC, USA; especially Steve Musser, Eric Brown, and Marc Allard), Agilent Technologies (Santa Clara, CA, USA; especially Paul Zavitsanos and the entire food team) in co-founding this group and project well before it was popular or believed that this level of sequencing could be accomplished. Special thanks also goes to the 100K Project Steering Committee for their support and guidance and efforts on behalf of the group. We also thank the entire consortium for funding, strains, and production in genome sequencing. Special thanks goes to the UC Davis 100K Genome Project Team for their tireless effort to get the job done.

References 1 Wang, L. et al. (2011) DNA phosphorothioation is widespread and quantized in bacterial genomes. Proc. Natl. Acad. Sci. U.S.A. 108, 2963–2968 2 Loenen, W.A. et al. (2014) Highlights of the DNA cutters: a short history of the restriction enzymes. Nucleic Acids Res. 42, 3–19 3 Fraga, M.F. and Esteller, M. (2002) DNA methylation: a profile of methods and applications. Biotechniques 33, 632 634, 636–649 4 Mansego, M.L. et al. (2013) Techniques of DNA methylation analysis with nutritional applications. J. Nutrigenet. Nutrigenomics 6, 83–96 5 Tretyakova, N. et al. (2013) Mass spectrometry of structurally modified DNA. Chem. Rev. 113, 2395–2436 6 Korlach, J. and Turner, S.W. (2012) Going beyond five bases in DNA sequencing. Curr. Opin. Struct. Biol. 22, 251–261 7 Wu, D. et al. (2009) A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature 462, 1056–1060 8 Jacobsen, A. et al. (2011) The Salmonella enterica pan-genome. Microb. Ecol. 62, 487–504 9 Kanehisa, M. (2002) The KEGG database. Novartis Found. Symp. 247, 91–101 discussion 101–103, 119–128, 244–252 10 Karp, P.D. et al. (2005) Expansion of the BioCyc collection of pathway/ genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 11 Franceschini, A. et al. (2013) STRING v9.1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808–D815

Trends in Microbiology May 2014, Vol. 22, No. 5

12 Roberts, R.J. et al. (2010) REBASE – a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 38, D234–D236 13 Murphy, J. et al. (2013) Bacteriophage orphan DNA methyltransferases: insights from their bacterial origin, function, and occurrence. Appl. Environ. Microbiol. 79, 7547–7555 14 Vasu, K. and Nagaraja, V. (2013) Diverse functions of restrictionmodification systems in addition to cellular defense. Microbiol. Mol. Biol. Rev. 77, 53–72 15 Makarova, K.S. et al. (2013) Comparative genomics of defense systems in archaea and bacteria. Nucleic Acids Res. 41, 4360–4377 16 Lark, C. and Arber, W. (1970) Host specificity of DNA produced by Escherichia coli. 13. Breakdown of cellular DNA upon growth in ethionine of strains with r plus-15, r plus-P1 or r plus-N3 restriction phenotypes. J. Mol. Biol. 52, 337–348 17 Peakman, L.J. et al. (2003) S-Adenosyl methionine prevents promiscuous DNA cleavage by the EcoP1I type III restriction enzyme. J. Mol. Biol. 333, 321–335 18 Labrie, S.J. et al. (2010) Bacteriophage resistance mechanisms. Nat. Rev. Microbiol. 8, 317–327 19 Rao, D.N. et al. (2014) Type III restriction-modification enzymes: a historical perspective. Nucleic Acids Res. 42, 45–55 20 Youell, J. and Firman, K. (2012) Mechanistic insight into type I restriction endonucleases. Front. Biosci. (Landmark Ed.) 17, 2122–2139 21 Pingoud, A. et al. (2005) Type II restriction endonucleases: structure and mechanism. Cell. Mol. Life Sci. 62, 685–707 22 Roberts, R.J. et al. (2003) A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. 31, 1805–1812 23 Liu, G. et al. (2010) Cleavage of phosphorothioated DNA and methylated DNA by the type IV restriction endonuclease ScoMcrA. PLoS Genet. 6, e1001253 24 Wilson, G.G. (1991) Organization of restriction-modification systems. Nucleic Acids Res. 19, 2539–2566 25 Arber, W. (1965) Host-controlled modification of bacteriophage. Annu. Rev. Microbiol. 19, 365–378 26 Ichige, A. and Kobayashi, I. (2005) Stability of EcoRI restrictionmodification enzymes in vivo differentiates the EcoRI restrictionmodification system from other postsegregational cell killing systems. J. Bacteriol. 187, 6612–6621 27 Tian, M. et al. (2013) Microarray-based identification of differentially expressed genes in intracellular Brucella abortus within RAW264.7 cells. PLoS ONE 8, e67014 28 Kahramanoglou, C. et al. (2012) Genomics of DNA cytosine methylation in Escherichia coli reveals its role in stationary phase transcription. Nat. Commun. 3, 886 29 Mruk, I. and Kobayashi, I. (2014) To be or not to be: regulation of restriction-modification systems and other toxin–antitoxin systems. Nucleic Acids Res. 42, 70–86 30 Aloui, A. et al. (2011) The effect of methylation on some biological parameters in Salmonella enterica serovar Typhimurium. Pathol. Biol. (Paris) 59, 192–198 31 Kozdon, J.B. et al. (2013) Global methylation state at base-pair resolution of the Caulobacter genome throughout the cell cycle. Proc. Natl. Acad. Sci. U.S.A. 110, E4658–E4667 32 Gonzalez, D. et al. (2014) The functions of DNA methylation by CcrM in Caulobacter crescentus: a global approach. Nucleic Acids Res. http:// dx.doi.org/10.1093/nar/gkt1352 33 Low, D.A. et al. (2001) Roles of DNA adenine methylation in regulating bacterial gene expression and virulence. Infect. Immun. 69, 7197–7204 34 Garcia-Del Portillo, F. et al. (1999) DNA adenine methylase mutants of Salmonella typhimurium show defects in protein secretion, cell invasion, and M cell cytotoxicity. Proc. Natl. Acad. Sci. U.S.A. 96, 11578–11583 35 Krebes, J. et al. (2014) The complex methylome of the human gastric pathogen Helicobacter pylori. Nucleic Acids Res. 42, 2415–2432 36 Kumar, R. and Rao, D.N. (2013) Role of DNA methyltransferases in epigenetic regulation in bacteria. In Epigenetics: Development and Disease (Kundu, T.K., ed.), pp. 81–102, Springer 37 Wang, L. et al. (2007) Phosphorothioation of DNA in bacteria by dnd genes. Nat. Chem. Biol. 3, 709–710 38 Chen, S. et al. (2010) Twenty years hunting for sulfur in DNA. Protein Cell 1, 14–21 299

Review 39 Ou, H.Y. et al. (2009) dndDB: a database focused on phosphorothioation of the DNA backbone. PLoS ONE 4, e5132 40 Xie, X. et al. (2012) Phosphorothioate DNA as an antioxidant in bacteria. Nucleic Acids Res. 40, 9115–9124 41 Howard, S.T. et al. (2013) Insertion site and distribution of a genomic island conferring DNA phosphorothioation in the Mycobacterium abscessus complex. Microbiology 159, 2323–2332 42 An, X. et al. (2012) A novel target of IscS in Escherichia coli: participating in DNA phosphorothioation. PLoS ONE 7, e51265 43 Benoukraf, T. et al. (2013) GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data. Nucleic Acids Res. 41, e55 44 Booth, M.J. et al. (2013) Oxidative bisulfite sequencing of 5methylcytosine and 5-hydroxymethylcytosine. Nat. Protoc. 8, 1841– 1851 45 Hebestreit, K. et al. (2013) Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 29, 1647–1653 46 Suzuki, M. and Greally, J.M. (2013) Genome-wide DNA methylation analysis using massively parallel sequencing technologies. Semin. Hematol. 50, 70–77 47 Wang, S.T. et al. (2013) TiO2-based solid phase extraction strategy for highly effective elimination of normal ribonucleosides before detection of 20 -deoxynucleosides/low-abundance 20 -O-modified ribonucleosides. Anal. Chem. 85, 10512–10518 48 Chionh, Y.H. et al. (2013) A multidimensional platform for the purification of non-coding RNA species. Nucleic Acids Res. 41, e168 49 Curtis, M. et al. (2010) Direct analysis in real time (DART) mass spectrometry of nucleotides and nucleosides: elucidation of a novel fragment [C5H5O]+ and its in-source adducts. J. Am. Soc. Mass Spectrom. 21, 1371–1381

300

Trends in Microbiology May 2014, Vol. 22, No. 5

50 Wrobel, K. et al. (2009) Epigenetics: an important challenge for ICP-MS in metallomics studies. Anal. Bioanal. Chem. 393, 481–486 51 Balbo, S. et al. (2014) DNA adductomics. Chem. Res. Toxicol. 27, 356–366 52 Chapsky, A. et al. (2012) Detection of polymorphism in the methlyenetetrahydrofolate reductase gene by Raman spectroscopy. J. Raman Spectrosc. 43, 1083–1088 53 Kelly, J.G. et al. (2011) Characterisation of DNA methylation status using spectroscopy (mid-IR versus Raman) with multivariate analysis. J. Biophotonics 4, 345–354 54 Bendall, M.L. et al. (2013) Exploring the roles of DNA methylation in the metal-reducing bacterium Shewanella oneidensis MR-1. J. Bacteriol. 195, 4966–4974 55 Fang, G. et al. (2012) Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nat. Biotechnol. 30, 1232–1239 56 Lluch-Senar, M. et al. (2013) Comprehensive methylome characterization of Mycoplasma genitalium and Mycoplasma pneumoniae at single-base resolution. PLoS Genet. 9, e1003191 57 Murray, I.A. et al. (2012) The methylomes of six bacteria. Nucleic Acids Res. 40, 11450–11462 58 Davis, B.M. et al. (2013) Entering the era of bacterial epigenomics with single molecule real time DNA sequencing. Curr. Opin. Microbiol. 16, 192–198 59 Flusberg, B.A. et al. (2010) Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 60 Clark, T.A. et al. (2012) Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing. Nucleic Acids Res. 40, e29 61 Clark, T.A. et al. (2011) Direct detection and sequencing of damaged DNA bases. Genome Integr. 2, 10

Exploring bacterial epigenomics in the next-generation sequencing era: a new approach for an emerging frontier.

Epigenetics has an important role for the success of foodborne pathogen persistence in diverse host niches. Substantial challenges exist in determinin...
1MB Sizes 0 Downloads 3 Views