bs_bs_banner

Environmental Microbiology Reports (2014) 6(6), 767–775

doi:10.1111/1758-2229.12201

Intra-genomic variation in G + C content and its implications for DNA stable isotope probing Nicholas D. Youngblut and Daniel H. Buckley* Department of Crop and Soil Sciences, Cornell University, 306 Tower Road, Ithaca, NY 14853, USA. Summary Combining deoxyribonucleic acid (DNA-based) stable isotope probing (DNA-SIP) with high-throughput sequencing provides a powerful culture-independent means to link microbial metabolic function to genomic information and taxonomic identity. DNA buoyant density (BD) in isopycnic gradients is dependent on both isotope incorporation and G + C content. G + C content varies across a genome but is constrained at rrn operons; hence, the ability to resolve isotopically labelled DNA from unlabelled DNA in SIP may vary between small subunitribosomal nucleic acid (SSU rRNA) amplicon and shotgun-read sequencing applications. We tested this hypothesis by evaluating the G + C content of genomic DNA fragments that encompassed either an SSU rRNA template (‘amplicon-fragments’) or a shotgun read template (‘shotgun-fragments’). We find that, contrary to expectations, the BD distribution of amplicon-fragments is non-normal and can be highly skewed. Furthermore, the BD distribution of amplicon-fragments can differ substantially from that of shotgun-fragments from the same genome. Our findings demonstrate the impact of G + C content on the downstream applications of DNA-SIP, which will aid in proper experimental design and the development of statistical tests to accurately identify sequences derived from isotopically labelled DNA. Introduction To understand how changes in microbial diversity influence biogeochemical cycling, it is necessary to link the metabolic functioning of microorganisms to their taxonomic identities (Radajewski et al., 2000; Neufeld et al., 2007c). Most microbial taxa have yet to be cultivated (Staley and Konopka, 1985), hindering the study of

Received 14 May, 2014; accepted 8 August, 2014. *For correspondence. E-mail [email protected]; Tel. (607) 255 1716; Fax (607) 255-2644.

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd

microbial physiological diversity through traditional culture-dependent techniques. In addition, metabolic characteristics determined for isolates cultivated in controlled laboratory environments may not reflect their in situ activities. Stable isotope probing (SIP) of nucleic acids (i.e. DNA-SIP and RNA-SIP) is now a well-developed culture-independent technique for linking metabolic functioning to taxonomic identity in experimental systems that approximate in situ conditions (Radajewski et al., 2003; Dumont and Murrell, 2005; Friedrich, 2006; Neufeld et al., 2007a,b,c). The technique has seen a diverse range of applications, which include the assessment of 13 C-methanol, 13C-cellulose or 15N2 assimilation in soils (Radajewski et al., 2000; Buckley et al., 2007; El Zahar Haichar et al., 2007), 13C-methane in sediments (Dumont et al., 2011), 13C-napthalene in a shallow aquifer (Jeon et al., 2003) and 13C-15N-threonine in the mouse gut (Berry et al., 2013). The basic premise of DNA-SIP is to exploit differences in the buoyant density (BD) of isotopically labelled (‘heavy’) and unlabelled (‘light’) DNA. However, the interpretation of DNA-SIP experiments is convoluted by a number of factors that influence the BD of labelled DNA. For instance, incomplete labeling of DNA can result from isotope dilution, which is often circumvented using concentrations of labelled substrate far exceeding in situ levels (Radajewski et al., 2000). The choice of isotopic label (e.g. 15N, 13C, 18O) will also impact the potential BD shift of labelled DNA (Birnie and Rickwood, 1978; Schwartz, 2007). In addition, cross-feeding and trophic cascades can result in secondary labeling of organisms (Lueders et al., 2004; 2006). Hence, interpreting the outcome of SIP experiments requires a thorough understanding of factors that impact the BD of DNA. A DNA-SIP experiment is further complicated by the direct relationship between DNA G + C content and the BD of DNA (Rolfe and Meselson, 1959; Sueoka, 1959). For double-stranded DNA, this relationship can be described by the equation ρ = 0.098[G + C] + 1.66, where ρ is buoyant density in g ml−1 and ‘[G + C]’ is the mole fraction of G + C content (Birnie and Rickwood, 1978). Microbial genome G + C content ranges from low G + C endosymbionts such as Candidatus Zinderia insecticola (13.5% G + C) (McCutcheon and Moran, 2010) to high G + C taxa such as Cellulomonas fimi (74.7% G + C)

DNA-SIP and intra-genomic variation in G + C content (Christopherson et al., 2013). Consequently, unlabelled genomic DNA (gDNA) from a complex community could span a density range of up to 0.06 g ml−1. However, 100% assimilation of the commonly used isotopes 13C and 15N can only produce shifts in BD of 0.036 and 0.016 g ml−1 respectively (Birnie and Rickwood, 1978). G + C content can also vary considerably across a microbial genome, with significant contrasts of G + C content (e.g. 5–15%) often occurring across relatively short regions (i.e. 1–50 kb) (Bernaola-Galván et al., 2004; Navarre et al., 2006). Thus, individual gDNA fragments can vary substantially in G + C content relative to the whole genome G + C, with variance increasing as fragment size decreases. Ideally, the complication of G + C content heterogeneity can be accounted for by employing controls using exclusively unlabelled substrate. Deoxyribonucleic acid fragments from taxa that have assimilated the labelled substrate will display a detectible shift in BD relative to the control (Radajewski et al., 2000). However, accurately detecting this shift may be convoluted by experimental noise due to incomplete labelling of DNA, lack of sequencing depth or the application of methods that have low taxonomic resolution (as described in Chen and Murrell, 2010). Combining DNA-SIP with high-throughput sequencing makes it possible to map DNA comprehensively across both control and treatment gradients and identify all taxa that assimilate the isotopic label. Targeted sequencing of the 16S rRNA gene can be used to link thousands of taxa to their in situ activities in SIP experiments, whereas shotgun-metagenomic sequencing can provide access to their genome information (Chen and Murrell, 2010). Recently, Zemb and colleagues developed a mathematical model that utilizes high-throughput sequencing data to identify taxa that have assimilated isotopically labelled substrate and also to quantify the amount of label assimilated by each taxon (Zemb et al., 2012). A key assumption of this model is that the BD of small subunitribosomal ribonucleic acid (SSU rRNA) fragments is normally distributed for each taxon. This assumption may hold for the RNA-SIP data analysed by Zemb and colleagues, as G + C content is constrained within SSU-rRNA. However, DNA-SIP involves the gradient fractionation of gDNA fragments and not transcripts. DNASIP applications that analyse SSU rRNA gene amplicons are dependent on the G + C content of gDNA fragments that encompass the amplicon (i.e. ‘amplicon-fragments’) rather than the G + C content of the amplicon itself. The BD of an amplicon-fragment is a function of both the G + C content of the rrn operon, which is constrained, as well as the G + C content of flanking genomic regions, which are more variable in G + C content. In contrast, DNA-SIP applications utilizing a shotgun metagenomic

768

approach target random gDNA fragments (i.e. ‘shotgunfragments’) that vary in G + C content as a function of the G + C skew of the genome. We therefore hypothesize that the BD distribution of amplicon-fragments and shotgunfragments from the same genome can vary substantially. In this study, we use simulations to explicitly test this hypothesis. Our findings will help guide DNA-SIP experimental design and the development of robust mathematical models for quantifying the assimilation of isotopically labelled substrate. Experimental design We used a set of 1161 and 121 bacterial and archaeal genomes to determine how intra-genomic variation in G + C content affects the potential BD distribution of gDNA fragments across a cesium chloride (CsCl) gradient in a DNA-SIP experiment. These genomes were selected to represent every species with at least one genome in GenBank (NCBI; Benson et al., 2008). The National Center for Biotechnology Information Taxonomy database was used to determine species classifications. In the case when multiple genome entries were available for a species, a pseudo-random number generator was used to select one genome to represent the species. We compared the BD distribution of genome fragments harbouring either the template of a 16S rRNA gene amplicon (‘amplicon-fragments’) or a shotgun read (‘shotgunfragments’) (Fig. S1). Amplicons from 16S rRNA genes were created by in silico polymerase chain reaction with modified versions of the primers 515F and 927R (forward: 5′-GTGYCAGCMGCMGCGGTRA-3′; reverse: 5′-CCGYC AATTYMTTTRAGTTT-3′), which produced amplicons for 1061 and 101 bacterial and archaeal genomes respectively. This same subset of genomes was used for shotgun read simulations. Altogether, these genomes encompassed 27 bacterial and four archaeal phyla. Genome G + C content among bacterial genomes varied from 13% (Candidatus Zinderia insecticola CARI) to 74% (Anaeromyxobacter dehalogenans 2CP-C), whereas archaeal genomes varied from 27% (Methanosphaera stadtmanae DSM 3091) to 67% (Halobacterium sp. NRC1). GRINDER v0.5.3 (Angly et al., 2012) was used to simulate 1000 amplicon reads and 10 000 shotgun reads per genome, with amplicon-fragments and shotgunfragments randomly selected to span each sequencing read template (Fig. S1). Genome fragment size was selected from a distribution that approximated the gDNA fragment sizes that result from bead beating cell lysis methods, in which fragment sizes approximate a leftskewed normal distribution with a mean of ∼ 12–13 kb (Figs S2 and S3) (e.g. Kauffmann et al., 2004; Roh et al., 2006; Thakuria et al., 2008). We also used a fragment size distribution with a mean of approximately 6 kb to

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology Reports, 6, 767–775

769 N. D. Youngblut and D. H. Buckley simulate a more intense bead beating method (Figs S2 and S3) (Miller et al., 1999). The start location of each genome fragment was selected from a uniform distribution constrained by the fragment size with the requirement that each fragment encompass the sequencing read template (amplicon or shotgun). The time to reach equilibrium BD for DNA fragments in a CsCl gradient rises sharply at small fragment sizes (Fig. S4), limiting the methodological feasibility of reaching equilibrium for fragments of these sizes (Birnie and Rickwood, 1978). Therefore, we simulated a size selection step to obtain fragments ≥ 4 kb prior to gradient fractionation, which would require ∼ 66 h to reach equilibrium with a TLA1110 rotor, at 55 000 rpm, an average gradient buoyant density of 1.7 and no intercalating agents (Buckley et al., 2007). The GITHUB repository located at https://github.com/ nyoungb2/deltaGC contains (i) IPYTHON notebooks detailing the analysis pipeline, (ii) all PERL scripts developed for this study and (iii) a text file containing the accession numbers of all genomes used in this study. The R packages (R Core Team, 2014) and PERL modules used include: GGPLOT2 v0.9.3.1 (Wickham, 2009), GRIDEXTRA v0.9.1 (Auguie, 2012), DATA.TABLE v1.9.2 (Dowle et al., 2014), and BIOPERL v1.006901 (Stajich et al., 2002).

Results and discussion G + C conservation in rrn operons We first explored G + C content of genomic neighbourhoods surrounding rrn operons by performing sliding window calculations of G + C content for each genome. For this analysis, all 16S rRNA gene copies in each genome were identified with RNAMMER v1.2 (Lagesen et al. 2007), which has been shown to accurately identify 16S rRNA genes in complete microbial genomes (Kembel et al. 2012). Bacterial genomes sampled here possessed on average 3.7 16S rRNA gene copies, with a maximum of 15 copies. In contrast, archaeal genomes possessed an average of 1.6 16S rRNA gene copies, with a maximum of four. Across all bacterial genomes, we found rrn operons to be highly conserved in G + C content relative to the flanking genomic regions (Fig. 1A). The 16S rRNA gene had a median G + C content of 54.5% ± 2.9% (SD), whereas the downstream 23S rRNA gene had a median G + C content of 52.5% ± 3% (SD) (Fig. 1A). As previously observed (Young et al. 1979), the G + C content of the 16S-23S ITS region was more AT rich than the flanking genes, with a median G + C content of 44.1% ± 8.8% (SD). The region immediately upstream of the 16S rRNA gene was also AT rich, likely caused by the UP element in the upstream activation region (Arnvig et al., 2005). Both upstream and down-

stream regions flanking the rrn operon were highly variable in G + C content, with median values slightly below 50% and standard deviations of ∼ 12% (Fig. 1A). Similar results were obtained for the archaeal genomes (Fig. S5). Examination of G + C content around the rrn operons in individual bacterial genomes with low, medium or high G + C content (Clostridium ljungdahii DSM 13528, Escherichia coli APEC 078 and Streptomyces griseus NBRC 13350 respectively) revealed significant intragenomic variability of the upstream and downstream regions flanking different copies of the operon (Fig. 1B). As depicted by Lactobacillus gasseri ATCC 33323, a subset of rrn operons can occur ina tandem arrangement (e.g. operons 2 and 5 as indicated by arrows in Fig. 1B), which caused the G + C content of flanking DNA to vary substantially among rrn operons. For both high and low G + C genomes, tandem rrn operons could considerably alter the G + C content of each rrn operon’s genomic neighbourhood (Fig. 1B). The occurrence of tandem rrn operons varied between domains, with 11.9% of bacterial genomes and only 1.7% of archaeal genomes possessing rrn operons within 5 kb of each other. The conservation of G + C content for the rrn operon, along with substantial variability in G + C content of regions flanking the operon, indicated that factors such as fragment size and position in the genome could substantially influence the G + C content of fragments encompassing the template for a 16S rRNA gene amplicon. For instance, fragments extending from the 16S rRNA gene amplicon template to the upstream region of the rrn operon would likely show more G + C content variation than fragments falling entirely within the rrn operon. In contrast, shotgun reads do not have the constraint of encompassing the 16S rRNA gene template, and thus their G + C content and BD profile in a CsCl gradient may not reflect the G + C content and BD profile of amplicon-fragments. Fragment G + C content conservation We calculated the BD of simulated amplicon-fragments and shotgun-fragments for the taxa shown in Fig. 1B and plotted their BD distributions in a CsCl gradient (Fig. 2). Amplicon-fragment BD distributions differed substantially from shotgun-fragment BD distribution (Fig. 2). The amplicon-fragment BD distributions were biased by the G + C constraint imposed by the presence of the rrn operon (Fig. 1), and this bias was greater for high and low G + C genomes (Figs 1 and 2). This bias resulted in amplicon-fragments having G + C distributions (and corresponding BD distributions) that ranged more towards 50% G + C than those of corresponding shotgunfragments (Fig. 2).

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology Reports, 6, 767–775

DNA-SIP and intra-genomic variation in G + C content

770

Fig. 1. A. Sliding window analysis demonstrates conservation in G + C content of rrn operons across 1061 bacterial genomes. The window size was 500 bp with a jump size of 100 bp. Points indicate median G + C content of all genomes and vertical lines signify the standard deviation. B. Select examples of G + C content surrounding rrn operons for four genomes including examples from low and high G + C content genomes. Different line colours are used to identify different rrn operons. Red arrows highlight examples of tandem rrn operons. The strains used in the analysis are as follows: C. ljungdahii, Clostridium ljungdahii (DSM13528); E. coli, Escherichia coli (APEC078); S. griseus, Streptomyces griseus (NBRC13350); L. gasseri, Lactobacillus gasseri (ATCC33323).

To assess the fragment BD distributions of all bacterial and archaeal genomes, we grouped ampliconfragment and shotgun-fragment BD values into bins of 0.001 g ml−1 and determined counts per bin (Fig. 3, Figs S6 and S7). The amplicon-fragment BD distributions deviated from normality (Fig. 3), which reflected our initial observations of individual taxa (Fig. 2). For bacteria, the BD range spanned by amplicon-fragments from individual genomes differed considerably among taxa, with fragments spanning BD ranges varying from 0.002 g ml−1 up to 0.028 g ml−1. The range spanned was generally greatest for low and high G + C taxa, which can be explained by the difference in G + C content between the entire genome and the rrn operon (Fig. 3). In contrast, BD values for shotgun-fragments from each genome largely followed a normal distribution (median kurtosis = 5.99), although most genomes contained a minor component of shotgun-fragments with a secondary

distribution that skewed towards 50% G + C content (median absolute skewness = 0.98). These data suggest that a subset of shotgun-fragments derived from genomic islands, rrn operons, or other aberrant genomic regions that tended to favour a balanced G + C content. In contrast, the positional constraint of the amplicon-fragments manifested as BD distributions dependent on: (i) fragment overlap with the amplicon template, (ii) presence of tandem rrn operons and (iii) the G + C content of the genomic neighborhood to the rrn operon. Amplicon-fragments from archaeal genomes showed less intra-genomic variance in BD compared with bacterial genomes (Fig. 4 and Fig. S6). The lower rrn copy number per genome likely played a role. In contrast, the shotgunfragment BD distributions of archaeal genomes largely reflected those from bacterial genomes (Fig. S6). To simulate the effects of shearing during DNA extraction, we decreased the size distribution of fragments by

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology Reports, 6, 767–775

771 N. D. Youngblut and D. H. Buckley

Fig. 2. The BD distributions of gDNA fragments vary between amplicon-fragments (red) and shotgun-fragments (blue) and the variance changes in relation to overall genome G + C content (dashed vertical lines). As in Fig. 1B, the strains used in the analysis are as follows: C. ljungdahii, Clostridium ljungdahii (DSM13528); E. coli, Escherichia coli (APEC078); S. griseus, Streptomyces griseus (NBRC13350); L. gasseri, Lactobacillus gasseri (ATCC33323).

approximately half (mean of ∼ 6 kb; Figs S3 and S4). Decreasing the fragment size distribution significantly reduced the intra-genomic variance in amplicon-fragment BD for both bacteria and archaea (one-sided t-test; p-value < 0.004; Fig. 4). Conversely, the intra-genomic variance in shotgun-fragment BD significantly increased for both domains (one-sided t-test; p-value < 1e-9; Fig. 4). These data show that gDNA fragment size can significantly affect both amplicon-fragment and shotgunfragment BD variability. Thus, careful consideration should be given to the fragment size distribution of gDNA for DNA-SIP experiments that employ amplicon sequencing or shotgun metagenomics. We used our predicted amplicon-fragment and shotgun-fragment BD distributions to quantitatively assess the effectiveness of SIP for resolving isotopically labelled from unlabelled DNA. We focused on 13C and 15 N, given their popularity and applicability. The shift in buoyant density is a linear function of percent isotope incorporation, with 100% incorporation of 13C or 15N producing a shift of 0.036 and 0.016 g ml−1 respectively (Birnie and Rickwood, 1978; Buckley et al., 2007). We plotted the theoretical BD distribution of fragments with 100% incorporation of either substrate alongside the BD distribution of unlabelled DNA (on the right in Fig. 3, Figs S6 and S7). Of note, the distributions representing 100% incorporation of 15N are nearly equivalent

Fig. 3. Shotgun-fragments have more potential than amplicon-fragments for colocalization of isotopically labelled and unlabelled DNA in isopycnic gradients. The heat maps depict BD distributions predicted for amplicon-fragments and shotgun-fragments from 1061 bacterial genomes. The colour scale indicates the number of gDNA fragments predicted at each BD value (bin size of 0.001 g ml−1). Kernel density curves on the right show the BD distributions of all fragments under a scenario of no label incorporation (blue), 100% incorporation of 15N (red) and 100% incorporation of 13C (green). Overlap between the curves indicates potential for isotopically labelled and unlabelled DNA to coexist at the same BD value.

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology Reports, 6, 767–775

DNA-SIP and intra-genomic variation in G + C content

Fig. 4. Shotgun-fragments had greater BD range than amplicon-fragments in BD for each genome. Also, the BD range for amplicon-fragments decreased as fragment size decreased, while the converse was true for shotgun-fragments. The values plotted represent the BD range of all fragments from each bacterial and archaeal genome. The horizontal line represents the median value. Boxes indicate the interquartile range (IQR) (25th to 75th percentiles). Whiskers indicate 1.5 * IQR, and points represent values > 2.5 * IQR. The intra-genomic fragment BD range of amplicon-fragments decreased significantly with a decreased fragment size distribution for bacterial genomes (one-sided t-test; Cohen’s d = 0.46; p-value < 2.2e-16) and archaeal genomes (one-sided t-test; Cohen’s d = 0.38; p-value < 0.004). Alternatively, the intra-genomic fragment BD range of shotgun-fragments increased significantly with a decreased fragment size distribution for bacterial genomes (one-sided t-test; Cohen’s d = −0.71; p-value < 2.2e-16) and archaeal genomes (one-sided t-test; Cohen’s d = −0.88; p-value < 1e-9).

to a 50% incorporation of 13C, since the BD shift is linear with increasing substrate incorporation. We found unlabelled amplicon-fragments to be almost completely separated from 100% 13C-labelled fragments (0.8% mean overlap of fragments), regardless of fragment size distribution or domain (Table 1). Thus, only unlabelled amplicon-fragments from very high G + C organisms

772

should collocate with 100% 13C-labelled ampliconfragments from very low G + C organisms. Shotgunfragments showed a substantially higher fragment overlap (mean = 28.3%). The lower possible BD shift from 15N incorporation resulted in considerably greater fragment overlaps for both amplicon-fragments and shotgun-fragments, with an average 28.3% and 56.1% of fragments overlapping for amplicon-fragments and shotgun-fragments respectively. Although this issue of poor resolution with 15N-SIP has previously been observed (Cadisch et al., 2005), and improvements to the resolution have been developed (Buckley et al., 2007), our simulations provide an explicit description of the resolving power for 15N-SIP coupled to the downstream applications of 16S rRNA gene amplicon and shotgun metagenomic sequencing. Furthermore, our 15NSIP simulations also depict a scenario of ∼ 50% 13C assimilation, which can readily occur in a DNA-SIP experiment due to short incubation times, cross-feeding, trophic cascades or using low concentrations of isotopically labelled substrate. In summary, we have provided a theoretical depiction of how intra-genomic variation in G + C content governs the BD distribution of DNA fragments in isopycnic gradients. By extrapolating from known characteristics of DNA, and contrary to expectations, we calculate that 16S rRNA genes are not normally distributed across BD gradients (Fig. 2). Furthermore, the BD distribution of genome fragments containing 16S rRNA gene amplicons is different from the BD distribution of the rest of the genome (Fig. 2). This has important implications for SIP studies that seek to couple information from 16S rRNA gene amplicons to information obtained from functional genes or shotgun metagenomic DNA. Based on our findings, we provide the following recommendations and cautions. First, it is clear that 16S rRNA gene amplicon-fragments can have a skewed BD distribution that causes fragment relative abundance for unlabelled DNA to increase in heavier fractions (e.g. L. gasseri in Fig. 2). Hence, comparison of amplicon relative abundance between a ‘heavy’ fraction

Table 1. Overlap in BD of unlabelled, 100% 15N-labelled or 100% 13C-labelled DNA fragments. Fragment overlap was quantified as the intersection between histograms of BD values for labelled and unlabelled fragments (Fig. 3).

Domain

Mean fragment size (kb)

Fragment type

Fragment overlap (%; unlabelled and

Bacteria Bacteria Bacteria Bacteria Archaea Archaea Archaea Archaea

12.5 12.5 6 6 12.5 12.5 6 6

amplicon shotgun amplicon shotgun amplicon shotgun amplicon shotgun

33.9 56.8 23.5 57.3 29.5 54.3 26.2 55.9

15

N)

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology Reports, 6, 767–775

Fragment overlap (%; unlabelled and 2.1 19.9 0.3 20.4 0.4 10.2 0.3 11.3

13

C)

773 N. D. Youngblut and D. H. Buckley and a ‘light’ fraction cannot provide conclusive evidence for isotope incorporation in DNA-SIP experiments. Evidence for isotopic incorporation must be obtained by comparing identical BD fractions between isotopically enriched treatments and unlabelled controls. Furthermore, it is clear that isotopically labelled DNA can vary substantially in BD and therefore occupies a wide range of gradient fractions encompassing both ‘heavy’ and ‘light’ fractions (Fig. 3). As a result, we recommend that SIP experiments analyse a wide BD range for both isotopically enriched treatments and unlabelled controls. Second, care should be exercised when analysing the BD distribution of 16S rRNA genes in relation to functional genes or metagenomic sequences, since these DNA fragments can have very different density profiles in isopyncnic gradients (Figs 2 and 3, Figs S6 and S7). Third, 16S rRNA gene amplicon-fragments have a non-normal BD distribution in isopycnic gradients, and hence analytical models that assume a normal distribution (e.g. Zemb et al., 2012) are inappropriate for analysis of DNA-SIP data. Fourth, DNA fragment size impacts variance in genome fragment G + C content and therefore also impacts variance in the BD of genomic DNA. Larger DNA fragments will reduce BD variance of genomic DNA; in contrast, smaller DNA fragments reduce BD variance for 16S rRNA gene amplicon fragments. The benefits of smaller DNA fragments for analysis of 16S rRNA gene amplicons will be offset by the fact that reducing DNA fragment size increases time to reach equilibrium in isopycnic gradients (Fig. S2). Although rotor geometry and centrifuge run conditions alter the time required for DNA to reach equilibrium (Birnie and Rickwood, 1978), we expect that DNA fragments of 2–6 kbp are likely optimal for most analyses of 16S rRNA genes in SIP experiments.

Acknowledgements This material is based upon work supported by the Department of Energy Office of Science, Office of Biological & Environmental Research Genomic Science Program under Award Numbers DE-SC0004486 and DE-SC0010558. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied or assumes any legal liability or responsibility for the accuracy, completeness or usefulness of any information, apparatus, product or process disclosed or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process or service by trade name, trademark, manufacturer or otherwise does not necessarily constitute or imply its endorsement, recommendation or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

References Angly, F.E., Willner, D., Rohwer, F., Hugenholtz, P., and Tyson, G.W. (2012) Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Res 40: e94. Arnvig, K.B., Gopal, B., Papavinasasundaram, K.G., Cox, R.A., and Colston, M.J. (2005) The mechanism of upstream activation in the rrnB operon of Mycobacterium smegmaaRtis is different from the Escherichia coli paradigm. Microbiology 151: 467–473. Auguie, B. (2012) gridExtra: functions in Grid graphics. R package version 0.9.1. URL http://CRAN.R-project.org/ package=gridExtra Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. (2008) GenBank. Nucleic Acids Res 36: D25–D30. Bernaola-Galván, P., Oliver, J.L., Carpena, P., Clay, O., and Bernardi, G. (2004) Quantifying intrachromosomal GC heterogeneity in prokaryotic genomes. Gene 333: 121–133. Berry, D., Stecher, B., Schintlmeister, A., Reichert, J., Brugiroux, S., Wild, B., et al. (2013) Host-compound foraging by intestinal microbiota revealed by single-cell stable isotope probing. Proc Natl Acad Sci USA 110: 4720– 4725. Birnie, G.D., and Rickwood, D. (1978) Centrifugal Separations in Molecular and Cell Biology. Boston, MA, USA: Butterworths. Buckley, D.H., Huangyutitham, V., Hsu, S.-F., and Nelson, T.A. (2007) Stable isotope probing with 15N achieved by disentangling the effects of genome G + C content and isotope enrichment on DNA density. Appl Environ Microbiol 73: 3189–3195. Cadisch, G., Espana, M., Causey, R., Richter, M., Shaw, E., Morgan, J.A.W., et al. (2005) Technical considerations for the use of 15N-DNA stable-isotope probing for functional microbial activity in soils. Rapid Commun Mass Spectrom 19: 1424–1428. Chen, Y., and Murrell, J.C. (2010) When metagenomics meets stable-isotope probing: progress and perspectives. Trends Microbiol 18: 157–163. Christopherson, M.R., Suen, G., Bramhacharya, S., Jewell, K.A., Aylward, F.O., Mead, D., and Brumm, P.J. (2013) The genome sequences of Cellulomonas fimi and ‘Cellvibrio gilvus’ reveal the cellulolytic strategies of two facultative anaerobes, transfer of ‘Cellvibrio gilvus’ to the genus Cellulomonas, and proposal of Cellulomonas gilvus sp. nov. PLoS ONE 8: e53954. Dowle, M., Short, T., Lianoglou, S., Srinivasan, A., with contributions from Saporta, R., and Antonyan, E. (2014) data.table: Extension of data.frame. R package version 1.9.2. URL http://CRAN.R-project.org/package=data.table Dumont, M.G., and Murrell, J.C. (2005) Stable isotope probing – linking microbial identity to function. Nat Rev Microbiol 3: 499–504. Dumont, M.G., Pommerenke, B., Casper, P., and Conrad, R. (2011) DNA-, rRNA- and mRNA-based stable isotope probing of aerobic methanotrophs in lake sediment. Environ Microbiol 13: 1153–1167. El Zahar Haichar, F., Achouak, W., Christen, R., Heulin, T., Marol, C., Marais, M.-F., et al. (2007) Identification of cellulolytic bacteria in soil by stable isotope probing. Environ Microbiol 9: 625–634.

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology Reports, 6, 767–775

DNA-SIP and intra-genomic variation in G + C content Friedrich, M.W. (2006) Stable-isotope probing of DNA: insights into the function of uncultivated microorganisms from isotopically labeled metagenomes. Curr Opin Biotechnol 17: 59–66. Jeon, C.O., Park, W., Padmanabhan, P., DeRito, C., Snape, J.R., and Madsen, E.L. (2003) Discovery of a bacterium, with distinctive dioxygenase, that is responsible for in situ biodegradation in contaminated sediment. Proc Natl Acad Sci USA 100: 13591–13596. Kauffmann, I.M., Schmitt, J., and Schmid, R.D. (2004) DNA isolation from soil samples for cloning in different hosts. Appl Microbiol Biotechnol 64: 665–670. Kembel, S.W., Wu, M., Eisen, J.A., and Green, J.L. (2012) Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Comput Biol 8: e1002743. Lagesen, K., Hallin, P., Rødland, E.A., Stærfeldt, H.-H., Rognes, T., and Ussery, D.W. (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35: 3100–3108. Lueders, T., Wagner, B., Claus, P., and Friedrich, M.W. (2004) Stable isotope probing of rRNA and DNA reveals a dynamic methylotroph community and trophic interactions with fungi and protozoa in oxic rice field soil. Environ Microbiol 6: 60–72. Lueders, T., Kindler, R., Miltner, A., Friedrich, M.W., and Kaestner, M. (2006) Identification of bacterial micropredators distinctively active in a soil microbial food web. Appl Environ Microbiol 72: 5342–5348. McCutcheon, J.P., and Moran, N.A. (2010) Functional convergence in reduced genomes of bacterial symbionts spanning 200 my of evolution. Genome Biol Evol 2: 708– 718. Miller, D.N., Bryant, J.E., Madsen, E.L., and Ghiorse, W.C. (1999) Evaluation and optimization of DNA extraction and purification procedures for soil and sediment samples. Appl Environ Microbiol 65: 4715–4724. Navarre, W.W., Porwollik, S., Wang, Y., McClelland, M., Rosen, H., Libby, S.J., and Fang, F.C. (2006) Selective silencing of foreign DNA with low GC content by the H-NS protein in Salmonella. Science 313: 236–238. Neufeld, J.D., Dumont, M.G., Vohra, J., and Murrell, J.C. (2007a) Methodological considerations for the use of stable isotope probing in microbial ecology. Microb Ecol 53: 435–442. Neufeld, J.D., Vohra, J., Dumont, M.G., Lueders, T., Manefield, M., Friedrich, M.W., and Murrell, J.C. (2007b) DNA stable-isotope probing. Nat Protoc 2: 860–866. Neufeld, J.D., Wagner, M., and Murrell, J.C. (2007c) Who eats what, where and when? Isotope-labelling experiments are coming of age. ISME J 1: 103–110. R Core Team (2014) R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. URL http://www.R-project.org Radajewski, S., Ineson, P., Parekh, N.R., and Murrell, J.C. (2000) Stable-isotope probing as a tool in microbial ecology. Nature 403: 646–649. Radajewski, S., McDonald, I.R., and Murrell, J.C. (2003) Stable-isotope probing of nucleic acids: a window to the function of uncultured microorganisms. Curr Opin Biotechnol 14: 296–302.

774

Roh, C., Villatte, F., Kim, B.-G., and Schmid, R.D. (2006) Comparative study of methods for extraction and purification of environmental DNA from soil and sludge samples. Appl Biochem Biotechnol 134: 97–112. Rolfe, R., and Meselson, M. (1959) The relative homogeneity of microbial DNA. Proc Natl Acad Sci USA 45: 1039–1043. Schwartz, E. (2007) Characterization of growing microorganisms in soil by stable isotope probing with H218O. Appl Environ Microbiol 73: 2541–2546. Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., et al. (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12: 1611– 1618. Staley, J.T., and Konopka, A. (1985) Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annu Rev Microbiol 39: 321– 346. Sueoka, N. (1959) A statistical analysis of deoxyribonucleic acid distribution in density gradient centrifugation. Proc Natl Acad Sci USA 45: 1480–1490. Thakuria, D., Schmidt, O., Mac Siúrtáin, M., Egan, D., and Doohan, F.M. (2008) Importance of DNA quality in comparative soil microbial community structure analyses. Soil Biol Biochem 40: 1390–1403. Wickham, H. (2009) ggplot2: Elegant Graphics for Data Analysis, 1st edn. 2009. Corr. 3rd printing 2010 edition. New York, USA: Springer. Young, R.A., Macklis, R., and Steitz, J.A. (1979) Sequence of the 16 S-23 s spacer region in two ribosomal RNA operons of Escherichia coli. J Biol Chem 254: 3264–3271. Zemb, O., Lee, M., Gutierrez-Zamora, M.L., Hamelin, J., Coupland, K., Hazrin-Chong, N.H., et al. (2012) Improvement of RNA-SIP by pyrosequencing to identify putative 4-n-nonylphenol degraders in activated sludge. Water Res 46: 601–610.

Supporting information Additional Supporting Information may be found in the online version of this article at the publisher’s web-site: Fig. S1. A depiction of the methodology employed in this study. For each genome, 1000 16S rRNA amplicons (red) were predicted by in silico PCR (A), and 10 000 shotgun reads (green) were simulated with a random distribution across the genome (B). A template genomic DNA fragment (brackets) was then simulated for each amplicon (‘amplicon – fragment’) or shotgun read (‘shotgun – fragment’). The G + C content of each fragment was used to calculate the theoretical BD of the fragment in a CsCl gradient, assuming equilibrium has been reached. Fig. S2. Histograms describing the distributions of gDNA fragment length are used to generate amplicon – fragments and shotgun – fragments for bacterial genomes. Fig. S3. Histograms describing the distributions of gDNA fragment length are used to generate amplicon – fragments and shotgun – fragments for archaeal genomes. Fig. S4. Theoretical time for a DNA fragment to reach equilibrium in a CsCl gradient depending on fragment length and G + C content. Lines of different colour indicate results for DNA with different G + C content (as indicated by colour

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology Reports, 6, 767–775

775 N. D. Youngblut and D. H. Buckley scale). The inset plot shows the x-axis range of 0−5 kb from the main plot. Calculations assume a TLA110 rotor (Beckman Coulter, Miami, FL), 55 000 rpm, an average gradient particle buoyant density of 1.7, and no intercalating agents. An IPython notebook describing all calculations can be found at (https://github.com/nyoungb2/deltaGC). Fig. S5. Sliding window analysis demonstrates conservation in G + C content of rrn operons across 101 archaeal genomes. The window size was 500 bp with a jump size of 100 bp. Points indicate median G + C content of all genomes and vertical lines signify the standard deviation. Fig. S6. Shotgun fragments have more potential than amplicon fragments to have BD values that overlap between isotopically labelled and unlabelled DNA, and the degree of overlap varies for shotgun fragments in response to DNA

fragment size. Buoyant density distribution predicted for amplicon fragments and shotgun fragments from 101 archaeal genomes. Fragment size distributions had a mean of ∼ 12.5 kb or ∼ 6 kb (Fig. S3). The colour scale indicates the number of gDNA fragments predicted at each BD value (bin size of 0.001 g ml−1). Kernel density curves on the right show fragment BD distribution under a scenario of no label incorporation (blue), 100% incorporation of 15N (red) and 100% incorporation of 13C (green). Overlap between the curves indicates potential for isotopically labelled and unlabelled DNA to coexist at the same BD value. Fig. S7. The figure is identical to Fig. 3, except a fragment size distribution with a mean of ∼ 6 kb was used (see Fig. S2), while Fig. 3 shows results for a fragment size distribution having a mean of ∼ 12.5 kb.

© 2014 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology Reports, 6, 767–775

Copyright of Environmental Microbiology Reports is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Intra-genomic variation in G + C content and its implications for DNA stable isotope probing.

Combining deoxyribonucleic acid (DNA-based) stable isotope probing (DNA-SIP) with high-throughput sequencing provides a powerful culture-independent m...
555KB Sizes 0 Downloads 4 Views