Epigenetics

ISSN: 1559-2294 (Print) 1559-2308 (Online) Journal homepage: http://www.tandfonline.com/loi/kepi20

Mouse olfactory bulb methylome and hydroxymethylome maps reveal noncanonical active turnover of DNA methylation Qin Ma, Huan Lu, Ziying Xu, Yuanyuan Zhou & Weimin Ci To cite this article: Qin Ma, Huan Lu, Ziying Xu, Yuanyuan Zhou & Weimin Ci (2017): Mouse olfactory bulb methylome and hydroxymethylome maps reveal noncanonical active turnover of DNA methylation, Epigenetics, DOI: 10.1080/15592294.2017.1356958 To link to this article: http://dx.doi.org/10.1080/15592294.2017.1356958

View supplementary material

Published online: 25 Sep 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=kepi20 Download by: [Australian Catholic University]

Date: 25 September 2017, At: 19:50

EPIGENETICS 2017, VOL. 0, NO. 0, 1–7 https://doi.org/10.1080/15592294.2017.1356958

RESEARCH PAPER

Mouse olfactory bulb methylome and hydroxymethylome maps reveal noncanonical active turnover of DNA methylation Qin Ma, Huan Lu, Ziying Xu, Yuanyuan Zhou, and Weimin Ci

Downloaded by [Australian Catholic University] at 19:50 25 September 2017

Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, University of Chinese Academy of Sciences, Beijing, China

ABSTRACT

ARTICLE HISTORY

Hydroxylation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC) by TET enzymes presents a particular regulatory mechanism in the mammalian brain. However, although methylation and hydroxymethylation of cytosines in non-CpG contexts have been reported, these mechanisms remain poorly understood. Here, we applied TAB-seq and oxBS-seq selectively to detect 5hmC and 5mC at base resolution in olfactory bulb derived from female mice. We found that active turnover of 5mC to 5hmC occurred in both CpG and non-CpG contexts. Strikingly, we identified a different sequence preference for 5mC and 5hmC in a CH context, in which H D A, C, or T, TNCA/TC for 5mC and NNCA/T/CN for 5hmC. More importantly, we found that genes showing 5mC to 5hmC turnover showed only limited overlap in CpG and CH contexts, and that olfactory receptor genes were marked with higher turnover of 5mC to 5hmC in non-CpG context. Collectively, we identified an unexpected sequence preference for non-CpG hydroxymethylation and distinct target genes regulated by the turnover of 5mC to 5hmC in CpG and CH contexts.

Received 12 May 2017 Revised 7 July 2017 Accepted 13 July 2017

Introduction Gene regulation involving DNA and chromatin modifications enables the fine-tuning of the nervous system. This is illustrated by 2 examples: the control of gene expression for clustered Pcdh genes and olfactory receptor (OR) gene choice. A common feature of the 2 gene families is that their regulation generates vast diversity among neurons, which allows them to adopt specialized functional identities.1,2 However, the underlying mechanism remains poorly understood; exploring the epigenetic pattern is fundamental to understand the fine-tuning of nervous system gene expression, including these 2 clusters of genes. As a heritable epigenetic mark of cellular memory, DNA methylation was believed to maintain a cell unique gene expression pattern. Recently, it has been shown that hydroxylation of 5mC to 5hmC presents a particular regulatory mechanism during mammalian brain development.3-5 Thus, research interest in methylomes and/or hydroxymethylomes in the context of CpG has grown rapidly during the past several years.6-10 Notably, whole-genome bisulfite sequencing (WGBS) has been used to show that non-CpG methylation is prevalent in pluripotent stem cells, neuron cells, and oocytes.10,11 However, the extent and functional role of hydroxylation of 5mC to 5hmC in the non-CpG context remains unknown. Recently, it has been shown that 5hmC occurs specifically in sites that are de novo methylated by DNMT3a and DNMT3b.12 Additionally, several studies have shown that asymmetric methylation in a CH context (where H represents A, T or C) was also maintained by DNMT3a and DNMT3b.13,14 Thus, we CONTACT Weimin Ci [email protected] Supplemental data for this article can be accessed on the publisher’s website. © 2017 Taylor & Francis Group, LLC

hypothesize that the distribution of both 5mC and 5hmC in the CH context should target DNMTs targeted regions. Therefore, elucidating the single base resolution profiles of both 5mC and 5hmC is fundamental to understanding the roles of 5mC and 5hmC in fine-tuning brain gene regulation and the potential roles of active turnover of 5mC to 5hmC in CpG or non-CpG contexts. Here, we applied oxBS-seq and TAB-seq along with mRNA-seq to profile 5mC and 5hmC at the single nucleotide level in the mouse olfactory bulb. We found that active turnover of 5mC to 5hmC occurred in both CpG and non-CpG contexts. Strikingly, we identified an unexpected sequence preference for non-CpG hydroxymethylation and found distinct target genes, such as ORs, regulated by the turnover of 5mC to 5hmC in the CH context.

Results 5hmC in the CpG context is a stable mark enriched in neuronal development-related genes We performed TAB-seq and oxBS-seq on olfactory bulb tissue derived from female mice. We sequenced to a depth of 26- and 30-fold for TAB-seq and oxBS-seq, respectively (Table S1). Thus, we created genome-wide, single-nucleotide-resolution 5mC and 5hmC maps of the female mouse olfactory bulb, along with allele-specific mRNA components of the transcriptome (Table S2). Consistent with previous findings, we identified pronounced 5mC and 5hmC modifications in both CpG and non-CpG contexts (Fig. 1a, b; Fig. S1a).

Downloaded by [Australian Catholic University] at 19:50 25 September 2017

2

Q. MA ET AL.

Figure 1. Active turnover of 5mC to 5hmC in CpG sites. (a) Percentages of 5hmC or 5mC sites for the olfactory bulb tissue in the contexts of CpG, CA, CT, and CC. (b) Average 5hmC and 5mC levels in the whole genome and various genomic elements as determined by TAB-seq and oxBS-seq, respectively. (c) Relative 5hmC/5mC levels within gene bodies (y-axis) as a function of gene expression (x-axis), with transcript abundance increasing from left to right. (d) Venn plot of the co-occurrence of 5hmC and 5mC at the same CpG site in olfactory bulb. (e) Heatmap of 5hmC and 5mC levels in CpG sites that were simultaneously called as methylated and hydroxymethylated. The scale bar from blue to red represents the density of modified sites from low to high. (f) DAVID functional annotation for the top 500 genes enriched in the active turnover of CpG sites in olfactory bulb tissue. (g) Graphical representation of the association between gene expression and the average 5hmC and 5mC level in the ppp3ca gene locus.

Next, we focused on the modifications on CpG sites. Consistent with previous studies, 5hmC modifications are enriched in gene body regions and have a positive correlation with gene expression (Fig. S1b). Since 5hmC is generated from 5mC by TET enzymes, we observed an expected negative correlation between 5mC level and gene expression (Fig. S1b). Consistently, the 5hmC/5mC level within gene body regions also positively associated with gene expression (Fig. 1c). Importantly, we found that most of the called 5hmC CpG sites frequently simultaneously carry 5mC modifications in the CpG context (Fig. 1d). The relative 5hmC and 5mC levels of simultaneously modified sites are shown in Fig. 1e. Given that 5mC and 5hmC are binary for any cytosine in a cell, the cooccurrence of the 2 modifications reflects active turnover of 5mC to 5hmC in a subset of cells within populations of cells. Thus, we hypothesized that 5hmC turnover from 5mC in a subset of cells is a stable mark with functional importance in brain tissue. Consistent with this scenario, we found that the active turnover CpG sites were not randomly distributed in brain

tissue. DAVID functional annotation of genes enriched in active turnover sites in CpG context revealed genes involved in multiple neuronal development-related KEGG pathways, such as Alzheimer disease, Huntington’s disease, and Parkinson disease (Fig. 1f). Additionally, a representative locus at the ppp3ca gene in the Alzheimer disease pathway further indicated that active turnover of 5mC to 5hmC is associated with actively transcribed genes (Fig. 1g). Collectively, 5hmC is a stable modification in the CpG context at brain development-related genes in brain tissue. Maintenance of DNA methylation and hydroxymethylation in the CH context Next, we analyzed 5mC and 5hmC modifications in non-CpG sites. As shown in Fig. 1a, we detected abundant 5mC and 5hmC in non-CpG contexts, comprising almost 25% and 7% of all cytosines at which DNA methylation and DNA hydroxymethylation were identified, respectively. Ninety-five percent of

Downloaded by [Australian Catholic University] at 19:50 25 September 2017

EPIGENETICS

the CmH sites were 5–60% methylated, whereas 99 percent of the ChH sites were 5–40% hydroxymethylated (Fig. S2a and S2b). At the current sequencing depth, 12,551,743 and 1,805,817 methylated and hydroxymethylated CH sites, respectively, were identified (Table S3). Additionally, we found that the average methylation level in the non-CpG context followed CmA > CmT > CmC in the olfactory bulb, which indicated that the maintenance efficiency of CmH by DNMTs follows CA > CT > CC (Fig. 2a). However, the turnover of 5mC to 5hmC tended to be the most efficient in the CC context (ChA/CmA < ChT/CmT < ChC/CmC, Fig. 2b). Theoretically, if all CmH-modified DNA molecules turn over to ChH, this would indicate that the efficiency of TET enzymes is 100% resulting in 0% correlation between CmH and ChH pattern. Thus, stronger turnover was indicative of a stronger derivation of the 5hmC pattern from the 5mC pattern and a higher efficiency of TET enzymes. In contrast to the active turnover at CpG sites, we found the turnover was either 100% or 0% at CH sites, which explains the limited overlap between 5mC and 5hmC CH sites (Fig. 2c). Additionally, as shown in Table S4, we found that the correlation score between ChH and CmH also follows CA > CT > CC. Collectively, these results suggested that DNMTs and TETs had different sequence preferences. Next, by single-nucleotide resolution mapping of both 5mC and 5hmC, we could analyze the genome sequence proximal to sites of non-CpG 5mC and 5hmC sites. Strikingly, we identified a 5-bp motif difference in non-CpG 5mC sites compared with non-CpG 5hmC sites; one contained CmH enriched in a TNCA/ TC motif, and the other contained ChH enriched in a NNCA/T/ CN motif (where N is any base, Fig. 2d). The identified sequence preference was consistent with previous reports that DNMT3 enzyme activity follows CG > CA > CT > CC.8,15 However,

3

TET enzymes, or the catalysis complex, have different sequence preferences, which may favor C at the C1 position next to the C sites. However, further functional studies are needed to confirm these findings. Different sets of genes show methylation turnover in CpG and CH contexts Next, we asked whether the turnover in both contexts has a synergistic role in regulating gene expression. Surprisingly, among the top 1000 genes that showed the strongest turnover, genes with turnover in the CpG and CH contexts only slightly overlapped (Fig. 3a). Additionally, CH-specific turnover genes are strongly enriched in olfactory receptor (OR) and vomeronasal receptor (VR) gene clusters (Fig. 3a). The distribution of active turnover loci in the CpG and CH contexts along chromosomes 2, 7, and 9, which contain large OR genomic clusters, is depicted in the heatmap format in Fig. 3b, where genes are ordered by chromosomal position. We found that OR gene clusters are enriched in methylation turnover in the CH context but depleted in methylation turnover in the CpG context. Similarly, heatmaps generated by unsupervised clustering also showed that a distinct set of genes is enriched in active turnover in either the CpG or CH context (Fig. S3a). To present data in a visually comprehensive manner, we randomly selected 1000 genes and ranked them based on the average signal intensity of ChH/CmH over their gene body regions. OR and VR genes constituted the majority of genes with significant enrichment for the turnover of ChH/CmH (Fig. 3c). For example, in Fig. S3b, the OR and VR gene clusters are depleted of turnover in the CpG context.

Figure 2. Non-CpG 5mC and 5hmC in olfactory bulb tissue. (a) Average 5hmC and 5mC levels in the genome in the contexts of CA, CT, and CC, as determined by TAB-seq and oxBS-seq, respectively. (b) Relative ChH/CmH levels within gene bodies in CH context. (c) Venn plot of the co-occurrence of 5hmC and 5mC at the same CH site in the olfactory bulb. (d) Log plots of the sequences proximal to CmH and ChH sites. Background sequences proximal to the CH sites are shown in the left panel.

Q. MA ET AL.

Downloaded by [Australian Catholic University] at 19:50 25 September 2017

4

Figure 3. 5hmC and 5mC in CpG and non-CpG sites and their roles in regulating gene expression. (a) Venn diagram showing the number of overlapping genes from among the top 1000 genes with the highest 5hmC/5mC ratios in the CpG and CH contexts. DAVID functional annotation (Protein_Domains) was applied to identify the protein families with higher 5hmC/5mC levels within the gene bodies in the CpG and CH contexts. (b) Positional heatmaps of chromosomes 2, 7, and 9 are shown. Each row represents one gene with the relative level of ChH/CmH within the gene body. Two states are shown as adjacent columns: the relative level of ChH/CmH and relative level of ChG/CmG. Arrows indicate OR, V1R, and V2R clusters found on these chromosomes. (c) Ranked heatmap illustrating 1000 randomly selected genes. Each raw score represents one gene with the relative level of ChH/CmH within the gene bodies increasing from green to red. ORs and VRs represent olfactory receptors and vomeronasal receptors, respectively. OR and VR genes are depicted by blue and yellow lines next to the heatmap. (d) For genes with alternative start sites, the isoform with the highest expression and highest 5hmC5UTR/5mC5UTR ratio was considered a positive correlation. Statistical significance was evaluated by t-tests,  represents P value <0.0001. (e) Wiggle tracks represent the 5hmC pattern in Pcdh-a gene clusters on chromosome 19 in olfactory bulb tissue. Constant exons are highly enriched in 5hmC in gene body regions and depleted of 5hmC in promoter regions.

EPIGENETICS

The key feature of ORs in approximately 5% of mouse genes is that their regulation generates vast diversity among neurons through monoallelic expression in specific types of neurons, allowing the neurons to adopt specialized functional identities. Recently, it has been shown that the heterochromatin markers H3K9me3 and H4K20me3 are enriched in OR genes.1 Our results illustrate that turnover in the CH context may also enable fine-tuning of monoallelic expression in ORs. Together, we found that distinct target genes, such as ORs, are regulated by the turnover of 5mC to 5hmC in the CH context.

Downloaded by [Australian Catholic University] at 19:50 25 September 2017

Active turnover of 5mC to 5hmC in the CpG context is associated with expression of different isoforms for genes with alternative start sites Previous human transcriptome studies have shown that brain tissues exhibit a more divergent splicing program compared with other tissues.16 Thus, we hypothesized that active turnover to 5hmC, as part of nucleosome architecture, may facilitate the fine-tuning of gene expression in brain tissue. First, we found that 5hmC and 5mC within gene body regions, especially in the 50 UTR, are positively and negatively associated with gene expression, respectively (Fig. S3c and S3d). Next, we examined whether the active turnover to 5hmC within the 50 UTR can be associated with expression of different isoforms for genes with alternative transcriptional start sites. We found that the isoform with the strongest turnover within the 50 UTR tends to be expressed at the highest level among isoforms (Fig. 3d). As illustrated in Fig. S3e, of the 2 isoforms of the Idh1 gene— NM_001111320 and NM_010497—, NM_0104970 is preferentially expressed in the olfactory bulb. Next, we looked at Pcdh-a clusters, as the single-cell identity of the mammalian neuron is thought to be provided by the clustered Pcdh family genes; that is, each neuron will express one unique isoform of the Pcdh gene.17 Individual Pcdh-a mRNAs are assembled from one of 14 “variable” exons and 3 “constant” exons. We found that isoforms with constant exons carried high levels of 5hmC and low levels of 5mC in gene body regions, while variable exons carried low levels of 5hmC and high levels of 5mC (Fig. 3e). These results suggest that active turnover of 5mC to 5hmC may impact pre-mRNA splicing. Collectively, active turnover of 5mC to 5hmC in the CpG context could be involved in the fine-tuning of gene expression in the nervous system.

Discussion Here, using single-nucleotide resolution mapping of 5mC and 5hmC simultaneously, we found active turnover of 5mC to 5hmC in both CpG and non-CpG contexts. Compared to previous studies in mouse embryonic stem cells (mESC) (7) and human brain tissue,18 we identified a higher level of 5hmC in the non-CpG context in this study than in previous ones. Consistently, the global 5hmC level is higher in the mouse brain than in mESC cells. Additionally, the number of CpG sites in the human genome is higher than the number of sites in the mouse genome (approximately 28 million CpG sites vs. 22 million CpG sites). Thus, the extent and functional role of active turnover of 5mC to 5hmC, in both CpG and non-CpG contexts, must be considered in an organism- and cell-dependent manner.

5

Strikingly, genes showing 5mC to 5hmC turnover in CpG and CH contexts presented limited overlap. The turnover may enable fine-tuning of gene regulation in brain tissue, which is illustrated by 2 examples: 1) Pcdh genes, which are located in euchromatin, showed turnover in the CpG context; and 2) olfactory receptor genes, which are located in heterochromatin, showed turnover in the CH context. However, this tissue includes a mixed populations of cells, and each neuron expresses a unique Pcdh gene isoform and a unique OR gene. Our study provides an epigenetic glimpse into the 5mC and 5hmC patterns in both CpG and non-CpG contexts, but additional experiments at the single neuron level will be needed to gain mechanistic insight into the regulation of gene choice. It has been shown that hemi-hydroxymethylated DNA is not a good substrate for DNMT1.19,20 Thus, hemi-hydroxymethylated DNA generated by DNA replication is probably maintained by other mechanisms. Recently, it has been shown that DNMT3 has approximately equal activity on hemi-methylated, hemi-hydroxymethylated, and unmodified CpG dinucleotides. Thus, 5hmC may be stably maintained through proper coordination between DNMT3 and TET proteins during cell replication. Several studies also showed that asymmetric non-CpG methylation was maintained by DNMT3a and DNMT3b.13,14 Additionally, if the turnover from non-CpG 5mC to non-CpG 5hmC is more efficient, the pattern of non-CpG 5hmC will show stronger deviation from the non-CpG 5mC pattern. Consistent with this scenario, our results illustrated that the TET enzyme, or the catalysis complex, exhibits a sequence preference for the CC context in the non-CpG context. Strikingly, recent evidence using the BS-seq method suggests that most human tissue types showed enriched CH methylation for the TNCAC or NNCAN motif.10 This signature may be due to the different amounts of 5hmC and 5mC present in different tissues, since the BS-seq method cannot distinguish 5mC from 5hmC.21,22 Our study further supports the notion that studies of 5mC reprogramming need to be re-evaluated in the context of 5hmC. Collectively, these analyses raise the intriguing possibility that active turnover of 5mC to 5hmC in CpG and non-CpG contexts plays a distinct role in the fine-tuning of gene expression in the mammalian brain.

Materials and methods Genomic DNA and RNA extraction Olfactory bulbs were dissected from 8-week-old female C57Bl/6 mice. The tissues for the technical replicates were from independent mice and at least 2 replicates were included in each assay, as indicated. Genomic DNA (Qiagen, Cat#: 51306) and RNA (Life Technologies, Cat#: 15596-018) were extracted according to the manufacturers’ instructions. Library preparation for oxBS-seq and sequencing Sequencing libraries were constructed as described previously23 with minor modifications. Briefly, genomic DNA (0.5 to 1 mg) was sonicated, end-repaired, adenylated, and ligated to

6

Q. MA ET AL.

methylated adaptors. Then, fragments were purified using BioRad Micro Bio-Spin P-6 SSC columns (SSC buffer). After purification, samples were denaturized using 1 M NaOH and oxidized using oxidant solution (15 mM KRuO4 in 0.05 M NaOH). Then, samples were subjected to bisulfite conversion (EZ DNA Methylation-Gold Kit, Zymo Research). After the purification step, the libraries were sequenced on an Illumina HiSeq X Ten. Paired reads were mapped uniquely to the reference genome (mm10, UCSC) by Bismark. The efficiency of converting 5hmC to uracil was calculated using a spiked 5hmC control.

Downloaded by [Australian Catholic University] at 19:50 25 September 2017

Library preparation for TAB-seq and sequencing Sequencing libraries were constructed as described in7 with minor modification. First, genomic DNA with spike-in controls was sonicated to an average size of 300 bp. Then, samples were glycosylated and oxidized using a kit from Wisegene (Cat#: K001). After that, samples were bisulfite converted (EZ DNA Methylation-Gold Kit). Libraries were sequenced using an Illumina HiSeq X Ten. Paired reads were mapped uniquely to the reference genome (mm10, UCSC) by Bismark. The efficiency of converting unmodified cytosine to uracil and the efficiency of converting 5mC to 5caU/U were calculated using spiked M. SssI-treated lambda DNA. RNA-seq and statistics Libraries were prepared using the NEB UltraTM Directional RNA Library Prep Kit according to the manufacturer’s protocol and sequenced using an Illumina HiSeq 2000. First, raw FASTQ data files generated by the Illumina HiSeq 2000 platform were mapped to the mouse reference sequence (mm10) for mRNA analysis using the TopHat program (TopHat v2.0.13)24 with default parameters. Then, normalized fragments per kilobase of transcript per million mapped read levels of each sample with default parameters were calculated using the Cufflinks program (Cufflinks v2.2.1). Next, protein coding genes were defined as expressed (FPKM  1) or unexpressed (FPKM < 1). Quantification of 5hmC and 5mC levels in the CpG context The levels of 5hmC and 5mC for each CpG site were calculated as described.25 Briefly, a binomial distribution model was performed to calculate the significance for the hydroxymethylation or methylation. Only sites with Benjamini-Hochberg corrected binomial P values  0.05 and read coverages  5 were considered hydroxymethylated or methylated. The hydroxymethylation and methylation levels were estimated by dividing the number of sequenced “C” bases by the number of sequenced “C” bases plus “T” bases. Quantification of 5hmC and 5mC levels in the CH context The levels of 5hmC and 5mC for each CH site were calculated using the same methods as for the CpG sites. Only sites with P values  0.05 and read coverages  5 were considered hydroxymethylated or methylated. Then, non-CpG 5hmCs and 5mCs were divided into CA, CT, and CC, according to the context.

Association between 5mC or 5hmC level and gene expression in olfactory bulb tissue Genes were aligned from transcription start site (TSS) to transcription end site (TES). For each gene, the region between TSS and TES was divided into 100 bins, and the 50 -UTR and 30 UTR represented 5 bins each. The 5mC and 5hmC levels were calculated for each bin/region. The genes were stratified into 4 groups by gene expression in the indicated sample. Genes with FPKMs

Mouse olfactory bulb methylome and hydroxymethylome maps reveal noncanonical active turnover of DNA methylation.

Hydroxylation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC) by TET enzymes presents a particular regulatory mechanism in the mammalian b...
1MB Sizes 0 Downloads 12 Views