Downloaded from genome.cshlp.org on October 18, 2014 - Published by Cold Spring Harbor Laboratory Press

Widespread contribution of transposable elements to the innovation of gene regulatory networks Vasavi Sundaram, Yong Cheng, Zhihai Ma, et al. Genome Res. published online October 15, 2014 Access the most recent version at doi:10.1101/gr.168872.113

Supplemental Material P30% of the binding sites for the tumor suppressor protein, TP53 (Wang et al. 2007). Similarly, ;20% of POU5F1 and NANOG binding sites were contributed by lineage-specific TEs in humans and mice (Kunarso et al. 2010), and a rodent-specific expansion of CTCF binding sites was also connected to retrotransposons (Schmidt et al. 2012). Interestingly, Kunarso et al. (2010) also showed that the lineage-specific TEs wired new genes into the human pluripotency transcriptional program. Likewise, a eutherian-specific TE, MER20, was suggested to have wired ;13% of pregnancy-related genes involved in signaling pathways related to implantation into endometrial stromal cells during the evolution of pregnancy in placental mammals (Lynch et al. 2011). Other than TF binding, recent reports have shown that TEs contribute DNase I hypersensitivity sites (Jacques et al. 2013), and may also contribute to the evolution and expression of lncRNA (Kapusta et al. 2013). Additionally, hypomethylated TEs have been shown to associate with tissue-specific enhancers (Xie et al. 2013). Taken together, TEs form an effective model for rewiring gene regulatory networks. The large portion of mammalian genomes that TEs represent is thought to have provided raw material for the evolution of cis-regulatory elements (Feschotte 2008), possibly via binding site turnover or via spreading of binding sites when the TE transposes (Feschotte 2008). We have conducted here by far the most comprehensive study of interactions between TFs and TEs. In summary, our study

10

Genome Research www.genome.org

made the following major discoveries. We found that TEs have contributed on average ;20% of TF binding sites in two representative cell lines in human and mouse. Of the 26 TFs we analyzed here, the extent to which TEs contribute to TF binding peaks exhibited TF-specific differences. Although certain TE subfamilies were shared between human and mouse genomes, a very small portion of peaks were conserved between human and mouse; most of the TE-derived binding peaks were species-specific. Importantly, our data confirmed that epigenetic regulation of TEs might be much more dynamic than previously thought (Morgan et al. 1999; Bird 2002; Slotkin and Martienssen 2007; Ekram et al. 2012; Rebollo et al. 2012a; Xie et al. 2013). TEs that were bound by TFs also enriched for enhancer epigenetic marks, such as increased H3K4me1, H3K27ac, and reduced DNA methylation, and often in a cell type-specific manner. Binding site motifs were strongly associated with TF binding but they did not perfectly predict binding. In this regard, TE-derived sequences behave no different from non-TE genomic sequences. The interplay between sequence features, TF binding, and epigenetic modification of TE sequences can only be elucidated with additional experimentation. Taken together, our results support the model of Britten and Davidson (1969) of TEs contributing to the evolution of TF binding sites and potentially rewiring gene regulatory networks. TE-derived TF binding peaks in human and mouse shared general characteristics such as their epigenetic profiles, but differed

Downloaded from genome.cshlp.org on October 18, 2014 - Published by Cold Spring Harbor Laboratory Press

TEs contribute to gene regulatory networks greatly in their genomic distribution such that majority of these peaks were species-specific. Interestingly, we identified distinct functional enrichment of genes associated with species-specific binding peaks, suggesting that TEs might have contributed to the evolution of species-specific regulatory functions and perhaps contributed to the phenotypic differences between species. A fundamental question that needs to be addressed next is how many of these TE-derived TF binding peaks are biologically functional. The binding events we report here indicate a biochemical activity of TF-DNA association, but whether these TE-derived TF binding peaks can influence expression of genes remains to be investigated. Alternatively, these TE sequences could function in nonconventional ways. For example, they might not directly result in a transcription read out, but could provide a buffer of extra binding sites to trap transcription factors or serve as a ‘‘landing pad’’ to allow transcription factors to quickly attach to and scan DNA. TEs clearly provided materials for evolving new binding sites and represent an efficient mechanism for rapid TF binding site turnover. We note that a functionally conserved binding site of a TF that resides in unique genomic sequences could also be derived from a TE, but the event may be difficult to identify as TE-derived if either the sequence context of the binding site has degenerated, or the event is simply too ancient to detect using sequence comparison, which more easily detects younger TEs (de Koning et al. 2011). Therefore, the number of TE-derived TF binding sites we reported here is likely a lower bound. Because these reported TEderived binding sites are generally quite young, they may also be transient in the context of evolutionary time. The majority of them may be functionally neutral and disappear as the species continue to evolve, but a select few might stand the test of evolution if they convey fitness advantage for the species. In this regard, perhaps at least one function of TE-derived binding sites is to provide material from which regulatory innovation can be evolved.

Methods ChIP-seq Chromatin immunoprecipitation was carried out as previously described (Landt et al. 2012). Cultured cells for biological replicates were grown in separate batches and at separate times. Briefly, 5 3 107 cells were grown to a density of 0.6–0.8 3 106/mL and then cross-linked in 1% formaldehyde for 10 min at room temperature. Nuclear lysates were sonicated using a Branson 250 Sonifier (power setting 7, 100% duty cycle for 12 3 20-s intervals), such that the chromatin fragments ranged from 50 to 2000 bp. Protein-DNA complexes were captured on Protein A/G agarose beads (Millipore #16-156/16-266) and eluted in 1% SDS TE buffer at 65°C. Following cross-link reversal and purification, the ChIP DNA sequencing libraries were prepared as described before (Kasowski et al. 2010) and sequenced on an Illumina Genome Analyzer II. All the data sets had reads that were 36 bp long.

Data sources and data processing TF ChIP-seq data sets for the 26 TFs in the two human-mouse pair of cell lines were processed by a uniform processing pipeline (Landt et al. 2012) and obtained from Cheng et al. (2014) via the consortia’s Data Coordination Center. Reads were mapped by BWA (Li and Durbin 2009), and only reads that can be mapped to exactly one location in the genome were retained. We used the SPP peak caller (Kharchenko et al. 2008) to identify and score (rank) potential occupancy peaks. For

obtaining optimal thresholds, we used the Irreproducible Discovery Rate (IDR) framework (https://sites.google.com/site/anshulkundaje/ projects/idr) to determine high confidence occupancy events by leveraging the reproducibility and rank consistency of the identified peaks across replicate experiments of a data set. As recommended by the ENCODE working group, a cutoff of 0.2 was used with IDR. We analyzed the extent of TE-mediated expansion of TF binding sites, using merged peaks from the two cell lines in human, and mouse. To merge data sets, we used the mergeBed function with default parameters, from the BEDTools package (Quinlan and Hall 2010). We also analyzed the extent of TE-derived TF binding specifically in each cell. For this, we merged the data sets from various institutions that exist for each cell line into one data set. To determine the epigenetic state of our regions of interest, we used ChIP-seq data sets for six histone marks: H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3, and H3K36me3. We used the density of the aligned reads, available in the Downloads section of the ENCODE data hub. Each data set had replicates that were averaged for the final analyses. We also assayed the DNA methylation levels in all four cell lines (see below).

Enrichment of TF ChIP-seq binding peaks in TEs We used enrichment calculation to evaluate the extent of TF binding peaks that were derived from TE. To identify TE-derived TF binding peaks, we required that the centers of the peaks overlapped with TE fragments, which were annotated using RepeatMasker (Smit et al. 1996-2010), in the human (hg19), and mouse (mm9) assembly (files were downloaded from the UCSC Genome Browser). We used the intersectBed tool (with default parameters) from the BEDTools package (Quinlan and Hall 2010) to calculate the intersection. Enrichment of TF binding peaks in TEs was defined as 0

1 Number of TF 9i 9peak centers in TE subfamily 9j =9 B Length of TE subfamily 9j 9ðkbÞ C C LORi;j = log2 B @ Number of TF 9i 9peaks in the genome A = Genome lengthðkbÞ To identify TE subfamilies that were enriched for TF binding peaks, we used a threshold of 1.5 to identify subfamilies, which represents approximately a threefold enrichment. To overcome TF–TE candidates that had high enrichment values resulting from very few TE instances, we required that (1) the number of genomic fragments a TE subfamily should be greater than 30, and (2) the number of peaks overlapping the fragments of a TE subfamily should be greater than 10.

Enrichment of ChIP-seq reads for TF binding data in TEs In an attempt to overcome issues of poor mappability of TEs, we adapted our recently published repeat-alignment pipeline (RAP) (Xie et al. 2013). The adapted pipeline uses all (including nonunique) sequencing reads and maps them to the TE consensus. Each TE subfamily has a consensus sequence that was used for the genome-wide annotation of TEs, curated by RepeatMasker (Smit et al. 1996-2010). This provides a normalized signal (RPKM) for each TE subfamily, based on the alignment to the TE consensus. RPKM =

Number of readsðRÞthat map to a TE consensus sequence 109 Length of the TE consensus sequenceðKÞ Total number of mapped reads in the data set ðMÞ

Genome Research www.genome.org

11

Downloaded from genome.cshlp.org on October 18, 2014 - Published by Cold Spring Harbor Laboratory Press

Sundaram et al.

We did the RPKM calculation for each TE subfamily using both ChIP-seq reads and input reads. The enrichments were calculated by the following equation:  LORi;j = log2

 RPKM of TF 9i 9reads that map to TE subfamily 9j 9 : RPKM of input reads that map to TE subfamily 9j 9

Occupancy conservation of TE-derived TF binding sites To identify conserved TF binding, we used a one-to-one nucleotide mapper called bnMapper (O Denas, R Sandstorm, Y Cheng, K Beal, J Herrero, RC Hardison, and J Taylor, in prep.) (https:// bitbucket.org/james_taylor/bx-python/wiki/bnMapper) and mapped ChIP-seq binding peaks between human and mouse and vice versa. The mapping strategy in this tool is bijective, which means that genomic regions from one species are mapped to only one region in the other species. Therefore, the reverse mapping of a mapped nucleotide will return the original nucleotide. The tool ignores mapped regions that span multiple blocks of different chains or map to multiple chromosomes. In order to unambiguously map features from the human genome to the mouse genome and vice versa, we used a reciprocal-best chain, as the human-mouse alignment provided by UCSC (based on BLASTZ pairwise alignment) is not symmetric. The reciprocal-best chain was created using a netting procedure and chaining only the first layer to make the original human-mouse alignment reciprocal. The reciprocal chain files can be downloaded at http://bx.mathcs.emory. edu/;odenas/mapper_comparisons/UCSC/UCSC_reciprocal. Once we identified the orthologous regions of TF binding peaks in one species, we overlapped the orthologous regions with the binding peaks of the same TF in the other species to see if the orthologous region was actually bound by the same TF (i.e., conserved occupancy). For this, we used the intersectBed tool (Quinlan and Hall 2010) and required that at least half the peak region overlapped the mapped orthologous region. Using these occupancy-conserved regions, we determined whether or not the TE-derived TF binding peaks were conserved. If occupancy-conserved regions overlapped the same TE (i.e., peak center overlapping the TE) in both species, we called the binding event a shared, TE-derived TF binding event. We also identified several cases in which the occupancy was conserved, but the TE annotation was different or missing. Additionally, there were many TE-derived TF binding events that were not conserved (defined by the lack of binding in the other species, or being unmappable).

Sequence identity of TE-derived TF binding events To determine the sequence identity of occupancy-conserved TEderived TF binding peaks, we used the chain files (described above) to identify alignable regions of the genomes. Using this, we measured the sequence identity between the pairs of alignable regions. For comparison, we randomly picked 1000 TEs from RepeatMaskerannotated TEs (Smit et al. 1996-2010), in the human (hg19) and mouse (mm9) genome assemblies.

Sequence conservation of TE-derived TF binding peaks (phastCons) We used phastCons scores (Siepel et al. 2005) to examine the sequence constraint on the TEs that had shared and species-specific TF binding events derived from it. For conserved binding events (i.e., cases in which the binding event is encoded on TE subfamilies either annotated as the same in both species, or where the annotation is not the same in both species), we used the TE

12

Genome Research www.genome.org

sequences that the TF binding peak was derived from. For non-TEbased conserved binding events, representing unmappable (i.e., no orthologous region identifiable) and unoccupied (i.e., the orthologous region lacked TF binding) binding events, we used the peaks regions. We downloaded the vertebrate phastCons data from UCSC Genome Browser for human (phastCons46way) and mouse (phastCons30way). We defined 6-kb regions, centered on each TE or peak, and profiled the phastCons score over the region. We then averaged the scores across various genomic regions in each category of occupancy conservation.

De novo prediction of TF binding motifs Because several TFs being analyzed here do not have known TF binding motifs, we used the HOMER software (Heinz et al. 2010) for de novo binding site prediction from TF binding peaks. We ran HOMER on unique ChIP-seq binding regions (we excluded TEs and repetitive sequences) for each TF in each species. Once we identified binding motifs for each TF in the human and mouse genome, we took advantage of the orthologous data to select binding motifs. From the top five ranked HOMER motifs, we selected the highest ranked motif that was the same between the two species (Supplemental Table 5). With this criterion, we were able to identify binding motifs for 19 of the 26 TFs. Since we trained the de novo motif predictor on sequences lacking repetitive sequences, we tested the prediction on TE sequences that we knew had binding peaks overlapping it; we found that as expected, the training and testing data sets were both enriched for the motif (Fig. 5A).

Enrichment of TF binding motifs in TEs To measure the enrichment of TF binding motifs in TEs, we first scanned the human and mouse genomes with the de novo predicted motifs, using FIMO (Grant et al. 2011). We then overlapped the TF binding motifs in the genome with RepeatMasker-annotated TE fragments (Smit et al. 1996-2010), using the intersectBed tool (Quinlan and Hall 2010) to identify TE-derived TF binding sites. Enrichment of TF binding motifs in TEs was defined as 0 1 Number of TF 9i 9binding motifs in TE subfamily 9j =9 B Length of TE subfamily 9j 9ðkbÞ C C LORi;j = log2B @ Number of TF 9i 9binding motifs in the genome A = Genome lengthðkbÞ To compare the enrichment of motifs in TEs with the enrichment of peaks in TEs, we required (as mentioned earlier) that an enriched TE subfamily had greater than 30 genomic copies.

Sequence alignment of TE subfamilies enriched for TF peaks and motifs To evaluate the motif conservation in TEs, we first downloaded the sequences of the TE subfamily and the subfamily’s consensus from Repbase (Jurka et al. 2005). We then chose TE subfamilies that were enriched for TF binding motifs and TF binding peaks, and aligned the sequences of the TE fragments using Clustal Omega (Sievers et al. 2011). We further removed columns in the alignment that contributed gaps in the consensus sequence.

Determining the epigenetic state of TEs To evaluate the epigenetic profiles of TEs that encode TF binding events, we used histone data sets for the human-mouse pairs of lymphoblast and leukemia cell lines, which are available

Downloaded from genome.cshlp.org on October 18, 2014 - Published by Cold Spring Harbor Laboratory Press

TEs contribute to gene regulatory networks from the Data Coordination Center (DCC) of the ENCODE and Mouse ENCODE Consortia. The data sets represented assays for six histone marks: H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3, and H3K36me3. To profile the histone signal on TEs, or peaks, we chose a 10-kb region, centered region of interest (TE or TF binding peak) and calculated the normalized read density in 50-bp bins. To overlap the epigenetic data sets with the regions of interest, we used the intersectBed tool (with default parameters) from the BEDTools package (Quinlan and Hall 2010). We averaged the signal from all regions and the replicates and plotted the enrichment of the histone signal over the input data set.

Assaying the methylation state of the TEs To profile the DNA methylation patterns in the human-mouse pair of lymphoblast and leukemia cell lines, we performed two complementary assays, MeDIP-seq and MRE-seq, on the cells, as described earlier (Maunakea et al. 2010). We aligned the sequencing reads from these assays back to the human (hg19) and mouse (mm9) genome assemblies, using BWA (Li and Durbin 2009). We overlapped the signal from each of these assays with the 10-kb regions of interest (as described above) and estimated the average assay signal.

Data access All data from this study have been submitted to the Mouse ENCODE Data Coordination Center (DCC; http://www.mouseencode.org). All data sets used in this study along with their NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) accession numbers or ENCODE DCC data set identifiers are listed in Supplemental Table 6.

Acknowledgments We thank collaborators of the ENCODE and Mouse ENCODE Consortia who have generated and processed data that were used in this project. V.S. is supported in part by the Cancer Biology Pathway, Washington University. M.P.S. is supported by Mouse ENCODE Consortium grant 3RC2HG005602. T.W. is supported by the Basil O’Connor Starter Scholar Research Award 5-FY10-491 from the March of Dimes Foundation, the Edward Jr. Mallinckrodt Foundation, American Cancer Society grant RSG-14-049-01-DMC, and NIH grants 5U01ES017154, R01HG007354, R01HG007175, and R01ES024992.

References Adams MD. 2000. The genome sequence of Drosophila melanogaster. Science 287: 2185–2195. Ames BN, Shigenaga MK, Hagen TM. 1993. Oxidants, antioxidants, and the degenerative diseases of aging. Proc Natl Acad Sci 90: 7915–7922. Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D. 2006. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 441: 87–90. Bird A. 2002. DNA methylation patterns and epigenetic memory. Genes Dev 16: 6–21. Bourque G. 2009. Transposable elements in gene regulation and in the evolution of vertebrate genomes. Curr Opin Genet Dev 19: 607–612. Bourque G, Leong B, Vega VB, Chen X, Lee YL, Srinivasan KG, Chew JL, Ruan Y, Wei CL, Ng HH, et al. 2008. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res 18: 1752–1762. Britten RJ, Davidson EH. 1969. Gene regulation for higher cells: a theory. Science 165: 349–357. Cheng Y, Ma Z, Kim BH, Cayting P, Boyle AP, Wu W, Sundaram V, Xing X, Li J, Euskirchen G, et al. 2014. Principles of regulatory information conservation between mouse and human. Nature (in press).

Chuong EB, Rumi MA, Soares MJ, Baker JC. 2013. Endogenous retroviruses function as species-specific enhancer elements in the placenta. Nat Genet 45: 325–329. Cohen CJ, Lock WM, Mager DL. 2009. Endogenous retroviral LTRs as promoters for human genes: a critical assessment. Gene 448: 105–114. de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. 2011. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet 7: e1002384. de Souza FSJ, Franchini LF, Rubinstein M. 2013. Exaptation of transposable elements into novel cis-regulatory elements: is the evidence always strong? Mol Biol Evol 30: 1239–1251. Demetrius L. 2005. Of mice and men. EMBO Rep 6: 39–44. Doolittle WF, Sapienza C. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601–603. Dunn CA, Medstrand P, Mager DL. 2003. An endogenous retroviral long terminal repeat is the dominant promoter for human b1,3galactosyltransferase 5 in the colon. Proc Natl Acad Sci 100: 12841– 12846. Ekram MB, Kang K, Kim H, Kim J. 2012. Retrotransposons as a major source of epigenetic variations in the mammalian genome. Epigenetics 7: 370– 382. The ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74. Feschotte C. 2008. Transposable elements and the evolution of regulatory networks. Nat Rev Genet 9: 397–405. Feschotte C, Gilbert C. 2012. Endogenous viruses: insights into viral evolution and impact on host biology. Nat Rev Genet 13: 283–296. Grant CE, Bailey TL, Noble WS. 2011. FIMO: scanning for occurrences of a given motif. Bioinformatics 27: 1017–1018. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. 2010. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38: 576–589. International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860–921.  Jeyakani J, Bourque G. 2013. The majority of primate-specific Jacques PE, regulatory sequences are derived from transposable elements. PLoS Genet 9: e1003504. Johnson DS, Mortazavi A, Myers RM. 2007. Genome-wide mapping of in vivo protein-DNA interactions. Science 80: 1497–1502. Jordan IK, Rogozin IB, Glazko GV, Koonin EV. 2003. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet 19: 68–72. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110: 462–467. Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, Bourque G, Yandell M, Feschotte C. 2013. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet 9: e1003470. Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, Waszak SM, Habegger L, Rozowsky J, Shi M, Urban AE, et al. 2010. Variation in transcription factor binding among humans. Science 328: 232–235. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. The Human Genome Browser at UCSC. Genome Res 12: 996– 1006. Kharchenko PV, Tolstorukov MY, Park PJ. 2008. Design and analysis of ChIPseq experiments for DNA-binding proteins. Nat Biotechnol 26: 1351– 1359. Kunarso G, Chia NY, Jeyakani J, Hwang C, Lu X, Chan YS, Ng HH, Bourque G. 2010. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet 42: 631–634. Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, et al. 2012. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22: 1813–1831. Li H, Durbin R. 2009. Fast and accurate short read alignment with BurrowsWheeler transform. Bioinformatics 25: 1754–1760. Lynch VJ, Leclerc RD, May G, Wagner GP. 2011. Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nat Genet 43: 1154–1159. Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D’Souza C, Fouse SD, Johnson BE, Hong C, Nielsen C, Zhao Y, et al. 2010. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466: 253–257. McClintock B. 1950. The origin and behavior of mutable loci in maize. Proc Natl Acad Sci 36: 344–355. McClintock B. 1956. Controlling elements and the gene. Cold Spring Harb Symp Quant Biol 21: 197–216.

Genome Research www.genome.org

13

Downloaded from genome.cshlp.org on October 18, 2014 - Published by Cold Spring Harbor Laboratory Press

Sundaram et al. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. 2010. GREAT improves functional interpretation of cisregulatory regions. Nat Biotechnol 28: 495–501. Medstrand P, Landry JR, Mager DL. 2001. Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-I genes in humans. J Biol Chem 276: 1896–1903. Merkenschlager M, Odom DT. 2013. CTCF and cohesin: linking gene regulatory elements with their targets. Cell 152: 1285–1297. Mestas J, Hughes CCW. 2004. Of mice and not men: differences between mouse and human immunology. J Immunol 172: 2731–2738. Morgan HD, Sutherland HG, Martin DI, Whitelaw E. 1999. Epigenetic inheritance at the agouti locus in the mouse. Nat Genet 23: 314–318. Mouse Genome Sequencing Consortium. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520–562. Nitzsche A, Paszkowski-Rogacz M, Matarese F, Janssen-Megens EM, Hubner NC, Schulz H, de Vries I, Ding L, Huebner N, Mann M, et al. 2011. RAD21 cooperates with pluripotency transcription factors in the maintenance of embryonic stem cell identity. PLoS ONE 6: e19470. Oliver KR, Greene WK. 2011. Mobile DNA and the TE-thrust hypothesis: supporting evidence from the primates. Mob DNA 2: 8. Phillips JE, Corces VG. 2009. CTCF: master weaver of the genome. Cell 137: 1194–1211. Pi W, Zhu X, Wu M, Wang Y, Fulzele S, Eroglu A, Ling J, Tuan D. 2010. Longrange function of an intergenic retrotransposon. Proc Natl Acad Sci 107: 12992–12997. ~ o-Ramırez L, Landsman D, McDonald JF, Jordan IK. Polavarapu N, Marin 2008. Evolutionary rates and patterns for human transcription factor binding sites derived from repetitive DNA. BMC Genomics 9: 226. Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842. Rebollo R, Miceli-Royer K, Zhang Y, Farivar S, Gagnier L, Mager DL. 2012a. Epigenetic interplay between mouse endogenous retroviruses and host genes. Genome Biol 13: R89. Rebollo R, Romanish MT, Mager DL. 2012b. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu Rev Genet 46: 21–42. Roman AC, Benitez DA, Carvajal-Gonzalez JM, Fernandez-Salguero PM. 2008. Genome-wide B1 retrotransposon binds the transcription factors dioxin receptor and Slug and regulates gene expression in vivo. Proc Natl Acad Sci 105: 1632–1637. Samuelson LC, Wiebauer K, Snow CM, Meisler MH. 1990. Retroviral and pseudogene insertion sites reveal the lineage of human salivary and pancreatic amylase genes from a single gene during primate evolution. Mol Cell Biol 10: 2513–2520. Sasaki T, Nishihara H, Hirakawa M, Fujimura K, Tanaka M, Kokubo N, Kimura-Yoshida C, Matsuo I, Sumiyama K, Saitou N, et al. 2008. Possible involvement of SINEs in mammalian-specific brain formation. Proc Natl Acad Sci 105: 4220–4225.

14

Genome Research www.genome.org

Schmidt D, Schwalie PC, Wilson MD, Ballester B, Gonc¸alves A, Kutter C, Brown GD, Marshall A, Flicek P, Odom DT. 2012. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148: 335–348. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15: 1034–1050. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam € ding J, et al. 2011. Fast, scalable generation of highH, Remmert M, So quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7: 539. Slotkin RK, Martienssen R. 2007. Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet 8: 272–285. Smit AFA, Hubley R, Green P. 1996–2010. RepeatMasker Open-3.0. http:// www.repeatmasker.org. Stevens M, Cheng JB, Li D, Xie M, Hong C, Maire CL, Ligon KL, Hirst M, Marra MA, Costello JF, et al. 2013. Estimating absolute methylation levels at single-CpG resolution from methylation enrichment and restriction enzyme sequencing methods. Genome Res 23: 1541– 1553. Visel A, Blow MJ, Li Z, Zhang T, Akiyama JA, Holt A, Plajzer-Frick I, Shoukry M, Wright C, Chen F, et al. 2009. ChIP-seq accurately predicts tissuespecific activity of enhancers. Nature 457: 854–858. Wang T, Zeng J, Lowe CB, Sellers RG, Salama SR, Yang M, Burgess SM, Brachmann RK, Haussler D. 2007. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc Natl Acad Sci 104: 18613–18618. Wendt KS, Yoshida K, Itoh T, Bando M, Koch B, Schirghuber E, Tsutsumi S, Nagae G, Ishihara K, Mishiro T, et al. 2008. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451: 796– 801. Xie D, Chen CC, Ptaszek LM, Xiao S, Cao X, Fang F, Ng HH, Lewin HA, Cowan C, Zhong S. 2010. Rewirable gene regulatory networks in the preimplantation embryonic development of three mammalian species. Genome Res 20: 804–815. Xie M, Hong C, Zhang B, Lowdon RF, Xing X, Li D, Zhou X, Lee HJ, Maire CL, Ligon KL, et al. 2013. DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape. Nat Genet 45: 836–841. Zhang B, Zhou Y, Lin N, Lowdon RF, Hong C, Nagarajan RP, Cheng JB, Li D, Stevens M, Lee HJ, et al. 2013. Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M&M algorithm. Genome Res 23: 1522–1540.

Received October 27, 2013; accepted in revised form April 18, 2014.

Widespread contribution of transposable elements to the innovation of gene regulatory networks.

Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which ...
3MB Sizes 0 Downloads 3 Views