Chromosome Res (2013) 21:673–684 DOI 10.1007/s10577-013-9382-8

REVIEW

Long noncoding RNAs as metazoan developmental regulators Jamila I. Horabin

Published online: 23 October 2013 # Springer Science+Business Media Dordrecht 2013

Abstract The study of long noncoding RNAs (lncRNAs) is still in its infancy with more putative RNAs identified than those with ascribed functions. Defined as transcripts that are longer than 200 nucleotides without a coding sequence, their numbers are on the rise and may well challenge protein coding transcripts in number and diversity. lncRNAs are often expressed at low levels and their sequences are frequently poorly conserved, making it unclear if they are transcriptional noise or bonafide effectors. Despite these limitations, inroads into their functions are being made and it is clear they make a contribution in regulating all aspects of biology. The early verdict on their activity, however, suggests the majority function as chromatin modifiers. A good proportion show a connection to disease highlighting their importance and the need to determine their function. The focus of this review is on lncRNAs which influence developmental processes which in itself covers a large range of known activities. Keywords lncRNA . development . transcription . chromatin modifiers . scaffolds . RNAP II pausing

Abbreviations BRD4 Bromodomain protein 4 CDK7 Cyclin dependent kinase 7 CoREST Corepressor of RE1 silencing transcription factor CTD C-terminal domain DCC Dosage compensation complex DNA Deoxyribonucleic acid HOTAIR HOX antisense intergenic RNA HOTTIP HOXA transcript at the distal tip lncRNAs Long noncoding RNAs mRNA Messenger RNA ORF Open reading frame P-TEFb Positive transcription elongation factor b PcG Polycomb group PRC2 Polycomb repressive complex 2 PRE Polycomb response elements RNA Ribonucleic acid RNAP II RNA polymerase II STAU1 Staufen 1 protein TINCR Terminal differentiation-induced ncRNA TrxG Trithorax group Xist X-inactive specific transcript

Responsible Editor: Brian P. Chadwick, Kristin C. Scott, and Beth A. Sullivan

Pervasive transcription

J. I. Horabin (*) Department of Biomedical Sciences, College of Medicine, Florida State University, Rm 3300-G, 1115 W. Call St., Tallahassee, FL 32306-4300, USA e-mail: [email protected]

The transcriptional output from the genomes of higher eukaryotes is far more complex than previously predicted. Evidence for transcription across at least 83.7 % of the human genome firmly establishes the reality of

674

this pervasive transcription (Djebali et al. 2012), and a considerable fraction of noncoding sequences which are conserved in the human and fruitfly genomes are transcribed (Carninci et al. 2008; MODencode consortium 2010). It has been argued that, with its regulatory functions, this extensive noncoding transcriptome accounts for the evolution of organismal complexity (Clark et al. 2011). Once referred to as “junk DNA” or “dark matter,” it is increasingly clear that the intergenic regions of the genome encode functional elements. Some of these regions perform their biological roles purely as DNA elements, but most of them are now known to be transcribed. These noncoding transcripts are classified as small (200 nt) RNAs. This size cutoff distinguishes long noncoding RNAs (lncRNAs), the subject of this review, from small RNAs such as the microRNAs, PIWI-interacting RNAs, small nucleolar RNAs, and small interfering RNAs.

What defines a lncRNA lncRNA is a broad definition that encompasses different classes of RNA transcripts, including enhancer RNAs (Darrow and Chadwick 2013), intergenic transcripts, and overlapping transcripts in either the sense or antisense orientation (Fig. 1). Despite their growing numbers, only a few have been analyzed to any significant extent, so the functional contribution to biology of most lncRNAs remains unknown. The majority of the characterized lncRNAs are generated by the same transcriptional machinery which generates other messenger RNAs (mRNAs), as substantiated by RNA polymerase II (RNAP II) occupancy and the histone modifications normally associated with transcription initiation and elongation (Guttman et al. 2009). These lncRNAs have

J.I. Horabin

a 5′ terminal methylguanosine cap and are often spliced and polyadenylated. There are alternate pathways which also contribute to the generation of known lncRNAs, but they are generally poorly characterized and the RNAs frequently are not polyadenylated. Most are likely expressed from RNAP III promoters (Kapranov et al. 2007; Dieci et al. 2007), excised during splicing and from small nucleolar RNA production (Yin et al. 2012). Current annotations suggest that the actual number of lncRNAs exceeds that of protein coding genes (Derrien et al. 2012). Additionally, multiple studies have shown that lncRNA expression is more cell type specific than that of protein coding genes (Cabili et al. 2011; Djebali et al. 2012; Ravasi et al. 2006). Studied more intensely in human and mammalian cell lines than any other model system, many of the current definitions and categorizations of lncRNAs come from these data bases. Whereas the number of known human protein coding genes has remained stable over recent years, the list of functional noncoding RNA genes within intergenic regions has continued to grow. As of 2013, approximately 10,000 lncRNA genes have been annotated in humans (Derrien et al. 2012; Hangauer et al. 2013). As the number of known lncRNAs continues to accumulate, it may well turn out that lncRNAs will surpass protein coding genes in number and diversity. Generally, lncRNAs are spliced and/or polyadenylated and tend to contain a smaller number of exons than protein coding genes (Rinn et al. 2003; Ponting and Belgard 2010). They are expressed in a tissue and/or developmental stage specific manner (Ponting et al. 2009; Ponting and Belgard 2010; Young et al. 2012) and often are in lower amounts relative to their protein coding counterparts, making it difficult to accurately detect and assemble their complex transcript structures (Derrien et al. 2012; Cabili et al. 2011). They are often found in the vicinity of protein coding genes and more

Fig. 1 Pervasive transcription. Sources of potential RNAs which can impact genes in cis or trans. Bent arrows depict promoters. For cis activity, some of the lncRNAs can be expected to traverse the entire coding region

lncRNAs and development

than half of mammalian coding genes have complementary noncoding antisense transcription (Katayama, et al. 2005). This may be in addition to overlapping, intronic and bidirectional noncoding transcription which generates an organization of complex transcription at loci with both coding and noncoding transcripts (Kapranov et al. 2005). lncRNAs predominantly localize to the nucleus, although several examples of cytoplasmic lncRNAs exist (a couple are described below). The conservation of lncRNAs between species has been found to be significantly better than putative neutrally evolving sequences, such as ancestral repeats in mammals (Ponjavic et al. 2007; Guttman et al. 2009; Marques and Ponting 2009) or small introns in Drosophila (Young et al. 2012). However, in all these organisms, lncRNA sequences have been shown to diverge far more rapidly than protein coding sequences. Interestingly, their promoters display levels of selection comparable to the promoters of protein coding genes (Derrien et al. 2012; Guttman et al. 2009; Haerty and Ponting 2013; Marques and Ponting 2009; Ørom et al. 2010; Ponjavic et al. 2007).

Challenges of identifying lncRNAs Besides their sheer numbers, one of the challenges in identifying lncRNAs comes from the fact that their biogenesis is shared with mRNAs as well as many of the other classes of noncoding RNAs. Additionally, they frequently are expressed or maintained at a low level and do not need to be polyadenylated to be functional (Wilusz et al. 2008). This makes it hard to distinguish them from incompletely spliced products, intron material, or spuriously transcribed RNA. Alternatively spliced products and transcripts from pseudogenes, which may or may not encode proteins, also contribute to the complexity. At present, there is no defining biochemical feature(s) which can be exclusively ascribed to lncRNAs. Most frequently, the lack of an extended open reading frame (ORF) provides the speculative evidence that the many identified transcripts function as RNAs (Dinger et al. 2008). Ribosome footprinting and peptide fragment assignments by mass spectrometry to determine if translation of the RNA can occur is a classification criterion (Ingolia et al. 2011; Banfai et al. 2012). Cross-species comparisons which suggest conservation of an extended ORF, especially when the codon nucleotides suggest selective or species specific pressure, is a

675

feature frequently used to distinguish coding transcripts from the converse (Lin et al. 2011). There are, of course, exceptions where short or noncanonical peptides are encoded in transcripts; this list of bifunctional mRNAs with ncRNA activity is small but growing in number (reviewed in Dinger et al. 2011).

Activities and functions Despite the challenges, the growing list of lncRNAs with ascribed functions is shedding light on their diversity and range of activities. Current evidence suggests the largest class may be that of chromatin modifiers. In a study which identified greater than 3,000 lncRNAs from several human cell types (Khalil et al. 2009), approximately 24 % were found to bind the polycomb repressive complex2 (PRC2) which is required for histone 3 lysine 27 trimethylation and transcriptional silencing. When additional chromatin modifying complexes were included (e.g., corepressor of RE1 silencing transcription factor (CoREST) and SCMX), this number rose to 38 % suggesting that the majority of lncRNAs may function as modulators of chromatin. Besides modifying chromatin, lncRNAs have numerous other activities; several of which are described below. Quite often, the function of a lncRNA is mediated simply by the act of transcription rather than the actual RNA transcript (Yoo et al. 2012; Latos et al. 2012). Through their ability to base pair with other RNAs, lncRNAs can also act as highly specific sequence sensors.

cis versus trans-acting A proviso, trans-acting lncRNAs act through the RNA product itself while cis-acting lncRNAs can act by two different modes. The first mode depends on the RNA, like trans-acting lncRNAs. A major chromosome-wide example is that of X chromosome inactivation by the X-inactive specific transcript (Xist) lncRNA in female mammals (reviewed in Froberg et al. 2013). Xist is expressed from one of the two X chromosomes and induces silencing of the chromosome from which it is expressed. At the other end of the spectrum is locus specific regulation, where enhancer RNAs or lncRNAs activate neighboring genes in cis, via the RNA product. The human HOTTIP lncRNA which is expressed from the human HOXA cluster is one such example,

676

discussed below. The nascent lncRNA transcript may bind to and deliver epigenetic modifiers to target genes while still attached to the elongating RNAP II or is attached to the chromosome by a specific DNA binding protein, the YY1 protein in the case of Xist. This is generally termed “tethering” and is often used to explain cis regulation by lncRNAs (reviewed in Guttman and Rinn 2012; Magistri et al. 2012). Tethering has also been proposed to act in plants. The COLDAIR lncRNA of Arabidopsis thaliana is initiated from an intron of the FLC pc gene which it silences by targeting repressive chromatin marks to the locus, so controlling flowering time (Heo and Sung 2011). The second mode of cis regulation involves the actual process of transcription. This is often referred to as transcriptional interference, where transcription of one gene represses the transcription of another in cis. Transcriptional interference can be realized by several means. Nucleosome repositioning which can negatively affect accessibility of the promoter, interference or blocking of enhancers or the promoter region, or interference with RNAP II elongation are all thought to contribute. It can also be accomplished through altering histone modifications or DNA methylation at promoters. In the case of imprinting (Mondal and Kanduri 2013), where genes are expressed in a parent of origin dependent fashion, regulation is often associated with lncRNAs from the imprinted gene regions. The Airn lncRNA silences expression of the neighboring paternal Igf2r, Slc22a2, and Slc22a3 genes. Its transcription was shown to be continuously required for Igf2r silencing, and silencing efficiency decreases when the Igf2r promoter is strongly activated. The evidence supports a model where RNAP II transcription from the interfering promoter causes a decrease in transcription initiation from the targeted promoter. The resulting DNA methylation appears to be part of reinforcing rather than initiating the silenced state (Latos et al. 2012). The reverse condition, to create a permissive chromatin environment, can also be generated from transcription of lncRNAs by either making the chromatin more open or blocking access of repressor complexes (reviewed in Kornienko et al. 2013; see Drosophila PREs in Homeotics below).

Developmental functions of lncRNAs ncRNAs in general have emerged as regulators of almost every facet of biology, and the idea that they

J.I. Horabin

regulate different aspects of embryogenesis and development is well accepted. Below the roles of lncRNAs and how they impact developmental processes are discussed. For the developmental topics such as X chromosome inactivation and imprinting which are covered in more depth by other authors, we refer the reader to the relevant articles in this issue. Regulation of RNA polymerase II The conventional view of gene activation is that the rate limiting step is the interaction of RNAP II with the promoters of genes. Studies in several systems, including human embryonic stem cells (Guenther et al. 2007) and the early Drosophila embryo (Zeitlinger et al. 2007), are challenging this view. Once thought to occur only at the heat shock genes, RNAP II pausing is now believed to be a necessary component for timely assembly of the RNAP II complex and precise regulation of expression. RNAP II is able to associate with genomic DNA at the transcription start site on an extended basis. This is achieved through a regulated cascade of progressive phosphorylation of serine residues within a heptad repeat in the C-terminal domain (CTD) of the largest RNAP II subunit (Fig. 2). This prolonged pausing has been suggested to poise genes for immediate and coordinated activation until stimulated by a differentiation or stress signal, for example. The rapidly inducible heat shock genes promoters, which were the first to be described as having paused RNAP II (Gilmour and Lis 1986), would thus represent one such response. From the perspective of ncRNAs, pausing and release utilize two abundant nuclear RNAs, U1 and 7SK, which function as cofactors of RNAP II (Fig. 2). Drosophila U1 snRNA is 164 nt long and 7SK is 444 nt long (331 nt in vertebrates), qualifying them as lncRNAs. The two nuclear RNAs are highly abundant in the chromatin fraction (Mondal et al. 2010). U1 is better known for its role in mRNA splicing; however, it has been shown to enhance CTD Ser5 phosphorylation by the TFIIH cyclin dependent kinase 7 (CDK7) in vitro (Fong and Zhou 2001). Phosphorylation of the CTD of RNAP II at serines 5 and 7 (Ser5 and Ser7) during transcription initiation is performed by TFIIH (Akhtar et al. 2009) a large multisubunit complex that contains, in addition to CDK7 kinase, the U1 snRNA (Kwek et al. 2001). Launching paused RNAP II into elongation requires the positive transcription elongation factor b (P-TEFb), a CDK9/cyclin T1 containing complex, which

lncRNAs and development

677

Fig. 2 Key molecules in the stalling and release of RNAP II. General and specific transcription factors (TF) which activate target genes promote the TFIIH, CDK7, U1 snRNA complex to phosphorylate RNAP II at Ser5 in the CTD, and poise it for transcription. Bent arrow depicts the promoter. The polymerase is held in check (paused) 20–60 base pairs downstream of the promoter by DRB sensitivity-inducing factor (DSIF) and negative elongation factor (NELF). Release requires

P-TEFb phosphorylation of Ser2 on the CTD as well as DSIF which converts DSIF to promote elongation; phosphorylated NELF is removed. P-TEFb is inactive when bound by the Hexim/7SK RNA complex until it is released by BRD4. The two noncoding RNAs are shown as green representing activating (U1 snRNA) and red inhibitory (7SK RNA). mRNA blue line, phosphorylation pink stars. General transcription factors (GTFs)

phosphorylates RNAP II on Ser2 in the CTD heptad repeats (Fig. 2). P-TEFb is negatively regulated by the bilaterian RNA, 7SK (Diribarne and Bensaude 2009). It is bound by HEXIM1 protein in a mutually exclusive and antagonistic mode in competition with vertebrate bromodomain protein 4 (BRD4). On displacement of PTEFb from the HEXIM1/7SK complex by BRD4 (Krueger et al. 2010), P-TEFb is able to phosphorylate RNAP II and promote elongation. The first evidence that release of paused RNAP II is a critical mechanism of gene regulation in Drosophila development came from the analysis of three segmentation genes, slp1, engrailed, and wingless (Wang et al. 2007). Inactive genes containing paused RNAP II are significantly overrepresented by developmental control genes. Many axis patterning and homeotic genes, tissue determinants, and components of cell signaling pathways contain RNAP II in the early embryo; some of which are only activated later in development (Zeitlinger et al. 2007; Saunders et al. 2013). During heat shock, the majority of RNAP II transcription is coordinately downregulated across the genome. A large part of this outcome is likely from RNAP III-derived lncRNAs, the mouse B2 RNA, and human Alu RNAs (178 and ~350 nt long), which have been

found in vivo to specifically occupy the promoters of repressed genes along with RNAP II and general transcription factors (Mariner et al. 2008). In vitro both the Alu and B2 lncRNAs have been shown to bind RNAP II tightly and prevent it from establishing contacts with the promoter, both upstream and downstream of the TATA box. This results in transcriptional repression on the DNA template, preventing RNAP II from properly engaging the DNA in the first stage of closed complex formation. TFIIH mediated phosphorylation of RNAP II is also repressed (Yakovchuk et al. 2011). The lncRNAs are presumably targeted to promoters as a consequence of RNAP II recruitment, but the complete mechanism is not yet elucidated. Intriguingly, removing the lncRNAs by RNAse treatment in vitro allowed restoration of the contacts with the promoter, permitting the RNAP II complexes to become transcriptionally active (Yakovchuk et al. 2009), suggesting how recovery of transcription after heat shock might occur. Altering protein translation and mRNA stability Repeated sequences such as Alu elements that are common to lncRNAs and mRNAs can create an interaction site, where lncRNAs with complementary

678

sequences bind to the mRNA and affect the half-life as well as translation of the mRNA. Staufen 1 protein (STAU1) binds to double stranded RNA. lncRNAs that contain Alu elements with half STAU1-binding sites can bind to Alu elements in the 3′ UTRs of target mRNAs and through imperfect complementarity generate an RNA duplex which completes a STAU1 binding site. STAU1 recognizes and binds to the resulting double stranded RNA elements and initiates mRNA decay (Gong and Maquat 2011). In exactly the opposite role, STAU1 protein can also enhance translation. During human epidermal differentiation, the 3.7 kb lncRNA terminal differentiationinduced ncRNA (TINCR) is induced. TINCR localizes to the cytoplasm where it interacts with STAU1 and is required for the normal induction of proteins which are key for epidermal differentiation. mRNAs which have the 25 nucleotide TINCR motif, which appears to be required for TINCR binding, have their stability promoted and TINCR interacts with a range of differentiation mRNAs (Kretz et al. 2013). Loss of proteins required for STAU1-mediated mRNA decay did not show the epidermal differentiation defect which loss of STAU1 or the TINCR lncRNA produced, suggesting a different pathway is involved. This is consistent with the poor enrichment of the Alu repeats in TINCR targets. The exact means by which STAU1 produces these opposite outcomes is not known. Cellular structural scaffolds mRNA localization coupled with translational control is a highly conserved and widespread mechanism for restricting protein expression. During development, a class of mRNAs is localized within the egg to ensure that on the appropriate developmental cue, a high level of protein is locally synthesized, and presumably also prevents its synthesis where it is not required. The underlying assumption is that the only function of the mRNA is to make protein. Studies on Drosophila and Xenopus eggs have shown, however, that besides making a protein some mRNAs also have independent structural functions. The VegT RNA has been known for many years to encode a protein necessary for establishing the primary germ layers in Xenopus (Zhang et al. 1998). It has since been shown that VegT RNA also fulfills a separate structural role in the cytokeratin network of primordial germ cells. The RNA is able to bind unpolymerized cytokeratin and promotes its polymerization. Its removal collapses

J.I. Horabin

the cytokeratin filaments and disrupts germinal granule formation (Kloc et al. 2007). In Drosophila, oskar RNA was first characterized as a determinant responsible for formation of the posterior pole plasm in the egg, and thus, formation of the abdomen and germline of the future fly (Markussen et al. 1995). It has since been found that, independent of the protein, oskar mRNA—more specifically its 3′ UTR—has a function that is essential for the completion of oogenesis (Jenny et al. 2006). Recently, Pathak et al. (2013) described transcripts of several hundred nucleotides from the AAGAG repeats of the pericentromeric regions of Drosophila which appear to be critical components of the nuclear matrix. The nuclear matrix is a non-chromatin structure made up of mostly protein and RNA molecules, although the identity of many of these is not known. While both strands of the repeats are transcribed and associate with the nuclear matrix, it is predominantly the polypurine AAGAG strand that is detected. Knockdown of the RNA resulted in nuclear defects, dispersing the nucleolus and heterochromatin, and in lethality. These data demonstrate the structural contribution of RNA to nuclear architecture and an additional utility of the repetitive part of the genome in higher eukaryotes. The stem cell fate In a study examining lncRNAs in mouse embryonic stem cells, the individual knockdown of dozens of lncRNAs was found to have major consequences on gene expression patterns (Guttman et al. 2011). The knockdowns caused either an exit from the pluripotent state or resulted in upregulation of lineage commitment programs comparable to the knockdown of well-known protein regulators of embryonic stem cells. Of note, knockdown of the lncRNAs led to largely comparable numbers of activated and repressed genes; many of the lncRNAs were found to primarily affect gene expression in trans, and ~30% of the 74 lncRNAs tested were found to physically interact with at least one of the 12 major chromatin modifying proteins examined. Patterning of the body axis by the homeotics The highly conserved polycomb group (PcG) and trithorax group (TrxG) proteins are essential for the maintenance of the identities of both stem cells and differentiated cells. The two groups of proteins work antagonistically; the TrxG primarily maintains the

lncRNAs and development

679

active state, while the PcG maintains the silenced transcriptional state through influencing the chromatin modifications of their target genes. The oldest and arguably best characterized targets of the PcG and TrxG are the homeotic genes of the Drosophila Bithorax complex (Fig. 3). In bilateria, the Hox genes regulate anterior to posterior patterning and their misexpression can cause homeotic transformations, where the complete identity of a segment is replaced

by that of another segment. Several years before the discovery of lncRNAs, it was noted that genes which determine segmental identity, in this case bithoraxoid (bxd), “curiously…do not possess any significant coding potential” (Lipshitz et al. 1987). PcG response elements (PREs) are composite DNA elements made up of different motifs that are recognized by various DNA-binding proteins. Forced transcription through intergenic PREs in the Drosophila Bithorax com-

Fig. 3 Genomic organization of Drosophila homeotic genes and mammalian Hox genes. The four human Hox complexes which are on separate chromosomes and a hypothetical ancestral homeotic complex are displayed showing their possible phylogenic relationships. Each gene is represented by a colored box. Note colinear expression position on chromosome which is in inverted order relative to body plan. The expression domains of genes are schematized in a fly and in the CNS and prevertebrae of a human fetus.

Each color is meant to show the anterior most expression domain of a given subfamily. Homeotic genes: lab labial, pb proboscipedia, Dfd Deformed, Scr Sex combs reduced, Antp Antennapedia, Ubx Ultrabithorax, abd-A abdominal-A, Abd-B Abdominal-B [Adapted by permission from Macmillan Publishers Ltd.: Pediatric Research 42, 421–429, copyright 1997 (license no. 3221390444371) and Oxford University Press (license no. 3221470153210)]

680

plex also causes homeotic transformations. These phenotypes resemble abnormalities that are caused by homeotic gene misexpression and correlate with a loss of PREmediated silencing (Bender and Fitzgerald 2002; Hogga and Karch 2002; Rank et al. 2002). Many PRE/TREs are transcribed into lncRNAs, and the spatial and temporal expression of many PRE/TRE lncRNAs correlate with the domains of expression of the target, suggesting they are under the same regulation as their cognate genes. For many of the lncRNAs in the Drosophila homeotic gene clusters, it has been suggested that transcription through the PRE is accompanied by derepression of its associated gene rather than recruitment of protein complexes by the RNA. Active transcription through the cis-acting element is thought to lead to loss of PcG protein binding, which results in an open chromatin conformation (Schmitt et al. 2005). By contrast, transcription of the bxd-PRE, which regulates the Ultrabithorax (Ubx) protein coding unit, was demonstrated to function by transcriptional interference between Ubx and the bxd lncRNAs (Petruk et al. 2006). A 92-kb long infraabdominal-8 (iab-8) transcription unit that is alternatively spliced and polyadenylated to generate multiple lncRNAs has recently been described to repress expression of the abdominal-A (abd-A) homeotic gene in the posterior CNS. The repression only occurs in cis and has been suggested to occur through transcription interference of the abd-A promoter, which lies just downstream of the iab-8 lncRNA poly(A) addition site (Gummalla et al. 2012). Both positively and negatively acting lncRNAs are utilized in regulating the Drosophila Bithorax complex. Several of the PcG and TrxG proteins bind to RNA, and recent evidence suggests that these RNA interactions are essential for targeting both groups of proteins to specific sites, modulating their effects on gene expression. This is best exemplified by the analysis of two noncoding RNAs from the vertebrate Hox clusters, HOTAIR and HOTTIP. As in Drosophila, expression of the HOX genes is activated sequentially relative to their chromosomal positions, which also faithfully reflects their position of activity along the anterior– posterior and/or proximal–distal axes. In mice and humans, the 39 Hox genes which encode homeo domain transcription factors are clustered on four different chromosomal loci (A–D; Fig. 3). HOX antisense intergenic RNA (HOTAIR) is a 2158 nucleotide spliced and polyadenylated lncRNA that is transcribed antisense to the HOXC genes. siRNAmediated knockdown of HOTAIR had little effect on

J.I. Horabin

transcription of the HOXC locus on chromosome 12, but led to dramatic transcriptional activation of the HOXD locus on chromosome 2, affecting genes over 40 kb, including HOXD8, HOXD9, HOXD10, and HOXD11 (Rinn et al. 2007). The authors also showed that the HOTAIR lncRNA binds PRC2 and is required for robust H3K27 trimethylation and transcriptional silencing of the HOXD locus. Chromatin isolation by RNA purification analyses revealed that HOTAIR occupancy occurs independently of the H3K27 methylase of PRC2, EZH2, suggesting that the RNA can guide PRC2 recruitment and specify formation and spread of a repressive environment (Chu et al. 2011). The HOXA transcript at the distal tip (HOTTIP) noncoding RNA is 3764 nucleotide long and is expressed from the 5′ end of the HoxA locus. It drives histone H3 lysine 4 trimethylation and transcription of the HoxA distal genes through recruiting the WDR5/MLL histone modifier complex to the promoters of flanking genes (WD repeat containing protein 5/mixed lineage leukemia protein, the vertebrate ortholog of Drosophila Trx; Wang et al. 2011). Endogenous HOTTIP is brought to its target genes by chromosomal looping and, affirming its cisacting nature, ectopic HOTTIP only activates transcription when it is artificially tethered to a reporter gene (Wang et al. 2011). The MLL H3K4 methylation complex is also recruited to the Hox locus by the noncoding RNA Mistral, a 798 nucleotide unspliced polyadenylated lncRNA, located between Hoxa6 and Hoxa7. Mistral directly interacts with MLL1, leading to changes at the chromatin level that activate Hoxa6 and Hoxa7 (Bertani et al. 2011). Dosage compensation To correct for the gene dose difference when the sex chromosomes are vastly different, organisms which have this disparity frequently utilize a dosage compensation system. Flies use the strategy of hyper-transcribing the single male X chromosome rather than silencing one of the two females X chromosomes, as seen in most mammals. Besides five key proteins, Drosophila dosage compensation depends on two X-linked lncRNAs—roX1 and roX2 (RNA on the X 1 and 2) which are, respectively, ~3700 and ~570 nucleotides long (splice variants give alternative transcripts). As it is male specific, the dosage compensation complex (DCC) is also referred to as the male specific lethal (MSL) complex, which describes the phenotype of loss of function mutations in some of the dedicated DCC proteins [MSL1, 2 and 3, male lethal

lncRNAs and development

(MLE) and males absent on the first (MOF)] as well as roX1, roX2 double mutants. The lethality is from reduced expression of X-linked genes. Limiting the DCC to males is accomplished by the master regulator of sex determination, Sex-lethal, which is female specific and inhibits the translation of msl2 mRNA (Kelley et al. 1997). In the absence of MSL2 protein, the DCC fails to assemble, as is the case when the roX RNAs or MSL1 are absent. The DCC coats the entire male X chromosome and elevates expression of most of the actively transcribed genes by approximately twofold. This binding also results in an enrichment of acetylated histone H4 lysine 16, a mark associated with ongoing transcription and the product of MOF (Hilfiker et al. 1997; Akhtar and Becker 2000). The RING finger protein MSL2 is an ubiquitin E3 ligase which ubiquitinates histone H2B lysine 34. This mark is thought to stimulate histone H3 K4 and K79 methylation through trans-tail cross talk and facilitate transcription (Wu et al. 2011). Curiously, the entire process also results in localizing the male X chromosome to the nuclear periphery. Even though it is inactive, the mammalian X chromosome also localizes near the nuclear envelope as the Barr body. The redundantly acting roX1 and roX2 lncRNAs (Meller and Rattner 2002) are required for the assembly and activity of the DCC and, unlike Xist in mammals, are able to function both in cis (their normal source sites) and in trans (when encoded on transgenes inserted on the autosomes) to target the DCC to the X chromosome. In addition to generating the roX RNAs, the roX loci function in cis as entry sites for the DCC (Meller and Rattner 2002; Stuckenholz et al. 2003). The Drosophila X chromosome contain hundreds of DCC binding sites with varying affinities which are thought to nucleate entry and facilitate spreading over the entire chromosome (Alekseyenko et al. 2008; Straub et al. 2008). Spreading is known to be dependent on the MLE helicase activity (Morra et al. 2011), but other than at the roX loci where the complex assembles on the newly synthesized transcripts, how the DCC entry sites are recognized and the mechanism of spreading remain to be determined.

Closing perspectives It is clear that lncRNAs can have quite potent roles in regulating gene expression and development. Their arenas of action are diverse, and in some cases, it is the

681

process of transcription and not even the lncRNA that is relevant. These multiple and varied functions provide a diversity of regulatory potential but also present a challenge in identifying their functions and connecting them with known networks, and while the numbers of identified lncRNAs are on the rise, their levels of expression are generally low. HOTTIP, for example, was quantified at 0.3 copies per cell (Wang et al. 2011). A common starting point to determine the function of a lncRNA has been RNA interference-mediated knockdown. This approach has generally been successful but has also exposed drawbacks. Depletion studies in cell lines have shown differences when compared to genetic studies in the organism. The most notable are mice carrying genetically disrupted alleles of NEAT1 (Nakagawa et al. 2011), MALAT1 (Eissmann et al. 2012; Nakagawa et al. 2012; Zhang et al. 2012), and HOTAIR (Schorderet and Duboule 2011) which failed to reproduce the phenotypes suggested by cell lines. NEAT1 is involved in nuclear retention of mRNAs and is also required for the formation of interchromatin paraspeckles, while MALAT1 sequesters serine/arginine–rich splicing factors to nuclear speckles, subnuclear bodies that were thought to be essential. Redundancy, in vivo compensation or off target effects may be responsible for the observations in cell lines highlighting the need for analysis in the organism, and caution. The ability to base pair with other RNAs as well as DNA potentially allows lncRNAs to act as highly specific sensors. This facet is readily apparent in how they might regulate mRNA translation or stability. But as chromatin modifiers, the underlying mechanism by which the trans-targeting occurs remains enigmatic. Even with the best characterized lncRNAs, it is unclear whether the RNA itself selects its target sites and acts as a guide for interacting protein complexes or whether it is the DNA binding specificity of the interacting proteins that selects the target sites. It is also possible that the local chromatin conformation has a role in target selection. The contribution of triplexes of RNA: DNA is an attractive one to solve this mystery; however, triplexes have yet to be demonstrated in vivo (Buske et al. 2012). Perhaps it is the utility of flexibility and length which gives lncRNAs a functional advantage. They have the potential to encode multiple docking sites for effector proteins, frequently as stem loop structures which are used, for example, by PcG proteins (Maenner et al. 2010) and MLE/MSL2 to bind the

682

J.I. Horabin

roX RNAs (Ilik et al. 2013; Maenner et al. 2013), structures that can be altered as needed by helicases. Length in the lncRNA may give both reach and flexibility, which may explain the retention of RNAs in large catalytic complexes such as the ribosome and spliceosome. Indeed, sequence alignments show a high number of correlated positions between lncRNAs appear to be maintained, which supports the hypothesis that lncRNAs are under selective pressure to maintain a functional RNA structure (Derrien et al. 2012). On this note, it is worth highlighting that a comparison between mammalian and zebrafish lncRNAs showed short stretches of conserved sequence which have more positional than sequence conservation. Despite this lack of strong sequence conservation, the lncRNAs between the different vertebrate species were functionally interchangeable (Ulitsky et al. 2011). In genome wide association studies, almost half of trait-associated single nucleotide polymorphisms have been identified in intergenic sequences (Hindorff et al. 2009) and there is increasing connection of lncRNAs with cancers, neurodegeneration, and other disease conditions (reviewed in Tang et al. 2013). Genome regulation has a new player that deserves our attention. Acknowledgments I would like to thank the Biomedical Sciences Department, College of Medicine at Florida State University for the financial support and the members of my lab for comments on the manuscript. Conflict of interest

The author declares no conflict of interest.

References Akhtar A, Becker PB (2000) Activation of transcription through histone H4 acetylation by MOF, an acetyltransferase essential for dosage compensation in Drosophila. Mol Cell 5:367–375 Akhtar MS et al (2009) TFIIH kinase places bivalent marks on the carboxy-terminal domain of RNA polymerase II. Mol Cell 34:387–393 Alekseyenko AA et al (2008) A sequence motif within chromatin entry sites directs MSL establishment on the Drosophila X chromosome. Cell 134:599–609 Banfai B et al (2012) Long noncoding RNAs are rarely translated in two human cell lines. Genome Res 22:1646–1657 Bender W, Fitzgerald D (2002) Transcription activates repressed domains in the Drosophila bithorax complex. Development 129:4923–4930 Bertani S et al (2011) The noncoding RNA Mistral activates Hoxa6 and Hoxa7 expression and stem cell differentiation by recruiting MLL1 to chromatin. Mol Cell 43:1040–1046

Buske FA et al (2012) Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Gen Res 22:1372– 138 Cabili MN et al (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25:1915–1927 Carninci P, Yasuda J, Hayashizaki Y (2008) Multifaceted mammalian transcriptome. Curr Opin Cell Biol 20:274–280 Chu C et al (2011) Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol Cell 44:667–678 Clark MB et al (2011) The reality of pervasive transcription. PLoS Biol 9:e1000625 Darrow EM, Chadwick BP (2013) Boosting transcription by transcription: enhancer associated transcripts. Chromosome Res. doi:10.1007/s10577-013-9384-6 Derrien T et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22:1775–1789 Dieci G et al (2007) The expanding RNA polymerase III transcriptome. Trends Genet 23:614–622 Dinger ME et al (2008) Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLOS Comput Biol 4:e1000176 Dinger ME, Gascoigne DK, Mattick JS (2011) The evolution of RNAs with multiple functions. Biochimie 93:2013–2018 Diribarne G, Bensaude O (2009) 7SK RNA, a non-coding RNA regulating P-TEFb, a general transcription factor. RNA Biol 6:122–128 Djebali S et al (2012) Landscape of transcription in human cells. Nature 489:101–108 Eissmann M et al (2012) Loss of the abundant nuclear noncoding RNA MALAT1 is compatible with life and development. RNA Biol 2012(9):1076–1087 Fong YW, Zhou Q (2001) Stimulatory effect of splicing factors on transcriptional elongation. Nature 414:929–933 Froberg JE, Yang L, Lee JT (2013) Guided by RNAs: Xinactivation as a model for lncRNA function. J Mol Biol 425:3698–3706 Gilmour DS, Lis JT (1986) RNA polymerase II interacts with the promoter region of the noninduced hsp70 gene in Drosophila melanogaster cells. Mol Cell Biol 6:3984–3989 Gong C, Maquat LE (2011) lncRNAs transactivate STAU1mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470:284–288 Guenther MG et al (2007) A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130:77–88 Gummalla M et al (2012) abd-A Regulation by the iab-8 Noncoding RNA. PLoS Genet 8:e1002720 Guttman M et al (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458:223–227 Guttman M et al (2011) lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477:295–306 Guttman M, Rinn JL (2012) Modular regulatory principles of large non-coding RNAs. Nature 482:339–346 Haerty W, Ponting CP (2013) Mutations within lncRNAs are effectively selected against in fruitfly but not human. Genome Biol 14:R49 Hangauer MJ, Vaughn IW, MacManus MT (2013) Pervasive transcription of the human genome produces thousands of

lncRNAs and development previously unidentified long intergenic noncoding RNAs. PLoS Genet 9, e1003569 Heo JB, Sung S (2011) Vernalization-mediated epigenetic silencing by a long intronic noncoding RNA. Science 331:76–79 Hilfiker A et al (1997) mof, a putative acetyl transferase gene related to the Tip60 and MOZ human genes and to the SAS genes of yeast, is required for dosage compensation in Drosophila. EMBO J 16:2054–2060 Hindorff LA et al (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106:9362–9367 Hogga I, Karch F (2002) Transcription through the iab-7 cisregulatory domain of the bithorax complex interferes with maintenance of Polycomb-mediated silencing. Development 129:4915–4922 Ilik et al (2013) Tandem stem-loops in roX RNAs act together to mediate X chromosome dosage compensation in Drosophila. Mol Cell 51:156–173 Ingolia NT, Lareau LF, Weissman JS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147:789–802 Jenny A et al (2006) A translation-independent role of oskar RNA in early Drosophila oogenesis. Development 133:2827–2833 Kapranov P et al (2005) Examples of the complex architecture of the human transcriptome revealed by RACE and highdensity tiling arrays. Genome Res 15:987–997 Kapranov P et al (2007) RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316:1484–1488 Katayama S et al (2005) Antisense transcription in the mammalian transcriptome. Science 309:1564–1566 Kelley RL et al (1997) Sex lethal controls dosage compensation in Drosophila by a non-splicing mechanism. Nature 387:195–199 Khalil AM et al (2009) Many human large intergenic noncoding RNAs associate with chromatin modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 106:11667– 11672 Kloc M, Bilinski S, Dougherty MT (2007) Organization of cytokeratin cytoskeleton and germ plasm in the vegetal cortex of Xenopus laevis oocytes depends on coding and non-coding RNAs: three dimensional and ultrastructural analysis. Exp Cell Res 313:1639–e1651 Kornienko AE et al (2013) Gene regulation by the act of long non-coding RNA transcription. BMC Biology 11:59 Kretz M et al (2013) Control of somatic tissue differentiation by the long non-coding RNA TINCR. Nature 493:231–235 Krueger BJ et al (2010) The mechanism of release of P-TEFb and HEXIM1 from the 7SK snRNP by viral and cellular activators includes a conformational change in 7SK. PLoS ONE 5, e12335 Kwek KY et al (2001) U1 snRNA associates with TFIIH and regulates transcriptional initiation. Nat Struct Biol 9:800–805 Latos PA et al (2012) Airn transcriptional overlap, but not its lncRNA products, induces imprinted Igf2r silencing. Science 338:1469–1472 Lin MF, Jungreis I, Kellis M (2011) PhyloCSF: a comparative genomics method to distinguish protein coding and noncoding regions. Bioinformatics 27:i275–i282 Lipshitz HD, Peattie DA, Hogness DS (1987) Novel transcripts from the ultrabithorax domain of the bithorax complex. Genes Dev 1:307–322

683 Magistri M et al (2012) Regulation of chromatin structure by long noncoding RNAs: focus on natural antisense transcripts. Trends Genet 28:389–396 Maenner S et al (2010) 2-D structure of the A region of Xist RNA and its implication for PRC2 association. PLoS Biol 8:e1000276 Maenner S et al (2013) ATP-dependent roX RNA Remodeling by the Helicase maleless enables specific association of MSL proteins. Mol Cell 51:174–184 Mariner PD et al (2008) Human Alu RNA is a modular transacting repressor of mRNA transcription during heat shock. Mol Cell 29:499–509 Markussen FH et al (1995) Translational control of oskar generates short OSK, the isoform that induces pole plasma assembly. Development 121:3723–3732 Marques AC, Ponting CP (2009) Catalogues of mammalian long noncoding RNAs: modest conservation and incompleteness. Genome Biol 10:R124 Meller VH, Rattner BP (2002) The roX genes encode redundant male-specific lethal transcripts required for targeting of the MSL complex. EMBO J 21:1084–1091 MODencode Consortium (2010) Identification of functional elements and regulatory circuits by Drosophila MODencode. Science 330:1787–1797 Mondal T et al (2010) Characterization of the RNA content of chromatin. Genome Res 20:899–907 Mondal T, Kanduri C (2013) Maintenance of epigenetic information: A noncoding RNA Perspective. Chromosome Res. doi:10.1007/s10577-013-9385-5 Morra R et al (2011) Role of the ATPase/helicase maleless (MLE) in the assembly, targeting, spreading and function of the male-specific lethal (MSL) complex of Drosophila. Epigenetics Chromatin 4:6 Nakagawa S et al (2011) Paraspeckles are subpopulation-specific nuclear bodies that are not essential in mice. J Cell Biol 193:31–39 Nakagawa S et al (2012) Malat1 is not an essential component of nuclear speckles in mice. RNA 18:1487–1499 Ørom UA et al (2010) Long noncoding RNAs with enhancerlike function in human cells. Cell 143:46–58 Pathak RU et al (2013) AAGAG repeat RNA is an essential component of nuclear matrix in Drosophila. RNA Biol 10:564–571 Petruk S et al (2006) Transcription of bxd noncoding RNAs promoted by trithorax represses Ubx in cis by transcriptional interference. Cell 127:1209–1221 Ponjavic J, Ponting CP, Lunter G (2007) Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res 17:556–565 Ponting CP, Oliver PL, Reik W (2009) Evolution and functions of long noncoding RNAs. Cell 136:629–641 Ponting CP, Belgard TG (2010) Transcribed dark matter: meaning or myth? Hum Mol Genet 19:R162–R168 Rank G, Prestel M, Paro R (2002) Transcription through intergenic chromosomal memory elements of the Drosophila bithorax complex correlates with an epigenetic switch. Mol Cell Biol 22:8026–8034 Rinn JL et al (2003) The transcriptional activity of human Chromosome 22. Genes Dev 17:529–540 Rinn JL et al (2007) Functional demarcation of active and silent chromatin domains in human HOX Loci by noncoding RNAs. Cell 129:1311–1323

684 Ravasi T et al (2006) Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res 16:11–19 Saunders A et al (2013) Extensive polymerase pausing during Drosophila axis patterning enables high-level and pliable transcription. Genes Dev 27:1146–1158 Schmitt S, Prestel M, Paro R (2005) Intergenic transcription through a polycomb group response element counteracts silencing. Genes Dev 19:697–708 Schorderet P, Duboule D (2011) Structural and functional differences in the long non-coding RNA hotair in mouse and human. PLoS Genet 7:e1002071 Stuckenholz C, Meller VH, Kuroda MI (2003) Functional redundancy within roX1, a noncoding RNA involved in dosage compensation in Drosophila melanogaster. Genetics 164:1003–1014 Straub T et al (2008) The chromosomal high-affinity binding sites for the drosophila dosage compensation complex. PLoS Genet 4:e1000302 Tang et al (2013) Long noncoding RNAs-related diseases, cancers, and drugs. Scientific World Journal 2013:943539 Ulitsky et al (2011) Conserved function of lincRNAs in vertebrate embryonic development despite. Rapid Sequence Evolution Cell 147:1537–1550 Wang X et al (2007) Transcription elongation controls cell fate specification in the Drosophila embryo. Genes Dev 21:1031–1036 Wang KC et al (2011) A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472:120–124

J.I. Horabin Wilusz JE, Freier SM, Spector DL (2008) 3′ end processing of a long nuclear retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 135:919–932 Wu L et al (2011) The RING finger protein MSL2 in the MOF complex is an E3 ubiquitin ligase for H2B K34 and is involved in crosstalk with H3 K4 and K79 methylation. Mol Cell 43:132–144 Yakovchuk P, Goodrich JA, Kugel JF (2009) B2 RNA and Alu RNA repress transcription by disrupting contacts between RNA polymerase II and promoter DNA within assembled complexes. Proc Natl Acad Sci U S A 106:5569–5574 Yakovchuk P, Goodrich JA, Kugel JF (2011) B2 RNA represses TFIIH phosphorylation of RNA polymerase II. Transcription 2:45–49 Yin QF et al (2012) Long noncoding RNAs with snoRNA ends. Mol Cell 48:219–230 Yoo EJ, Cooke NE, Liebhaber SA (2012) An RNA-independent linkage of noncoding transcription to long-range enhancer function. Mol Cell Biol 32:2020–2029 Young RS et al (2012) Identification and properties of 1119 lincRNA loci in the Drosophila melanogaster genome. Genome Biol Evol 4:427–442 Zeitlinger J et al (2007) RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet 39:1512–1516 Zhang J et al (1998) The role of maternal VegT in establishing the primary germ layers in Xenopus embryos. Cell 94:515–524 Zhang B et al (2012) The lncRNA Malat1 is dispensable for mouse development but its transcription plays a cisregulatory role in the adult. Cell Rep 2:111–123

Long noncoding RNAs as metazoan developmental regulators.

The study of long noncoding RNAs (lncRNAs) is still in its infancy with more putative RNAs identified than those with ascribed functions. Defined as t...
407KB Sizes 0 Downloads 0 Views