PERSPECTIVES OPINION

Pervasive transcription: illuminating the dark matter of bacterial transcriptomes Joseph T. Wade and David C. Grainger

Abstract | The conventional view of transcription posits that mRNAs are generated from the coding DNA strand and are delineated by gene boundaries; however, recent reports have mapped transcription start sites to unexpected locations in bacterial genomes, including the non-coding strand. The resultant RNAs were previously dismissed as artefacts, but models that describe such events as ‘pervasive transcription’ are now gaining support. In this Opinion article, we discuss our current understanding of pervasive transcription, its genetic origin and its regulation. On the basis of existing observations, we propose that RNAs that result from pervasive transcription are more than ‘transcriptional noise’ and have important functions in gene regulation and genome evolution. The identification of genes as discrete packages of genetic information has influenced all subsequent models of transcription and its control1. As such, besides ribosomal RNAs and tRNAs, the vast majority of described RNAs are mRNAs, which contain an ORF, a 5ʹ UTR and a 3ʹ UTR. Importantly, these mRNAs rarely overlap with one another1. Over the past 10 years, advances in nucleic acid sequencing have facilitated the analysis of transcriptomes in unprecedented depth and in an unbiased manner (BOX 1). The results have been controversial and exciting in equal measure; across all domains of life, an abundance of non-canonical transcripts has been detected in the transcriptome1–5, which are easily distinguished as they are non-coding, are not demarked by gene boundaries and are frequently antisense2–5. As the occurrence of these unusual transcripts is widespread, this form of transcription has been described as ‘pervasive’ (REF. 2). Thus, transcription is not restricted by the position of annotated protein-coding genes but can initiate in almost any genomic context. Unlike mRNAs, pervasive transcripts rarely have an assigned function and often occur within genes, which results in

a non-coding RNA (ncRNA) that can be either sense or antisense with respect to the overlapping gene2–5. As antisense RNAs (asRNAs) have perfect complementarity to sense transcripts, they have the potential to interfere with gene expression; however, whether pervasive transcription is simply ‘biological noise’ or whether it actually has distinct functional roles remains an open question6–9. Given this uncertainty, the term transcriptional ‘dark matter’ has been used to describe the resultant transcripts8,9. The discovery of pervasive transcription as a widespread phenomenon in diverse organisms has been facilitated by technological advances (BOX 1). In particular, approaches that enable transcriptomes to be analysed in an unbiased manner, independently of genome annotation, which include high-density microarrays and nextgeneration sequencing, have been essential and provided the earliest reports of pervasive transcription in animals3, yeast4 and bacteria5. However, despite their autonomy, such reports of transcription beyond the constraints of genome annotation were controversial and were often dismissed as experimental artefacts or biological noise6–9;

NATURE REVIEWS | MICROBIOLOGY

for example, it has been suggested that contamination of RNA sequencing (RNA-seq) experiments with genomic DNA might generate a background of sequencing reads that could be mistaken for RNA transcripts6. Alternatively, it has also been suggested that poor probe specificity in DNA microarray experiments may have resulted in artefactual transcript signals9. However, more than 10 years after the first reports of pervasive transcription, the phenomenon has been identified in bacteria as diverse as Escherichi­a coli, Helicobacter pylori, Synechocystis sp. PCC6803 and Staphylococcus aureus10–14. Importantly, pervasive transcription has been confirmed by several independent studies in some of these organisms; for example, in E. coli, pervasive transcription has been detected using DNA microarraybased transcriptomics5, promoter–reporter fusion assays15, RNA-seq10, genome-scale mapping of sigma factor (σ factor) binding16–19, in vitro transcription assays19 and genomic SELEX (systematic evolution of ligands by exponential enrichment)20. Moreover, levels of pervasive transcription are sensitive to global regulatory systems, which argues against a purely artefactual origin14,19. Thus, the debate should now move from discussing whether such transcripts exist to determining why they occur, how they are regulated and what their function, if any, might be. In this respect, single-cell RNAseq approaches are likely to provide valuable information (BOX 1); current RNA-seq data provide only a time- and population-averaged profile of transcription, whereas single-cell approaches have the potential to show how frequently non-canonical transcripts are produced and whether they are present in the same cell as overlapping mRNAs. Current evidence suggests that bacteria use several complementary mechanisms to suppress pervasive transcription14,19. Although this indicates that the phenomenon negatively affects cell fitness, possible functions have started to emerge in eukaryotes, in which pervasive transcription was recognized much earlier21; for example, it has been suggested that pervasive transcripts might mediate long-range chromatin interactions and have a role in transcriptional interference21. In this Opinion article, we VOLUME 12 | SEPTEMBER 2014 | 647

© 2014 Macmillan Publishers Limited. All rights reserved

PERSPECTIVES Box 1 | Pervasive transcription revealed by genome-scale approaches The emergence of genomics has facilitated in‑depth analyses of bacterial transcriptomes, and several methods have led to the identification of pervasive transcription and/or promise further insights in the coming years. High-density microarrays High-density DNA microarrays enable unbiased probing of bacterial transcriptomes. Briefly, cDNAs that are labelled with fluorescent dyes hybridize to complementary sequences that are immobilized on the array. In tiling arrays (that is, arrays in which all genomic positions are covered by at least one DNA probe), all parts of the chromosome can be studied16. Thus, the phenomenon of widespread antisense transcription in bacteria was first observed using hybridization of labelled cDNA to high-density microarrays5, as were many specific examples of antisense RNAs (asRNAs)16. Although this method has led to the identification of many novel RNAs, they have limited resolution and sensitivity. Strand-specific RNA-seq Next-generation sequencing methods have mostly replaced microarrays as the primary technique for interrogating genomes. This is primarily because sequencing provides comparable information to microarrays but with greater resolution and sensitivity. There are various methods for constructing strand-specific RNA-seq libraries, each of which involves preparing cDNA from ribosomal RNA-depleted cellular RNA extracts. These cDNA libraries can be sequenced using any of a range of next-generation sequencing platforms, which has led to the identification of many asRNAs10,11. Furthermore, the application of RNA-seq to mutants that lack RNases has led to the discovery of an even wider range of antisense transcripts14,42. However, analysis of whole transcriptomes is not sufficiently sensitive to detect all pervasive transcription, and it is also challenging to identify intragenic transcripts that are in the same orientation as overlapping genes. These issues can be circumvented by the use of transcription start site (TSS) mapping. Mapping TSSs Two variations of RNA-seq enable the identification of TSSs, which are the sequences that define where transcription initiates. Both methods rely on discriminating between primary (that is, unprocessed) transcripts and transcripts that have been processed (that is, cleaved). The first method, known as differential RNA-seq (dRNA-seq), uses an exonuclease that specifically degrades processed RNA transcripts, which have monophosphorylated 5ʹ ends11. The RNA-seq profiles of exonuclease-treated and untreated libraries are compared, which facilitates the identification of primary transcripts that have triphosphorylated 5ʹ ends, as processed transcripts are depleted by the exonuclease. Thus, the positions of TSSs, which correspond to the triphosphorylated 5ʹ ends, are easily mapped. The second method also distinguishes between monophosphorylated and triphosphorylated 5ʹ ends. Again, RNA-seq is applied to two libraries, one of which is treated with a phosphorylase that converts triphosphorylated 5ʹ ends to monophosphorylated 5ʹ ends19,25. TSS mapping is highly sensitive, has single-nucleotide resolution and can easily identify transcripts that initiate within genes in either orientation. Thus, TSS mapping has proven to be the most powerful tool for providing insights into pervasive transcription in bacteria. ChIP–chip and ChIP-seq of sigma factors The three methods described above rely on the detection of RNA. An alternative approach is to map the binding of initiating RNA polymerase (RNAP) across the chromosome using chromatin immunoprecipitation followed by microarray (ChIP–chip) or ChIP followed by sequencing (ChIP– seq). These techniques involve the enrichment of RNAP–DNA complexes from cell lysates by immunoprecipitation, and the isolated DNA is then characterized by hybridization to a DNA microarray (using ChIP–chip) or by next-generation sequencing (using ChIP–seq). The isolation of initiating RNAP can be achieved by artificially trapping RNAP at promoters using the antibiotic rifampicin56,57 or by using antibodies that are specific for sigma factors (σ factors), which are the accessory proteins that only associate with RNAP during promoter binding. Such approaches have identified many intragenic promoters that are targeted by the housekeeping σ factor (σ70) of Escherichia coli16,19,56,57 as well as the alternative σ factor σ32 (REF. 17). One drawback of this approach for the identification of intragenic promoters is that the exact promoter sequence is needed to determine the direction of transcription. In addition, the resolution of these methods is lower than for RNA-seq methods, and especially TSS mapping, although ChIP–seq resolution is typically

Pervasive transcription: illuminating the dark matter of bacterial transcriptomes.

The conventional view of transcription posits that mRNAs are generated from the coding DNA strand and are delineated by gene boundaries; however, rece...
774KB Sizes 0 Downloads 7 Views