ARTICLE IN PRESS

G Model PSL 8856 1–12

Plant Science xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Plant Science journal homepage: www.elsevier.com/locate/plantsci

1

Review

2

Regulation of plant translation by upstream open reading frames

3

Q1

Albrecht G. von Arnim a,b,∗ , Qidong Jia b , Justin N. Vaughn a,1

4

Q2

a

5

b

Department of Biochemistry, Cellular and Molecular Biology, The University of Tennessee, Knoxville, TN 37996-0840, United States Graduate School of Genome Science and Technology, The University of Tennessee, Knoxville, TN 37996-0840, United States

6

7

a r t i c l e

i n f o

a b s t r a c t

8 9 10 11 12

Article history: Received 17 July 2013 Received in revised form 8 September 2013 Accepted 10 September 2013 Available online xxx

13

20

Keywords: Protein synthesis Translation initiation factor Upstream open reading frame Sensor Development Arabidopsis thaliana

21

Contents

14 15 16 17 18 19

22

1.

23 24 25 26

2.

27 28 29

3.

30 31 32

4.

33 34 35 36 37 38

5.

We review the evidence that upstream open reading frames (uORFs) function as RNA sequence elements for post-transcriptional control of gene expression, specifically translation. uORFs are highly abundant in the genomes of angiosperms. Their negative effect on translation is often attenuated by ribosomal translation reinitiation, a process whose molecular biochemistry is still being investigated. Certain uORFs render translation responsive to small molecules, thus offering a path for metabolic control of gene expression in evolution and synthetic biology. In some cases, uORFs form modular logic gates in signal transduction. uORFs thus provide eukaryotes with a functionality analogous to, or comparable to, riboswitches and attenuators in prokaryotes. uORFs exist in many genes regulating development and point toward translational control of development. While many uORFs appear to be poorly conserved, and the number of genes with conserved-peptide uORFs is modest, many mRNAs have a conserved pattern of uORFs. Evolutionarily, the gain and loss of uORFs may be a widespread mechanism that diversifies gene expression patterns. Last but not least, this review includes a dedicated uORF database for Arabidopsis. © 2013 Published by Elsevier Ireland Ltd.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1. Definitions and early cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. Types of uORFs and their distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3. How does the ribosome get past uORFs: Leaky scanning, shunting, and reinitiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How does the ribosome engage with uORFs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Elongation on CPuORFs with inhibitory peptides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Termination and reinitiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . uORFs as regulators of metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Regulation of polyamine metabolism by uORFs and polyamine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Regulation of carbohydrate metabolism by uORFs and sucrose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . uORFs mediate developmental gene regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. uORFs and translation reinitiation modulate the auxin response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. The development of leaf dorsoventral polarity is sensitive to defects in translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and hypotheses to guide future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Supplementary data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Abbreviations: AGI#, Arabidopsis gene identifier; CaMV, cauliflower mosaic virus; eIF, eukaryotic translation initiation factor; mORF, major open reading frame; NMD, nonsense mediated decay; RPL, ribosomal protein of the large subunit; RPS, ribosomal protein of the small subunit; uORF, upstream open reading frame; CPuORF, conserved peptide uORF; UTR, untranslated region. ∗ Corresponding author at: Department of Biochemistry, Cellular and Molecular Biology, The University of Tennessee, Knoxville, TN 37996-0840, United States. Q3 Tel.: +1 865 974 6206. E-mail addresses: [email protected] (A.G. von Arnim), [email protected] (Q. Jia), [email protected] (J.N. Vaughn). 1 Current address: Department of Genetics, University of Georgia, Athens, GA 30602-7223, United States. 0168-9452/$ – see front matter © 2013 Published by Elsevier Ireland Ltd. http://dx.doi.org/10.1016/j.plantsci.2013.09.006

Please cite this article in press as: A.G. von Arnim, et al., Regulation of plant translation by upstream open reading frames, Plant Sci. (2013), http://dx.doi.org/10.1016/j.plantsci.2013.09.006

G Model PSL 8856 1–12

ARTICLE IN PRESS A.G. von Arnim et al. / Plant Science xxx (2013) xxx–xxx

2

39

1. Introduction

40

1.1. Definitions and early cases

75

Upstream open reading frames (uORFs) are protein coding regions in mRNAs that lie upstream of the main protein coding region, i.e. in the 5 untranslated region of the mRNA (Fig. 1). Although counterintuitive, the so-called 5 untranslated regions (5 UTR) of mRNAs are often partially translated. According to Kozak’s scanning model of translation initiation, the ribosome scans the mRNA from the 5 cap and engages at the first AUG triplet that it encounters. If the first AUG is the start codon of a uORF, it typically reduces the efficiency of translation of the main coding region of the mRNA (major open reading frame or mORF) [1,2]. However, uORFs usually do not eliminate translation altogether. This was shown in planta with the maize transcription factors, R/Lc and Opaque-2 [3,4]. One of the earliest and most unconventional uORF-based translational control systems was discovered in the pararetrovirus, cauliflower mosaic virus (CaMV)[5,6]. Briefly, the long CaMV 5 leader contains six uORFs (Supplemental Fig. 1) and a long hairpinloop structure downstream from the first uORF. A ribosome that has translated the first short uORF is competent to scan past the hairpin without unfolding it, an event called shunting. The shunting mechanism ensures that a specific region of the 5 UTR remains free of ribosomes. This region contains an RNA encapsidation signal that binds to the CaMV coat protein [6,7]. It is notable that these viruses do not use an internal ribosome entry mechanism that maintains their 5 UTR free of ribosomes. Internal ribosome entry sites are RNA sequence elements commonly used by metazoan viruses that direct ribosomes to specific sites adjacent to their translation start codons and circumvent cap-dependent translation initiation. Until recently, shunting was thought to be a peculiarity of the pararetroviruses such as CaMV and the related pararetrovirus, rice tungro bacilliform virus. However, uORF-stimulated shunting was recently discovered in a picorna-like RNA virus (rice tungro spherical virus). Because the two rice viruses coexist together in the same host, it seems very likely that the RNA virus may have acquired the shunting mechanism by cohabitation with rice tungro bacilliform virus [8].

76

1.2. Types of uORFs and their distribution

41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

The preponderance of uORFs is clearly biased with respect to gene function. Highly expressed mRNAs such as those of many housekeeping genes tend to have short 5 UTRs that are devoid of uORFs. Poorly expressed mRNAs such as mRNAs for transcription factors and kinases often have longer 5 UTRs and are rich in uORFs [9]. This feature is pan-eukaryotic and was first observed in 1987 [10]. These results suggest that the presence or absence of uORFs is most likely adaptive. However, not many uORFs have been directly examined for their functional significance at the whole plant level. Only in a handful of cases has the mutation of the uORF revealed significant growth defects [11–13]. In addition, very few uORFs were discovered by classical forward genetic analysis, that is, because a mutation in the uORF altered the phenotype of the plant [12,14]. Some uORFs overlap the major open reading frame of the mRNA (major ORF). A recent study in yeast concluded that, among all the possible mutations in a gene’s 5 upstream region, mutations that cause uORFs to overlap with the major ORF have the most dramatic inhibitory effect on gene expression [15]. uORFs are classified along evolutionary lines. For a fairly small fraction of uORFs the peptide sequence is noticeably conserved in evolution (Conserved peptide uORFs or CPuORFs). In these cases, the peptide sequence is key for translational repression. Several surveys on CPuORFs have been published [16–19]. CPuORFs have rightfully been assigned their own gene identifiers (AGI numbers)

in Arabidopsis. In some cases, and in keeping with similar CPuORF peptides in fungi [20], the conserved peptide is hypothesized to stall the ribosome as a nascent peptide while located in the ribosome exit tunnel, thus blocking the progression of upstream ribosomes or suppressing reinitiation [11,21]. In other cases, the uORF peptide may exert its function after it has been released from the ribosome. Only two such cases are known. In one case, the peptide binds to the mRNA and destabilizes it [22]. In another case, the synthesized uORF peptide can inhibit translation when added to an in vitro translation system [23]. CPuORFs are currently sorted into more than 30 homology groups that are spread over more than 79 Arabidopsis genes [17,19,24], while up to 150 cereal genes are now estimated to have CPuORFs [18]. Most CPuORFs are conserved between monocots and dicots [16–18,25]. Lineage specific gain or loss of CPuORFs is uncommon [19], but it does occur. Arabidopsis often has uORF features different from those of other dicots [25]. For example, the highly conserved CPuORF in the mRNA for ribosomal protein S6 kinase has lost its AUG in the Brassicaceae lineage, and has been replaced by a different uORF in a different frame. About 35% of Arabidopsis genes give rise to a uORF-containing mRNA, and about half of these have multiple uORFs [9]. Other plant species have similar fractions of transcripts with uORFs (Fig. 2A). The Arabidopsis genome encodes more than 20,000 uORFs, almost as many as major ORFs (Supplemental Data File S1). Because the AUG triplet is only slightly underrepresented in Arabidopsis 5 UTRs [9], the number of uORFs is only slightly lower than predicted by chance alone. However, longer uORFs are more underrepresented than shorter ones (Fig. 2B). The CPuORFs represent only a small fraction of all uORFs. However, the uORFs that do not qualify as CPuORFs (nonCPuORFs) commonly influence the level of gene expression of the major ORF (Table 1). Therefore, a large number of nonCPuORFs are functional. Then, what fraction of the nonCPuORFs was, or still is, subject to selection? What fraction of them has adaptive significance? There is evidence that uORFs other than the known CPuORFs are subject to selection. First, the presence of the nonCPuORF is sometimes conserved, even if their amino acid sequence is not [25]. Similar to the situation in mouse and human [26], the AUG triplet is the most frequently conserved triplet when 5 UTRs from two plant families are aligned (Fig. 2C), consistent with the notion that many uAUGs are under stabilizing selection. Furthermore, certain gene networks that possess CPuORFs also include other genes with nonCPuORFs, for example the polyamine gene network (Fig. 3) and the auxin response transcription factors [27]. Finally, it should be recognized that the definition of a uORF as a CPuORF is constrained by our statistical power. All of these nonrandom patterns suggest that many nonCPuORFs are biologically significant. Such observations notwithstanding, it is also evident that many uORFs are unconserved. It was speculated that uORFs might counterbalance evolutionary changes in transcriptional control [4]; this interesting idea of coevolution between transcriptional and posttranscriptional regulation has not been scrutinized. 1.3. How does the ribosome get past uORFs: Leaky scanning, shunting, and reinitiation In vitro, uORFs suppress translation in a length dependent manner [2,28]. One of the more detailed in vivo surveys was performed in human cells [26] with >25 uORFs that inhibited protein expression between a little and 100%. Although this study did not detect a correlation between uORF length and protein expression, evidence from plants (Fig. 2D) shows a moderate correlation between inhibition of gene expression and uORF length. uORFs of more than 16 codons can be expected to inhibit translation. In contrast, uORFs shorter than 16 codons often inhibit expression by less than three fold and sometimes not at all, although there are exceptions.

Please cite this article in press as: A.G. von Arnim, et al., Regulation of plant translation by upstream open reading frames, Plant Sci. (2013), http://dx.doi.org/10.1016/j.plantsci.2013.09.006

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152

153 154

155 156 157 158 159 160 161 162 163 164

ARTICLE IN PRESS

G Model PSL 8856 1–12

A.G. von Arnim et al. / Plant Science xxx (2013) xxx–xxx

5’ UTR Alternative Splice Sites

3

3’ UTR

Major ORF (mORF)

Alternative Transcription Initiation Site overlap-uORF

5' Cap

AAAAA uAUG

mAUG

Stop

Stop

Alternative Translation Initiation Site Upstream ORFs (uORFs) Fig. 1. Schematic of an mRNA with uORFs. Not all possibilities illustrated in this figure are likely to apply simultaneously to a single mRNA. uORFs (red) can overlap with each other. In this review, overlap-uORFs are the subset of uORFs that overlap the major ORF (mORF, green). uORFs can be affected by alternative transcription initiation sites or alternative splicing (blue arrows) [100,101]. Therefore, not all mRNA molecules transcribed from a given gene will necessarily contain all the annotated uORFs. An alternative translation initiation site may extend the coding sequence of the mORF (light green). Such N-terminal extensions of major ORFs do not qualify as uORFs. The sequence between a uORF stop and the next start codon is referred to as the intercistronic spacer (not marked). In the literature, uORFs are occasionally referred to as short ORFs, upstream ORFs, and micro ORFs. 165 166 167 168 169

How does the ribosome overcome the inhibition by the uORF? First, the ribosome may ignore the uAUG by leaky scanning. The likelihood of leaky scanning is difficult to predict with precision. The extent of leaky scanning is estimated by fusing a reporter gene a few codons downstream of the uAUG. These experiments

show that most AUGs are detected at least some of the time [29,30]. A weak sequence context such as uugAUGa or uuuAUGu may allow between 3% and 15% initiation as compared to a strong context, such as aaaAUGg [2,29,31]. However, depending on the broader sequence context, even weak AUGs such as uugAUGc can

Fig. 2. Characteristics of uORFs in the 5 UTRs of plants. (A) The fraction of known transcripts that contain at least one uAUG in the 5 UTR is given for several genera of angiosperms. (B) The number of genes with a given uORF length in Arabidopsis. The line marked ‘Real’ indicates the authentic data. The two other lines are simulations of uORF lengths obtained from shuffling each 5 UTR 1000 times. Shuffling was performed with the 5 UTR split into mononucleotides or dinucleotides. Arabidopsis uORFs are slightly biased toward shorter lengths compared to the simulation. The inset shows the full distribution. (C) Comparison of orthologous pairs of genes. Alignable regions of 5 UTRs were scored for the fraction at which each triplet is identical in both species (see [26] for a similar analysis). The datapoint for the AUG triplet is marked. Boxes and whiskers indicate the distribution of conservation frequencies for all other 63 triplets. The median number of aligned triplets scored for each taxon was 3281 (Brassicaceae), 14,136 (Solanaceae), and 3780 (mammals). The whiskers represent the minimum and maximum values in the distribution. Boxes represent 25% below the median (thick horizontal line) and 25% above the median. uAUGs are conserved in a higher percentage of alignment columns than any other triplet. Whole-transcriptome pairwise comparisons: Brassicaceae - Arabidopsis thaliana and Brassica napus, Solanaceae - tomato and potato, Mammals – human and mouse. (D) The relationship between uORF length and repression of gene expression. Each datapoint represents the fold repression of expression from the major ORF when a uORF-containing 5 UTR is compared with a uORF-less mutant version. R2 denotes the correlation coefficient. Original data are in Supplemental Table 1.

Please cite this article in press as: A.G. von Arnim, et al., Regulation of plant translation by upstream open reading frames, Plant Sci. (2013), http://dx.doi.org/10.1016/j.plantsci.2013.09.006

170 171 172 173 174

ARTICLE IN PRESS

G Model PSL 8856 1–12

A.G. von Arnim et al. / Plant Science xxx (2013) xxx–xxx

4 Table 1 Experimental case studies of uORFs in plants. Gene AGI # Functional class A. Transcription factors AtbZip11 At4g34590 S-type bZip Opaque-2 Maize bZip myb7 Rice Myb ATR1 At5g60890 Myb R/Lc Maize basic helix loop helix SAC51 At5g64350 Basic helix loop helix HsfB1/HSF4/TBF1 At4g36990 Heat shock factor ARF3/ETTIN At2g33860 Auxin response factor ARF5/MONOPTEROS At1g19850 Auxin response factor ATH1 At4g32980 BELL-type homeodomain ABI3 At3g24650 MtHAP2-1 Medicago C/EBP B. Polyamine network SAMDC/AdoMetDC At3g02470 S-adenosylmethionine decarboxylase Arginine decarboxylase Dianthus (carnation) ODC Ornithine decarboxylase (tomato) C. Other genes CGS1 At3g01120 Cystathione gamma synthase pma1, pma3 Nicotiana plumbaginifolia H+-ATP-ases XIPOTL1 At3g18000 Phosphoethanolamine N-methyltransferase AtMHX1 At2g47600 Vacuolar magnesium/zinc – proton antiporter CaMV CaMV a b

175 176 177 178 179 180 181

Number of uORFs CPuORF classa

uORF affectsb

uORF lengths (codons)

Other characteristics

References

4 uORFs CPuORF HG1

Translation

18, 42, 5, 19

42 amino acid (aa) CPuORF peptide detected in vitro; sucrose exacerbates repression;

[21,27,31,51]

3 uORFs

Translation

3, 21, 20

uORFs in cluster repress translation redundantly

[3]

1 uORF

Translation

40

[103]

3 uORFs

mRNA level

33, 3, 4

uORF repression depends on downstream spacer sequence; peptide detected in vitro 33 amino acid uORF reduces mRNA level; uORF mutant allele from suppressor screen (atr1-d)

1 uORF

Translation

38

Peptide involved in repression; poor reinitiation

[4,38]

5 uORFs CPuORF HG15 2 uORFs CPuORF HG18 2 uORFs

mRNA level (NMD)

20, 16, 48, 53, 6

[12,81,87]

Translation

15, 36

CPuORF truncation from 53 to 3 amino acids rescues expression and thermospermine deficiency; uORF mutant allele from suppressor screen (sac51-d) CPuORF has 36 amino acids

mRNA level (minor) Translation

92, 5

Large uORF1 peptide detected in vitro; ARF3 translation stimulated by RPL24 and eIF3h

[27,69]

6 uORFs

Translation

3–23

ARF5 translation stimulated by RPL24 and eIF3h

[27,69]

4 uORFs

Translation

9, 12, 13, 1

uORF cluster with 7 uAUGs

[106]

26, 11, 12

Cluster of 3 uORFs inhibits ABI3 expression

[107]

3 uORFs

[14]

[104,105]

3 uORFs

mRNA level

62, 50, 34

uORF cluster in alternatively spliced intron; uORF1 peptide binds and represses its own mRNA

[22]

2 uORFs CPuORF HG3

Translation

3–4 48–54

Dual overlapping uORFs; 4 paralogous genes; polyamine suppresses uORF1 translation and triggers uORF2 translation and SAMDC repression

[11,49]

1 uORF

Translation

7

Synthesized uORF peptide inhibits in vitro translation

[23]

1 uORF

Translation

5

Two to three fold repression of ODC in vitro, independent of uORF peptide sequence

[111]

mainORF

Translation elongation

does not apply

S-adenosylmethionine blocks translation elongation

[52–54]

1 uORF

Translation

9 or 5

[108–110]

CPuORF HG13

Translation

25

uORFs mildly repress translation; uORF counteracts activation of expression by the 5 leader; uORFs also in tomato and Arabidopsis homologs Translation repressed by phosphocholine; peptide is arginine-serine rich

1 uORF

Translation, mRNA level (NMD)

13

uORF recognized despite poor AUG context; secondary structure in 5 UTR; peptide-independent; disallows efficient reinitiation

[32,33,80]

6 uORFs 6 uORFs

Shunting Translation

Regulation of plant translation by upstream open reading frames.

We review the evidence that upstream open reading frames (uORFs) function as RNA sequence elements for post-transcriptional control of gene expression...
1MB Sizes 0 Downloads 0 Views