GENE-40035; No. of pages: 15; 4C: Gene xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets Pravin Prakash, Dolly Ghosliya, Vikrant Gupta ⁎ Biotechnology Division, CSIR — Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow 226015, India

a r t i c l e

i n f o

Article history: Received 27 August 2014 Received in revised form 8 October 2014 Accepted 25 October 2014 Available online xxxx Keywords: Catharanthus roseus Deep sequencing MicroRNA Real-time PCR

a b s t r a c t MicroRNAs are small endogenous non-coding RNAs of ~19–24 nucleotides and perform regulatory roles in many plant processes. To identify miRNAs involved in regulatory networks controlling diverse biological processes including secondary metabolism in Catharanthus roseus, an important medicinal plant, we employed deep sequencing of small RNA from leaf tissue. A total of 88 potential miRNAs comprising of 81 conserved miRNAs belonging to 35 families and seven novel miRNAs were identified. Precursors for 16 conserved and seven novel cromiRNAs were identified, and their stem-loop hairpin structures were predicted. Selected cro-miRNAs were analyzed by stem-loop qRT-PCR and differential expression patterns were observed in different vegetative tissues of C. roseus. Targets were predicted for conserved and novel cro-miRNAs, which were found to be involved in diverse biological role(s) including secondary metabolism. Our study enriches available resources and information regarding miRNAs and their potential targets for better understanding of miRNA-mediated gene regulation in plants. © 2014 Published by Elsevier B.V.

1. Introduction MicroRNAs (miRNAs) and small interfering RNAs (siRNAs) are regulatory nucleic acid molecules which have been widely studied during the past two decades. MiRNAs are small noncoding RNA (ncRNA) molecules of ~19–23 nucleotides in length and play important regulatory roles in most of the biological processes (Bartel, 2004). Plant miRNAs are transcribed from intergenic regions as primary microRNA (pri-miRNAs) and cleaved within nucleus by ribonuclease enzyme Dicer-like 1 (DCL1) to form a precursor-miRNA (pre-miRNA) hairpin which is further cleaved to form a miRNA:miRNA* duplex (Kurihara and Watanabe, 2004). The characteristic feature of plant miRNA processing is 2′-O-methylation of miRNA:miRNA* duplex by Hua-Enhancer1 (HEN1), a nuclear RNA methyl transferase protein (Yu et al., 2005). The miRNA:miRNA* duplex is Abbreviations: A, adenosine; AMFE, adjusted minimal folding free energy; BLAST, Basic Local Alignment Search Tool; bp, base pair; C, cytidine; cDNA, DNA complementary to RNA; cro-miRNA, Catharanthus roseus micro ribonucleic acid; DCL1, Dicer-like protein 1; DTT, dithiothreitol; dNTP, deoxyribonucleoside triphosphate; EST, expressed sequence tags; G, guanosine; GO, gene ontology; GSS, genomic survey sequences; miRNA, micro ribonucleic acid; μl, microliter; MFE, minimal folding free energy; MFEI, minimal folding free energy index; ng, nanogram; ncRNA, non-coding ribonucleic acid; NCBI, National Centre for Biotechnology Information; nt, nucleotide(s); pre-miRNA, precursor micro ribonucleic acid; qRT-PCR, quantitative real time polymerase chain reaction; RISC, RNA-induced silencing complex; siRNA, small interfering ribonucleic acid; T, thymidine; TIA, terpenoid indole alkaloid; TSA, transcriptome shotgun assembly; U, uridine. ⁎ Corresponding author at: Biotechnology Division, CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Near Kukrail Picnic Spot, Lucknow 226015, India. E-mail addresses: [email protected], [email protected] (V. Gupta).

transported to cytoplasm by HASTY, a plant ortholog to exportin 5 (Bollman et al., 2003) and subsequently incorporated into the RNAinduced silencing complexes (RISCs) where argonaute family proteins direct the mature miRNA to interact with its target gene for regulation (Baumberger and Baulcombe, 2005). The miRNA* strand is usually destined for degradation but it might also become functional guide strands and perform regulatory roles (Guo and Lu, 2010; Hsieh et al., 2009; Li et al., 2010). Within RISC the miRNAs are completely or near perfectly complementary to target mRNA(s), thus, cleave them and ultimately silence the gene concerned. Besides endonucleolytic cleavage plant miRNAs also repress the target gene through translational inhibition (Brodersen et al., 2008). The first miRNA-encoding gene reported was lin-4 from Caenorhabditis elegans that is involved in the regulation of genes during early stages of larval development (Lee et al., 1993), since then several miRNAs have been identified in metazoans and found to be involved in diverse regulatory functions. In plants, miRNAs play crucial roles in regulating many events of developmental activities such as flower and root morphogenesis (Aukerman and Sakai, 2003; Wang et al., 2005), anther development (Millar and Gubler, 2005), and, leaf and vascular development (Kim et al., 2005; Palatnik et al., 2003; Song et al., 2012). The role(s) of plant miRNAs have also been implicated in environmental stresses like cold (Zhou et al., 2008), drought (Zhao et al., 2007; Zhou et al., 2010), salinity (Sunkar et al., 2008), pathogen infection (Navarro et al., 2006), UV-B radiation (Zhou et al., 2007) and mechanical stress (Lu et al., 2005). Cloning of small RNAs has been an indispensible method for discovery of conserved as well as novel microRNAs (Sunkar and Zhu, 2004; Sunkar et al., 2005). Apart from cloning, computational methods

http://dx.doi.org/10.1016/j.gene.2014.10.046 0378-1119/© 2014 Published by Elsevier B.V.

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

2

P. Prakash et al. / Gene xxx (2014) xxx–xxx

also provide a useful means for mining conserved microRNAs in different organisms which rely on the sequences available in the public databases. Expressed sequence tags (ESTs) and genomic survey sequences (GSSs) were analyzed for the identification and prediction of miRNAs in Barley (Colaiacovo et al., 2010), Brachypodium (Unver and Budak, 2009), sorghum (Katiyar et al., 2012), soybean (Zhang et al., 2008) and tomato (Yin et al., 2008) etc. Next-generation high-throughput sequencing technologies are robust tools for the identification of poorly expressed, species-specific, non-conserved and novel miRNAs. Deep sequencing of RNAs in combination with transcriptome data is proving helpful in identifying conserved and novel miRNAs in plants lacking whole genome sequence information. Several miRNAs have been reported from plants like Arabidopsis (Fahlgren et al., 2007), rice (Sunkar et al., 2008; Sunkar and Jagadeeswaran, 2008), Vitis vinifera (Pantaleo et al., 2010), Populus trichocarpa (Puzey et al., 2012), Vitis amurensis (C. Wang et al., 2012), Hevea brasiliensis (Gébelin et al., 2012), Prunus persica (Luo et al., 2013), Brassica oleracea (Lukasik et al., 2013), Solanum tuberosum (Lakhotia

et al., 2014), Triticum aestivum (Su et al., 2014) etc. by using highthroughput sequencing and subsequent computational analysis. The Catharanthus roseus belongs to the Apocynaceae family and synthesizes over 130 secondary metabolites of economic importance that include a variety of terpenoid indole alkaloids and phenolic compounds via Tepenoid Indole Alkaloid (TIA) and phenylpropanoids pathways, respectively (El-Sayed and Verpoorte, 2007; Mustafa and Verpoorte, 2007). Important alkaloids produced by C. roseus include anti-neoplastic vinblastine and vincristine, anti-hypertensives reserpine and ajmalicine, and anti-arrhythmic ajmaline. C. roseus leaf contains specialized cells and is considered as a major site for biosynthesis of high value secondary metabolites (Murata and De Luca, 2005; St-Pierre et al., 1999). MicroRNAlike molecules have recently been identified and reported from few secondary metabolite-rich plants having medicinal and pharmaceutical importance like Taxus chinensis (Qiu et al., 2009), Papaver somniferum (Unver et al., 2010) and Artemisia annua (Pérez-Quintero et al., 2012). Due to the lack of genome sequence and availability of limited number

(a) 45 Percentage (%) of reads

40

Redundant reads Unique reads

35 30 25 20 15 10 5 0 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Read length (nt)

(b) snRNA IRES 1.75% 0.012%

CRISPR-DR 0.0014%

snoRNA 6.07%

tRNA 25.71% rRNA 60.49%

Fig. 1. Analyses of small RNA reads generated by deep sequencing of C. roseus leaf small RNA library. (a) Size distribution of small RNA sequences identified in the library. Majority of small RNA reads ranged between 21 and 24 nt in length among which 24, 23 and 21 nt long were the most abundant. (b) Percentage of major categories of non-coding RNAs other than miRNAs identified in the library.

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

P. Prakash et al. / Gene xxx (2014) xxx–xxx

3

Table 1 C. roseus mature miRNA sequences identified by Illumina-based high-throughput sequencing and subsequent analysis. miRNA family

miRNA

C. roseus mature miRNA reads (5′–3′)

Read length (nt)

No. of reads

Strand

Mismatch (nt)

Homologous miRNA

Plant species

156

Cro-miR156c Cro-miR156c Cro-miR156f-3p Cro-miR156g-5p Cro-miR156h Cro-miR156l Cro-miR157a Cro-miR157c-3p Cro-miR157d-3p Cro-miR159 Cro-miR159a Cro-miR159c Cro-miR159h-3p Cro-miR160a* Cro-miR160a-3p Cro-miR160g Cro-miR162a Cro-miR162a Cro-miR162-5p Cro-miR164a* Cro-miR164c Cro-miR164f Cro-miR164g-3p Cro-miR165a Cro-miR166d Cro-miR166c-5p Cro-miR166e Cro-miR166f Cro-miR167a Cro-miR167b-3p Cro-miR167f-5p Cro-miR167h-5p Cro-miR168a Cro-miR168a-3p* Cro-miR168b Cro-miR169a Cro-miR169h* Cro-miR169m Cro-miR170 Cro-miR170-5p Cro-miR171a Cro-miR171a Cro-miR171b Cro-miR171b Cro-miR171c Cro-miR171c-5p Cro-miR171f Cro-miR172a* Cro-miR172a-5p Cro-miR172c Cro-miR172c-5p Cro-miR319a Cro-miR319h Cro-mir319i Cro-mir390a Cro-miR391 Cro-miR393a Cro-miR394 Cro-miR395a Cro-miR396a* Cro-miR396b Cro-miR396a-3p* Cro-miR396a-3p Cro-miR396f Cro-miR396j Cro-miR397a Cro-miR398a Cro-miR398b Cro-miR399a Cro-miR399b Cro-miR408 Cro-miR477a* Cro-miR530a

UUGACAGAAGAGAGAGAGCAC UUGACAGAAGAAAGAGAGCAC GCUCACUCUCUAUCUGUCACC UUGACAGAAGAUAGAGGGCAC UGACAGAAGAGAGUGAGCAC UUGACAGAAGAUGGAGAGCAC UUGACAGAAGAUAGAGAGCAC GCUCUCUAUACUUCUGUCAUC GCUCUCUAUGCUUCUGUCAUC UUUGGACUGAAGGGAGCUCUA UUUGGAUUGAAGGGAGCUCUA UUUGGAUUGAAGGGAGCUCCA UUUGGAGUGAAGGGAGCUCUA UGCCUGGCUCCCUGUAUGCCA GCGUAUGAGGAGCCAUGCAUA UGCCUGGCUCCCUGGAUGCCC UCGAUAAACCUCUGCAUCCAG UCGAUAAACCUGUGCAUCCAG GGAGGCAGCGGUUCAUCGAUC UGGAGAAGCAGGGCACGUGCA UGGAGAAGCAGGGUACGUGCA UGGAGAAGCAGGGCACAUGCA CACGUGCUCCCCUUCUCCAAC UCGGACCAGGCUUCAUCCCCC UCGGACCAGGCUUCAUUCCCC GGAAUGUUGUCUGGCUCGAGG UCGGACCAGGCUUCAUUCUCC UCUCGGACCAGGCUUCAUUCC UGAAGCUGCCAGCAUGAUCUA GGUCAUGCUCUGACAGCCUCACU UGAAGCUGCCAGCAUGAUCUU UGAAGCUGCCAACAUGAUCUA UCGCUUGGUGCAGGUCGGGAA CCCGCCUUGCAUCAACUGAAU UCGCUUGGUGCAGGUCGAGAAC CAGCCAAGGAUGACUUGCCGA UAGCCAAGGAUGACUUGCCU UAGCCAAGGAUGACUUGCC UGAUUGAGCCGUGUCAAUAUC UAUUGGCCUGGUUCACUCAGA UGAUUGAGCCGCGCCAAUAUC UGAUUGAGUCGUGCCAAUAUC UUGAGCCGUGCCAAUAUCAC UGAUUGAGCCGUGCCAAUAUC UAUUGGUGCGGUUCAAUGAGA AGAUAUUGGUGCGGUUCAAUG UUGAGCCGCGCCAAUAUCACU AGAAUCUUGAUGAUGCUGCAU GUGGCAUCAUCAAGAUUCAUA GGAAUCUUGAUGAUGCUGCAG GUAGCAUCAUCAAGAUUCACA UUGGACUGAAGGGAGCUCCC UUGGACUGAAGGGAGCUCCU UUGGGCUGAAGGGAGCUCCC AAGCUCAGGAGGGAUAGCGCC UACGCAGGAGAGAUGACGCCGC UCCAAAGGGAUCGCAUUGAUU UUGGCAUUCUGUCCACCUCC CUGAAGUGUUUGGGGGAACUC UUCCACAGCUUUCUUGAACUG UUCCACAGCUUUCUUGAACUUU GUUCAAUAAAGCUGUGGGAUG UUCAAGAAAGCUGUGGGAAA UUCCACGGCUUUCUUGAACUG UUUCCACAGCUAUCUUGAACU CAUUGAGUGCAGCGUUGAUGA UGUGUUCUCAGGUCACCCCUU UGUGUUCUCAGGUCGCCCCUG UGCCAAAGGAGAAUUGCCCUG UGCCAAAGGAGAGUUGCCCUG UGCACUGCCUCUUCCCUGGC ACUCUCCCUCAAGGGCUUCUGG UGCAUUUGCACCUGCACCUCC

21 21 21 21 20 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 23 21 21 21 21 22 21 20 19 21 21 21 21 20 21 21 21 21 21 21 21 21 20 20 20 21 22 21 20 21 21 22 21 20 21 21 21 21 21 21 21 20 22 21

01 05 10 08 10 04 3462 02 1153 15 58,665 438 01 430 17 01 3769 02 35 800 01 01 16 58 105,933 103 77 10,372 3661 150 43 06 2811 398 01 01 01 01 02 1301 02 06 108 1511 03 04 05 1146 19 11 07 1408 03 07 40 01 94 7738 01 20,187 169 63 187 14 01 33 14 479 07 22 11 02 01

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + − + + + + + + + +

0 0 0 0 0 0 0 01 0 0 0 01 01 0 0 01 0 0 0 0 0 01 01 0 0 0 01 0 0 0 0 01 0 0 0 0 0 0 0 0 0 0 0 0 0 01 0 0 01 0 0 0 0 0 0 0 01 0 0 0 0 01 0 0 0 01 0 0 0 0 0 0 01

ahy-miR156c smo-miR156c aly-miR156f-3p mtr-miR156g-5p sbi-miR156h ptc-miR156l ath-miR157a aly-miR157c-3p aly-miR157d-3p aqc-miR159 ath-miR159a ath-miR159c gma-miR159h-3p ath-miR160a bra-miR160a-3p ptc-miR160g ath-miR162a bna-miR162a csi-miR162-5p ath-miR164a osa-miR164c ptc-miR164f zma-miR164g-3p ath-miR165a vvi-miR166d zma-miR166c-5p vvi-miR166e bdi-miR166f ath-miR167a aly-miR167b-3p ptc-miR167f-5p ptc-miR167h-5p ath-miR168a aly-miR168a-3p bna-miR168b ath-miR169a ath-miR169h osa-miR169m ath-miR170 aly-miR170-5p ath-miR171a mtr-miR171a zma-miR171b osa-miR171b sly-miR171c aly-miR171c-5p vvi-miR171f ath-miR172a aly-miR172a-5p vvi-miR172c mtr-miR172c-5p ctr-miR319a ptc-miR319h ptc-miR319i ath-miR390a mdm-miR391 zma-miR393a ath-miR394a ath-miR395a ath-miR396a ath-miR396b aly-miR396a-3p cca-miR396a-3p ptc-miR396f gma-miR396j ath-miR397a ath-miR398a osa-miR398b osa-miR399a ath-miR399b gma-miR408 nta-miR477a tcc-miR530a

Arachis hypogaea Selaginella moellendorffii Arabidopsis lyrata Medicago truncatula Sorghum bicolor Populus trichocarpa Arabidopsis thaliana Arabidopsis lyrata Arabidopsis lyrata Aquilegia caerulea Arabidopsis thaliana Arabidopsis thaliana Glycine max Arabidopsis thaliana Brassica rapa Populus trichocarpa Arabidopsis thaliana Brassica napus Citrus sinensis Arabidopsis thaliana Oryza sativa Populus trichocarpa Zea mays Arabidopsis thaliana Vitis vinifera Zea mays Vitis vinifera Brachypodium distachyon Arabidopsis thaliana Arabidopsis lyrata Populus trichocarpa Populus trichocarpa Arabidopsis thaliana Arabidopsis lyrata Brassica napus Arabidopsis thaliana Arabidopsis thaliana Oryza sativa Arabidopsis thaliana Arabidopsis lyrata Arabidopsis thaliana Medicago truncatula Zea mays Oryza sativa Solanum lycopersicum Arabidopsis lyrata Vitis vinifera Arabidopsis thaliana Arabidopsis lyrata Vitis vinifera Medicago truncatula Citrus trifoliata Populus trichocarpa Populus trichocarpa Arabidopsis thaliana Malus domestica Zea mays Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis lyrata Cynara cardunculus Populus trichocarpa Glycine max Arabidopsis thaliana Arabidopsis thaliana Oryza sativa Oryza sativa Arabidopsis thaliana Glycine max Nicotiana tabacum Theobroma cacao

157

159

160

162

164

165 166

167

168

169

170 171

172

319

390 391 393 394 395 396

397 398 399 408 477 530

(continued on next page)

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

4

P. Prakash et al. / Gene xxx (2014) xxx–xxx

Table 1 (continued) miRNA family

miRNA

C. roseus mature miRNA reads (5′–3′)

Read length (nt)

828 858 1511 2111 2199 5139 5368 6173

Cro-miR828a Cro-miR858 Cro-miR1511 Cro-miR2111a Cro-miR2199 Cro-miR5139 Cro-miR5368 Cro-miR6173

UCUUGCUCAAAUGAGUAUUCCA CUCGUUGUCUGUUCGACCUUG AACCAGGCUCUGAUACCAUGA UAAUCUGCAUCCUGAGGUUUC UGAUACACUAGCACGGAUCAC AAACCUGGCUCUGAUACCA GGGACAGUCUCAGGUAGACA UAGCCGUAAACGAUGGAUACU

22 21 21 21 21 19 20 20

No. of reads 11 35 03 02 01 07 151 113

Strand

Mismatch (nt)

Homologous miRNA

Plant species

+ + + + + + + +

0 0 01 01 0 0 01 01

vvi-miR828a ppe-miR858 gma-miR1511 ath-miR2111a mtr-miR2199 rgl-miR5139 gma-miR5368 hbr-miR6173

Vitis vinifera Prunus persica Glycine max Arabidopsis thaliana Medicago truncatula Rehmannia glutinosa Glycine max Hevea brasiliensis

Less-conserved miRNA families are marked in bold. The cro-miRNAs for which miRNA* sequences were identified are represented with (*) symbol. Mismatches are italicized in mature cro-miRNA sequences.

of ESTs in C. roseus, it is difficult to mine miRNAs and further explore the miRNA-mediated regulation of gene expression in this secondary metabolite-rich plant. In the present study, we used Illumina-based high-throughput sequencing and analysis of leaf small RNAs to identify conserved and novel miRNAs in C. roseus, studied their expression, and predicted their putative targets. It would be interesting to explore the possibilities of involvement of miRNAs in the regulation of secondary metabolism in C. roseus which could be further utilized for genetic manipulations for enhanced production of commercially important metabolites. 2. Materials and methods 2.1. Plant material and deep sequencing of small RNA Mature seeds of C. roseus (cv. Nirmal) were obtained from National Gene Bank of Medicinal and Aromatic Plants, CSIR-CIMAP, Lucknow, India. Seeds were surface sterilized in 0.01% HgCl2 for 2–3 min, rinsed thrice with sterile water and germinated in pots containing a mixture of soil and vermiculite (3:1) at 28 ± 1 °C under a 16 h/8 h light/dark cycle. Leaves of 2-month-old seedlings were collected and frozen in liquid nitrogen and stored at −80 °C until use. Total RNA was extracted by using Trizol reagent (Invitrogen, USA) as per manufacturer's instructions and quantified in NanoDrop ND-1000 (NanoDrop Technologies, USA). Small RNA library was constructed by using NEXTflex™ Small RNA Sequencing Kit-5132-02 (Bioo Scientific Corporation, USA) according to the library protocol outlined in the manufacturer's instruction manual. 1 μg of total RNA was used as the starting material. Briefly, 3′ adaptors were ligated to the specific 3′OH group of small RNA followed by 5′ adaptor ligation. The ligated products were reverse transcribed by priming with reverse transcriptase primers, enriched by PCR (15 cycles) and size selected in the range of 140–200 bp on 15% urea polyacrylamide gel followed by overnight gel-elution and precipitation. The small RNA library was quantified using NanoDrop and validated for quality by running an aliquot on high sensitivity bioanalyzer chip using an Agilent 2100 Bioanalyzer (Agilent Technologies, USA). The prepared library was sequenced on the Illumina HiSeq 1000 by SBS method to read 50 bases single end at Genotypic Technologies Pvt. Ltd, Bangalore, India. 2.2. Bioinformatics analysis of C. roseus small RNA sequences Upon completion of the sequencing run, the raw data was extracted by using Illumina pipeline software to obtain FASTQ files and quality check was done by SeqQCv2.2 software (http://genotypic.co.in/SeqQC. html). Low-quality sequence reads with a Phred score of less than 30 and a read-length shorter than 16 nucleotides were discarded. Clean reads were obtained after filtering out low-quality reads, trimming and removal of adapter sequences as well as polyA tail containing sequences. Subsequently, the clean reads were fed in Deep Sequencing

Small RNA Analysis Pipeline (DSAP) software and aligned with Rfam database to predict non-coding RNAs such as rRNA-, tRNA‐, snRNA‐, and snoRNA‐derived sequences etc. DSAP was used at the default parameters and alignment of clean reads with non-coding RNAs from all organisms (including plants) available at Rfam database was done at a BLAST cut-off e-value of 0.001. The reads qualifying the cut-off e-value of 0.001 were considered as non-coding RNAs. The remaining small RNA sequences were aligned against miRNA sequences from all organisms available at miRBase v19.0 (http://www.mirbase.org/) to identify known (conserved) miRNAs in C. roseus by using BLASTN search at a cut-off e-value of 10. Top hits with 0–1 nucleotide mismatch against miRBase miRNAs were considered as the corresponding miRNA homologs in C. roseus. Sequences with no more than a single base mismatch with the known miRNAs were grouped to the particular miRNA family. Their precursors were searched in ‘nr’ nucleotide collection, expressed sequence tags (EST) and transcriptome shotgun assembly (TSA) dataset specific to C. roseus available at NCBI (http://www.ncbi. nlm.nih.gov) by BLASTN at default parameters for short input sequences and cut-off e-value of 10. High-throughput transcriptome sequences recently made available at http://nipgr.res.in/mjain.html? page=catharanthus were also analyzed for screening the precursors. Significant hits with no mismatch to their corresponding mature miRNAs were checked for characteristic hairpin-like secondary structures formed by primary and precursor miRNAs by using miRNA identification pipeline of C-mii tool (Numnark et al., 2012). Briefly, the putative candidate sequences were scanned again by BLASTN (e-value cut-off: 10) by C-mii tool. Subsequently, the protein-coding sequences were removed by BLASTX (e-value ≤ 1e−5) against protein databases UniProtKB/Swiss-Prot (release 2010_12) and UniProtKB/TrEMBL (release 2011_01), and, other non-coding RNAs were eliminated by BLASTN (e-value ≤ 1e−5) against non-coding RNA database Rfam10 by C-mii tool for identification of C. roseus miRNA precursors. The filtered sequences were used as input by UNAFold for folding of primary and precursor miRNAs. For primary miRNA folding, default UNAFold parameters such as maximum base pair distance 3000, maximum bulge/interior loop size 30, and single thread run at 37 °C temperature were used. The precursor miRNA folding steps of C-mii tool were then performed to extract the pre-miRNA sequences from the pri-miRNAs using the same parameters that were applied for pri-miRNA folding. The default auto mode of C-mii tool was selected to decide the cleavage positions of primary and precursor miRNAs. The criteria for selection of conserved cro-miRNA precursors were: i) folding of miRNA precursor sequences into an appropriate secondary stem-loop structure; ii) one arm of a precursor stem portion contains mature miRNA whereas its opposite arm harbors miRNA* sequence; iii) no loop or break and less than six nucleotide mismatches between mature miRNA and its opposite miRNA* sequence; iv) higher negative MFEI values of precursors than other types of RNAs; and v) miRNA:miRNA* sequences form a duplex with two nucleotides 3′ overhang in hairpin secondary structure

Fig. 2. Predicted stem-loop hairpin structures of C. roseus miRNA precursors. The mature miRNAs in the precursors are shown in red color. Nucleotide positions in the predicted pre-miRNAs are numbered from the 5′ end. The nucleotide positions of precursors in pri-miRNAs are shown in brackets in the conserved miRNA structures. (a–p) Conserved miRNAs. (q–w) Novel miRNAs. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

P. Prakash et al. / Gene xxx (2014) xxx–xxx

(a) cro-miR157a

(g) cro-miR168a

(b) cro-miR160a

(c) cro-miR162a

(h) cro-miR169a

(d) cro-miR164a

(i) cro-miR169h

5

(e) cro-miR166d

(j) cro-miR171b

(f) cro-miR167a

(k) cro-miR172a

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

6

P. Prakash et al. / Gene xxx (2014) xxx–xxx

(l) cro-miR319a

(m) cro-miR396a

(n) cro-miR398a

(o) cro-miR408

(p) cro-miR828a

Fig. 2 (continued).

(Meyers et al., 2008; M. Wang et al., 2012; Zhang et al., 2006). After assigning known miRNAs into respective families, the remaining small RNA sequences were then checked for novel miRNA candidates. Due to the non-availability of genome sequence of C. roseus in the public domain, reference genome sequence of a close asterid Solanum lycopersicum was used to identify novel C. roseus miRNAs. Initially, the small RNA reads were aligned with the S. lycopersicum genome using bowtie with strict parameters (number of mismatch; − v = 0). Mireap (http:// sourceforge.net/projects/mireap/) was then used to predict novel miRNAs by using this map file (BED) along with S. lycopersicum genome (ftp://ftp.sgn.cornell.edu/genomes/Solanum_lycopersicum/annotation/ ITAG2.3_release/) fasta file. The parameters used for Mireap prediction included: i) minimal and maximal miRNA reference sequence lengths of 20 nt and 24 nt, respectively; ii) maximal copy number of miRNAs on the reference (20); iii) maximal free energy allowed for a miRNA precursor (− 18 kcal/mol); iv) maximal space between the miRNA and miRNA* (35); v) minimal mature base pairs of the miRNA and miRNA* (14); vi) maximal bulge of the miRNA and miRNA* (4); vii) maximal asymmetry of the miRNA/miRNA* duplex (5); and viii) flank sequence length of miRNA precursor (100). Reads of precursor sequences suggested by Mireap software were analyzed by BLASTX (e-value cut-off: 1e− 20) against UniProtKB/Swiss-Prot(swissprot) dataset to check any potential protein-coding sequence and further subjected to an online tool ‘psRobot’ (http://omicslab.genetics.ac.cn/ psRobot) to align with S. lycopersicum genomic loci in order to identify sequences homologous to C. roseus-specific small RNA reads, and their flanking regions which fold into secondary stem-loop structures with suitable MFEI values. The psRobot was used at strict parameters which allow the minimal number of mismatches in small RNA region

to one, maximal number of mismatches in small RNA region to seven, maximal precursor length to 100 nucleotides, and large loop small RNA option to “F”. BLASTX was again performed for precursors suggested by psRobot against UniProtKB/Swiss-Prot(swissprot) at a less-stringent cut-off threshold of 1e−5 to finally check any similarity to proteins. The MFEI value was calculated according to the equation: MFEI = (MFE / length of the RNA sequence) × 100 / %GC content (Zhang et al., 2006) and considered for predicting novel miRNAs. To study the nucleotide composition and dominance, positionspecific base analysis was done for all the identified conserved and novel C. roseus miRNAs. A text file containing 81 conserved and 7 novel mature cro-miRNA sequences in fasta format was prepared and imported in BioEdit (Hall, 1999) software. Positional frequency summary option from alignment menu of BioEdit was executed and the output file was generated. The average of overall percentage of A, U, G and C was then calculated. The individual percentages of each base at different nucleotide positions of mature miRNA sequences as well as their overall percentage were represented in graphical form by using SigmaPlot 10.0 (Systat Software Inc. USA). 2.3. Expression analysis of miRNAs by stem-loop qRT-PCR For validating the selected conserved and new cro-miRNAs, stemloop qRT-PCR was carried out as described previously (Varkonyi-Gasic et al., 2007) with slight modifications. Small RNA from leaf, stem and root of C. roseus was extracted by using mirPremier microRNA isolation kit (Sigma‐Aldrich, USA) following the manufacturer's instructions, and quantified on a NanoDrop ND‐1000 spectrophotometer. Small RNA (100 ng), 0.5 μl 10 mM dNTP mix, 1.0 μl 5 μM stem-loop RT primers

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

P. Prakash et al. / Gene xxx (2014) xxx–xxx

7

10

20 30 -| a g aguagu uuugg ggaggaca cuauu gcuc aaaauaaau \ agacc cUUUCUGU GAUAA CGAG UUUUGuuua a uaca A^ agaaaa 80 70 60 50 ----

(q) cro-miRnovel01

10 20 30 cuc|ua U AA A ACUuua c cauuCU CA CC AAAGGCAUU u g guaaga gu gg uuuccguaa a acu^gc c ga a auuacg 70 60 50 40 (r) cro-miRnovel02

10 20 30 uuuucgaa| auuu uc gagg gag uauuuu uuaucccuu \ cuC GUAAAA GAUGGGGAA a uaaucgaa^ C--GA Guau 60 50 40 (s) cro-miRnovel03

10 20 30 uaguaagac| a a u uuu gaugaaggu cgagg uga guug \ cuaCUUCCA GCUCC ACU CAAC u gcaauau--^ G A C caa 70 60 50 40 (t) cro-miRnovel04

10 20 30 a uc au-| a a ug uc uugaguagu ac auu ga cuuc ga ug u aaCUCGUCA UG UAG CU gaag cu gu g uacuauac A UA GUC^ A - gu uu 80 70 60 50 40 a-------

(u) cro-miRnovel05

10 20 30 cu u- g -| c u uu ucgugg aguuga uc gc agg gagu g \ aguacC UCAGCU AG CG UUC CUCa c a ccacc UUU G A^ - ac 70 60 50 40 uuu--

(v) cro-miRnovel06

10 20 30 40 gucaa--| AA AAAA G c ga agauuUGAAC UG UA AUGGCC ugauc \ ucugaacuug ac au uacugg acuag a cuggcca^ ga aagg g uu 80 70 60 50 (w) cro-miRnovel07 Fig. 2 (continued).

and nuclease free water to a final volume of 10 μl were mixed and heated at 65 °C for 5 min followed by incubation on ice for 2 min. Subsequently, 2.0 μl RT buffer (10×), 4.0 μl MgCl2 (25 mM), 2.0 μl DTT (0.1 M), 0.1 μl RNaseOUT (40 U/μl) and 0.25 μl SuperScript III RT (200 U/μl) were added in the reaction mixture and final volume adjusted to 20 μl. RT reaction was carried out in a S1000 Thermal Cycler (Bio-Rad, USA) for 30 min at 16 °C, 30 min at 42 °C, 5 min at 85 °C and then on held at 4 °C. RT-PCR was done by using 10 ng of cDNA, 5.0 pmol each of miRNA-specific forward and universal reverse primer, 0.5 μl dNTP mix (10 mM), 2.5 μl Advantage 2 PCR Buffer (10×), 0.5 μl Advantage® 2 Polymerase Mix (50×) and MQ water to a final volume of 25 μl. The reactions were incubated in a thermal cycler at 95 °C for 1 min, followed

by 40 cycles of 95 °C for 15 s, 60 °C for 30 s and 72 °C for 30 s, and, the product obtained was resolved on a 2% agarose gel for visualization. The reaction for stem-loop qRT-PCR was set up using cDNA (10 ng), 5.0 pmol each of miRNA-specific forward and universal reverse primer, 10 μl 2× SYBR® Premix ExTaq™ (Tli RNaseH Plus), 0.4 μl of 50 × ROX reference dye II (Takara Bio Inc., Japan) and MQ water to final reaction volume to 20 μl. Quantitative PCR was carried out in a 7900HT Fast Real-Time PCR system (Applied Biosystems, USA) for 30 s at 95 °C, followed by 40 cycles of 15 s at 95 °C and 30 s at 60 °C. The amplification cycle was followed by a dissociation curve analysis ranging between 62 and 95 °C ramp with 0.4 °C increment to confirm the specificity of the primers. The primer efficiencies were calculated by

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

8

P. Prakash et al. / Gene xxx (2014) xxx–xxx

using a standard curve generated from 5-fold dilution series of 5 points in triplicate from mixed cDNA sample. The PCR efficiency of reference gene and all the tested miRNAs were similar and closer to 2.0. The threshold cycle (Ct) values were recorded and ΔCt method was used to calculate relative expression of miRNAs. The Ct values of each reaction were normalized to the Ct value obtained for 5.8S rRNA (accession: HQ130657.2) reference gene. Each reaction was conducted for three biological samples with three technical replications. The primers used for stem-loop qRT-PCR are given in Supplementary Table 1.

redundant small RNA sequences showed that the abundance of C. roseus small RNAs were dominated by 24 nt (18.50%) followed by 21 nt (9.43%). While analysis of the size distribution of unique sRNA sequences revealed that 24 nt (41.47%) were most dominant followed by 23 nt (14.69%), 21 nt (8.54%) and 22 nt (6.22%) (Fig. 1a). Among the total 48,279,056 reads, around 25,906,982 sequences belonged to 892 different categories of non-coding RNAs (Supplementary Table 2), and, fractions of most abundant ones among ncRNAs including rRNA (60.49%), tRNA (25.71%), snRNA (1.75%), snoRNAs (6.07%) etc. are depicted in Fig. 1b.

2.4. Identification of C. roseus miRNA targets 3.2. Identification of conserved miRNAs in C. roseus Targets of predicted C. roseus potential miRNAs were identified in different plant species using web-based tool psRNATarget (http:// plantgrn.noble.org/psRNATarget) (Dai and Zhao, 2011) at default parameters such as: length for complementarity scoring (hsp size) to 20 nucleotides; maximum unpaired energy (UPE) to evaluate target-site accessibility to 25; flanking length around the target site for target accessibility analysis was taken as 17 bp in upstream/13 bp in downstream; range of central mismatch leading to translational inhibition at 9–11 nucleotide positions, and a maximum expectation value cutoff threshold of 2.0. Targets were also searched in C. roseus EST and TSA dataset of NCBI by BLASTN (e-value cut-off: 10) using identified mature cro-miRNAs as query. Apart from NCBI data, targets were also filtered from C. roseus high-throughput transcriptome dataset available at http://nipgr.res.in/mjain.html?page=catharanthus using the user-submitted small RNAs/user-submitted transcript option of the psRNATarget tool at default parameters as mentioned previously. The hits with only 0–2 nt mismatches were further scanned and functionally annotated by C-mii tool. Firstly, the targets were scanned at default parameters by BLASTN (e-value cut-off: 10) and keeping the binding score ≤2, followed by prediction of secondary structures and MFEs (cut-off value: ≤−20 kcal/mol) for miRNA:target duplex hybridizations at 37 °C by using UNAFold. In silico target prediction was done by considering the stringent criteria as described previously (Yin et al., 2008) with slight modifications. Briefly, the conditions used for prediction of targets were (i) maximum two nucleotide mismatches between miRNA and its target and (ii) mismatch-less pairing between miRNA and its target transcript at the cleavage site present at nucleotide positions 10 and 11. The miRNA:target pair without any mismatch was scored as 0, while points assigned for G:U wobble and non-G:U mismatch were 0.5 and 1.0, respectively. Targets were also functionally annotated by BLASTX (e-value 1e−10) against UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases for plants. Gene ontology (GO) annotation for known and novel cro-miRNA targets was done by using inbuilt target annotation module of C-mii software and Blast2GO v2.7.1 suite, respectively. 3. Results 3.1. Sequencing and analysis of small RNAs in C. roseus To identify miRNAs in medicinally important plant C. roseus (cv. Nirmal), a small RNA library from leaf tissue was constructed and sequenced by Illumina high-throughput platform. A total of 75.52 million reads were generated with a mean read length of 50 bases, of which 75.22 million (99.6%) reads were of high quality. High-throughput sequencing yielded a total of 3851.4 Mb of data which included 3817.50 Mb (99.12%) of high quality bases. The raw reads of C. roseus leaf library was submitted to SRA database of NCBI under the accession number SRX528793 and BioSample ID SAMN02732194. After the removal of 3′ TruSeq adapter, a total of 48,279,056 reads with a sub-set of 4,768,340 non-redundant distinct reads could be obtained. Poor quality sequences, sequences smaller than 16 nt and larger than 30 nt were discarded which rendered a total of 17,197,286 clean reads with 3,734,131 non-redundant reads. The size distribution analysis of

Bioinformatics-based methods such as homology- and structuresimilarity-based searches by using the available mature miRNA sequences have made the prediction of miRNAs possible in various organisms including plants having complex genomes. In order to identify C. roseus miRNAs, clean reads of N16 nt and b25 nt length were subjected to homology search against eudicot miRNAs in miRBase v19.0 database. A total of 266 eudicot miRNAs belonging to different plant species were aligned on our small RNA reads corresponding to 107 distinct C. roseus miRNA sequences (Supplementary Table 3). Many miRNAs like miR156, miR160, miR162, miR172 etc. appears to be universally expressed in diverse angiosperms among which few such as miR156 and miR319 have also been reported in bryophyte, gymnosperm and lycopods (Axtell and Bowman, 2008). However, a vast number of miRNAs are identified in few or even a single species, and is well exemplified by miR415, miR416, miR417, and miR418 that have been reported only in Arabidopsis and rice (Jones-Rhoades et al., 2006; Wang et al., 2004). Like-wise, miR1885 and miR5718 were reported only in Brassica rapa (Yu et al., 2012). Considering the fact that few categories of miRNA are widespread whereas others are only limited to certain plant species, microRNAs are classified as ‘conserved’ or ‘less-conserved’. The 107 unique C. roseus small RNA sequences were further filtered and only those sequences that matched to known plant miRNAs with 0–1 mismatches were considered as conserved or less-conserved miRNAs. Finally, 81 were identified as conserved or less conserved cro-miRNAs belonging to 35 different families due to their occurrence in other plant species (Table 1). Among the cro-miRNAs pool, significant differences were observed in the number of members for each conserved miRNA family. The miR171 family was the largest with seven members followed by miR156 and miR396 with six members each. Out of the remaining 32 families, 14 comprised of 2–5 members while 18 miRNA families were found to be represented by only a single member. Considerable variations in the number of reads were detected among miRNA families reflecting their differential abundance levels. Out of the total 35 families, a member of miR166 family i.e. cro-miR166d was the most abundant with the highest number of reads (105,933) followed by cro-miR159a and cro-miR396a that were represented by 58,665 and 20,187 reads, respectively. Members of many miRNA families were represented by very few or even a single read indicating their poor abundance. It was also noted that different members of the same family exhibited variations in the number of reads and hence their abundance, which suggest a functional divergence within the family. Moreover, the existence of a dominant member in a family suggests its (dominant member) involvement in the regulatory role(s) at the particular developmental stage of the tissue which is used for small RNA sequencing, while, low abundant members of the same family might function in other tissue-specific developmental stages or physiological conditions. For example, the frequency of miR159 family members varied from single read for cro-miR159h-3p to 58,665 reads for cro-miR159a. The comparisons between abundance of different members in a miRNA family could provide valuable information regarding their regulatory activities during particular growth conditions and developmental stages. Most of the cro-miRNAs identified in our study were also reported across various plant species suggesting conservation in their roles.

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

P. Prakash et al. / Gene xxx (2014) xxx–xxx

9

Table 2a Characteristics of predicted precursors of C. roseus miRNAs. C. roseus miRNA

miRNA family

MFE value (−kcal/mol)

Precursor length (nt)

Strand

G + C content (%)

MFEI value (−kcal/mol)

Accession/contig

Cro-miR157a Cro-miR160a Cro-miR162a Cro-miR164a Cro-miR166d Cro-miR167a Cro-miR168a Cro-miR169a Cro-miR169h Cro-miR171b Cro-miR172a Cro-miR319a Cro-miR396a Cro-miR398a Cro-miR408 Cro-miR828a

157 160 162 164 166 167 168 169 169 171 172 319 396 398 408 828

42.7 38.3 30.1 34.6 34.6 29.2 46.2 44.0 33.3 36.6 41.1 67.1 48.3 38.6 32.7 36.1

106 81 92 100 110 98 135 129 96 80 100 170 112 104 102 96

+ + − − − + − + + + − + − + + −

42.45 53.08 42.39 48.0 40.0 39.79 49.62 48.06 47.91 42.5 30.0 46.47 47.32 38.46 48.03 40.62

0.94 0.89 0.77 0.72 0.78 0.74 0.68 0.70 0.72 1.07 1.37 0.84 0.91 0.96 0.66 0.92

GACD01079399.1 Cr_TC46798 GACD01063067.1 GACD01081084.1 GACD01084270.1 GACD01057081.1 GACD01073538.1 FD424574.1 Cr_TC43569 Cr_TC42522 GACD01064541.1 GACD01009384.1 GACD01067162.1 GACD01003091.1 GACD01074202.1 GACD01042832.1

Next we attempted to identify the precursors (pre-miRNAs) of conserved cro-miRNAs to generate stem-loop hairpin structures. Due to the non-availability of C. roseus genome sequence information, we utilized EST and TSA datasets of NCBI and a recently available transcriptome data (http://nipgr.res.in/mjain.html?page=catharanthus) for hunting miRNA precursors. Transcriptome sequences have been successfully employed to identify miRNAs and their precursors in plant species (Li et al., 2013; Wan et al., 2012). Among all the 81 conserved cromiRNAs that were analyzed, precursors for 16 miRNAs could be identified whose stem-loop hairpin structures were generated by using C-mii tool. Out of 16 cro-miRNAs, 9 (cro-miR157a, cro-miR160a, cro-miR164a, cromiR167a, cro-miR168a, cro-miR169a, cro-miR169h, cro-miR396a and cro-miR828a) and 7 (cro-miR162a, cro-miR166d, cro-miR171b, cromiR172a, cro-miR319a, cro-miR398a and cro-miR408) were located on the 5′ and 3′ arm of precursors, respectively (Fig. 2a–p). The size of pre-miRNA sequences ranged between 80 and 170 nucleotides and minimal folding free energy (MFE) ranged from −29.2 to −67.1 kcal/mol. The minimum folding free energy index (MFEI) values of miRNA precursors ranged from −0.66 to −1.37 kcal/mol with an average of −0.85 kcal/mol, and, the average G + C content of all the sixteen cromiRNA precursors was 44.04% (Table 2a).

Wang et al., 2012) with considerable MFEI value according to Zhang et al. (2006) could be identified. Therefore, reference genome sequence from a close asterid S. lycopersicum was used for the prediction of novel cromiRNAs and their precursor-like sequences. Stem-loop hairpins were retained only when they comply with: (a) the mature miRNAs-like reads are mapped in the arm region of the precursors; (b) free energy of the secondary structure calculated by RNAfold is lower than −18 kcal/mol. A total of 29 putative novel miRNA-like candidates ranging between 18 and 26 nt were identified by Mireap which had no matches to any previously known plant miRNA (Supplementary Table 4). We further refined the predictions made by Mireap on the basis of minimum folding free energy index (MFEI) value which is a major criterion to distinguish miRNAs from other types of ncRNAs (Zhang et al., 2006). All the 29 candidates suggested by Mireap were further analyzed by using the online tool psRobot (Wu et al., 2012), and 7 non-protein coding precursor candidates aligning with S. lycopersicum genomic loci and forming a characteristic hairpin structure with considerable MFEI values were obtained and termed as C. roseus-specific novel miRNAs (Fig. 2q–w). The length of mature novel cro-miRNAs and their precursors ranged between 20 and 22 nt and 70–82 nt, respectively. The MFEI values for novel cro-miRNAs ranged from − 0.67 to −0.98 kcal/mol with an average of −0.79 kcal/mol (Table 2b).

3.3. C. roseus-specific novel miRNAs 3.4. Position-specific nucleotide dominance of C. roseus miRNAs To predict the novel C. roseus-specific miRNAs, the reads not aligning to miRBase were analyzed by Mireap software. Novel miRNAs are identified by Mireap based on alignment, secondary structure, free energy and location on the precursor arm. Since the genome sequence of C. roseus is not available in public databases, the C. roseus transcriptome shotgun assemblies (TSA), expressed sequence tags (EST) and nucleotide collection (nr) datasets of NCBI as well as transcriptome available at http://nipgr.res. in/mjain.html?page=catharanthus were searched to identify novel C. roseus miRNAs and their precursors. No candidate qualifying the empirical parameters used to predict miRNA precursors (Meyers et al., 2008; M.

Position-specific base analysis was carried out for the identified 81 known and 7 novel mature cro-miRNAs. Uracil was the dominant base (65.9%) at first nucleotide position towards the 5′ end of the mature miRNA sequences. Positions at the 19th and 22nd nucleotide were dominated by cytosine with an abundance of 48.86% and 50%, respectively. Guanine (40.91%) and adenine (37.5%) were preferred bases at positions 8 and 10, respectively. Overall, the nucleotide composition in C. roseus miRNAs was found to be 22.42% adenine, 24.93% cytosine, 23.63% guanine and 29.0% uracil (Fig. 3).

Table 2b C. roseus novel miRNAs and their characteristics. Chromosome

Novel miRNA

Sequence

miRNA length (nt)

Read count

Precursor position

Strand

Precursor length

%GC

MFEI (−kcal/mol)

SL2.40ch02 SL2.40ch02 SL2.40ch03 SL2.40ch05 SL2.40ch09 SL2.40ch11 SL2.40ch11

Cro-miRnovel01 Cro-miRnovel02 Cro-miRnovel03 Cro-miRnovel04 Cro-miRnovel05 Cro-miRnovel06 Cro-miRnovel07

GUUUUGAGCAAUAGAUGUCUUU CUUCAAACCAAAAGGCAUUACU GAAGGGGUAGAGAAAAUGCC CAACCUCAACCUCGGACCUUC AUCCUGGAUAUGUAACUGCUC CUCCUUAGCGGAUUUCGACUUC UGAACAAUGAAAAUAGAUGGCC

22 22 20 21 21 22 22

05 04 101 17 06 03 06

4,976,933–4,977,014 9,779,166–9,779,237 9,589,682–9,589,751 3,809,820–3,809,890 1,170,935–1,171,015 15,203,670–15,203,741 35,991,286–35,991,367

+ + + + + + +

82 72 70 71 81 72 82

32.9 38.9 34.3 42.3 35.8 50.0 41.5

0.94 0.77 0.67 0.98 0.71 0.67 0.84

MFE: minimal folding free energy; MFEI: minimal folding free energy index.

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

P. Prakash et al. / Gene xxx (2014) xxx–xxx

40

cro-miRnovel07

cro-miRnovel06

6

cro-miRnovel05

5

cro-miRnovel04

4

cro-miRnovel03

3

cro-miRnovel02

2

cro-miR398a

1

Novel cro-miRNAs cro-miRnovel01

500 bp 400 bp 300 bp

50

cro-miR397a

60

cro-miR319a

70

Conserved cro-miRNAs

cro-miR172a

Base percentage (%)

80

(a)

cro-miR169a

A C G U

90

cro-miR167a

100

cro-miR157a

10

7

8

9

10

11

12

13 14

200 bp

30

75 bp

20

(i) Stem

200 bp

Position of nucleotides

(ii) Leaf 500 bp 400 bp 300 bp 200 bp

3.5. Expression analysis of C. roseus miRNAs

MicroRNAs are important players for regulating the gene expression at the post-transcriptional level by binding with the target mRNAs and subsequently promoting translational repression or mRNA degradation. Because of the complementarity, the targets of plant miRNAs can be identified by using tools such as psRNATarget, Target-align and TAPIR. For assigning a functional role to 81 conserved and 7 novel cromiRNAs in diverse biological processes, their target genes were predicted by online web server psRNATarget. Several genes from plants were identified as putative targets of the C. roseus miRNAs among which many belonged to multiple gene families with diverse biological functions. A majority of the cro-miRNAs were found to target more than one type of transcript. Transcription factors such as squamosa promoter-binding protein (target of miR156/miR157), GAMyb-like1 and R2R3-MYB (regulated by miR159/miR828/miR858), auxin response factors (targeted by miR160), NAM, CUP-SHAPED COTYLEDON1 and NAC1 (regulated by miR164), HD-ZIP protein (controlled by miR165/miR166), CCAAT-binding transcription factor (target of miR169) and AP2-like transcription factor (regulated by miR172) were major targets of cro-miRNAs. Besides transcription factors, other targets included F-box protein (miR394), ATP-sulfurylase (miR395),

(b) 10.000 Stem Leaf Root

1.000

0.100

0.010

cro-miRnovel07

cro-miRnovel06

cro-miRnovel05

cro-miRnovel04

cro-miRnovel03

cro-miRnovel02

cro-miRnovel01

cro-miR398a

cro-miR397a

0.000

cro-miR319a

0.001

cro-miR172a

3.6. Prediction of targets of conserved and novel C. roseus-specific miRNAs

(iii) Root

Relative expression level

To determine the expression of conserved and novel cro-miRNAs in different vegetative tissues (leaf, stem and root) of C. roseus, stem-loop RT-PCR as well as quantitative real time PCR (stem-loop qRT-PCR) was carried out. Seven conserved miRNAs were randomly chosen while all the seven novel cro-miRNAs were selected for the expression analysis. Stem-loop RT-PCR for all the tested cro-miRNAs displayed the presence of expected-sized band indicating their expression in stem, leaf and root tissues of C. roseus (Fig. 4a). For relative quantification of cro-miRNA expression by stem-loop qRT-PCR, 5.8S rRNA was used as the endogenous reference. The relative expression of conserved and novel cro-miRNAs in different vegetative tissues is shown in Fig. 4b. Among the conserved miRNAs, the expression of cro-miR157a, cro-miR167a, cro-miR169a, cro-miR397a and cro-miR398a was maximal in leaf while cromiR172a and cro-miR319a showed abundant expression in the stem. In the case of novel C. roseus-specific miRNAs, cro-miRnovel01 and cro-miRnovel05 exhibited the highest expression in the leaf tissue, moderate in the stem and least expression in the root. Overall, a differential relative expression was observed for all the analyzed cro-miRNAs in the examined vegetative tissues.

75 bp

cro-miR169a

Fig. 3. Position-specific nucleotide dominance in C. roseus mature miRNAs. Uracil dominated the first nucleotide position towards the 5′ end of cro-miRNAs, and was found to be the most abundant base in terms of overall percentage followed by cytosine, guanine and adenine in the identified cro-miRNAs.

75 bp

cro-miR167a

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Overall

0

500 bp 400 bp 300 bp

cro-miR157a

10

Fig. 4. Expression analysis of miRNAs in different vegetative tissues (stem, leaf and root) of C. roseus. Expression of randomly selected conserved cro-miRNAs (miR157a, miR167a, miR169a, miR172a, miR319a, miR397a, and miR398a) and all the 7 novel cro-miRNAs were analyzed by (a) stem-loop RT-PCR and (b) stem-loop qRT-PCR (quantitative realtime PCR). Negative control reactions were also placed in which no template cDNA was used. For quantitative stem-loop real-time PCR, 5.8S rRNA was used as endogenous control for normalization and relative quantification. The analysis was done for three biological samples with technical replicates. Error bars represent standard deviation from the mean.

laccase (miR397), superoxide dismutase (miR398) etc. which were related to metabolism, developmental processes, cell cycle regulation or signal transduction. Targets of few C. roseus-specific novel miRNAs could also be predicted which included alkaline alpha-galactosidase, seed imbibition protein and sialyltransferase-like protein (Supplementary Table 5). Putative targets for all 88 cro-miRNAs were also searched in the ESTs and high-throughput transcriptome sequence data available for C. roseus. Out of them, targets for 12 cro-miRNAs which included 11 conserved miRNAs (cro-miR156c, cro-miR159a, cro-miR160a, cro-miR170, cro-miR172a, cro-miR394, cro-miR395a, cro-miR397a, cro-miR398b, cro-miR530a, cro-miR828a) and 1 novel C. roseus-specific miRNA (cro-

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

P. Prakash et al. / Gene xxx (2014) xxx–xxx

11

Table 3a List of targets predicted for conserved and novel miRNAs in C. roseus. C. roseus miRNA

Predicted target

Target accession/contig

Binding position

Strand

Score

MFE (−kcal/mol)

Cro-miR156c

Squamosa promoter binding-like protein

Cro-miR159a Cro-miR160a

GAMYB transcription factor Auxin responsive factor

Cro-miR170 Cro-miR172a Cro-miR394 Cro-miR395a

Scarecrow-like protein Ethylene responsive transcription factor Floral homeotic protein APETALA2 F-box protein Low affinity sulfate transporter

Cro-miR397a

Laccase

Cro-miR398b Cro-miR530a Cro-miR828a Cro-miRnovel01

Blue copper-binding protein BEL1-like homeodomain protein Anthocyanin regulatory C1 protein NAD(P)H-quinone oxidoreductase

GACD01025143.1 Cr_TC10851 Cr_TC03339 Cr_TC11023 GACD01043092.1 GACD01048223.1 Cr_TC10614 GACD01047734.1 Cr_TC06547 Cr_TC33737 Cr_TC14940 GACD01043992.1 Cr_TC32885 GACD01039352.1 Cr_TC16720 FD416130.1 Cr_TC48843 Cr_TC11314 GACD01001435.1

1770–1790 2609–2629 1170–1190 1401–1421 1092–1112 1582–1602 1616–1636 1474–1494 2404–2424 1429–1449 1289–1308 133–153 245–265 445–465 553–573 71–91 1813–1833 484–505 1065–1086

+ + + + − + + + + + + + + + + + + + +

2.0 0.0 0.0 0.5 1.5 1.0 1.0 1.0 1.5 1.5 1.0 1.5 1.5 1.5 1.5 1.5 2.0 1.0 0.0

−29.8 −35.5 −35.5 −34.8 −31.6 −41.8 −41.9 −31.3 −31.1 −31.1 −30.8 −31.5 −31.5 −28.6 −29.1 −35.7 −33.4 −29.2 −38.6

miRnovel01) could be predicted, the reason being the stringent parameters used to reduce false predictions. The C. roseus targets and their base pair alignment with respective cro-miRNAs are given in Tables 3a and 3b. Predicted targets in C. roseus also belonged to transcription factor category (squamosa promoter-binding-like protein, GAMyb, ARF, scarecrow-like protein, AP2-domain factor, BEL1-like homeodomain protein, anthocyanin regulatory C1 protein) and others such as F-box protein, sulfate transporter, laccase, copper-binding protein and NAD(P)H-quinone oxidoreductase. Most targets belonged to plant transcription factor categories that were also predicted as targets of conserved miRNAs in many other plants, which underlines a conserved role of miRNAs in plant developmental processes. For functional annotation, gene ontology (GO) analysis was done for the predicted C. roseus targets, which indicated their involvement in regulating diverse physiological processes (Supplementary Table 6). 4. Discussion Non-coding RNAs (miRNAs and siRNAs) have emerged as important regulators of gene expression since their discovery in C. elegans (Lee et al., 1993). During the past many years, it has been well demonstrated that miRNAs play critical roles in regulating various biological processes in plants at post-transcriptional level by degrading or repressing translation of target mRNAs (Bartel, 2004; Chen, 2009). Successful cloning, ability to sequence small RNAs and development of small RNA prediction tools led to the discovery and decipher roles of miRNAs in plants and animals (Lee and Ambros, 2001; Sunkar and Zhu, 2004). Plants have a complex genetic network to regulate development and secondary metabolism, therefore, identification of small RNAs could provide a deep insight into the underlying regulatory mechanisms. Recently developed high-throughput NexGen sequencing technologies have facilitated the identification of conserved and species-specific miRNAs on a large scale. Here, we report identification of conserved and novel miRNAs from C. roseus leaf, which synthesize and accumulate several secondary compounds including those having anticancer properties. Due to the pharmaceutical importance of secondary metabolites, attempts were made to predict and analyze miRNAs in several medicinal plants having complex secondary metabolic pathways (Pérez-Quintero et al., 2012; Qiu et al., 2009; Unver et al., 2010). In our study, we carried out deep sequencing of small RNAs from C. roseus leaf tissue and identified 81 conserved miRNAs belonging to 35 families and 7 C. roseus-specific miRNAs. The small RNA population of C. roseus was dominated by 24 nt long sequences similar to the length distribution observed for several other plant species like Pinus contorta

and Oryza sativa (Morin et al., 2008), Medicago truncatula (Szittya et al., 2008) and Arachis hypogaea (Zhao et al., 2010). A number of conserved cro-miRNAs showed high sequencing frequency while a majority of novel cro-miRNAs were represented by lower number of reads in the dataset, consistent with previous reports (Sunkar et al., 2008; Zhao et al., 2010). The number of reads by which each miRNA is represented in the library is indicative of their relative abundance. Similar to previous reports (Paul et al., 2014; Szittya et al., 2008; Zhao et al., 2010), certain C. roseus miRNAs belonging to families such as miR166 and miR157 were abundant, while members of miR159, miR396 and miR394 family were also detected in excessively large numbers indicating their tissue-abundant expression and roles. The miR169 was found to be the most frequently sequenced in model plants like rice and wheat (Sunkar et al., 2008; Yao et al., 2007), contrary to our findings in C. roseus where it exhibited low abundance. Uracil constituted nearly 65.9% of the bases at the 5′ end of mature cro-miRNAs while cytosine happened to be a frequent one at the 19th nucleotide position, consistent with mature miRNA features reported in other plants (Zhang et al., 2008). In addition, the 22nd nucleotide position also had cytosine in majority of cro-miRNAs. Mature miRNAs having uridine at the 5′ position are preferentially recruited by Argonaute 1 (AGO1), 5′ terminal adenosine containing miRNAs are preferred by AGO2 and AGO4, whereas AGO5 mostly recruits miRNAs that start with 5′ cytosine (Mi et al., 2008). Uracil dominance at the 5′ terminus of mature cro-miRNAs indicates their preferential recruitment by AGO1 for incorporation into the RNA induced silencing complex (RISC) leading to either target mRNA cleavage or repression of translation. Dicer-like1 (DCL1) preferentially recognizes duplex pre-miRNAs with 5′ phosphate and 3′ overhang and cleaves both strands of pre-miRNA with its intrinsic RNaseIII activity leaving a miRNA:miRNA* imperfect duplex with 5′ phosphate and approximately 2 nt overhang at the 3′ end (Bartel, 2004). We observed the characteristic DCL1-cleaved 2 nt 3′ overhang in the case of all the predicted stem-loop hairpin structures of conserved cro-miRNA precursors. The complementary miRNA strand (miRNA*) that pairs with miRNA in the DCL1 product, can also be detected by deep sequencing, which is usually considered to be excluded through degradation pathway after mature miRNAs are formed. Generally, mature miRNA of miRNA: miRNA* gets incorporated into RISC due to uracil at the first position, while miRNA* usually degrades as it lacks uracil as the first base (Baumberger and Baulcombe, 2005). Recent studies support a role of plant miRNA* in important cellular functions since their expression gets enhanced during certain stages or physiological conditions (Hsieh et al., 2009; Li et al., 2010). Deep sequencing of C. roseus small RNAs

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

12

P. Prakash et al. / Gene xxx (2014) xxx–xxx

led to the identification of miRNA* sequences for eight miRNA families (miR160, miR164, miR168, miR169, miR172, miR396, miR396 and miR477) suggesting their active biological role in leaf tissue. In order to observe expression of identified cro-miRNAs, stem-loop RT-PCR and qRT-PCR were performed for 7 novel and 7 randomly selected conserved cro-miRNAs. All of them expressed not only in leaf but also in stem and root, while, majority of them showed relatively high level in

leaf tissue. Moderate to lower expression seen in stem and root reflects that, besides leaf, they might play regulatory roles in these tissues as well. To gain an insight into the functional relevance of the identified cromiRNAs, their targets were predicted in different plants including C. roseus. Several putative targets of cro-miRNAs predicted in different plant species include several transcription factors (controlling growth

Table 3b C. roseus miRNAs and their complementary binding sites within their target mRNAs.

C. roseus miRNAs and their predicted target IDs

Base pair alignment of cro–miRNAs with their respective target

cro–miR156c Target GACD01025143.1 (Squamosa promoter binding–like protein) cro–miR156c Target Cr_TC10851 (Squamosa promoter binding–like protein) cro–miR156c TargetCr_TC03339 (Squamosa promoter binding–like protein) cro–miR156c TargetCr_TC11023 (Squamosa promoter binding–like protein) cro–miR159a Target GACD01043092.1 (GAMYB transcription factor) cro–miR160a Target GACD01048223.1 (Auxin responsive factor) cro–miR160a Target Cr_TC10614 (Auxin responsive factor) cro–miR170 Target GACD01047734.1 (Scarecrow–like protein) cro–miR172a Target Cr_TC06547 (Ethylene responsive transcription factor) cro–miR172a Target Cr_TC33737 (Floral homeotic protein APETALA2) cro–miR394 Target Cr_TC14940 (F–box protein) cro–miR395a Target GACD01043992.1 (Low affinity sulfate transporter) cro–miR395a TargetCr_TC32885 (Low affinity sulfate transporter) cro–miR397a Target GACD01039352.1 (Laccase)

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

P. Prakash et al. / Gene xxx (2014) xxx–xxx

13

Table 3b (continued)

cro-miR397a Target Cr_TC16720 (Laccase) cro-miR398b Target FD416130.1 (Blue copper-binding protein) cro-miR530a Target Cr_TC48843 (BEL1-like homeodomain protein) cro-miR828a Target Cr_TC11314 (Anthocyanin regulatory C1 protein) cro-miRnovel01 Target GACD01001435.1 (NAD(P)H-quinone oxidoreductase) Watson–Crick and G:U wobble pairings are indicated by vertical lines (|) and colons (:), respectively, while mismatches are represented by (*) symbol.

and development) and genes for various other physiological processes such as ATP sulfurylase (related to sulfur assimilation and reduction), F-box protein and ubiquitin-conjugating enzyme (involved in protein degradation) and superoxide dismutase (SODs) which counteract oxidative stress. Besides other plants, target genes for 12 cro-miRNAs could also be predicted in C. roseus, which were associated with transcriptional regulation, metabolic process, signal transduction and secondary metabolism. Among conserved miRNAs, cro-miR156c was predicted to target squamosa promoter binding protein-like (SPL) family transcription factors. A subset of SBP-domain containing SPL proteins are regulated by miR156/miR157 family members; such as SPL3 which regulates flowering (Gandikota et al., 2007) and SPL9, a negative regulator of anthocyanin biosynthesis (Gou et al., 2011) in Arabidopsis. GAMYB family transcription factor was one of the predicted targets of cro-miR159c in our study. In Arabidopsis, miR159a and miR159b negatively regulate GAMYB-like genes MYB33 and MYB65, and repress plant growth (Alonso-Peral et al., 2010). It has also been proven that GAMYB factors are involved in GA signaling and miR159 mediates its regulation affecting anther development and flowering time in Arabidopsis (Achard et al., 2004). Previous studies showed that MYBs are targeted by different miRNAs such as miR159, miR828, and miR858. The miR159-targeted MYBs are involved in embryogenesis, anther and pollen development, miR828-targeted MYBs were found to control primary and secondary metabolism-related processes leading to anthocyanin biosynthesis, and, miR858 targets MYBs of diverse biological processes related to cell wall formation, lignification, anthocyanin biosynthesis and response to stresses (Xia et al., 2012). Auxin response factors (ARFs) which regulate plant growth and development by mediating auxin signaling were predicted as targets of cro-miR160a. In A. thaliana, it has been observed that miR160 targets ARF17 which act as regulator of early auxin response genes (Mallory et al., 2005). ARF10 is negatively regulated by miR160 and plays an important role in seed germination and post-embryonic development (Liu et al., 2007). SCARECROW-like proteins regulate radial patterning in roots, gibberellin- and light-mediated signaling in plants. They were predicted as targets of miR170 (Rhoades et al., 2002), and were also predicted as a target of cro-miR170a in our study. Floral homeotic gene APETALA2 plays a crucial role in flower development (Yant et al., 2010) and is predicted to be targeted by miR172a of C. roseus. The miR530a seems to

regulate the expression of BEL1-like homeodomain protein which regulates morphogenesis in Arabidopsis (Kumar et al., 2007). The biosynthesis of red and purple colored anthocyanin pigments is controlled by several regulatory genes and one such gene is C1 which encodes a transcription factor (Paz-Ares et al., 1987). Our studies indicated that the C1 regulatory gene could be targeted by cro-miR828a. In C. roseus, miR395 was predicted to target a low affinity sulfate transporter protein. Sulfate assimilation in plants has been reported to be regulated by miR395 which targets three isoforms of ATP sulfurylase and low affinity sulfate transporter causing an increase in translocation of sulfate from roots to shoots during sulfate starvation (Kawashima et al., 2011). Laccase genes are predicted to be secondary metabolism-related and seems to be targeted by cro-miR397a in C. roseus. In Populus trichocarpa, overexpression of ptr-miR397a led to downregulation of 17 laccases, and reduced lignin content was observed in transgenic lines (Lu et al., 2013). One of the novel miRNA identified in this report, cro-miRnovel01, was predicted to target NAD(P)H-quinone oxidoreductase which carry out reduction of quinones to quinols in plants (Sparla et al., 1996). Apparently, miRNAs might play crucial roles in essential biological functions in C. roseus as well, and there is a possibility of existence of miRNA candidates that regulate secondary metabolism producing pharmaceutically important compounds. In summary, we report conserved, non-conserved and novel miRNAs from the medicinally important plant C. roseus for the first time by employing high-throughput sequencing. Prediction of known and new cro-miRNA targets and their functional annotation indicated their involvement in diverse processes like transcriptional regulation, primary and secondary metabolic processes, development, and signal transduction. The expression of these small RNAs in vegetative tissues provides an evidence for their regulatory activities. Besides providing first hand information regarding cro-miRNAs and their expected targets, our study enlightens the understanding of miRNA-mediated gene regulation and contributes to the miRNA sequence resources of medicinal plants. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2014.10.046. Conflict of interest The authors declare that there is no conflict of interest.

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

14

P. Prakash et al. / Gene xxx (2014) xxx–xxx

Acknowledgments This research work was financially supported by Council of Scientific and Industrial Research, Govt. of India, under MLP03 (XI-FYP) project of CSIR-CIMAP, Lucknow. Fellowships from Department of Biotechnology (DBT) and University Grants Commission, Govt. of India, are gratefully acknowledged. The authors are thankful to the Director, CSIR-CIMAP, for encouragement and providing infrastructural facilities.

References Achard, P., Herr, A., Baulcombe, D.C., Harberd, N.P., 2004. Modulation of floral development by a gibberellin-regulated microRNA. Development 131, 3357–3365. Alonso-Peral, M.M., Li, J., Li, Y., Allen, R.S., Schnippenkoetter, W., Ohms, S., White, R.G., Millar, A.A., 2010. The microRNA159-regulated GAMYB-like genes inhibit growth and promote programmed cell death in Arabidopsis. Plant Physiol. 154, 757–771. Aukerman, M.J., Sakai, H., 2003. Regulation of flowering time and floral organ identity by a microRNA and its APETALA2-like target genes. Plant Cell 15, 2730–2741. Axtell, M.J., Bowman, J.L., 2008. Evolution of plant microRNAs and their targets. Trends Plant Sci. 13, 343–349. Bartel, D.P., 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281–297. Baumberger, N., Baulcombe, D.C., 2005. Arabidopsis ARGONAUTE1 is an RNA Slicer that selectively recruits microRNAs and short interfering RNAs. Proc. Natl. Acad. Sci. U. S. A. 102, 11928–11933. Bollman, K.M., Aukerman, M.J., Park, M.Y., Hunter, C., Berardini, T.Z., Poethig, R.S., 2003. HASTY, the Arabidopsis ortholog of exportin 5/MSN5, regulates phase change and morphogenesis. Development 130, 1493–1504. Brodersen, P., Sakvarelidze-Achard, L., Bruun-Rasmussen, M., Dunoyer, P., Yamamoto, Y.Y., Sieburth, L., Voinnet, O., 2008. Widespread translational inhibition by plant miRNAs and siRNAs. Science 320, 1185–1190. Chen, X., 2009. Small RNAs and their roles in plant development. Annu. Rev. Cell Dev. Biol. 25, 21–44. Colaiacovo, M., Subacchi, A., Bagnaresi, P., Lamontanara, A., Cattivelli, L., Faccioli, P., 2010. A computational-based update on microRNAs and their targets in barley (Hordeum vulgare L.). BMC Genomics 11, 595. Dai, X., Zhao, P.X., 2011. psRNATarget: a plant small RNA target analysis server. Nucleic Acids Res. 39, W155–W159. El-Sayed, M., Verpoorte, R., 2007. Catharanthus terpenoid indole alkaloids: biosynthesis and regulation. Phytochem. Rev. 6, 277–305. Fahlgren, N., Howell, M.D., Kasschau, K.D., Chapman, E.J., Sullivan, C.M., Cumbie, J.S., Givan, S.A., Law, T.F., Grant, S.R., Dangl, J.L., Carrington, J.C., 2007. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS One 2, e219. Gandikota, M., Birkenbihl, R.P., Höhmann, S., Cardon, G.H., Saedler, H., Huijser, P., 2007. The miRNA156/157 recognition element in the 3′ UTR of the Arabidopsis SBP box gene SPL3 prevents early flowering by translational inhibition in seedlings. Plant J. 49, 683–693. Gébelin, V., Argout, X., Engchuan, W., Pitollat, B., Duan, C., Montoro, P., Leclercq, J., 2012. Identification of novel microRNAs in Hevea brasiliensis and computational prediction of their targets. BMC Plant Biol. 12, 18. Gou, J.Y., Felippes, F.F., Liu, C.J., Weigel, D., Wang, J.W., 2011. Negative regulation of anthocyanin biosynthesis in Arabidopsis by a miR156-targeted SPL transcription factor. Plant Cell 23, 1512–1522. Guo, L., Lu, Z., 2010. The fate of miRNA* strand through evolutionary analysis: implication for degradation as merely carrier strand or potential regulatory molecule? PLoS One 5, e11387. Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids. Symp. Ser. 41, 95–98. Hsieh, L.C., Lin, S.I., Shih, A.C., Chen, J.W., Lin, W.Y., Tseng, C.Y., Li, W.H., Chiou, T.J., 2009. Uncovering small RNA‐mediated responses to phosphate deficiency in Arabidopsis by deep sequencing. Plant Physiol. 151, 2120–2132. Jones-Rhoades, M.W., Bartel, D.P., Bartel, B., 2006. MicroRNAs and their regulatory roles in plants. Annu. Rev. Plant Biol. 57, 19–53. Katiyar, A., Smita, S., Chinnusamy, V., Pandey, D.M., Bansal, K., 2012. Identification of miRNAs in sorghum by using bioinformatics approach. Plant Signal. Behav. 7, 246–259. Kawashima, C.G., Matthewman, C.A., Huang, S., Lee, B.R., Yoshimoto, N., Koprivova, A., Rubio-Somoza, I., Todesco, M., Rathjen, T., Saito, K., Takahashi, H., Dalmay, T., Kopriva, S., 2011. Interplay of SLIM1 and miR395 in the regulation of sulfate assimilation in Arabidopsis. Plant J. 66, 863–876. Kim, J., Jung, J.H., Reyes, J.L., Kim, Y.S., Kim, S.Y., Chung, K.S., Kim, J.A., Lee, M., Lee, Y., Kim, V.N., Chua, N.H., Park, C.M., 2005. MicroRNA-directed cleavage of ATHB15 mRNA regulates vascular development in Arabidopsis inflorescence stems. Plant J. 42, 84–94. Kumar, R., Kushalappa, K., Godt, D., Pidkowich, M.S., Pastorelli, S., Hepworth, S.R., Haughn, G.W., 2007. The Arabidopsis BEL1-LIKE HOMEODOMAIN proteins SAW1 and SAW2 act redundantly to regulate KNOX expression spatially in leaf margins. Plant Cell 9, 2719–2735. Kurihara, Y., Watanabe, Y., 2004. Arabidopsis microRNA biogenesis through Dicer-like 1 protein functions. Proc. Natl. Acad. Sci. U. S. A. 101, 12753–12758. Lakhotia, N., Joshi, G., Bhardwaj, A.R., Katiyar-Agarwal, S., Agarwal, M., Jagannath, A., Goel, S., Kumar, A., 2014. Identification and characterization of miRNAome in root, stem, leaf and tuber developmental stages of potato (Solanum tuberosum L.) by highthroughput sequencing. BMC Plant Biol. 14, 6.

Lee, R.C., Ambros, V., 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862–864. Lee, R.C., Feinbaum, R.L., Ambros, V., 1993. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843–854. Li, H., Deng, Y., Wu, T., Subramanian, S., Yu, O., 2010. Mis‐expression of miR482, miR1512, and miR1515 increases soybean nodulation. Plant Physiol. 153, 1759–1770. Li, C., Zhu, Y., Guo, X., Sun, C., Luo, H., Song, J., Li, Y., Wang, L., Qian, J., Chen, S., 2013. Transcriptome analysis reveals ginsenosides biosynthetic genes, microRNAs and simple sequence repeats in Panax ginseng C. A. Meyer. BMC Genomics 14, 245. Liu, P.P., Montgomery, T.A., Fahlgren, N., Kasschau, K.D., Nonogaki, H., Carrington, J.C., 2007. Repression of auxin response factor10 by microRNA160 is critical for seed germination and post-germination stages. Plant J. 52, 133–146. Lu, S.F., Sun, Y.H., Shi, R., Clark, C., Li, L.G., Chiang, V.L., 2005. Novel and mechanical stressresponsive microRNAs in Populus trichocarpa that are absent from Arabidopsis. Plant Cell 17, 2186–2203. Lu, S., Li, Q., Wei, H., Chang, M.J., Tunlaya-Anukit, S., Kim, H., Liu, J., Song, J., Sun, Y.H., Yuan, L., Yeh, T.F., Peszlen, I., Ralph, J., Sederoff, R.R., Chiang, V.L., 2013. Ptr-miR397a is a negative regulator of laccase genes affecting lignin content in Populus trichocarpa. Proc. Natl. Acad. Sci. U. S. A. 110, 10848–10853. Lukasik, A., Pietrykowska, H., Paczek, L., Szweykowska-Kulinska, Z., Zielenkiewicz, P., 2013. High-throughput sequencing identification of novel and conserved miRNAs in the Brassica oleracea leaves. BMC Genomics 14, 801. Luo, X., Gao, Z., Shi, T., Cheng, Z., Zhang, Z., Ni, Z., 2013. Identification of miRNAs and their target genes in peach (Prunus persica L.) using high-throughput sequencing and degradome analysis. PLoS One 8, e79090. Mallory, A.C., Bartel, D.P., Bartel, B., 2005. MicroRNA-directed regulation of Arabidopsis auxin response factor17 is essential for proper development and modulates expression of early auxin response genes. Plant Cell 17, 1360–1375. Meyers, B.C., Axtell, M.J., Bartel, B., Bartel, D.P., Baulcombe, D., Bowman, J.L., Cao, X., Carrington, J.C., Chen, X., Green, P.J., Griffiths-Jones, S., Jacobsen, S.E., Mallory, A.C., Martienssen, R.A., Poethig, R.S., Qi, Y., Vaucheret, H., Voinnet, O., Watanabe, Y., Weigel, D., Zhu, J.K., 2008. Criteria for annotation of plant MicroRNAs. Plant Cell 20, 3186–3190. Mi, S., Cai, T., Hu, Y., Chen, Y., Hodges, E., Ni, F., Wu, L., Li, S., Zhou, H., Long, C., Chen, S., Hannon, G.J., Qi, Y., 2008. Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5′ terminal nucleotide. Cell 133, 116–127. Millar, A., Gubler, F., 2005. The Arabidopsis GAMYB-like genes, MYB33 and MYB65, are microRNA-regulated genes that redundantly facilitate anther development. Cell 17, 705–721. Morin, R.D., Aksay, G., Dolgosheina, E., Ebhardt, H.A., Magrini, V., Mardis, E.R., Sahinalp, S.C., Unrau, P.J., 2008. Comparative analysis of the small RNA transcriptomes of Pinus contorta and Oryza sativa. Genome Res. 18, 571–584. Murata, J., De Luca, V., 2005. Localization of tabersonine 16-hydroxylase and 16-OH tabersonine-16-O-methyltransferase to leaf epidermal cells defines them as a major site of precursor biosynthesis in the vindoline pathway in Catharanthus roseus. Plant J. 44, 581–594. Mustafa, N.R., Verpoorte, R., 2007. Phenolic compounds in Catharanthus roseus. Phytochem. Rev. 6, 243–258. Navarro, L., Dunoyer, P., Jay, F., Arnold, B., Dharmasiri, N., Estelle, M., Voinnet, O., Jones, J.D., 2006. A plant miRNA contributes to antibacterial resistance by repressing auxin signaling. Science 312, 436–439. Numnark, S., Mhuantong, W., Ingsriswang, S., Wichadakul, D., 2012. C-mii: a tool for plant miRNA and target identification. BMC Genomics 13, S16. Palatnik, J.F., Allen, E., Wu, X.L., Schommer, C., Schwab, R., Carrington, J.C., Weigel, D., 2003. Control of leaf morphogenesis by microRNAs. Nature 425, 257–263. Pantaleo, V., Szittya, G., Moxon, S., Miozzi, L., Moulton, V., Dalmay, T., Burgyan, J., 2010. Identification of grapevine microRNAs and their targets using high-throughput sequencing and degradome analysis. Plant J. 62, 960–976. Paul, S., Kundu, A., Pal, A., 2014. Identification and expression profiling of Vigna mungo microRNAs from leaf small RNA transcriptome by deep sequencing. J. Integr. Plant Biol. 56, 15–23. Paz-Ares, J., Ghosal, D., Wienand, U., Peterson, P.A., Saedler, H., 1987. The regulatory c7 locus of Zea mays encodes a protein with homology to myb proto-oncogene products and with structural similarities to transcriptional activators. EMBO J. 6, 3553–3558. Pérez-Quintero, A.L., Sablok, G., Tatarinova, T.V., Conesa, A., Kuo, J., López, C., 2012. Mining of miRNAs and potential targets from gene oriented clusters of transcripts sequences of the anti-malarial plant, Artemisia annua. Biotechnol. Lett. 34, 737–745. Puzey, J.R., Karger, A., Axtell, M., Kramer, E.M., 2012. Deep annotation of Populus trichocarpa microRNAs from diverse tissue sets. PLoS One 7, e33034. Qiu, D., Pan, X., Wilson, I., Li, F., Liu, M., Teng, W., Zhang, B., 2009. High throughput sequencing technology reveals that the taxoid elicitor methyl jasmonate regulates microRNA expression in Chinese yew (Taxus chinensis). Gene 436, 37–44. Rhoades, M., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B., Bartel, D.P., 2002. Prediction of plant microRNA targets. Cell 110, 513–520. Song, J.B., Huang, S.Q., Dalmay, T., Yang, Z.M., 2012. Regulation of leaf morphology by microRNA394 and its target LEAF CURLING RESPONSIVENESS. Plant Cell Physiol. 53, 1283–1294. Sparla, F., Tedeschi, G., Trost, P., 1996. NAD(P)H:(quinone-acceptor) oxidoreductase of tobacco leaves is a flavin mononucleotide-containing flavoenzyme. Plant Physiol. 112, 249–258. St-Pierre, B., Vazquez-Flota, F.A., De Luca, V., 1999. Multicellular compartmentation of Catharanthus roseus alkaloid biosynthesis predicts intercellular translocation of a pathway intermediate. Plant Cell 11, 887–900. Su, C., Yang, X., Gao, S., Tang, Y., Zhao, C., Li, L., 2014. Identification and characterization of a subset of microRNAs in wheat (Triticum aestivum L.). Genomics 103, 298–307. Sunkar, R., Jagadeeswaran, G., 2008. In silico identification of conserved microRNAs in large number of diverse plant species. BMC Plant Biol. 8, 37.

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

P. Prakash et al. / Gene xxx (2014) xxx–xxx Sunkar, R., Zhu, J., 2004. Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16, 2001–2019. Sunkar, R., Girke, T., Jain, P.K., Zhu, J., 2005. Cloning and characterization of microRNAs from Rice. Plant Cell 17, 1397–1411. Sunkar, R., Zhou, X., Zheng, Y., Zhang, W., Zhu, J.K., 2008. Identification of novel and candidate miRNAs in rice by high throughput sequencing. BMC Plant Biol. 8, 25. Szittya, G., Moxon, S., Santos, D.M., Jing, R., Fevereiro, M.P., Moulton, V., Dalmay, T., 2008. High‐throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families. BMC Genomics 9, 593. Unver, T., Budak, H., 2009. Conserved microRNAs and their targets in model grass species Brachypodium distachyon. Planta 230, 659–669. Unver, T., Parmaksiz, I., Dündar, E., 2010. Identification of conserved micro-RNAs and their target transcripts in opium poppy (Papaver somniferum L). Plant Cell Rep. 29, 757–769. Varkonyi-Gasic, E., Wu, R., Wood, M., Walton, E.F., Hellens, R.P., 2007. Protocol: a highly sensitive RT-PCR method for detection and quantification of microRNAs. Plant Methods 3, 12. Wan, L.C., Zhang, H., Lu, S., Zhang, L., Qiu, Z., Zhao, Y., Zeng, Q.Y., Lin, J., 2012. Transcriptome-wide identification and characterization of miRNAs from Pinus densata. BMC Genomics 13, 132. Wang, X.J., Reyes, J.L., Chua, N.H., Gaasterland, T., 2004. Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol. 5, R65. Wang, J.W., Wang, L.J., Mao, Y.B., Cai, W.J., Xue, H.W., Chen, X.Y., 2005. Control of root cap formation by microRNA-targeted auxin response factors in Arabidopsis. Plant Cell 17, 2204–2216. Wang, C., Han, J., Liu, C., Kibet, K.N., Kayesh, E., Shangguan, L., Li, X., Fang, J., 2012a. Identification of microRNAs from Amur grape (Vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics. BMC Genomics 13, 122. Wang, M., Wang, Q., Wang, B., 2012b. Identification and characterization of microRNAs in asiatic cotton (Gossypium arboreum L.). PLoS One 7, e33696. Wu, H.J., Ma, Y.K., Chen, T., Wang, M., Wang, X.J., 2012. PsRobot: a web-based plant small RNA meta-analysis toolbox. Nucleic Acids Res. 40, W22–W28.

15

Xia, R., Zhu, H., An, Y.Q., Beers, E.P., Liu, Z., 2012. Apple miRNAs and tasiRNAs with novel regulatory networks. Genome Biol. 13, R47. Yant, L., Mathieu, J., Dinh, T.T., Ott, F., Lanz, C., Wollmann, H., Chen, X., Schmid, M., 2010. Orchestration of the floral transition and floral development in Arabidopsis by the bifunctional transcription factor APETALA2. Plant Cell 22, 2156–2170. Yao, Y., Guo, G., Ni, Z., Sunkar, R., Du, J., Zhu, J.K., Sun, Q., 2007. Cloning and characterization of microRNAs from wheat (Triticum aestivum L.). Genome Biol. 8, R96. Yin, Z., Li, C., Han, X., Shen, F., 2008. Identification of conserved microRNAs and their target genes in tomato (Lycopersicon esculentum). Gene 414, 60–66. Yu, B., Yang, Z., Li, J., Minakhina, S., Yang, M., Padgett, R.W., 2005. Methylation as a crucial step in plant microRNA biogenesis. Science 307, 932–935. Yu, X., Wang, H., Lu, Y., de Ruiter, M., Cariaso, M., Prins, M., van Tunen, A., He, Y., 2012. Identification of conserved and novel microRNAs that are responsive to heat stress in Brassica rapa. J. Exp. Bot. 63, 1025–1038. Zhang, B.H., Pan, X.P., Cox, S.B., Cobb, G.P., Anderson, T.A., 2006. Evidence that miRNAs are different from other RNAs. Cell. Mol. Life Sci. 63, 246–254. Zhang, B., Pan, X., Stellwag, E.J., 2008. Identification of soybean microRNAs and their targets. Planta 229, 161–182. Zhao, B., Liang, R., Ge, L., Li, W., Xiao, H., Lin, H., Ruan, K., Jin, Y., 2007. Identification of drought-induced microRNAs in rice. Biochem. Biophys. Res. Commun. 354, 585–590. Zhao, C.Z., Xia, H., Frazier, T.P., Yao, Y.Y., Bi, Y.P., Li, A.Q., Li, M.J., Li, C.S., Zhang, B.H., Wang, X.J., 2010. Deep sequencing identifies novel and conserved microRNAs in peanuts (Arachis hypogaea L.). BMC Plant Biol. 10, 3. Zhou, X., Wang, G., Zhang, W., 2007. UV-B responsive microRNA genes in Arabidopsis thaliana. Mol. Syst. Biol. 3, 103. Zhou, X., Wang, G., Sutoh, K., Zhu, J.K., Zhang, W., 2008. Identification of cold-inducible microRNAs in plants by transcriptome analysis. Biochim. Biophys. Acta 1779, 780–788. Zhou, L., Liu, Y., Liu, Z., Kong, D., Duan, M., Luo, L., 2010. Genome-wide identification and analysis of drought-responsive microRNAs in Oryza sativa. J. Exp. Bot. 61, 4157–4168.

Please cite this article as: Prakash, P., et al., Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets..., Gene (2014), http://dx.doi.org/10.1016/j.gene.2014.10.046

Identification of conserved and novel microRNAs in Catharanthus roseus by deep sequencing and computational prediction of their potential targets.

MicroRNAs are small endogenous non-coding RNAs of ~19-24 nucleotides and perform regulatory roles in many plant processes. To identify miRNAs involved...
2MB Sizes 3 Downloads 11 Views