Research Article Received: 22 May 2013

Revised: 7 October 2013

Accepted: 15 October 2013

Published online in Wiley Online Library: 19 November 2013

( DOI 10.1002/psc.2584

Mining antimicrobial peptides from small open reading frames in Ciona intestinalis Yongzhong Lu,* Yu Zhuang and Jie Liu Though being able to encode various kinds of bioactive peptides, small open reading frames (sORFs) are poorly annotated in many genomic data. The present study was conducted to evaluate the potential of sORFs in encoding antimicrobial peptides (AMPs) in the basal chordate model Ciona intestinalis. About 4.8 M genomic sequence was first retrieved for sORFs mining by the program sORFfinder, then the sORFs were translated into amino acid sequences for AMP prediction via CAMP server, and thereafter, ten putative AMPs were selected for expression and antimicrobial activity validation. In total, over 180 peptides deduced from the sORFs were predicted to be AMPs. Among the ten tested peptides, six were found to have significant expressed sequence tag matches, providing strong evidence for gene expression; five were proved to be active against the bacterial strains. These results indicate that many sORFs in C. intestinalis genome contain AMP information. This work can serve as an important initial step to investigate the role of sORFs in the innate defense of C. intestinalis. Copyright © 2013 European Peptide Society and John Wiley & Sons, Ltd. Additional supporting information may be found in the online version of this article at the publisher’s web site. Keywords: antimicrobial activity; bioinformatics; gene annotation; genomic sequence; secondary structure

Introduction The increasing resistance of bacteria or fungi to currently available antibiotics demands efforts to develop new antibiotics with new modes of actions [1]. Although hundreds of natural antimicrobial products have been discovered, they were not as successful as anticipated, so most pharmaceutical companies have redirected their screening efforts from natural product libraries to other strategies, including synthesis of low molecular weight compounds and in silico analysis of genomic data [2–4]. The enormous genomic data have become the promising sources of putative AMPs as a result of bioinformatics progress [5–8]. AMPs are small-sized, gene-encoded antibiotics, which are generally in length between 12 and 100 amino acids. Yet, many bioinformatics approaches are usually designed to recognize larger genes, rather than short sequences [5–9]. Small open reading frames, largely distributed in eukaryotic genomic sequences, have been proved to have wide biological functionality [10–12]. However, relatively few sORFs in genomes have been annotated because they are often missed by most gene finders [10–12]. According to our available data, few AMPs directly encoded by sORFs have been reported in eukaryotic organisms. To investigate the potential for sORFs to encode AMPs in eukaryotic organisms, a basal chordate model Ciona intestinalis, which is valuable in innate immunity study, was selected for our work [13,14].

Materials and Methods Strains

J. Pept. Sci. 2014; 20: 25–29

Bioinformatics Analysis The genomic sequence of Chromosome 2 of C. intestinalis was obtained from GenBank. SORFs were generated using the server sORFfinder, with a prior probability of 50%, and C. intestinalis as the model [15,16]. All sORFs were then translated into amino acid sequences by the software Primer5.0 and saved in FASTA format for further analysis. Potential AMPs were predicted via the server CAMP with both SVM and RF algorithms [17,18]. Peptide Synthesis According to their confidences, ten putative AMPs were selected to be synthesized on an ABI 433A peptide synthesizer (Applied Biosystems, USA) using Fmoc chemistry and standard side-chain protection. The cysteines in peptides P-01 and P-05 were all protected by acid-labile S-trityl moieties. The synthesized peptides were cleaved from resin and purified by RP-HPLC using a Kromasil 100-5C18 column (4.6 mm × 100 mm, 5 μm; AkzoNobel Pulp and Performance Chemicals, Sweden) with a flow rate of * Correspondence to: Yongzhong Lu, Biology Department, Qingdao University of Science and Technology, No. 53 Zhengzhou Road, Qingdao 266042, China. E-mail: [email protected] Biology Department, Qingdao University of Science and Technology, Qingdao 266042, China Abbreviations: sORF, small open reading frame; AMP, antimicrobial peptide; SVM, support vector machines; RF, random forests; EST, expressed sequence tag; CAMP, collection of anti-microbial peptides; CFU, colony forming units.

Copyright © 2013 European Peptide Society and John Wiley & Sons, Ltd.


Seven microbial strains, including three Gram-positive bacteria Staphylococcus aureus, Streptococcus mutans, and Micrococcus tetragenus, three Gram-negative bacteria Proteus vulgaris Hauser, Escherichia coli, and Pseudomonas aeruginosa (Schroeter) Migula,

and a fungal strain Candida albicans, were used in this study for antimicrobial activity analysis. They were all purchased from Guangdong Culture Collection Centre of Microbiology (China) and maintained on nutrient agar slants at 4 °C.

LU, ZHUANG AND LIU retrieved for analysis. In total, 5606 sORFs were found within this sequence, and more than 180 putative AMPs with the confidences above 50% were distributed in them. The results calculated by SVM and RF were similar, except that the number of putative AMPs in each confidence interval was different (Figure 1).

1 ml/min. The molecular masses of the purified peptides were characterized by a Q-trap mass spectrometer (Applied Biosystems). These peptides were lyophilized and stored at 20 °C. The sequences and confidences of the ten peptides calculated by algorithms SVM and RF are presented in Table 1. Antimicrobial Activity Analysis

Antimicrobial Activity of the Peptides

Detection of antimicrobial activity was performed by standard disc diffusion method [19]. To prepare the inoculums for the experiment, one colony of each strain from their respective selective agar medium was inoculated into 5 ml nutrient broth by aseptic technique and incubated for 8–12 h at 37 °C, and then the culture of each strain was diluted to a final concentration of 1 × 107 CFU/ml, which was equivalent to a McFarland 0.5 turbidity standard. The prepared inoculums were spread evenly on the surface of the agar media. Mueller–Hinton blood agar was used for S. mutans Clarke; nutrient agar was used for P. vulgaris Hauser, E. coli, P. aeruginosa, S. aureus, and M. tetragenus; and Wort Agar was used for C. albicans. After the liquid was absorbed, a total of 10 μl peptide solution (100 μg/ml) was dropped on the surface of each plate with the help of a micropipette. The inoculated plates were incubated at 37 °C for 24 h. Zones of inhibition were measured for all plates to value the antimicrobial effects of the peptides. The assay was carried out in triplicate.

The antimicrobial activity of the ten peptides is presented in Table 2. At the concentration of 100 μg/ml, five peptides were found to be active against the bacterial strains. Thereinto,

In Silico Analysis on the Expression of sORFs in C. intestinalis On the basis of the rationale that a functional gene should be transcribed, ESTs and other sources of transcript information are the most reliable evidence for gene expression [20]. sORFs of the ten synthetic peptides were searched against EST database of C. intestinalis using BLASTn, and only those with an expectation value less than 1e 10 were considered as positive matches.

Results Figure 1. Putative AMPs predicted from Chromosome 2 of Ciona intestinalis. A total of 5606 sORFs were identified from Chromosome 2 by sORFfinder; the percentages and numbers indicate different confidence intervals and the number of AMPs in each interval. (a) The prediction result by SVM; (b) the prediction result by RF.

AMP Prediction To check the feasibility of our method in finding AMPs, only partial genome sequence (Chromosome 2) of C. intestinalis was Table 1. Sequences of the synthetic peptides Confidence sORFs


sORF01 sORF02 sORF03 sORF04 sORF05

P-01 P-02 P-03 P-04 P-05

sORF06 sORF07 sORF08 sORF09 sORF10

P-06 P-07 P-08 P-09 P-10




0.634 0.914 0.989 0.977 0.912

0.586 0.894 0.968 0.976 0.94

0.993 0.589 0.55 0.939 0.963

0.974 0.63 0.588 0.69 0.75

Secondary structure

Net charge

Hydrophobicity (%)


α-helix α-helix Extended α-helix β-sheet

+4 +3 +3 +5 +3

31 47 55 57 50


β-sheet β-sheet α-helix α-helix α-helix

+5 +1 +2 +4 +5

42 50 36 35 28



Secondary structure was predicted by GOR V [30]. Net charge and hydrophobicity were calculated by APD2 [31]. Amphipathicity was predicted by Helical wheel predictor [32]. Y indicates the structure of a peptide is amphiphilic, and N indicates not.

Copyright © 2013 European Peptide Society and John Wiley & Sons, Ltd.

J. Pept. Sci. 2014; 20: 25–29

MINING AMPs IN CIONA INTESTINALIS Table 2. Antimicrobial activity of the synthetic peptides Strains Peptides P-01 P-02 P-03 P-04 P-05 P-06 P-07 P-08 P-09 P-10

Escherichia coli

Staphylococcus aureus

++ ++ +++ +

Pseudomonas aeruginosa

Candida albicans

Proteus vulgaris

Streptococcus mutans

Micrococcus tetragenus

++ ++





++ +




+ +


‘ ’ indicates no inhibition zone formation; ‘+’ indicates the diameter of inhibition zone is between 0 and 5 mm; ‘++’ indicates the diameter of inhibition zone is between 6 and 10 mm; and ‘+++’ indicates the diameter of inhibition zone is above 10 mm.

P-02, P-03, and P-05 could inhibit two kinds of bacteria; P-04 could inhibit all the tested bacteria, showing broadspectrum antibacterial activity; and P-10 also could inhibit most of the bacteria. From Tables 1 and 2, we could see that the confidences of the aforementioned active peptides were all over 90%. Although the confidences of P-06 and P-09 were also over 90%, they did not exhibit any antimicrobial activity. P-01, P-07, and P-08 did not show any activity either, which was not beyond our expectation, because their confidences were relatively low. Among the seven tested strains, only the fungus C. albicans was insensitive to all the peptides.

Bioinformatics Evidence for Expression of sORFs in C. intestinalis Among the ten sORFs, six were found to have significant matches after being blasted against the EST database of C. intestinalis (Table 3, Figure 2). The matches for sORF06 and sORF07 were annotated as uncharacterized oxidoreductase YrbE-like and uncharacterized protein LOC100183059, respectively; the matches for other sORFs such as sORF04 and sORF10 were not annotated yet, but their corresponding peptides were proved to be active in this work.

Table 3. BLAST analysis of ten sORFs against EST database of Ciona intestinalis sORFs sORF01

Matched records

Length (nt)


sORF02 sORF03 sORF04 sORF05 sORF06

dbj|BW124447.1 dbj|BW110763.1 — — dbj|BW443801.1 — ref|XM_002131599.1*

690 684 — — 772 — 1308

53/54 (98%) 53/54 (98%) — — 63/63 (100%) — 56/56 (100%)


gb|FF993226.1 gb|FF862287.1 dbj|BW347735.1 dbj|BW346291.1 dbj|BW332064.1 ref|XM_002120683.1*

754 794 618 659 592 9028

56/56 (100%) 56/56 (100%) 56/56 (100%) 56/56 (100%) 56/56 (100%) 42/42 (100%)

— 435 777 688 589 755

— 48/48 (100%) 48/48 (100%) 44/48 (92%) 39/41 (95%) 39/41 (95%)

sORF08 sORF09 sORF10

— gb|FF722310.1 gb|FF722309.1 dbj|BW446976.1 gb|FK113597.1 dbj|BW159503.1

Description Unannotated Unannotated — — Unannotated — Uncharacterized Oxidoreductase YrbE-like Unannotated Unannotated Unannotated Unannotated Unannotated Uncharacterized Protein LOC100183059 — Unannotated Unannotated Unannotated Unannotated Unannotated

J. Pept. Sci. 2014; 20: 25–29

Copyright © 2013 European Peptide Society and John Wiley & Sons, Ltd.


* Marked sequences were from Reference Sequence database; other matched sequences were from GenBank.


Figure 2. Alignment of sORFs versus the EST transcripts of Ciona intestinalis. BLAST search against the EST database of C. intestinalis was carried out with an expectation value cutoff of 10; six sORFs had significant matches. GenBank accession number was presented to indicate the subject for each sORF.



A conventional way of finding an AMP is usually through complex biological isolation and extensive antibacterial activity testing. Nowadays, genomics, transcriptomics, and proteomics platforms have provided more efficient strategies to identify distinct classes of AMPs [12]. Among which, searching in genome and transcript sequences datasets has been proved to be simple and effective [21]. For example, 317 defensin-like sequences were found in the genome of Arabidopsis by hidden Markov models [7]. BAGEL could successfully mine bacteriocins encoded by sORFs in the bacterial genomes [22]. Yet, the existing bioinformatics methods failed in mining AMPs from eukaryotic sORFs. A total of 593,586 sORFs were once reported in Drosophila [10]; at least 100,000 putative sORFs might exist in the whole genome of C. intestinalis according to this work. To find AMPs from numerous sORFs in the eukaryotic genome needs new strategies. Our method fully made use of two new achievements in bioinformatics: sORF finding program sORFfinder and AMP prediction server CAMP. sORFfinder can identify sORFs with high-coding potential [15,16], and CAMP can mine AMPs on the basis of the sequence features instead of sequence similarity [17,18]. The new AMPs found in this work testified our strategy. Before this study, only one AMP gene family was discovered from the genome of C. intestinalis by bioinformatics method [23]. Validation of the putative AMPs depended on their correct synthesis. In this work, both P-01 and P-05 contained a single cysteine residue, which is usually highly reactive to form an intermolecular disulfide bond. To avoid the disturbance of possible dimmer peptides in activity analysis, this residue was protected during peptide synthesis, and mass spectrometer analysis confirmed the acquisition of monomer peptides. Although the amino acid sequences of the newly identified AMPs were highly heterogeneous, they possessed some common characteristics just like AMPs from other lives, such as positively charged residues and substantial hydrophobic residues (~30% or more) [24]. Nevertheless, not all of the candidates had a chance to be real AMPs. The conformations and structures of AMPs also play important roles in their antimicrobial activity [25].

AMPs are usually classified into four families on the basis of their secondary structures: α-helical peptides, β-sheet peptides, extended peptides, and loop peptides [26]. Among the ten peptides, six were predicted to form α-helical structures, which were thought to be the most common and effective forms, but not all of them had antimicrobial activity. Structure analysis showed that the antibacterial ones were amphipathic with hydrophilic domain on one side and hydrophobic domain on the other (Table 1). The remaining four were predicted to be β-sheet or extended peptides; similarly, only two showed antibacterial activity. These AMPs might kill the microorganisms in different ways. The α-helical ones might cause leakage of essential cell contents by forming pores in the membranes, whereas the extended or β-sheet ones might disrupt lipid organization of the membranes. Despite the mechanism diversity, all these peptides must first bind to cell surfaces by electrostatic interaction. Not all the predicted AMPs were validated indicating that the prediction tool CAMP should be improved by including more properties of AMPs. As reported, amphipathicity is a key feature for most AMPs [25]. If it is considered by CAMP, the prediction accuracy will be improved much. Are these putative sORFs real genes, or did they just arise by chance? ESTs and other sources of transcript data are usually regarded as evidence for gene expression [20]. BLASTn searches against EST database of C. intestinalis obtained six significant matches. Although ESTs are not sufficient to predict whether a gene is translated into a functional protein, their detection constitutes strong evidence of gene expression [20]. As for the unmatched sORFs, their transcripts might not be lucky enough to be included in the database, or they were just non-encoded random sORFs. Although the sORFfinder could identify sORFs with highcoding potential, there still existed a false positive rate of 4% [15]. All the matched EST records seem to be larger transcripts. According to the available data, most AMPs are synthesized as inactive prepeptides containing a signal peptide. During transport of the prepeptide into extracellular space, the signal peptide may be removed by the enzymes that perform posttranslational modifications [27]. The length of the matched ESTs of AMPs P-04 and P-10 was about 700 nt, which was similar to that of many known AMPs [28,29].

Copyright © 2013 European Peptide Society and John Wiley & Sons, Ltd.

J. Pept. Sci. 2014; 20: 25–29

MINING AMPs IN CIONA INTESTINALIS In conclusion, here, we provide a distinctive strategy to find AMPs by exploring sORFs distributed in genomic sequences. It can not only identify new AMPs with high confidences, but also provide a new avenue for sORFs annotating. It also should be pointed out that in the near future, the false positive problem is still difficult to avoid and modified or cyclized AMPs cannot be identified by this approach. Nevertheless, it still can serve as an initial step to elucidate the role of sORFs in innate defense. Acknowledgements This work was supported by the National Key Technology R&D Program of China (No. 2011BAD14B04), Shandong Provincial Science Foundation, China (No. ZR2009EM012), Shandong High School Science and Technology Planning Project (No. J10LC53), the Open Project Program of State Key Laboratory of Freshwater Ecology and Biotechnology (No. 2011FB18), and the Open Project Program of State Key Laboratory of Marine Bioactive Substances (No. MBSMAT-2010-05).

References 1 Li JF, Zhang J, Xu XZ, Han YY, Cui XW, Chen YQ, Zhang SQ. The antibacterial peptide ABP-CM4: the current state of its production and applications. Amino Acids. 2012; 42(6): 2393–2402. 2 Gwynn MN, Portnoy A, Rittenhouse SF, Payne DJ. Challenges of antibacterial discovery revisited. Ann. N. Y. Acad. Sci. 2010; 1213: 15–19. 3 Mills SD. When will the genomics investment pay off for antibacterial discovery? Biochem. Pharmacol. 2006; 71(7): 1096–1102. 4 Brötz-Oesterhelt H, Sass P. Postgenomic strategies in antibacterial drug discovery. Future Microbiol.. 2010; 5(10): 1553–1579. 5 Gruber CW, Muttenthaler M. Discovery of defense- and neuropeptides in social ants by genome-mining. PLoS One. 2012; 7(3): e32559. 6 Belarmino LC, Benko-Iseppon AM. Databank based mining on the track of antimicrobial weapons in plant genomes. Curr. Protein Pept. Sci. 2010; 11(3): 195–198. 7 Silverstein KA, Graham MA, Paape TD, VandenBosch KA. Genome organization of more than 300 defensin-like genes in Arabidopsis. Plant Physiol.. 2005; 138(2): 600–610. 8 Giacomelli L, Nanni V, Lenzi L, Zhuang J, Dalla Serra M, Banfield MJ, Town CD, Silverstein KA, Baraldi E, Moser C. Identification and characterization of the defensin-like gene family of grapevine. Mol. Plant Microbe Interact. 2012; 25(8): 1118–1131. 9 Xiao Y, Hughes AL, Ando J, Matsuda Y, Cheng JF, Skinner-Noble D, Zhang G. A genome-wide screen identifies a single beta-defensin gene cluster in the chicken: implications for the origin and evolution of mammalian defensins. BMC Genomics. 2004; 5(1): 56. 10 Ladoukakis E, Pereira V, Magny EG, Eyre-Walker A, Couso JP. Hundreds of putatively functional small open reading frames in Drosophila. Genome Biol.. 2011; 12(11): R118. 11 Cheng H, Chan WS, Li Z, Wang D, Liu S, Zhou Y. Small open reading frames: current prediction techniques and future prospect. Curr. Protein Pept. Sci. 2011; 12(6): 503–507. 12 Yang X, Tschaplinski TJ, Hurst GB, Jawdy S, Abraham PE, Lankford PK, Adams RM, Shah MB, Hettich RL, Lindquist E, Kalluri UC, Gunter LE, Pennacchio C,Tuskan GA. Discovery and annotation of small proteins using genomics, proteomics, and computational approaches. Genome Res. 2011; 21(4): 634–641.

13 Satou Y, Shin-i T, Kohara Y, Satoh N, Chiba S. A genomic overview of short genetic variations in a basal chordate, Ciona intestinalis. BMC Genomics. 2012; 13: 208. 14 Dishaw LJ, Flores-Torres JA, Mueller MG, Karrer CR, Skapura DP, Melillo D, Zucchetti I, De Santis R, Pinto MR, Litman GW. A Basal chordate model for studies of gut microbial immune interactions. Front Immunol. 2012; 3: 96. 15 Hanada K, Akiyama K, Sakurai T, Toyoda T, Shinozaki K, Shiu SH. sORF finder: a program package to identify small open reading frames with high coding potential. Bioinformatics. 2010; 26(3): 399–400. 16 URL: [last accessed September 2013] 17 Thomas S, Karnik S, Barai RS, Jayaraman VK, Idicula-Thomas S. CAMP: a useful resource for research on antimicrobial peptides. Nucleic Acids Res. 2010; 38(Database issue):D774–D780. 18 URL: [last accessed September 2013] 19 Zaidan MR, Noor Rain A, Badrul AR, Adlin A, Norazah A, Zakiah I. In vitro screening of five local medicinal plants for antibacterial activity using disc diffusion method. Trop. Biomed.. 2005; 22(2): 165–170. 20 Guillén G, Díaz-Camino C, Loyola-Torres CA, Aparicio-Fabre R, Hernández-López A, Díaz-Sánchez M, Sanchez F. Detailed analysis of putative genes encoding small proteins in legume genomes. Front Plant Sci. 2013; 4: 208. 21 Pestana-Calsa MC, Calsa T Jr. Systems and Computational Biology – Molecular and Cellular Experimental Systems (Ed: Yang NS), InTech, New York, 2011. 22 de Jong A, van Heel AJ, Kok J, Kuipers OP. BAGEL2: mining for bacteriocins in genomic data. Nucleic Acids Res. 2010; 38(Web Server issue):W647–W651. 23 Fedders H, Leippe M. A reverse search for antimicrobial peptides in Ciona intestinalis: identification of a gene family expressed in hemocytes and evaluation of activity. Dev. Comp. Immunol. 2008; 32 (3): 286–298. 24 Hoskin DW, Ramamoorthy A. Studies on anticancer activities of antimicrobial peptides. Biochim. Biophys. Acta. 2008; 1778(2): 357–375. 25 Bulet P, Stöcklin R. Insect antimicrobial peptides: structures, properties and gene regulation. Protein Pept. Lett.. 2005; 12(1): 3–11. 26 Nguyen LT, Haney EF, Vogel HJ. The expanding scope of antimicrobial peptide structures and their modes of action. Trends Biotechnol. 2011; 29: 464–472. 27 López-Meza JE, Ochoa-Zarzosa A, Aguilar JA, Loeza-Lara PD. Biomedical Engineering, Trends, Research and Technologies (Ed: Komorowska M and Olsztynska-Janus S), InTech, New York, 2011. 28 Xu D, Wei J, Cui H, Gong J, Yan Y, Lai R, Qin Q. Differential profiles of gene expression in grouper Epinephelus coioides, infected with Singapore grouper iridovirus, revealed by suppression subtractive hybridization and DNA microarray. J. Fish Biol.. 2010, 77(2): 341–360. 29 Dreyer C, Hoffmann M, Lanz C, Willing EM, Riester M, Warthmann N, Sprecher A, Tripathi N, Henz SR, Weigel D. ESTs and EST-linked polymorphisms for genetic mapping and phylogenetic reconstruction in the guppy, Poecilia reticulata. BMC Genomics. 2007, 8: 269. 30 URL: [last accessed September 2013] 31 URL: [last accessed September 2013] 32 URL: [last accessed September 2013]

Supporting Information Additional supporting information may be found in the online version of this article at the publisher’s web-site.


J. Pept. Sci. 2014; 20: 25–29

Copyright © 2013 European Peptide Society and John Wiley & Sons, Ltd.

Mining antimicrobial peptides from small open reading frames in Ciona intestinalis.

Though being able to encode various kinds of bioactive peptides, small open reading frames (sORFs) are poorly annotated in many genomic data. The pres...
498KB Sizes 0 Downloads 0 Views