MEEGID 2274

No. of Pages 8, Model 5G

17 March 2015 Infection, Genetics and Evolution xxx (2015) xxx–xxx 1

Contents lists available at ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid 6 7

5

De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes

8

Xing-Li Xu a,b, Tian-Yin Cheng a,⇑, Hu Yang b, Fen Yan a, Ya Yang a

3 4

9 10 11 12 1 2 4 6 15 16 17 18 19 20 21 22 23 24 25

a b

College of Veterinary Medicine, Hunan Agricultural University, Changsha 410128, PR China College of Life Sciences and Resource Environment, Jiangxi Yichun University, Jiangxi, Yichun 336000, PR China

a r t i c l e

i n f o

Article history: Received 25 November 2014 Received in revised form 4 March 2015 Accepted 9 March 2015 Available online xxxx Keywords: Haemaphysalis flava Transcriptomes Salivary glands High-throughput annotation

a b s t r a c t Saliva plays an important role in feeding and pathogen transmission, identification and analysis of tick salivary gland (SG) proteins is considered as a hot spot in anti-tick researching area. Herein, we present the first description of SG transcriptome of Haemaphysalis flava using next-generation sequencing (NGS). A total of over 143 million high-quality reads were assembled into 54,357 unigenes, of which 20,145 (37.06%) had significant similarities to proteins in the Swiss-Prot database. 13,513 annotated sequences were associated with GO terms. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that 14,280 unigenes were assigned to 279 KEGG pathways in total. Reads per kb per million reads (RPKM) analysis showed that there were 3035 down-regulated unigenes and 2260 up-regulated unigenes in the engorged ticks (ET) compared with the semi-engorged one (SET). Several important genes are associated with blood feeding and ingestion as secreted salivary proteins, concluding cysteine, longipain, 4D8, calreticulin, metalloproteases, serine protease inhibitor, enolase, heat shock protein and AV422 in SG, were identified. The qRT-PCR results confirmed that patterns of these genes (except for the longipain gene) expression were consistent with RNA-seq results. This de novo assembly of SG transcriptome of H. flava not only provides more chance for screening and cloning functional genes, but also forms a solid basis for further insight into the changes of salivary proteins during blood-feeding. Ó 2015 Published by Elsevier B.V.

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

45 46

1. Introduction

47

Haemaphysalis flava (Acari: Ixodidae), is a species from the Ixodidae family found in mixed forest zones and wilderness in many countries. It not only serves as a vector of pathogenic microorganisms (e.g., Francisella tularensis) that may cause significant vector-borne diseases (e.g., tularemia; Ozawa et al., 1982) in animals and humans, but also causes lesions, dermatitis, blood loss, weight loss and even death by tick bites (Zhang et al., 2010). All these posed human and animal health threat and agricultural burdens. The saliva of ticks, which is important in transmission of tickborne pathogens, is a complex mixture consisting of powerful pharmacological active proteins that are injected into their host during blood feeding period (Chmelar et al., 2011; Francischetti

48 49 50 51 52 53 54 55 56 57 58 59

⇑ Corresponding author at: Laboratory of Molecular Physiology, College of Veterinary Medicine, Hunan Agricultural University, Changsha, Hunan Province 410128, PR China. Tel.: +86 731 84635233; fax: +86 731 84673618. E-mail address: [email protected] (T.-Y. Cheng).

et al., 2009). Secreted proteins, which are the most important class of salivary proteins, are of vital importance for tick survival, helping it combat host’s hemostasis (Chmelar et al., 2012; Champagne, 2006; Maritz-Olivier et al., 2007; Francischetti, 2010), immunity (Kazimírová and Štibrániová, 2013) and inflammatory responses (Chmelar et al., 2011) in order to maintain blood flow. Identification and analysis of tick secreted proteins is considered as critical in the anti-tick research area (Titus et al., 2006; Nuttall et al., 2006), due to their important role in feeding and pathogen transmission. To date, in order to identify protein families and discover novel target antigen, salivary gland (SG) transcriptomes from several ticks species have been described (Chmelar et al., 2008; Aljamali et al., 2009a; Francischetti et al., 2011). Among the various transcriptome studies, next generation sequencing (NGS) based on RNA-Seq technique could provide massive sequences with enormous depth for both model and non-model organisms. It is relatively inexpensive, and facilitates the discovery of hundreds of novel genes and the analysis of gene expression in many organisms. It has been successfully applied to identify the bacterial (Vayssier-Taussat et al., 2013;

http://dx.doi.org/10.1016/j.meegid.2015.03.010 1567-1348/Ó 2015 Published by Elsevier B.V.

Please cite this article in press as: Xu, X.-L., et al. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes. Infect. Genet. Evol. (2015), http://dx.doi.org/10.1016/j.meegid.2015.03.010

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

MEEGID 2274

No. of Pages 8, Model 5G

17 March 2015 2 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105

X.-L. Xu et al. / Infection, Genetics and Evolution xxx (2015) xxx–xxx

Williams-Newkirk et al., 2012; Hue et al., 2013; Kaewkong et al., 2014; Budachetri et al., 2014; Carpi et al., 2011) and parasitic communities (Bonnet et al., 2014) associated with ticks. Until now, transcriptomic analyses of SG of only a few ticks (Karim et al., 2011; Schwarz et al., 2013, 2014) have been performed using RNA-Seq technique, but H. flava SG transcriptome has not been explored. In the present study, we analyzed the sialo transcriptome of adult female H. flava by NGS. There is evidence that the content of tick secreted proteins undergoes a series changes as blood-feeding progresses (McSwain et al., 1982; Binnington, 1978). In order to identify the changes of secreted proteins and to uncover the tick’s biological responses during the feeding period, two cDNA samples, the salivary gland pairs (SGs) from adult female H. flava at two different feeding stages (engorged ticks and semi-engorged ticks), were sequenced using Illumina/Solexa platform. Approximately eight gigabytes of high quality sequencing data were generated. A total of over 143 million high-quality reads of SG transcriptome was assembled into 54,357 transcripts. For functional annotation of the assembled gene transcripts (contigs), we searched against the NCBI nonredundant (NR) and UniProt database using BlastX. In addition, we compared gene expression in SG from engorged ticks with gene expression in corresponding tissue from the semi-engorged one. This strategy can help more fully understand the tick’s biological responses and the salivary transcriptome dynamics of hematophagy in ticks.

106

2. Materials and methods

107

2.1. Ticks, dissections of salivary gland and total RNA extraction

108

127

Adult H. flava were collected from Xinyang city of Henan province (32°130 N, 114°080 E), China. Ticks were fed on hedgehogs, acquired from a tick-free area. In this study, we distinguished between engorged ticks (ET) and semi-engorged ticks (SET). ET were detached spontaneously in the late phase of tick feeding (mean length of 10 mm). SET were manually detached from hedgehog’s skin in the middle phase of tick feeding (mean length of 5 mm). A total of 100 SGs of all collected ticks (50 SET, 50 ET) were dissected and stored in liquid nitrogen until RNA extraction. Total RNA samples were extracted from tick SG using Trizol Reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. Total RNA concentration of two samples were checked using the NanoDrop spectrophotometer and Agilent Bioanalyzer 2100 (Agilent technologies, Santa Clara, CA, USA). A total of 8.4 lg total RNA from ET, and 10 lg total RNA from SET were purified with RNeasy micro kit (QIAGEN, Germany), respectively. The poly-A containing mRNA molecules were purified using poly-T oligo-attached magnetic beads. Following purification, the mRNA is fragmented into small pieces using divalent cations under 94 °C for 8 min.

128

2.2. Construction of cDNA library for sequencing

129

First and second strand cDNA were synthesized from purified mRNA. After end repair (Resuspension Buffer, End Repair Mix, 3‘A’ add process, Illumina, San Diego, CA, USA), of each cDNA library was connected with adapter sequences (Truseq universal adapter: 50 -AAT GAT ACG GCG ACC ACC GAG ATC TAC ACT CTT TCC CTA CAC GAC GCT CTT CCG ATC T-30 ; Truseq indexed adapter: 50 -GAT CGG AAG AGC ACA CGT CTG AAC TCC AGT CAC NNN NNN ATC TCG TAT GCC GTC TTC TGC TTG-30 ). Finally, the libraries were enriched using 8 cycles of PCR. The following amplification program was used: 98 °C for 30 s, 8 cycles at 98 °C for 10 s, 60 °C for 30 s, 72 °C for 30 s followed by a final elongation at 72 °C for 5 min.

109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126

130 131 132 133 134 135 136 137 138 139

Concentration and quality of the libraries were measured in QubitÒ 2.0 Fluorometer (Invitrogen, Q32866) and Agilent 2100 (Agilent Technologies Inc., Santa Clara, CA, USA), respectively. Sequencing cluster generation and NGS were carried out utilizing HiSeq™ 2500 (Illumina, SY-401-1001) by Bohao Biotechnology Corporation (Shanghai, China) and analyzed by data collection software (Illumina, San Diego, CA, USA).

140

2.3. De novo assemblies

147

Raw sequence data preprocess and assembly were performed by using CLC Genomics Workbench (CLC bio, Aarhus, Denmark). All raw reads were trimmed according to their quality score. The sequences with low quality, adapter sequences, ambiguous bases and containing less than 20 nucleotides were removed from further analysis before contig assembly. After clean up, de novo sequence assembly was performed on all remaining reads with a contig scaffolding algorithm with a minimum contig length of 200 bp or longer, and a word size of 45 characters. Contigs resulting from assemblies were combined into a single file.

148

2.4. Functional annotation and classification

158

The assembled contigs were annotated by searching against the NCBI NR and UniProt database using BlastX alignment with E-value < 1e5. The best match results were used to determine the closest known taxonomy of the unigenes according to their identity percentage. The highest scoring blast hit species were identified. Based on annotation results, Gene Ontology (GO) functional classification was performed by WEGO (http://wego. genomics.org.cn/cgi-bin/wego/index.pl), contigs were divided into biological processes, cellular components and molecular functions according to GO terms with the Blast2GO software (http://www. blast2go.de). In this study, the presented data is showed with a GO analysis at level two. The contigs sequences were then aligned with the Eukaryotic Orthologous Groups of Proteins (KOG) database (http://www. ncbi.nlm.nih.gov/COG/) to classify and predict functions. In addition, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis were performed to the unigene sequences on the KEGG Automatic Annotation Server (KAAS) online.

159

2.5. Quality assessment with polymerase chain reaction (PCR)

177

To assess the quality of assembled contigs, sequences from cDNA were amplified by using primers, which were designed using Primer Premier 5.0 (Premier, Canada) and synthesized by Shanghai Sangon Biological Engineering Technology and Service Co., Ltd. in China. Reaction mixtures of 50 ll contained 2.5 ll of cDNA, 25 ll of 2 Taq mix, 1.5 ll of each primer, PCR amplifications were conducted according to the manufacturer’s instructions. Presence of PCR products were confirmed by electrophoresis on 1.5% agarose gels stained with ethidium-bromide in 1 TAE buffer using a 100 bp DNA mass ladder. After sequencing, the PCR products were compared against the NCBI database using Blastn.

178

2.6. Comparison of unigene in the two H. flava SG Illumina libraries

189

Comparisons of unigene express analysis were carried out between two Illumina libraries. Based on the number of reads of each unigene from two SG libraries, RPKM method (reads per kb per million reads) was used to measure the different expression levels of each gene. In this study, DEGseq, an R package for identifying differentially expressed genes from RNA-seq data (Wang et al., 2010) was applied to determine the significance of difference in gene expression between semi-engorged SG libraries

190

Please cite this article in press as: Xu, X.-L., et al. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes. Infect. Genet. Evol. (2015), http://dx.doi.org/10.1016/j.meegid.2015.03.010

141 142 143 144 145 146

149 150 151 152 153 154 155 156 157

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176

179 180 181 182 183 184 185 186 187 188

191 192 193 194 195 196 197

MEEGID 2274

No. of Pages 8, Model 5G

17 March 2015 3

X.-L. Xu et al. / Infection, Genetics and Evolution xxx (2015) xxx–xxx 198 199 200 201 202

203 204 205 206 207 208 209 210 211 212 213 214 215

and engorged SG libraries. If the cut-off of the False Discovery Rate (FDR) was set at 5% and fold change was more than two, the unigenes were considered as having differential expression. When P value was less than 0.05, GO functional enrichment and a KEGG pathway enrichment analyses were performed. 2.7. Differentially expressed sialoprotein gene validation by real-time quantitative RT-PCR (qRT-PCR) Nine genes encoding secreted proteins that mediate hematophagy and blood ingestion were analyzed for validation of RNAseq results by using qRT-PCR. These genes included cysteine, longipain, 4D8, calreticulin, metalloproteases, serine protease inhibitor, enolase, heat shock protein and AV422. The total RNA was extracted from SG of both ET and SET groups using Trizol RNA extraction method (Takara, Japan) and the quality was evaluated by gel electrophoresis. First strand cDNA synthesis was done using Prime Script™ RT reagent Kit with gDNA Eraser (Takara, Japan). The primer sequences (Table S1) for qRT-PCR were designed utilizing the Roche Universal Probe Library Assay Design Center

228

(https://www.roche-applied-science.com) and synthesized by Sangon Biotechnology Corporation (Shanghai, China). qRT-PCR was performed with an Applied Biosystems 7300 Real time PCR system (Applied Biosystems, USA) in a final volume of 20 ll PCR reaction mixture. The amplification program was as follows: 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s, 60 °C of the annealing temperature for 31 s. A melting curve was constructed for each primer pair to validate the presence of the specific PCR products. The actin gene was used as the reference gene for normalization to make a comparison of the expression levels of mRNA between the samples. The relative fold change in transcription by using qRT-PCR was evaluated and compared with RNA-seq analysis.

229

2.8. Putative molecular markers and Open Reading Frame (ORF)

216 217 218 219 220 221 222 223 224 225 226 227

230 231 232 233 234 235 236 237

MISA (Microsatellite) 1.0 tool (http://pgrc.ipk-gatersleben.de/ misa/) was used to detect simple sequence repeat (SSR) markers which show high levels of polymorphism for all unigenes. Open Reading Frame (ORF) sequences were predicted among contigs by using the EMBOSS-getorf tool (http://emboss.bioinformatics. nl/cgi-bin/emboss/getorf). The parameters were set as follows: the longest ORF length of 1,000,000 bp and the minimum ORF length of 300 bp or longer.

238

3. Results and discussion

239

3.1. Illumina sequencing and de novo transcriptome assembly

240

A total of 85,429,336 reads (85.4 Mb) and 77,483,512 reads (77.5 Mb) of 100 bp paired end were generated and the Q20 (the sequencing quality values that corresponds to 1% chance of error) were 97.33% and 96.42%, respectively. After the removal of adapters and trimming of low-quality and ambiguous reads, 73,692,388 and 69,400,726 high-quality and clean reads were obtained and accounting for 86.26% and 89.57% of total raw reads, respectively. De novo assembly was carried out by using CLC Genomics Workbench v6.0.4 (CLC Bio, Denmark), the reads were assembled into 70,542 contigs, an N50 (the length for which the collection of all contigs of that length or longer contains at least half of the sum of the lengths of all contigs) of 946 bp, with an average length of 725 nucleotides and longest sequence of 14,363 bp. Then, the contigs were further assembled into 55,760 scaffolds with an N50 of 1238 bp, an average length of 924 bp. After clustering using the software CAP3 (Huang and Madan,

241 242 243 244 245 246 247 248 249 250 251 252 253 254 255

Table 1 Length distribution of the assembled final unigenes of Illumina sequences of SGs. Unigene length (bp)

Unigene counts (number)

Unigene length (bp)

Unigene counts (number)

200–300 400–500 600–700 800–900 1000–1100 1200–1300 1400–1500 1600–1700 1800–1900 >2000

3 11,018 4274 2463 1440 1070 805 638 523 5078

300–400 500–600 700–800 900–1000 1100–1200 1300–1400 1500–1600 1700–1800 1900–2000 Total

11,651 6587 3189 1736 1262 925 688 579 428 54,357

Table 2 Statistical summary of the SGs transcriptome for assembling. Statistics

Counts (number)

Total length (bp)

N50 (bp)

Average length (bp)

Longest length (bp)

GC (%)

Contig Scaffold Unigene

70,542 55,760 54,357

51,150,302 51,542,524 51,293,691

946 1238 1287

725 924 943

14,363 16,560 16,560

49.40 49.40 49.40

1999), we obtained 54,357 unigenes with an N50 of 1287 bp, a mean length of 859 bp and varying in size from 200 bp to 16,560 bp. The unigenes size distribution is shown in Table 1. The results of the assembly are listed in Table 2, and a detailed description of the sequencing is provided in Supplemental Table S2.

256

3.2. Function annotation of SG transcripts

262

All 54,357 unigenes from Illumina assemblies were functionally annotated by searching the NCBI NR and Swiss-Prot protein database in GenBank with E-value < 1e5. In total, 17,813 unigenes (accounting for 32.77% of all the assembled unigenes) were aligned to NR database, and 36,544 sequences did not have BlastX hits. Of all unigenes, 37.06% (n = 20,145) had significant similarities to proteins in the Swiss-Prot database and the remaining 34,212 unigenes were unknown (Table 3), indicating that many new genes and non-coding RNA sequences were obtained. These unigenes provided a valuable resource for further identification of genes related to sialoprotein of H. flava. Of these unigenes, 47% (n = 9640) had a similarity of between 60% to 100% and 3520 unigenes, which showed very strong homology, had an E-value of less than 1.0E200. Based on UniProtKB databases, most unigenes were matched with sequences from three top-hit species: Rhipicephalus pulchellus, Ixodes scapularis, Amblyomma maculatum. A lot of hits (9461 unigenes) matched with Rhipicephalus more than with other species of Haemaphysalis, this could be because there is less data on Haemaphysalis to load into the NCBI database.

263

3.3. Classification of assembled contigs

282

Using Blast2GO analysis, a total of 13,513 of 20,145 annotated sequences (Swiss-Prot database) were assigned to biological process (BP, 28,506 sequences), cellular components (CC, 11,598

283

Table 3 Results of blasting against the NCBI NR and Swiss-Prot protein database. Database

Total unigene

No. sequence with hits

No. unknown sequence

Percentage of annotation (%)

(NR) database Swiss-Prot

54,357 54,357

17,813 20,145

36,544 34,212

32.77 37.06

Please cite this article in press as: Xu, X.-L., et al. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes. Infect. Genet. Evol. (2015), http://dx.doi.org/10.1016/j.meegid.2015.03.010

257 258 259 260 261

264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281

284 285

MEEGID 2274

No. of Pages 8, Model 5G

17 March 2015 4 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301

X.-L. Xu et al. / Infection, Genetics and Evolution xxx (2015) xxx–xxx

sequences) and molecular functions (MF, 19,422 sequences) (Table 4). For biological process, the majority of unigenes were involved in ‘‘metabolic’’ and ‘‘cellular process’’, indicating genes involved in basic processes for the survival of H. flava. A high percentage of genes were classified as ‘‘cell’’ and ‘‘cell part’’ under the cellular components category. In addition, four unigenes were assigned for virion particle, which could be explained by the presence of the virus in the saliva of tick. Among molecular functions, ‘‘catalytic activity’’ (44.36%) were found to be the most abundant class, indicating an important metabolic activity in SG of H. flava. KOG classification of all the unigenes indicated that 9739 unigenes were functionally classified into 25 categories (Fig. 1). The largest proportion of unigenes focused on ‘‘signal transport mechanisms’’ (1500 unigenes, accounting for 13.72%), the second-largest number of unigenes was assigned to the cluster of ‘‘general function prediction only’’ (1351, 12.36%), followed by

‘‘posttranslational modification, protein turnover and chaperones’’ (1117, 10.21%), and the smallest groups was qualified under the class of ‘‘cell motility’’ (13, 0.11%).

302

3.4. Comparative analysis between the two SG libraries

305

To identify the differentially expressed genes (DEGs) in ET and SET, we used RPKM algorithm to compare the differences in gene expression. Based on RNA-Seq (quantification) analysis, in total 5295 unigenes were differentially expressed between two samples, among which the number of unigenes with up-regulated expression was 2260, while the number was 3035 in ET compared with SET (the full lists of DEGs can be seen in Table S3). The majority of DEGs (1814, 34.25%) showed 2- to 5-fold change and the second-largest number of DEGs (1454, 27.45%) showed over 50-fold change in expression level between two samples (Fig. 2).

306

Table 4 Gene Ontology analysis of SGs transcriptome data. GO namespace

GO ID

Term

Unigene counts

Level

Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Biological process Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Cellular components Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions Molecular functions

GO:0008152 GO:0040007 GO:0032502 GO:0050896 GO:0022414 GO:0001906 GO:0048519 GO:0032501 GO:0002376 GO:0065009 GO:0065007 GO:0050789 GO:0051704 GO:0000003 GO:0009987 GO:0051179 GO:0044699 GO:0044092 GO:0022610 GO:0040011 GO:0048518 GO:0044093 GO:0023052 GO:0051234 GO:0043226 GO:0044423 GO:0005623 GO:0031974 GO:0044464 GO:0030054 GO:0045202 GO:0044456 GO:0019012 GO:0031012 GO:0044422 GO:0032991 GO:0044421 GO:0044420 GO:0044425 GO:0005576 GO:0016020 GO:0000988 GO:0004872 GO:0001071 GO:0005488 GO:0005215 GO:0016247 GO:0016530 GO:0031386 GO:0060089 GO:0003824 GO:0030234 GO:0005198 GO:0009055 GO:0016209

Metabolic process Growth Developmental process Response to stimulus Reproductive process Cell killing Negative regulation of biological process Multicellular organismal process Immune system process Regulation of molecular function Biological regulation Regulation of biological process Multi-organism process Reproduction Cellular process Localization Single-organism process Negative regulation of molecular function Biological adhesion Locomotion Positive regulation of biological process Positive regulation of molecular function Signaling Establishment of localization Organelle Virion part Cell Membrane-enclosed lumen Cell part Cell junction Synapse Synapse part Virion Extracellular matrix Organelle part Macromolecular complex Extracellular region part Extracellular matrix part Membrane part Extracellular region Membrane Protein binding transcription factor activity Receptor activity Nucleic acid binding transcription factor activity Binding Transporter activity Channel regulator activity Metallochaperone activity Protein tag Molecular transducer activity Catalytic activity Enzyme regulator activity Structural molecule activity Electron carrier activity Antioxidant activity

9574 13 80 1009 15 1 79 97 18 527 1621 1599 88 15 7830 1108 2448 295 91 7 46 91 761 1093 1368 4 2748 116 2748 30 25 24 4 60 551 776 82 26 1142 299 1595 44 236 203 8545 628 7 2 1 264 8616 477 185 167 47

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Please cite this article in press as: Xu, X.-L., et al. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes. Infect. Genet. Evol. (2015), http://dx.doi.org/10.1016/j.meegid.2015.03.010

303 304

307 308 309 310 311 312 313 314 315

MEEGID 2274

No. of Pages 8, Model 5G

17 March 2015 5

X.-L. Xu et al. / Infection, Genetics and Evolution xxx (2015) xxx–xxx

Fig. 1. KOG functional classification of transcriptome. A total of 9739 unigenes were functionally classified into 25 categories.

Fig. 2. The fold change distribution of up- and down-regulated DEGs. Green bars refers to down-regulated DEGs in ET compared with SET and red bars refers to up-regulated DEGs in ET compared with SET. The X axis shows fold change of DEGs and the Y axis number of DEGs. DEG: differentially expressed genes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 5 Blast results of confirmation experiment. No.

Gene name

Primer sequence (50 –30 )

Size (bp)

Top hit descriptions

Similarity (%)

A

Enolase

F: CTCACCGAGAAAGGCTTATT R: GAGGCTTCCAATCTGGTTC

981

Ornithodoros moubata enolase GU594041.1

75

B

Longipain

F: CTCAATACCACTTGGACGG R: CGACACCGAGTAGGACTTCT

597

Haemaphysalis longicornis AB255051.1

92

C

Actin

F: ACGAGACCACCTACAACTCC R: AGGGAACACTTGATTTGACC

572

Rhipicephalus appendiculatus actin AY254899.1

96

D

Serine protease Inhibitor

F: AGGACCTGAACGCTGAGAA R: GCGAGCACTAAACACATCG

656

Rhipicephalus microplus serine protease inhibitor KC990106.1

80

E

Calreticulin

F: CATCTTCAACTACAAGGGCA R: CCTCGTCGTCTTTCTTCTCTT

737

Haemaphysalis qinghaiensis calreticulin AY962875.1

98

316

3.5. Functional enrichment analysis of DEGs

317

In our study, through KEGG, 14,280 unigenes (among them, 2180 contig sequences predicted to encode enzymes with enzyme commission numbers) were assigned to 279 KEGG pathways in total. The first three major pathways were ‘‘metabolic pathways’’

318 319 320

(1191 unigenes, 8.34% of sequences), ‘‘biosynthesis of secondary metabolites’’ (328 unigenes, 2.29% of sequences) and ‘‘RNA transport’’ (303 unigenes, 2.21% of sequences), respectively. To better understand the function of DEGs, GO functional and KEGG pathway enrichment analysis was performed when P value was less than 0.05. The results indicated that the up- and down-regulated genes

Please cite this article in press as: Xu, X.-L., et al. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes. Infect. Genet. Evol. (2015), http://dx.doi.org/10.1016/j.meegid.2015.03.010

321 322 323 324 325 326

MEEGID 2274

No. of Pages 8, Model 5G

17 March 2015 6

X.-L. Xu et al. / Infection, Genetics and Evolution xxx (2015) xxx–xxx

Fig. 3. qRT-PCR validation of the differentially expressed genes analyzed by RNA-seq. qRT-PCR was perform for nine genes that are identified as differential expressed genes between SET and ET. The expression level of each gene was normalized to the level in SET. The Y axis shows the relative mRNA expression levels. SET: semi-engorged ticks, ET: engorged ticks. ⁄P < 0.05, ⁄⁄P < 0.01.

329

were significantly enriched in 712 GO categories (shown in Table S4). A total of 240 pathways were found to be enriched with DEGs (Table S5).

330

3.6. SSR analysis and ORF prediction

331

A total of 13,678 SSRs (or microsatellites) were detected by utilizing the MISA (Microsatellite) software. Of these, 1189 SSRs (8.69%) were hybrid nucleotide repeats, 8594 SSRs (62.83%) were mononucleotide repeats, 1756 SSRs (12.83%) were dinucleotide repeats, and 1928 SSRs (14.09%) were trinucleotide repeats. In addition, 7182 ORFs were predicted. NGS data facilitate the discovery of the SSR molecular marks and ORF. However, primers need to be designed for validation of the predicted SSR and ORF results.

327 328

332 333 334 335 336 337 338

339 340 341 342 343 344 345 346

transcripts encoded for secreted proteins (other proteins concluding housekeeping proteins, transposable elements, pathogens and unknown proteins). The secreted proteins of some ticks have been investigated (Mans et al., 2008; Aljamali et al., 2009b; Ribeiro et al., 2011, 2012). Like other blood feeders, ticks produce various secreted proteins to participate in the digestion of the host blood meal or to modulate host defenses in order to maintain their blood feeding capacity. Among these modulatory molecules, cysteine (Yamaji et al., 2013), longipain (Tsuji et al., 2008), 4D8 (de la Fuente et al., 2006), calreticulin (Kaewhom et al., 2008), metalloproteases (Ali et al., 2014), serine protease inhibitor (Tirloni et al., 2014), enolase (Díaz-Martín et al., 2013), heat shock proteins (Tian et al., 2011) and AV422 (Mulenga et al., 2013) are vital for tick survival and feeding. These genes of interest were analyzed by PCR and qRT-PCR to confirm the RNA-seq results.

347

3.7. Genes encoding secreted proteins that mediate hematophagy and blood ingestion

3.8. Experimental confirmation by PCR

362

NGS technology with bioinformatics as a high throughput sequencing approach (Wang et al., 2009; Van Verk et al., 2013; Metzker, 2009), has replaced Sanger DNA sequencing techniques. This de novo assembly of the SG transcriptome of H. flava, provide us with valuable raw material for identification and analysis of tick SG proteins. We concentrated on the most significant sequenced

The quality of assembled cDNA contigs was evaluated for five target genes (enolase, longipain, actin, serine protease inhibitor and calreticulin) by PCR amplification using primers designed from Illumina sequences. Fragments of expected sizes were obtained (Fig. S1). Resulting sequences confirmed the unigene data and were submitted to the BLAST database (Table 5).

363

Please cite this article in press as: Xu, X.-L., et al. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes. Infect. Genet. Evol. (2015), http://dx.doi.org/10.1016/j.meegid.2015.03.010

348 349 350 351 352 353 354 355 356 357 358 359 360 361

364 365 366 367 368

MEEGID 2274

No. of Pages 8, Model 5G

17 March 2015 X.-L. Xu et al. / Infection, Genetics and Evolution xxx (2015) xxx–xxx 369 370

3.9. Quantitative expression and comparative analysis for genes associated with hematophagy and blood ingestion

407

To experimentally confirm the reliability of RNA-seq analysis, expression levels of candidate genes were validated in each sample by using qRT-PCR. These included nine unigenes, which were five up-regulated unigenes (contig 3524, 4D8; contig 35430, cysteine protease; contig 10961, serine protease inhibitor; contig 2959, AV422; contig 4730, longipain), four down-regulated unigenes (contig 5711, calreticulin; contig 29153, metalloprotease; contig 2263, enolase; contig 949, heat shock protein) in the ET compared with SET. Fold change in expression of nine unigenes was evaluated and compared with RNA-seq analysis. As shown in Fig. 3, the qRT-PCR results confirmed that all but the longipain gene (contig 4730) showed a similar trend in transcriptional changes with the results from the RNA-Seq, but the fold change in expression for most unigenes was a magnitude higher in the RNA-seq analysis. The discrepancy between the two methods for longipain gene might be related to a small sample size represented in our data. The results were similar to one other report (Yates et al., 2014), where three out of four genes matched the RNA-seq results when analyzed by using qRT-PCR. Comparison of transcriptomes between ET and SET contributes to the better understanding of the change of secreted proteins genes that mediate hematophagy and blood ingestion in different blood-feeding phase. Previous studies revealed that cysteine (Yamaji et al., 2013), 4D8 (de la Fuente et al., 2006) might contribute to tick blood ingestion. Calreticulin (Kaewhom et al., 2008), metalloproteases (Ali et al., 2014), and enolase (DíazMartín et al., 2013) have been considered essential for tick blood feeding. In our study, in ET, cysteine (contig 35,430), 4D8 (contig 3524) and AV422 (contig 2959) had a higher expression as opposed to SET, the contigs that annotated for metalloprotease (contig 29,153), calreticulin (contig 5711), enolase (contig 2263) were significantly expressed in SET (Fig. 3). These results showed that salivary proteins of ET may play important role in the digestion of the host blood meal, and most salivary proteins may be involved in modulate host defenses in order to maintain blood flow in SET. There were quantitative and qualitative differences significantly in saliva constituent of H. flava at different feeding stages.

408

4. Conclusion

409

The transcriptomic approach provides opportunities to identify pharmacologically active proteins and gives further insights into the tick-host interactions and tick SG physiology. This is the first characterization of the de novo sialotranscriptome of adult female H. flava performed by NGS technology and bioinformatics. A total of 70,542 contigs, 55,760 scaffolds and 54,357 unigenes were obtained. All unigenes from Illumina assemblies were functionally annotated by searching the NCBI NR and Swiss-Prot protein database in GenBank. Comparison in expression level between two samples identified 5295 differentially expressed unigenes. Several candidate genes involved in mediating hematophagy and blood ingestion, were identified further with qRT-PCR. Genes (metalloprotease, enolase, calreticulin) involved in modulating host defenses in order to maintain blood flow were over expressed in SET, whereas genes (4D8, cysteine) participated in the digestion of the host blood meal were over expressed in ET. Taken together, these results help identify important genes that mediate hematophagy and uncover tick’s biological responses well during the feeding period, as well as provide a valuable molecular basis for the analysis of the mechanisms associated with the tick-host relationship.

371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406

410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429

7

Author contributions

430

Conceived and designed the experiments: T.C., X.X. Performed the experiments: X.X., F.Y., Y.Y. Analyzed the data: X.X. Contributed reagents/materials/analysis tools: X.X., F.Y., Y.Y. Wrote the paper: X.X., T.C. All authors have read and approved the final manuscript.

431

Acknowledgements

436

The authors are grateful to the Shanghai Bohao Biotechnology Corporation for technical assistance in data processing and bioinformatics analysis. This study was financially supported by grant from the National Natural Science Foundation of China (No. 31372431). We also thank Zhi-Hui Liao and Hao Tang for their help in sampling.

437

Appendix A. Supplementary data

443

Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.meegid.2015.03. 010.

444

References

447

Ali, A., Tirloni, L., Isezaki, M., Seixas, A., Konnai, S., Ohashi, K., da Silva Vaz Junior, I., Termignoni, C., 2014. Reprolysin metalloproteases from Ixodes persulcatus, Rhipicephalus sanguineus and Rhipicephalus microplus ticks. Exp. Appl. Acarol., 1– 20 Aljamali, M.N., Hern, L., Kupfer, D., Downard, S., So, S., Roe, B.A., Sauer, J.R., Essenberg, R.C., 2009a. Transcriptome analysis of the salivary glands of the female tick Amblyomma americanum (Acari: Ixodidae). Insect Mol. Biol. 18 (2), 129–154. Aljamali, M.N., Ramakrishnan, V.G., Weng, H., Tucker, J.S., Sauer, J.R., Essenberg, R.C., 2009b. Microarray analysis of gene expression changes in feeding female and male lone star ticks, Amblyomma americanum (L). Arch. Insect Biochem. Physiol. 71 (4), 236–253. Binnington, K.C., 1978. Sequential changes in salivary gland structure during attachment and feeding of the cattle tick Boophilus microplus. Int. J. Parasitol 8 (2), 97–115. Bonnet, S., Michelet, L., Moutailler, S., Cheval, J., Hébert, C., Vayssier-Taussat, M., Eloit, M., 2014. Identification of parasitic communities within European ticks using next-generation sequencing. PLoS Negl. Trop. Dis. 8 (3), e2753. Budachetri, K., Browning, R.E., Adamson, S.W., Dowd, S.E., Chao, C.C., Ching, W.M., Karim, S., 2014. An insight into the microbiome of the Amblyomma maculatum (Acari: Ixodidae). J. Med. Entomol. 51 (1), 119–129. Carpi, G., Cagnacci, F., Wittekindt, N.E., Zhao, F., Qi, J., Tomsho, L.P., Drautz, D.I., Rizzoli, A., Schuster, S.C., 2011. Metagenomic profile of the bacterial communities associated with Ixodes ricinus ticks. PLoS One 6 (10), e25604. Champagne, D.E., 2006. Antihemostatic molecules from saliva of blood-feeding arthropods. Pathophysiol. Haemost. Thromb. 34 (4–5), 221–227. Chmelar, J., Anderson, J.M., Mu, J., Jochim, R.C., Valenzuela, J.G., Kopecky´, J., 2008. Insight into the sialome of the castor bean tick Ixodes ricinus. BMC Genomics 9 (1), 233. Chmelar, J., Calvo, E., Pedra, J.H., Francischetti, I., Kotsyfakis, M., 2012. Tick salivary secretion as a source of antihemostatics. J. Proteomics 75 (13), 3842–3854. Chmelar, J., Oliveira, C.J., Rezacova, P., Francischetti, I.M., Kovarova, Z., Pejler, G., Kopacek, P., Ribeiro, J.M., Mares, M., Kopecky, J., Kotsyfakis, M., 2011. A tick salivary protein targets cathepsin G and chymase and inhibits host inflammation and platelet aggregation. Blood 117 (2), 736–744. de la Fuente, J., Almazán, C., Blas-Machado, U., Naranjo, V., Mangold, A.J., Blouin, E.F., Gortazar, C., Kocan, K.M., 2006. The tick protective antigen, 4D8, is a conserved protein involved in modulation of tick blood ingestion and reproduction. Vaccine 24 (19), 4082–4095. Díaz-Martín, V., Manzano-Román, R., Oleaga, A., Encinas-Grandes, A., PérezSánchez, R., 2013. Cloning and characterization of a plasminogen-binding enolase from the saliva of the argasid tick Ornithodoros moubata. Vet. Parasitol. 191 (3), 301–314. Francischetti, I., 2010. Platelet aggregation inhibitors from hematophagous animals. Toxicon 56 (7), 1130–1144. Francischetti, I.M., Sá-Nunes, A., Mans, B.J., Santos, I.M., Ribeiro, J.M., 2009. The role of saliva in tick feeding. Front. Biosci. 14, 2051. Francischetti, I., Anderson, J.M., Manoukis, N., Pham, V.M., Ribeiro, J., 2011. An insight into the sialotranscriptome and proteome of the coarse bontlegged tick Hyalomma marginatum rufipes. J. Proteomics 74 (12), 2892–2908.

448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497

Please cite this article in press as: Xu, X.-L., et al. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes. Infect. Genet. Evol. (2015), http://dx.doi.org/10.1016/j.meegid.2015.03.010

432 433 434 435

438 439 440 441 442

445 446

MEEGID 2274

No. of Pages 8, Model 5G

17 March 2015 8 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544

X.-L. Xu et al. / Infection, Genetics and Evolution xxx (2015) xxx–xxx

Huang, X., Madan, A., 1999. CAP3: a DNA sequence assembly program. Genome Res. 9 (9), 868–877. Hue, F., Langeroudi, A.G., Barbour, A.G., 2013. Chromosome sequence of Borrelia miyamotoi, an uncultivable tick-borne agent of human infection. Genome Announc. 1 (5), e00713-13. Kaewhom, P., Stich, R.W., Needham, G.R., Jittapalapong, S., 2008. Molecular analysis of calreticulin expressed in salivary glands of Rhipicephalus (Boophilus) microplus indigenous to Thailand. Ann. N.Y. Acad. Sci. 1149 (1), 53–57. Kaewkong, W., Intapan, P.M., Sanpool, O., Janwan, P., Thanchomnang, T., Kongklieng, A., Tantrawatpan, C., Boonmars, T., Lulitanond, V., Taweethavonsawat, P., Chungpivat, S., Maleewong, W., 2014. High throughput pyrosequencing technology for molecular differential detection of Babesia vogeli, Hepatozoon canis, Ehrlichia canis and Anaplasma platys in canine blood samples. Ticks Tick Borne Dis. 5 (4), 381–385. Karim, S., Singh, P., Ribeiro, J.M., 2011. A deep insight into the sialotranscriptome of the gulf coast tick Amblyomma maculatum. PLoS One 6 (12), e28525. Kazimírová, M., Štibrániová, I., 2013. Tick salivary compounds: their role in modulation of host defences and pathogen transmission. Front. Cell. Infect. Microbiol. 3. Mans, B.J., Andersen, J.F., Schwan, T.G., Ribeiro, J., 2008. Characterization of antihemostatic factors in the argasid, Argas monolakensis: implications for the evolution of blood-feeding in the soft tick family. Insect Biochem. Mol. Biol. 38 (1), 22–41. Maritz-Olivier, C., Stutzer, C., Jongejan, F., Neitz, A.W., Gaspar, A.R., 2007. Tick antihemostatics: targets for future vaccines and therapeutics. Trends Parasitol. 23 (9), 397–407. McSwain, J.L., Essenberg, R.C., Sauer, J.R., 1982. Protein changes in the salivary glands of the female lone star tick, Amblyomma americanum, during feeding. J. Parasitol., 100–106 Metzker, M.L., 2009. Sequencing technologies – the next generation. Nat. Rev. Genet. 11 (1), 31–46. Mulenga, A., Kim, T.K., Ibelli, A.M.G., 2013. Deorphanization and target validation of cross-tick species conserved novel Amblyomma americanum tick saliva protein. Int. J. Parasitol. 43 (6), 439–451. Nuttall, P.A., Trimnell, A.R., Kazimirova, M., Labuda, M., 2006. Exposed and concealed antigens as vaccine targets for controlling ticks and tick-borne diseases. Parasite Immunol. 28 (4), 155–163. Ozawa, A., Yamaguchi, N., Hayakawa, K., Matsuo, I., Niizuma, K., Ohkido, M., 1982. A case of tick bite (Haemaphysalis flava) – consideration of tularemia infection through tick bite. Nihon Hifuka Gakkai Zasshi 92 (13), 1415–1421. Ribeiro, J.M.C., Labruna, M.B., Mans, B.J., Maruyama, S.R., Francischetti, I., Barizon, G.C., de Miranda Santos, I.K., 2012. The sialotranscriptome of Antricola delacruzi female ticks is compatible with non-hematophagous behavior and an alternative source of food. Insect Biochem. Mol. Biol. 42 (5), 332–342. Ribeiro, J.M., Anderson, J.M., Manoukis, N.C., Meng, Z., Francischetti, I.M., 2011. A further insight into the sialome of the tropical bont tick Amblyomma variegatum. BMC Genomics 12 (1), 136.

Schwarz, A., Tenzer, S., Hackenberg, M., Erhart, J., Gerhold-Ay, A., Mazur, J., Kuharev, J., Ribeiro, J.M., Kotsyfakis, M., 2014. A systems level analysis reveals transcriptomic and proteomic complexity in Ixodes ricinus midgut and salivary glands during early attachment and feeding. Mol. Cell. Proteomics 13 (10), 2725–2735. Schwarz, A., von Reumont, B.M., Erhart, J., Chagas, A.C., Ribeiro, J.M., Kotsyfakis, M., 2013. De novo Ixodes ricinus salivary gland transcriptome analysis using two next-generation sequencing methodologies. FASEB J. 27 (12), 4745–4756. Tian, Z., Liu, G., Zhang, L., Yin, H., Wang, H., Xie, J., Zhang, P., Luo, J., 2011. Identification of the heat shock protein 70 (HLHsp70) in Haemaphysalis longicornis. Vet. Parasitol. 181 (2), 282–290. Tirloni, L., Seixas, A., Mulenga, A., Vazlda Jr., S., Termignoni, C., 2014. A family of serine protease inhibitors (serpins) in the cattle tick Rhipicephalus (Boophilus) microplus. Exp. Parasitol. 137, 25–34. Titus, R.G., Bishop, J.V., Mejia, J.S., 2006. The immunomodulatory factors of arthropod saliva and the potential for these factors to serve as vaccine targets to prevent pathogen transmission. Parasite Immunol. 28 (4), 131–141. Tsuji, N., Miyoshi, T., Battsetseg, B., Matsuo, T., Xuan, X., Fujisaki, K., 2008. A cysteine protease is critical for Babesia spp. transmission in Haemaphysalis ticks. PLoS Pathog. 4 (5), e1000062. Van Verk, M.C., Hickman, R., Pieterse, C.M., Van Wees, S., 2013. RNA-Seq: revelation of the messengers. Trends Plant Sci. 18 (4), 175–179. Vayssier-Taussat, M., Moutailler, S., Michelet, L., Devillers, E., Bonnet, S., Cheval, J., Hébert, C., Eloit, M., 2013. Next generation sequencing uncovers unexpected bacterial pathogens in ticks in Western Europe. PLoS One 8 (11), e81439. Wang, L., Feng, Z., Wang, X., Wang, X., Zhang, X., 2010. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26 (1), 136–138. Wang, Z., Gerstein, M., Snyder, M., 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10 (1), 57–63. Williams-Newkirk, A.J., Rowe, L.A., Mixson-Hayden, T.R., Dasch, G.A., 2012. Presence, genetic variability, and potential significance of ‘‘Candidatus Midichloria mitochondrii’’ in the lone star tick Amblyomma americanum. Exp. Appl. Acarol. 58 (3), 291–300. Yamaji, K., Miyoshi, T., Hatta, T., Matsubayashi, M., Alim, M.A., Kushibiki, S., Fujisaki, K., Tsuji, N., 2013. HlCPL-A, a cathepsin L-like cysteine protease from the ixodid tick Haemaphysalis longicornis, modulated midgut proteolytic enzymes and their inhibitors during blood meal digestion. Infect. Genet. Evol. 16, 206–211. Yates, S.A., Swain, M.T., Hegarty, M.J., Chernukin, I., Lowe, M., Allison, G.G., Ruttink, T., Abberton, M.T., Jenkins, G., Skøt, L., 2014. De novo assembly of red clover transcriptome based on RNA-Seq data provides insight into drought response, gene discovery and marker identification. BMC Genomics 15 (1), 453. Zhang, H., Wang, X.H., Fan, W.F., Yuan, M., 2010. Review of parasitosis of gaint panda. Gansu. Anim. Vet. Sci. 212, 40–43.

Please cite this article in press as: Xu, X.-L., et al. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes. Infect. Genet. Evol. (2015), http://dx.doi.org/10.1016/j.meegid.2015.03.010

545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589

De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes.

Saliva plays an important role in feeding and pathogen transmission, identification and analysis of tick salivary gland (SG) proteins is considered as...
1MB Sizes 8 Downloads 13 Views