J Appl Genetics DOI 10.1007/s13353-015-0275-8

ANIMAL GENETICS • ORIGINAL PAPER

Genome-wide identification of allele-specific expression in response to Streptococcus suis 2 infection in two differentially susceptible pig breeds Huayu Wu & Uma Gaur & Supamit Mekchay & Xianwen Peng & Lianghua Li & Hua Sun & Zhongxu Song & Binke Dong & Mingbo Li & Klaus Wimmers & Siriluck Ponsuksili & Kui Li & Shuqi Mei & Guisheng Liu

Received: 11 March 2014 / Revised: 15 January 2015 / Accepted: 10 February 2015 # Institute of Plant Genetics, Polish Academy of Sciences, Poznan 2015

Abstract Although allele expression imbalance has been recognized in many species, and strongly linked to diseases, no whole transcriptome allele imbalance has been detected in pigs during pathogen infections. The pathogen Streptococcus suis 2 (SS2) causes serious zoonotic disease. Different pig breeds show differential susceptibility/resistance to pathogen infection, but the biological insight is little known. Here we analyzed allele-specific expression (ASE) using the spleen transcriptome of four pigs belonging to two phenotypically different breeds after SS2 infection. The comparative analysis of allele specific SNPs between control and infected animals revealed 882 and 1096 statistically significant differentially expressed allele SNPs (criteria: ratio≧2 or ≦0.5) in Landrace and Enshi black pig, respectively. Twenty nine allelically imbalanced SNPs were further verified by Sanger sequencing, and later six SNPs were quantified by pyrosequencing assay. The pyrosequencing results are in agreement with the RNAseq results, except two SNPs. Looking at the role of ASE in

predisposition to diseases, the discovery of causative variants by ASE analysis might help the pig industry in long term to design breeding programs for improving SS2 resistance. Keywords Allele specific expression . Disease resistance . Pig . SNP . Streptococcus suis

Background Although a gene contains two alleles in a diploid genome, under the effect of regulatory factors, one allele is expressed preferentially over the other, which is termed allele-specific expression (ASE). ASE may accumulate with genetic divergence and possibly with adaptation to different environments and are responsive to dynamic developmental processes (Von

Communicated by: Maciej Szydlowski Huayu Wu and Uma Gaur contributed equally to this work. H. Wu : U. Gaur : X. Peng : L. Li : H. Sun : Z. Song : B. Dong : M. Li : S. Mei : G. Liu Institute of Animal Science and Veterinary Medicine, Hubei Academy of Agricultural Sciences, Yaoyuan No. 1, Nanhu, Hongshan District Wuhan 430064, China H. Wu : U. Gaur : X. Peng : L. Li : H. Sun : B. Dong : S. Mei : G. Liu (*) Hubei Key Laboratory of Animal Embryo Engineering and Molecular Breeding, Yaoyuan No. 1, Nanhu, Hongshan District Wuhan 430064, China e-mail: [email protected] S. Mekchay Department of Animal Science, Faculty of Agriculture Chiang Mai University, Chiang Mai 50200, Thailand

K. Wimmers Leibniz Institute for Farm Animals Biology (FBN), Research Unit ‘Molecular Biology’, Wilhelm-Stahl-Allee 2, Dummerstorf 18196, Germany

S. Ponsuksili Leibniz Institute for Farm Animals Biology (FBN), Research Group ‘Functional Genome Analysis’, Wilhelm-Stahl-Allee 2, Dummerstorf 18196, Germany

K. Li Beijing Institute of Animal Science and Veterinary Medicine, Chinese Academy of Agricultural Sciences, Malianwa, Haidian District Beijing 100193, China

J Appl Genetics Table 1 Mapping of trimmed RNA-seq reads to the pig genome

Sample

Sequencing direction

Clean reads

Total mapped reads

Uniquely mapped reads

Multiple mapped reads

LC*

End 1 End 2 End 1 End 2 End 1 End 2 End 1 End 2

37342142 36542439 36675619 35911390 36569367 35084779 38774996 37990564

28713927 27632166 26580528 21200095 27938773 26197728 29582853 28560488

24724329 23832555 22961773 18351852 24005137 22633950 25688340 24867196

3989598 3799611 3618755 2848243 3933636 3563778 3894513 3693292

LI EC Notes*: LC- Landrace control; LI- Landrace infected; EC- Enshi black control, EI- Enshi black infected

EI

Korff et al. 2009). ASE is invoked and modulated by both genetic and epigenetic changes, affecting the underlying DNA sequence or chromatin of each allele respectively (Gaur et al. 2013). The genome wide expression quantitative trait loci (eQTL) analysis is used as an initial step to define the functions, but ASE has gained much importance only in the recent past (Almlöf et al. 2012). The comparison of ASE analysis with traditional e-QTL mapping in human primary monocytes supported the fact that ASE is more sensitive for detecting cis-regulatory SNPs (cis-rSNPs) than standard eQTL mapping approach. In a recent study by Frésard et al. (2014), the parental-allele-specific differential expression is scanned in the chicken genome. Their results confirmed the absence of imprinting in chicken embryos at embryonic day 4.5. A substantial number of studies have recently been carried out to identify associations between SNPs and complex disease. ASE analysis has been used to understand the mechanism of differential Marek virus resistance in poultry (Perumbakkam et al. 2013; MacEachern et al. (2011a, b). These results suggest that ASE screens are a simple and powerful approach to identify genetic elements and specific alleles for genetic resistance to complex traits, especially those that involve two-state situations in pathogen challenge experiments where parental alleles can be tracked by SNPs or other polymorphisms (Meydan et al. 2011).

Table 2

Streptococcus suis (S. suis) is a major zoonotic pathogen in many countries causing meningitis, septicaemia, endocarditis, arthritis, and septic shock with high mortality. There are 35 serotypes of S. suis, of which serotype 2 (SS2) is the most prevalent (Gottschalk et al. 2010). There have been many SS2 outbreaks in humans worldwide (China, Thailand, United Kingdom, Netherlands, Australia, and United states (Watkins et al. 2001; Van de Beek et al. 2008; Tramontana et al. 2008; Smith et al. 2008; Chen et al. 2007). Previous studies of the SS2 pathogenesis mainly focused on the virulence factors. Whole transcriptome studies have been performed in pigs with response to the SS2 infection, and a number of differentially expressed transcripts are identified which are involved in the immune response including genes related to the inflammatory response, innate immune response, cell adhesion, antigen processing and presentation, cytokines and angiogenesis (Li et al. 2010 and Liu et al. 2011 by the microarray; Gaur et al. 2014 by RNA-seq). In this study, we have examined ASE in two different pig breeds showing phenotypically differential susceptibility to SS2 infection, by RNA-seq, to uncover genes with cis-acting regulatory elements that respond to SS2 infection. Furthermore, highly significant ASE SNPs have been verified using Sanger sequencing and later on expression is quantified for selected SNPs using pyrosequencing assay.

Mapped reads on expressed genes Table 3

Sample

LC* LI EC EI

Total genes

22017 22017 22017 22017

Notes*: the same as Table 1

Distribution and proportion of reads mapped in genome

Expressed genes (mapped reads number>0)

Expressed genes (mapped reads number>10)

Sample CDS

13809 13665 13859 13832

11578 (83.84 %) 11373 (83.23 %) 11610 (83.77 %) 11511 (83.22 %)

LC* LI EC EI

28.72 24.57 30.10 26.97

Intergenic Intron

% % % %

18.80 18.52 19.30 17.76

% % % %

Notes*: the same as Table 1

44.22 48.32 42.50 46.81

5'UTR 3' UTR Non-coding exon % % % %

0.99 0.88 1.10 0.86

% % % %

4.17 4.35 4.20 4.31

% % % %

3.10 3.36 2.80 3.29

% % % %

J Appl Genetics

High throughput sequencing Approximately an average of 72 million reads/sample were generated of which more than 70 % were mapped to the porcine reference genome (from www. ensembl.org) (Table 1) using Tophat (Trapnell et al. 2009), with ∼86 % uniquely mapped reads & ∼14 % multiple mapped reads. Multiple genomic localizations are always associated with repeat sequence in genome, so only reads mapping with high confidence to a unique location in our reference sequence set were included in this study. BLASTing all mapped reads against the reference sequence database identified more than 13,000 expressing genes (Table 2), which is ∼60 % of the genes represented in reference sequence database, reads were mainly located in intron region, followed by CDS and intergenic region (Table 3). The high proportion of reads mapping in the intronic region reflect the incomplete or inaccurate annotation of swine genome. Statistical analysis of SNPs There is non-significant difference in SNP numbers between infected and control animals in each pig breed (Tables 4 and 5). However, the total SNPs count is almost double in Enshi black than Landrace pig breed, this is mostly because of homozygous SNPs (Table 4), in Table 5, this difference is shown in the CDS, non-coding exon, 5'UTR, intron and intergenic region. The two breeds for proportion of SNP distributions have nonsignificant difference in CDS, 3'UTR, 5'UTR, intron, intergenic region, however, the SNPs in non-coding exons have a bigger differences, 0.43∼0.48 % vs. 0.28∼0.29 % (Table 5). ASE SNPs distribution analysis The distribution analysis in each pig chromosome revealed that the SNPs are mainly concentrated in the terminal regions (Fig. 1a-d). From the mapped reads over 1,300,000 polymorphisms (p value ≧0.8, copy number >2, interval between each SNP sites must be longer than 5 bp) were initially identified in all four animals transcriptomes (Table 4). The strict filtering of the raw SNP data reduced this number to 78,834 allele specific SNPs (copy numbers >25, interval between each SNP sites must be longer than 45 bp) (Table 4). The comparative analysis of these SNPs between control and infected animals revealed 882 and 1096 statistically significant differentially expressed allele SNPs

Table 4 Statistical analysis of SNPs

Notes*: the same as Table 1

SNP distribution across genomic regions

Table 5

Results

Sample CDS

Non-coding 3' UTR 5'UTR Intron Exon

LC* LI EC EI

1092 1043 1300 1217

19684 17982 30565 29071

4457 4069 6919 6783

1258 1141 1901 1829

Intergenic

127095 102097 109281 83111 228691 177839 224227 172859

Notes*: the same as Table 1

(ratio≧2 or ≦ 0.5) in Landrace and Enshi black pig, respectively. Gene set enrichment The distribution analysis of genes with significant differentially expressed allele SNPs between control and infected animals in each breed revealed that allele SNPs are mainly distributed on chromosome 7 in Landrace pig, whereas the allele SNPs are distributed more uniformly on all chromosomes in Enshi black pig. This suggests a clear cut difference in the corresponding molecular mechanisms in two pig breeds after SS2 infection. The function clustering of genes with significant differentially expressed SNPs in Landrace pig resulted in one cluster mainly enriched in antigen processing and immune functioning, while in Enshi black function clustering resulted in three clusters mainly enriched in cytokine signal transduction (Tables 6 and 7). Validation and quantification of ASE SNPs Twenty-nine SNPs, mainly located in immune-related genes, were randomly selected and further validated using Sanger sequencing to rule out the possibility of false SNP calling. Partial Sanger sequencing results for the selected six SNPs in two pig breeds are shown in Fig. 2a-f. In some of the chromatograms, we could clearly see two peaks for an SNP, and that was in concordance with our RNA sequencing results. Six SNPs already validated by Sanger sequencing were chosen for further quantification using pyrosequencing. Three SNPs are already available in porcine database. The other three are novel SNPs, selected based on immune functioning predicted by GO analysis (Table 8). For detailed information about the ASE SNPs please refer to materials and methods section. The locations of these six SNPs were manually tracked in the porcine genome assembly Sscrofa10.2. Although the pyrosequencing results of two allele SNPs which are transcripts ENSSS

Sample

Total SNPs

Homozygous SNPs

Heterozygous SNPs

Total number of ASE SNPs

LC* LI EC EI

255583 216627 447215 436086

143377 (56.10 %) 117862 (54.41 %) 302228 (67.58 %) 291604 (66.87 %)

112206 (43.90 %) 98765 (45.59 %) 144987 (32.42 %) 144482 (33.13 %)

18581 17744 21726 20783

J Appl Genetics

CT00000008144 and ENSSSCT00000007515 were not in complete agreement with the RNA-seq results, yet the other four allele SNPs reflected the same state of allelic imbalance as shown by RNA-seq results.

Discussion ASE is a powerful technique that measures the expression of each allele via SNP within an RNA sample. The allelic imbalance (AI) displayed by a gene is sufficient to identify a cis-

regulatory element (Stamatoyannopoulos 2004). The key advantage of this approach is identification of an SNP exhibiting ASE which is in tight linkage disequilibrium with the causative polymorphism, thus essentially identifying a high confidence candidate gene (genetic factor) with expression variation (Perumbakkam et al. 2013). The ASE of certain variants has been reported to make the organism more prone to disease risk (Hundhausen et al. 2012; Blomquist et al. 2013; Park et al. 2014; Ronninger et al. 2012; Fairfax et al. 2012). In humans the different transcript expression derived from each parental copy of CCAAT/enhancer binding protein gamma (CEBPG) is associated with predisposition to

a Landrace control

b Landrace infected

c Enshi black control

d Enshi black infected

Fig. 1 a: Landrace control. b: Landrace infected. c: Enshi black control. d: Enshi black infected. a–d Distribution analysis of SNPs in each pig breed in chromosome 1. SNPs are distributed mainly in the telomere regions

J Appl Genetics

lung cancer (Blomquist et al. 2013). So far the differential transcript expression in relation to some complex disease has not been carried out in pigs at whole transcriptome scale. The present work studied ASE in two phenotypically different pig breeds showing different clinical symptoms during SS2 infection. Piglets from commercial pig Landrace and indigenous pig Enshi black were infected with SS2 and the spleens were harvested at 56 hrs after challenge. High throughput transcriptome sequencing of the spleen was followed by further SNP analysis which revealed little difference in SNP numbers between control and infected animal in each pig breed (Table 4). Whereas the total number of SNPs in Enshi black is about 1.7∼2 times than Landrace. The chromosome wise distribution analysis of SNPs revealed that the telomeres are the hot spots for point mutations (Fig. 1a-d). This could possibly point toward the small RNA based regulation of gene expression, since centromeric or telomeric regions are known to be a source of small RNAs (Djupedal et al. 2009). Higher SNPs distribution in these hot Table 6

spots indicates the possibility of some kind of small RNA based regulation during infection. A total of 765 common differentially expressed SNPs are found in two infected pig breeds. These could be the candidate SNPs responsible for differential immune response in both pig breeds. Further validation needs to be done in order to find out the complex relationship between ASE and the infection response. We have selected 29 allele SNPs to validate using Sanger sequencing, and the results were in agreement with RNA-seq data (Fig. 2a-f). A total of six allelically imbalanced SNPs: three dbSNPs and three novel SNPs, all in immune related genes were further quantified using pyrosequencing assay. In the recent past, pyrosequencing has emerged as a reliable method for quantifying the expression from each allele (Yang et al. 2013; Wang et al. 2007), because this method can detect small differences in relative gene expression from each chromosome with high accuracy. The first allelically imbalanced SNP (transcript i.d. ENSSS CT00000001612) (G/A) is located in gene MHC class II histocompatibility antigen SLA-DRB1, which is an

GO term analysis of ASE-SNPs (ratio ≧0.5) in Landrace control versus Landrace infected

Annotation cluster 1

Enrichment score: 1.5187009826713618

Term GO:0042613- MHC class II protein complex GO:0002504- antigen processing and presentation of peptide or polysaccharide antigen via MHC class II MHCII ssc04672: intestinal immune network for IgA production PIRSF001991: class II histocompatibility antigen ssc04612: antigen processing and presentation

Count 3 3

P value 0.001697 0.001771

Fold enrichment 4.01 41.10577

3 4 3 4

0.002465 0.002565 0.002672 0.008822

35.82 13.13651 33 8.514403

GO:0042611- MHC protein complex ssc05310: asthma GO:0006955- immune response GO:0019882- antigen processing and presentation IPR003006: immunoglobulin/major histocompatibility complex, conserved site ssc05332: graft-versus-host disease ssc05322: systemic lupus erythromatosus IPR003597: immunological C1-set ssc05330: allograft rejection ssc04940: Type I diabetes mellitus

3 3 5 3 3

0.009134 0.012177 0.012735 0.01889 0.022943

17.43478 16.42063 4.684418 12.64793 12.00568

3 4 3 3 3

0.024153 0.026441 0.029213 0.030556 0.030556

11.49444 5.676269 10.565 10.14216 10.14216

SM00407: IGc1 ssc05320: autoimmune thyroid disease ssc05416: viral myocarditis GO:0005886- plasma membrane ssc04514: cell adhesion molecules (CAMs) IPR007110: immunoglobulin like GO:0044459- plasma membrane part IPR013783: immunoglobulin-like fold

3 3 3 5 3 3 3 3

0.037255 0.043186 0.04911 0.062278 0.095117 0.152322 0.171041 0.191412

8.92 8.410569 7.837121 2.68407 5.388021 4.126953 3.517544 3.569257

J Appl Genetics Table 7

GO term analysis of ASE-SNPs (ratio ≧0.5) in Enshi black control versus Enshi black infected

Annotation cluster 1

Enrichment score: 2.2892326338328175

Term ssc04062: Chemokine signaling pathway ssc04540: Gap junction ssc04916: Melanogenesis Annotation cluster 2 ssc04062: Chemokine signaling pathway ssc04912: GnRH signaling pathway ssc04020: Calcium signaling pathway Annotation cluster 3 GO:0032553- ribonucleotide binding GO:0032555- purine ribonucleotide binding GO:0017076- purine nucleotide binding GO:0000166- nucleotide binding GO:0005524- ATP binding GO:0032559- adenyl ribonucleotide binding GO:0001883- purine nucleoside binding GO:0030554- adenyl nucleotide binding

Count 7 3 3 Enrichment score: 2.08959974392798 7 3 3 Enrichment score: 0.7515485373305347 4 4 4 4 3 3 3 3

P value 5.79E-05 0.048388 0.048388

Fold enrichment 8.831098 7.957692 7.957692

5.79E-05 0.065071 0.142885

8.831098 6.746739 4.25137

0.105154 0.105154 0.126486 0.189109 0.203325 0.209836 0.247304 0.247304

3.017308 3.017308 2.789333 2.333086 3.202041 3.138 2.818563 2.818563

GO:0001882- nucleoside binding

3

0.249532

2.801786

immunological gene-dense region of high diversity in mammalian species and involved in antigen processing pathways (Mantegazza et al. 2013). This is a novel SNP identified in swine. Genetic variability in swine leukocyte antigen class II (SLA-II) has been reported to be associated with immune responses to some bacterial infections in pigs (Shinkai et al. 2012). MHC polymorphism plays an important role in immune functioning by providing the advantage to the MHC heterozygous individuals, due to their ability to recognize a wider variety of antigens derived from multiple pathogens (Froeschke and Sommer 2012). In ASE quantification analysis, pyrosequencing clearly demonstrated AI, similar to the RNA-seq results. The ASE was preferentially skewed toward reference allele G in Landrace infected sample, where as the skew was toward mutant allele A in Landrace control (Table 9). As indicated by RNA-seq results, the same SNP was found to be monoalleically expressed in both Enshi black control and infected sample, with allelic bias toward mutant allele A. Interestingly, the SLA-DOB1 gene was downregulated for Landrace pig in our DEG study (Gaur et al. 2014) using the same data. The allele specific expression of the mutant allele A in Landrace control, Enshi black control and infected piglets indicated the possible involvement of this allele in immunity to SS2. This hypothesis needs further experimental verification. The second validated ASE SNP was located in BAT2 gene (HLA-B associated transcript 2; ENSSSCT00000001540), which is reported to be significantly associated with the immunological traits in Landrace piglets and had potential application value for marker-assisted selection in pig breeding with

disease resistance (Wang et al. 2012). This is again a novel SNP identified in swine. Interestingly, pyrosequencing demonstrated a clear cut monoallelic expression from the mutant allele A in all four samples, i.e., Landrace control and infected, Enshi black control and infected, whereas in the Sanger sequencing result for Landrace infected sample, two peaks for this SNP (G/A) were seen in the chromatogram. In the transcriptome sequencing result for this SNP, the AI ratio was 2.16 (Table 8), which is near to non-significant level of AI ratios, i.e., 2. This points toward possible bias for the reference allele A in the sequencing. The third SNP (ENSSSCT00000002561 (rs45434083)), which is allelically imbalanced in Landrace control sample, is located in ACTN1 gene, and is a dbSNP. ACTN1 seems to regulate the activity of a variety of receptors and serves as a scaffold to connect the cytoskeleton with diverse signaling pathways (Oikonomou et al. 2011). Pyrosequencing replicated allelic ratios obtained by RNA-seq results in all four samples. Landrace control AI ratio for this SNP (A/G), obtained by RNA-seq was 0.455 (Table 8) which is near to the nonsignificant ASE threshold of

Genome-wide identification of allele-specific expression in response to Streptococcus suis 2 infection in two differentially susceptible pig breeds.

Although allele expression imbalance has been recognized in many species, and strongly linked to diseases, no whole transcriptome allele imbalance has...
1MB Sizes 0 Downloads 6 Views