Acta Biotheor DOI 10.1007/s10441-015-9260-1 REGULAR ARTICLE

In Silico Functional and Structural Characterization of H1N1 Influenza A Viruses Hemagglutinin, 2010–2013, Shiraz, Iran Afagh Moattari1 • Behzad Dehghani1 • Nastaran Khodadad1 • Forogh Tavakoli1

Received: 8 September 2014 / Accepted: 6 May 2015  Springer Science+Business Media Dordrecht 2015

Abstract Hemagglutinin (HA) is a major virulence factor of influenza viruses and plays an important role in viral pathogenesis. Analysis of amino acid changes, epitopes’ regions, glycosylation and phosphorylation sites have greatly contributed to the development of new generations of vaccine. The hemagglutinins of 10 selected isolates, 8 of 2010 and 2 of 2013 samples were sequenced and analyzed by several bioinformatic softwares and the results were compared with those of 3 vaccine isolates. The study detected several amino acid changes related to altered epitopes’ sites, modification sites and physico-chemical properties. The results showed some conserved modification sites in HA structure. This study is the first analytical research on isolates obtained from Shiraz, Iran, and our results can be used to better understand the genetic diversity and antigenic variations in Iranian and Asian H1N1 pathogenic strains. Keywords

Influenza  Hemagglutinin  In silico  Glycosylation  Phosphorylation

1 Introduction In recent years, the influenza A virus has generally caused pandemic influenza where only type A influenza (H1N1) virus has infected millions of people and the infection has caused 18,000 deaths prior to May 30, 2010 globally (Girard et al.

Electronic supplementary material The online version of this article (doi:10.1007/s10441-015-92601) contains supplementary material, which is available to authorized users. & Afagh Moattari [email protected] 1

Influenza Research Center, Department of Bacteriology and Virology, Shiraz University of Medical Sciences, 71348-45794 Shiraz, Iran

123

A. Moattari et al.

2010; Taubenberger and Morens 2010). Among many subtypes of influenza A virus, H1N1, H2N2, and H3N2 subtypes have efficiently been adapted to transmit to and infect humans (Bouvier and Lowen 2010; Schrauwen and Fouchier 2014). For many years, H1N1 has accounted for most influenza epidemics. Unlike seasonal influenza, it has caused severe respiratory illness with high mortality rates (worldwide, 20–50 million deaths in 1918) (Ma et al. 2011; Taubenberger and Morens 2006). The emergence and transition of type A (H1N1) pdm09 in 2009 resulted in a new pandemic as declared by the World Health Organization (WHO, 6 August 2010). The virus RNA encodes eleven proteins including HA, NA, NP, M1, M2, NS1, NEP, PA, PB1, PB1-F2, PB2, of which hemagglutinin (HA) and neuraminidase (NA) are two surface glycoproteins that interact with cellular receptors and play an important role in cellular attachment (Kapoor and Dhama 2014; Mishin et al. 2005). Mishin et al. (2005) reported the role of HA in binding to cellular receptors and the functional balance between HA and NA in influenza virus infection. HA is synthesized in the endoplasmic reticulum as an HA precursor (HA0) that is posttranslationally cleaved into two subunits of HA1 and HA2 (Boulay et al. 1988). Influenza A virus cellular receptors contain terminal neuraminic acid (NeuAc) moieties (Mishin et al. 2005). Pathogenicity, virus infection and spread of the virus depend on the HA0 cleavage. The HA1 subunit carries the NeuAc-binding site, and the HA2 subunit is responsible for fusion of viral and cellular membrane (Mishin et al. 2005). Structurally, HA is a trimer glycoprotein and comprises a globular head and stem regions. Globular region includes receptor binding domain and major antigenic sites and the stem consists of fusion peptide that supports globular domain (Das et al. 2010; Wang et al. 2009). HA modification includes glycosylation and phosphorylation. HA co-translational or posttranslational glycosylation modification is essential for folding and transport (Anwar et al. 2006; Das et al. 2010; Wang et al. 2009). Frequent mutations in HA are related to variation in antigenic epitopes that affect the antibody recognition, escape from the immune responses, and impacts on vaccination (Han and Marasco 2011). Eighteen HA subtypes were recognized and for some subtypes high resolution crystal structures were determined (H1, H2, H3, H5, H7, H9, H14) (Sun et al. 2010; Tong et al. 2012, 2013). Several studies have focused on the relationship between functional and structural properties of HA subtypes and determined important structures related to special function (Isin et al. 2002; Sriwilaijaroen and Suzuki 2012; Sun et al. 2010). These studies provided beneficial data to identify the corresponding structural and functional modules in HA. Comparing the similarities and differences between HA modules could usefully define other HA molecule’s properties. Bioinformatic analysis of HA is a favorable and useful method to determine several changes in amino acids, modification sites, B cell and T cell epitopes (Das et al. 2010; Sun et al. 2010). The study of amino acid variations related to epitopes could lead to a new generation of vaccine against influenza. This study attempted to determine the major changes in the HA protein of influenza viruses isolated in the

123

In Silico Functional and Structural Characterization of…

Virology Department of Shiraz University Medical School in 2010 and 2013, compared with those of vaccine strains introduced by the World Health Organization (WHO).

2 Materials and Methods 2.1 Sampling The present study comprised 772 patients selected from pandemic Influenza A (H1N1) infections in Shiraz, southern Iran, between May 2010 to February 2013. The specimens collected from the patients were placed in viral transport media and transported, under refrigeration to the virology laboratory of Shiraz University of Medical School (SUMS) and stored at -70 C until tested. The study was approved by the ethics committee of SUMS. 2.2 RNA Extraction and Real Time Reverse Transcription (rRT)-PCR RNA extraction was carried out using Roche High Pure Viral RNA Extraction Kit (Roch, Mannheim, Germany) according to the manufacturer’s instructions. Extracted RNAs were kept at -80 C until further processing, where rRT-PCR was carried out using SuperScript III Platinum One-Step Quantitative RT-PCR kit manufactured by Invitrogen. Real time runs were performed on the Corbett 6000 Rotor Gene system. The reaction comprised 4 ll of the extracted RNA combined with 16 ll of the master mix, including 29 reaction mix, SuperScript III RT/ Platinum Taq Mix, 5.4 ll RNase-DNase Free water and 0.4 ll of each primer and probe. Each isolate of RNA was tested by separate primer/probe sets for detection of influenza universal swine (swFLUA), swine H1 and RNase P. According to the CDC Real time RT-PCR protocol, the cycling conditions included a 30 min RT step at 50 C, followed by enzyme inactivation at 95 C for 2 min. PCR step included 45 cycles at 95 C for 15 s, 55 C for 30 s, and 72 C for 30 s. Data collection and analysis of the real-time PCR assay were accomplished using the Rotor-Gene data analysis Software, Version 6.0A. The isolates were positive for H1N1pdm09 grown in MDCK cells. 2.3 Virus Isolation The swabs were vortexed in 5 ml DMEM medium for a few minutes to dislodge and suspend adherent viruses. The Madin–Darby canine kidney cell confluent monolayers were inoculated with 200 microliters of the viral suspension proven positive by Real Time PCR. The monolayers were maintained in the serum free Dulbeco’s Modified Eagle’s Medium (Sigma) and supplemented with 2 mg/ml trypsin (Gibco BRL, Life Technologies), 100 lg/ml streptomycin and 100 units/ml penicillin G. The cultures were incubated at 34 C and examined daily for cytopathic effect which was confirmed by the ability of infected cultures to agglutinate guinea pig erythrocytes no later than 7 days post-infection.

123

A. Moattari et al.

2.4 Sequencing The PCR products of 8 HA gene isolated in 2010 and 2 HA gene isolated in 2013 were purified by a commercial gel extraction kit (QiagenGmbH, Hilden, Germany) and subsequently sequenced. The nucleotide sequences obtained in this study were submitted to Gen Bank under the following accession numbers. 2.5 Selection of HA for Analysis For bioinformatic analysis, 10 sequences were submitted (full length: 1701 bp, 567 amino acids): GenBank:HQ419004.1(A/Shiraz/1/2010(H1N1), GenBank:HQ419005.1(A/Shiraz/2/2010(H1N1), GenBank:HQ419006.1(A/Shiraz/3/2010(H1N1), GenBank:HQ4 19007.1(A/Shiraz/4/2010(H1N1), GenBank:HQ419008.1(A/Shiraz/5/2010(H1N1), GenBank:HQ419009.1(A/Shiraz/6/2010(H1N1), GenBank:HQ419010.1(A/Shiraz/ 7/2010(H1N1), GenBank:HQ419011.1(A/Shiraz/8/2010(H1N1), GenBank:KJ7812 17.1(A/Shiraz/38/2013(H1N1), GenBank:KJ781218.1(A/Shiraz/43/2013(H1N1) and three vaccine isolates GenBank:FJ981613(A/California/07/2009(H1N1), GenBank: CY058519 (California/07/2009 x NYMC X-157), GenBank:CY030232(A/Brisbane/ 59/2007(H1N1)) were obtained from http://www.ncbi.nlm.nih.gov. For easier reading, abbreviations were used instead of the names of isolates: Shiraz1–Shiraz 8, Shiraz 38, Shiraz 43, Calif, Calif X-157, and Brisbane (Table 1). 2.6 Amino Acid Changes and Phylogenetic Trees For analysis of the mutations in all 13 HA sequences, translated and editing were carried out with the CLC sequence viewer version Beta (QIAGEN). The alignment of the translated peptides of all sequences was generated using CLUSTAL X software, version 1.81. Phylogenetic trees were constructed by neighbor–joining Table 1 Abbreviations were used instead of isolated names

123

GenBank

Abbreviations

HQ419004.1(A/Shiraz/1/2010(H1N1)

Shiraz 1

HQ419005.1(A/Shiraz/2/2010(H1N1)

Shiraz 2

HQ419006.1(A/Shiraz/3/2010(H1N1)

Shiraz 3

HQ419007.1(A/Shiraz/4/2010(H1N1)

Shiraz 4

HQ419008.1(A/Shiraz/5/2010(H1N1)

Shiraz 5

HQ419009.1(A/Shiraz/6/2010(H1N1)

Shiraz 6

HQ419010.1(A/Shiraz/7/2010(H1N1)

Shiraz 7

HQ419011.1(A/Shiraz/8/2010(H1N1)

Shiraz 8

KJ781217.1(A/Shiraz/38/2013(H1N1)

Shiraz 38

KJ781218.1(A/Shiraz/43/2013(H1N1)

Shiraz 43

FJ981613(A/California/07/2009(H1N1)

Calif

CY058519 (California/07/2009 9 NYMC X-157)

Calif x-157

CY030232(A/Brisbane/59/2007(H1N1)

Brisbane

In Silico Functional and Structural Characterization of…

(NJ) and maximum-likelihood (ML) methods, 100 times, to confirm the reliability of phylogenetic trees. 2.7 Primary Sequence Analysis Theoretical isoelectric point (PI), molecular weight, total number of positive and negative residues, extinction coefficient, instability index, aliphatic index and grand average hydropathy (GRAVY) were evaluated using the ‘‘Expasy’sProtParam’’ (http://expasy.org/tools/protparam.html), (Gasteiger et al. 2005). ‘‘PROTSCALE’’ (http://us.expasy.org/tools/protscale.html) was used to calculate the number of codons, bulkiness, polarity, refractivity, recognition factors, hydrophobicity, transmembrane tendency, percent buried residues, percent accessible residues, average area buried, average flexibility, relative mutability, and the number of amino acids (Gasteiger et al. 2005). 2.8 Immuno-Informatic Analysis B cell epitopes’ positions were determined at www.immuneepitope.org (http://tools. immuneepitope.org/tools/bcell/iedb_input). The server uses the following methods: Chou and Fasman method of Chou and Fasman (2006) used for Beta-Turns (Karplus and Schulz 1985) for predicting the flexibility; Emini method (Emini et al. 1985) for predicting surface accessibility and Parker method (Parker et al. 1986) for hydrophilicity evaluation. Linear B cell epitopes were also predicted by Bepipred (Larsen et al. 2006) (http://www.cbs.dtu.dk/services/BepiPred/) software. BcePred software at http:// www.imtech.res.in/raghava/bcepred was run on sequences to detect polarity-based B cell epitopes in addition to properties used by the previous server (Saha and Raghava 2004). ABCpred software at http://www.imtech.res.in/raghava/abcpred/ predicted B cell epitopes (Saha and Raghava 2006b). Probability of antigenicity was estimated at http://www.ddg-pharmfac.net/ vaxijen/VaxiJen/VaxiJen.html website using VaxiJen software (Doytchinova and Flower 2007). Default threshold of the software was 0.4. Also AlgPred (Saha and Raghava 2006a) at http://www.imtech.res.in/raghava/algpred/submission.html was used regarding IgE epitopes. 2.9 Functional Characterization DISPHOS (http://www.dabi.temple.edu/disphos/pred.html) (Iakoucheva et al. 2004) and NetPhos (http://www.cbs.dtu.dk/services/NetPhos/) (Blom et al. 1999) were used to predict serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins. NetPhosK (http://www.cbs.dtu.dk/services/NetPhosK/) (Blom et al. 2004) was used to determine kinase specific phosphorylation sites in eukaryotic proteins. N-glycosylation sites were predicted using NetNGlyc (http://www.cbs.dtu.dk/ services/NetNGlyc/), (Gupta and Brunak 2002) and GlycoEP (http://www.imtech. res.in/raghava/glycoep/submit.html) (Chauhan et al. 2013).

123

A. Moattari et al.

2.10 Secondary Structure Prediction SOPMA software at http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_ sopma.html (Geourjon and Deleage 1995) was used to predict the secondary structure of all sequences. The results were confirmed by Phyre server at http:// www.sbg.bio.ic.ac.uk/phyre (Kelley and Sternberg 2009); ALPHAPRED, BetaTpred2 (Kaur and Raghava 2003b), and GAMMAPred (Kaur and Raghava 2003a) software at http://www.imtech.res.in and RONN at http://www.strubi.ox.ac.uk/ RONN (Yang et al. 2005). 2.11 Tertiary Structure Prediction and Validation All 3D structures were built in I-TASSER (Roy et al. 2010) at http://zhanglab.ccmb. med.umich.edu/I-TASSER, Phyre2server (Kelley and Sternberg 2009) at http:// www.sbg.bio.ic.ac.uk/*phyre2/html and (PS)2 Server (Chen et al. 2006) at http:// ps2v2.life.nctu.edu.tw. Qmean (Benkert et al. 2008) at http://swissmodel.expasy. org/qmean/cgi/index.cgi was employed to evaluate the stereochemistry and quality of the models. The Ramachandran plots were mapped by Rammpage at http:// mordred.bioc.cam.ac.uk/*rapper/rampage.php.

3 Results 3.1 Phylogenetic Results Phylogenetic tree for 13 isolates is shown in Fig. 1 by NJ method. The two main clades are shown in tree. The upper clade was divided into two clusters. In the first cluster Calif and CalifX-157 were closer than Brisbane and in the second cluster Shiraz 38 and Shiraz 43 were very close with 94 bootstrap score. Down clades were divided into two clusters, where the first cluster included Shiraz 4, Shiraz 5 and the second cluster contained other isolates. In addition, by ML method, two main clades are shown in tree, and the upper clade was divided into two clusters; in the first cluster Shiraz 1–8 and Calif X-157 and calif were close, and in the second cluster Shiraz 38 and 43 were very closely related. Down clade include Brisbane isolate. 3.2 Amino Acid Changes Comparison of the patient and vaccines’ isolates showed changes in several amino acids. Computable changes were observed in 100, 220, and 104 positions in Calif and Calif X-157., and similar changes were found in all isolates’ sequences (Table 2). 3.3 ProtParam and Protscale Properties A variety of protein sequences were evaluated using ProtParam, ProtParam physicochemical properties, molecular weight, aliphatic index. The grand averages of

123

In Silico Functional and Structural Characterization of…

Fig. 1 Phylogenetic tree for 13 sequences, bootstrap 100. By NJ and Ml methods

hydropathicity were similar in patients’ isolates and vaccine sequences. PI analysis results were divided into 3 groups. The first group was vaccine and Shiraz 6 isolates (pI 7.19–6.74), the second was Shiraz 7 (pI 7.51) isolate, and the third group included Shiraz 1–Shiraz 5, Shiraz 8, Shiraz 38 and Shiraz 43 (pI 7.81–8.22) isolates. There was no significant difference between the instability indexes of the isolates, which was predicted as stable proteins. ‘‘PROTSCALE’’ results for several properties of patient and vaccine isolates showed no significant difference; the results revealed a high degree of similarity in many of the isolates’ features.

123

123

Unique amino acid changes

Popular amino acid changes

G242R, E252Q, E260Q

A261P

V251I

?

I338V

?

S102T

D291N

?

?

N104K

G105W

N104E

?

S220T

D103H, T262P, T316K, P321S

?

?

?

?

?

?

P100S

Shiraz 3

Shiraz 2

Shiraz 1

Y175F, P176S, S181T, T183L, A278V, G378K

?

I477M

I341V, G478K

G157A

?

?

?

?

?

?

Shiraz 5

E391K

?

?

?

?

?

?

Shiraz 4

S138N, R222K

?

?

?

?

?

Shiraz 6

D239G

?

?

?

?

?

Shiraz 7

?

?

?

?

?

?

?

Shiraz 8

Table 2 Comparison of amino acid residues changes among isolates and vaccine isolates California and California X-157

? ?

S202T E516K

? ? ?

V226L N461D T305P

Q205H, S285P, K300E, V428L, N472K, A549E

?

S468N

Y10F, I22L, D114N, K300Q, P314L, D489N

?

C320Y

?

? ?

H155R

?

?

?

?

?

Shiraz 43

?

?

?

?

Shiraz 38

A. Moattari et al.

In Silico Functional and Structural Characterization of…

3.4 B Cell Epitopes Analysis Proteins’ sequences position containing the B cell epitopes at 80 % identity level by immuneepitope results are shown in Table 3,where they confirmed three common regions except for Brisbane (141–145, 173–175, 501); 104–108 regions were found in Shiraz 38, Calif, and Calif X-157. Linear B cell epitopes by Bepipred analysis demonstrated 3 distinct conserved regions (28–32, 138–142, 499–507) and some common regions among the isolates. These include 100–107 regions in Shiraz 38, Shiraz 43, Brisbane, Calif, Calif X-157 and 289 region in Shiraz 4, Shiraz 7, Shiraz 38, Shiraz 43, Calif, and Calif X-157. BcePred results identified several similar B cell epitopes’ regions in all isolates including 114–120, 129–133, 462–468, 370–372 and 506–518. The regions shared by all isolates, except Brisbane, were 4. Shiraz 43 showed two new regions as 156–160 and 506–518. ABCpred result revealed 16 meric peptide sequences as B cell epitopes for 13 protein sequences (Table 4). Three 16 meric conserved regions (279, 357, 351) were determined; region 300 was common to all sequences except Brisbane. Epitopes having vaxijen cutoff value was considered as 0.4 for identification of T cell epitopes, the results showed no significant difference and all them were probably antigenic. Based on prediction by AlgPred, none of the proteins’ sequences was allergen. 3.5 Functional Analysis Prediction of serine, threonine and tyrosine phosphorylation sites by DISPHOS, NetPhos and kinase specific phosphorylation sites by NetPhosK are shown in Table 5. DISPHOS program output showed that Shiraz 1–Shiraz 8 had some similar regions (123, 127, 209, 215, 287, 501, 507); these were also shown in Calif and Table 3 Results of B cell epitopes by ‘‘immuneepitope’’ Isolates

B cell epitopes positions

Shiraz 1

141–145

173–175

501

Shiraz 2

?

?

?

Shiraz 3

?

?

?

Shiraz 4

?

Shiraz 5

?

?

?

Shiraz 6

?

?

?

Shiraz 7

?

?

?

Shiraz 8

?

?

?

?

Shiraz 38

?

?

?

Shiraz 43

?

?

?

104–108 102–103

Brisbane

108–110

?

107, 156, 500

Calif

?

?

?

?

100–102

Calif X-157

?

?

?

?

?

123

A. Moattari et al. Table 4 Results of 16 meric peptide sequences as B cell epitopes by ‘‘ABCpred’’ Isolates

Start codon of 16 meric peptide sequences as B cell epitopes

Shiraz 1

279

338

300

250

357

94

449

351

383

Shiraz 2

?

?

?

?

?

?

?

?

?

Shiraz 3

?

?

?

?

?

?

?

?

Shiraz 4

?

?

?

?

?

?

?

?

Shiraz 5

?

?

?

?

?

?

?

Shiraz 6

?

?

?

?

?

?

=

?

?

Shiraz 7

?

?

?

?

?

?

?

?

?

Shiraz 8

?

?

?

?

?

?

Shiraz 38

?

?

?

?

?

Shiraz 43

?

?

?

?

?

Brisbane

?

Calif

?

?

?

?

356

93

?

?

?

Calif X-157

?

?

?

?

?

93

?

?

?

356

350

497, 337

Calif X-157. Phosphorylation sites in Shiraz 38 (123, 126, 127, 209, 215, 287, 501, 507) were found in Calif and Calif X-157 but a new site (101) was found in Shiraz 43. Brisbane phosphorylation sites showed a very different Phosphorylation pattern. The study of the numbers of phosphorylation sites in the isolates revealed various changes. These include 14 serine sites in Shiraz 1, Shiraz 2, Shiraz 5, Shiraz 8, Calif and Calif X-157. On the other hand, 17 and 16 serine sites were found in Shiraz 3, Shiraz 4 as well as 13 in Shiraz 38 and Shiraz 43, respectively. In addition 8 threonine phosphorylation sites were detected in Shiraz 1, Shiraz 3, Shiraz 38, Calif and Calif X-157. Also 9 threonine phosphorylation sites were found in Shiraz 4 and Shiraz 8 with 7 in Shiraz 43 and 4 in Brisbane. Similar kinase phosphorylation sites (124, 220, 224, 326, 393, 500, and 524) were identified in all isolates except in Brisbane. Comparison of the results of the patient and vaccine isolates indicated lack of site 221 in Shiraz 1,Shiraz 5, Shiraz 7,Shiraz 8 and Shiraz 38; 294 in Shiraz 1–Shiraz 7, Shiraz 8. Also our analysis indicated addition of site 321 in Shiraz 3, site176 in Shiraz 4, and site 201 in Shiraz 43. Brisbane had only 4 sites (124, 220, 392, and 499). The outcomes of glycosylation site prediction for all protein sequences by using NetNGlyc and GlycoEP are displayed in Table 6. NetNGlyc results showed 4 conserved glycosylation sites (28, 40, 304, and 557) for all isolates except for Brisbane, that glycosylation sites located on 28, 40, 71, 142, 176, 303, and 556. Similar glycosylation sites’ prediction (27, 28, 293, and 498) was shown by GlycoEP in all isolates but not in Brisbane. Comparison all sequences with Calif and Calif X-157 showed loss of sites 71 and 176 in Shiraz 1–Shiraz 8, Shiraz 38, and Shiraz 43 and deletion of site 304 in Shiraz 38 and Shiraz 43 and addition of 40 and 557 in Shiraz 8. Brisbane had some similar sites with Calif and Calif X-157 (27, 28, 71, 176, and 498) and one different site (497).

123

?, ?, ?, ?, ?, ?, ?, 101, 106, 126

115, 123, 126, 127, 208, 214, 227, 500, 506

95, 99, 106, 123, 126, 127, 209, 215, 220, 287, 501, 507

95, 99, 106, 123, 126, 127, 209, 215, 287, 501, 507

Shiraz 43

Calif

Calif X-157

?, ?, ?, ?, ?, ?, ?, 126

Shiraz 38

Brisbane

?, ?, ?, ?, ?, ?, ?

?, ?, ?, ?, ?, ?, ?

Shiraz 7

Shiraz 8

?, ?, ?, ?, ?, ?, ?

?, ?, ?, ?, ?, ?, ?

Shiraz 5

Shiraz 6

?, ?, ?, ?, ?, ?, ?

?, ?, ?, ?, ?, ?, ?

?, ?, ?, ?, ?, ?, ?

Shiraz 2

Shiraz 3

123, 127, 209, 215, 287, 501, 507

Shiraz 1

Shiraz 4

Position of phosphorylation sites

Isolates

Ser: 14 Thr: 8 Tyr: 11

Ser: 14 Thr: 8 Tyr: 11

Ser: 17 Thr: 4 Tyr: 11

Ser: 16 Thr: 7 Tyr: 11

Ser: 17 Thr: 8 Tyr: 10

Ser: 14 Thr: 9 Tyr: 11

Ser: 14 Thr: 9 Tyr: 11

Ser: 14 Thr: 9 Tyr: 11

Ser: 14 Thr: 9 Tyr: 11

Ser: 13 Thr: 9 Tyr: 11

Ser: 13 Thr: 8 Tyr: 11

Ser: 14 Thr: 8 Tyr: 11

Ser: 14 Thr: 8 Tyr: 11

Numbers phosphorylation sites

Table 5 Results of position of phosphorylation sites, number of phosphorylation sites, and Kinase phosphorilation sites

124, 220, 221, 224, 294, 326, 393, 500, 524

124, 220, 221, 224, 294, 326, 393, 500, 524

124, 220, 392, 499

124, 201, 220, 224, 294, 326, 393, 500, 524

124, 220, 224, 294, 326, 393, 500, 524

124, 220, 224, 326, 393, 500, 524

124, 220, 224, 294, 326, 393, 500, 524

124, 220, 221, 224, 294, 326, 393, 500, 524

124, 220, 224, 294, 326, 393, 500, 524

124, 176, 220, 224, 294, 326, 393, 500, 524

124, 220, 224, 321, 326, 393, 500, 524

124, 220, 224, 326, 393, 500, 524

124, 220, 224, 326, 393, 500, 524

Kinase phosphorilation sites

In Silico Functional and Structural Characterization of…

123

A. Moattari et al. Table 6 Glycosylation sites of all 13 isolates by two softwares ‘‘NetNGlyc’’ and ‘‘GlycoEP’’ Isolates

NetNGlyc

GlycoEP

Shiraz 1

28, 40, 304, 557

27, 28, 293, 304, 498

Shiraz 2

?, ?, ?, ?

?, ?, ?, ?, ?

Shiraz 3

?, ?, ?, ?

?, ?, ?, ?, ?

Shiraz 4

?, ?, ?, ?

?, ?, ?, ?, ?

Shiraz 5

?, ?, ?, ?

?, ?, ?, ?, ?

Shiraz 6

?, ?, ?, ?

?, ?, ?, ?, ?

Shiraz 7

?, ?, ?, ?

?, ?, ?, ?, ?

Shiraz 8

?, ?, ?, ?

27, 28, 40, 293, 304, 498, 557

Shiraz 38

?, ?, ?, ?

27, 28, 293, 498

Shiraz 43

?, ?, ?, ?

27, 28, 293, 498

Brisbane

28, 40, 71, 142, 176, 303, 556

27, 28, 71, 176, 303, 497

Calif

?, ?, ?, ?

27, 28, 71, 176, 293, 304, 498

Calif X-157

?, ?, ?, ?

27, 28, 71, 176, 293, 304, 498

Fig. 2 Secondary structures of all sequences predicted by ‘‘SOMPA’’ and validated. Blue helix, red strand, purple coil and green beta turn. (Color figure online)

123

In Silico Functional and Structural Characterization of…

3.6 Secondary Structure Prediction Percentages of secondary structure constituents generated by SOPMA and other softwares, and schematic display of proteins’ secondary structure are depicted in Fig. 2. 3.7 Tertiary Structures Prediction All 3D structures were determined by I-TASSER, Phyre2server and (PS)2, and the predicted structures were validated using Qmean and Rammpage. Rammpage identified 3D structure by 3 regions including favored region, allowed region and outlying region. The analysis of the results showed that the predicted 3D structures by Phyre2server were more reliable. Means of favored and allowed regions for Phyre2server were 94.24 and 4.26 %; which was 89.07 and 7.11 % for I-TASSER and 92.13 and 5.1 % for (PS)2, indicating the Phyre2server as a more credible bioinformatic software to predict the tertiary structure of hemagglutinin. Qmean results included two main scores, QMEAN score and Z-score, showing the quality and reliability of tertiary structures. Means of QMEAN score and Z-score for Phyre2server were 0.624 and 1.7; for I-TASSER as 0.49, -3.05 and for (PS) 2 as 0.48, -3.1. The current results confirm better prediction of quality and reliability structure by Phyre2server. The results of Qmean and Rammpage analyses are shown in Table 7 and finally predicted 3D structure for each sequence is displayed in Fig. 3. The positions of phosphorylation and glycosylation sites of 2010, 2013 and vaccine isolates are shown in Figs. 4 and 5, respectively.

4 Discussion Bioinformatic tools are beneficial and useful methods used for analysis and prediction of biological phenomena. Several bioinformatic tools have been developed in recent years but validation tests are necessary to perform for all of them. This research confirmed the validation of each tool, before they are used in analytical studies. The current study is a comparative analysis of some viral sequences derived from patients between 2010 and 2013 in virology department of Shiraz University of Medical Sciences and those of 3 vaccine isolates as control. The results showed some amino acid changes in 13 sequences of HA related to alignment tree. Also, the study of amino acids revealed similar changes in Shiraz 1, Shiraz 8 in 105 and 102 positions. The study of Shiraz 38, and Shiraz 43 detected changes in 9 amino acid residues including 391, 155, 202, 516, 320, 468, 226, 461, and 305. The changes in amino acids could be related to diversity in modification sites, epitopes, function and structure of HA. The comparison between amino acid changes and properties of HA indicated widespread useful data supporting HA functional and structural prediction of isolates derived from the patients (Das et al. 2010; Strengell et al. 2011; Sun et al. 2010, 2013).

123

123

94.4, 4.2

94.2, 4.4

89.3, 6.6

Shiraz 7

94.2, 4.4

93.8, 4.4

94.4, 4.2

89.3, 6.3

88.2, 7.3

89, 7.2

89.1, 6.2

Shiraz 43

Brisbane

Calif

Calif X-157

94.4, 4.2

94.4, 4.0

94.2, 4.2

88.1, 7.8

89.9, 7.4

Shiraz 8

Shiraz 38

94.4, 4.2

88.7, 7.8

89.2, 7.5

Shiraz 5

94.2, 4.4

94.0, 4.4

94.2, 4.2

94.4, 4.2

Phyre2 (%)

Shiraz 6

89.5, 7.4

89.5, 6.6

Shiraz 3

Shiraz 4

89.5, 6.6

88.7, 7.8

Shiraz 1

I-TASSER (%)

Rammpage analysis

Shiraz 2

Isolates

91.5, 5.5

91.5, 5.5

93.1, 4.6

91.1, 6.0

91.1, 6.0

91.1, 6.0

93.1, 4.1

93.1, 4.1

91.8, 5.5

91.8, 5.5

93.1, 3.9

92.2, 5.1

93.3, 4.6

(PS) (%)

2

0.499

0.512

0.49

0.489

0.511

0.464

0.47

0.5

0.484

0.482

0.501

0.484

0.479

I-TASSER

QMEAN score

0.63

0.634

0.602

0.631

0.623

0.625

0.625

0.634

0.616

0.617

0.622

0.631

0.63

Phyre2

0.454

0.447

0.562

0.498

0.481

0.503

0.462

0.478

0.459

0.509

0.482

0.476

0.475

(PS)

2

-2.79

-3.19

-2.84

-3.13

-3.16

-2.89

-3.39

-2.83

-3.23

-3.16

-2.89

-3.09

-3.1

I-TASSER

Z-score

-1.62

-1.58

-1.97

-1.62

-1.72

-1.68

-1.69

-1.58

-1.8

-1.79

-1.73

-1.62

-1.63

Phyre2

-3.43

-3.5

-2.47

-2.92

-3.12

-2.87

-3.34

-3.15

-3.36

-2.8

-3.1

-3.17

-3.19

(PS)2

Table 7 Validation of proteins 3D structures, Rammpage analysis (% of residues in favoured region, % of residues in allowed region), QMEAN score (global score of the whole model reflecting the predicted model reliability ranging from 0 to 1) and Z-score is a masseur for the absolute quality of a model

A. Moattari et al.

In Silico Functional and Structural Characterization of…

Fig. 3 3D structure of proteins, 1 H1, 2 H2, 3 H3, 4 H4, 5 H5, 6 H6, 7 H7, 8 H8, 9 H38, 10 H43, 11 Brisbane, 12 California, and 13 California X-157

Fig. 4 Position of phosphorylation sites. a 2010 isolates, b 2013 isolates, c vaccine isolates

123

A. Moattari et al.

Fig. 5 Position of glycosylation sites. a 2010 isolates, b 2013 isolates, c vaccine isolates

The primary analysis of the properties of HA sequences did not show any relationship to amino acid changes and protein properties. Such data will be beneficial to future analyses like cloning, expression, and purification of HA. The comparative study between B cells epitope regions by immuneepitope and amino acid changes showed that 141–145 and 501 were conserved regions; the lack of 173–175 in Shiraz 4 was related to tyrosine to phenylalanine change in amino acid 175. Also, amino acid change in amino acid 104 was related to lack of 104–108 epitope region in patient isolates except Shiraz 38, with no change in position 104; the proline changing to serine in amino acid 100 was related to lack of 100–102 region in all patient isolates. 108–110 and 102–103 regions in Shiraz 43 isolate was not amenable to logical interpretation. Bepipred showed 6 conserved regions in all patient isolates including 28–32, 138–148, 200–204, 238–239, 371–377, and 499–507. Tyrosine to phenylalanine in 175 and proline to serine in 176 positions were related to lack of 174–176 epitope region. Lack of 100–107 region in Shiraz 1 and Shiraz 8 isolates was related to changing of asparagine to lysine in 104, serine to threonine in 102, and glycine to tryptophan in 105 but changing of glycine to tryptophan was more important. Shiraz 1, Shiraz 3 and Shiraz 8 did not contain 287–292 region, because aspartic acid changed to asparagine in 291. BcePred detected many conserved regions in B cell epitopes but lack of 413–429 and 320–324 in Shiraz 1 and Shiraz 2 was not related to amino acid changes. On the other hand, 279, 300, 357, 351 are the start codons of conserved 16 meric regions in all patient isolates. Lack of 338–354 region in Shiraz 5 isolates was related to isoleucine change to valine in 341. Changing of glutamic acid to lysine in 391 was responsible for lack of 383–399 region in Shiraz 4, Shiraz 5, Shiraz 38 and Shiraz 43. Phosphorylation is a major and important phase of HA post-translational modifications and viral protein phosphorylation plays important roles in the

123

In Silico Functional and Structural Characterization of…

influenza virus life cycle (Hutchinson et al. 2012; Wang et al. 2013). DISPHOS and NetPhos are prevalent and helpful tools based on serine, threonine and tyrosine phosphorylation sites in proteins. Study of properties on proteins Complexity, hydrophobicity, and charge seem to exist in multiple regions showed protein regions in and around the phosphorylation sites were an important prerequisite for phosphorylation. Two dimensional analyses of conserved phosphorylation sites (123, 127, 209, 215, 287, 501, 507) showed that 123, 209, 215 and 507 were on the helix and 127, 287 and 501 were on coil structure. The number of phosphorylation sites did not show a significant difference between 2010, 2013 and vaccine isolates, but there was a limited increase in serine sites. Predictions of kinase specific eukaryotic protein phosphoylation sites by NetPhosK 1.0 Server’’ with 0.7 threshold revealed all phosphoylation sites with the highest score corresponding to the Protein kinase C (PKC) phosphorylation sites. Some related studies have shown the important roles played by PKC in infection and release from human cells (Root et al. 2000; Sieczkarski et al. 2003). The analysis of data did not show any major changes in PKC phosphorylation sites except for a new phosphorylation site in Shiraz 43 compared to vaccine isolates. Threonine change to serine amino acid was related to lack of 321 and 176 PKC phosphorylation sites in Shiraz 3 and Shiraz 4, respectively. HA is considered as a surface glycoprotein of influenza virus and glycosylation has been shown to have important roles in many functions of HA molecules (Das et al. 2010; Mir-Shekari et al. 1997; Sun et al. 2013). Oligosaccharides can attach to the asparagine (Asn) side chain in N-X-(S/T) Sequon, where X represents any residue other than proline in glycosylation modification cotranslationally or posttranslationally. Many types of glycans have been found on HA molecules, including high mannose, complex type, and hybrid type. Regardless of glycan type, structure and composition of glycans depends on the accessibility of glycosylation sequons to host cell saccharide modifying enzymes. In many previous studies, the great function of glycosylation has been found including: (a) protein folding that is necessary to transport to the cell surface, (b) to avoid accumulation in the Golgi complex, (Roberts et al. 1993) (c) receptor binding, (d) escape from immune system by interfering with antibody recognition, (e) the HA cleavage of glycans near the proteolytic activation site of HA modulate, and (f) changes in receptor binding properties (Klenk et al. 2001). Studies on the progressive increase in glycosylation sites since 1918, has shown that glycosylation takes place specifically on the HA globular head region (Sun et al. 2013). In the current study, glycosylation analysis showed two similar sites in 28 and 304 regions, indicated by NetNGlyc, GlycoEP softwares. Studies conducted from 2007 to 2013 showed that regions 27, 28 and 40 are the conserved sites. Interestingly, comparison between 2009, 2010, and 2013 isolates detected a decrease in the number of glycosylation sites without any new site.

123

A. Moattari et al.

The regions 27, 28, 40, 293, 304, and 498 were major locations on the coil secondary structure and all major sites except 498(stalk) were found on globular part of viral protein domain. N-linked glycosylation sites in 304 were absent in Shiraz 38 and Shiraz 43 because threonine changed to proline in 305 region, regarding Asn-X-Ser/ Thrsequons where X is any amino acid except proline. This change was determined by GlycoEP software but the results of NetNGlyc did not show any changes. This indicates a better performance of GlycoEP compared to NetNGlyc. Secondary and tertiary structure analysis did not show any significant differences among patient and vaccine isolates; also the analysis showed that the main mass of HA consisted of coils, helix, strand and turn. Overview of all results confirmed widespread changes in 2013 isolates compared with vaccine and 2010 isolates. Often the change in properties of HA shows diversity in HA protein that could lead to changes in virulence and infection mechanism of influenza virus, a condition reducing the efficiency of vaccine. Similar studies, at different time periods, are necessary to distinguish the the diversity and changes of HA protein as an important and multifunctional protein in influenza virus virulence. Few studies have focused on the relationship between experimental results and in silico analysis for HA proteins. Therefore, the results of this study are useful for better understanding of the HA modification sites, epitope sites, and structural analysis that are important in delineating the mechanism of hemagglutinin action. Screening of hemagglutinin diversity is very important to achieve better understanding of H1N1 antigenic variations, antigenic drift and examination of vaccine efficacy of influenza vaccine. Acknowledgments The authors would like to acknowledge Shiraz University of Medical Sciences for financial support.

References Anwar T, Lal SK, Khan AU (2006) In silico analysis of genes nucleoprotein, neuraminidase and hemagglutinin: a comparative study on different strains of influenza A (Bird Flu) virus sub-type H5N1. In Silico Biol 6:161–168 Benkert P, Tosatto SC, Schomburg D (2008) QMEAN: a comprehensive scoring function for model quality assessment proteins—structure, function. Bioinformatics 71:261–277 Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294:1351–1362 Blom N, Sicheritz-Ponte´n T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4:1633–1649 Boulay F, Doms RW, Webster RG, Helenius A (1988) Posttranslational oligomerization and cooperative acid activation of mixed influenza hemagglutinin trimers. J Cell Biol 106:629–639 Bouvier NM, Lowen AC (2010) Animal models for influenza virus pathogenesis and transmission. Viruses 2:1530–1563 Chauhan JS, Rao A, Raghava GP (2013) In silico platform for prediction of N-, O-and C-glycosites in eukaryotic protein sequences. PLoS ONE 8:e67008 Chen C-C, Hwang J-K, Yang J-M (2006) 2: protein structure prediction server. Nucleic Acids Res 34:W152–W157

123

In Silico Functional and Structural Characterization of… Chou PY, Fasman GD (2006) Prediction of the secondary structure of proteins from their amino acid sequence. In: Advances in enzymology and related areas of molecular biology. Wiley, New York, pp 45–148. doi:10.1002/9780470122921.ch2 Das SR, Puigbo` P, Hensley SE, Hurt DE, Bennink JR, Yewdell JW (2010) Glycosylation focuses sequence variation in the influenza A virus H1 hemagglutinin globular domain. PLoS Pathog 6:e1001211 Doytchinova IA, Flower DR (2007) VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinform 8:4 Emini EA, Hughes JV, Perlow D, Boger J (1985) Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide. J Virol 55:836–839 Gasteiger E, Hoogland C, Gattiker A, Wilkins MR, Appel RD, Bairoch A (2005) Protein identification and analysis tools on the ExPASy server. In: Walker JM (ed) The proteomics protocols handbook. Springer, Berlin, pp 571–607 Geourjon C, Deleage G (1995) SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput Appl Biosci CABIOS 11:681–684 Girard MP, Tam JS, Assossou OM, Kieny MP (2010) The 2009 A (H1N1) influenza virus pandemic: a review. Vaccine 28:4895–4902 Gupta R, Brunak S (2002) Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput 7:310–322 Han T, Marasco WA (2011) Structural basis of influenza virus neutralization. Ann N Y Acad Sci 1217:178–190 Hutchinson EC et al (2012) Mapping the phosphoproteome of influenza A and B viruses by mass spectrometry. PLoS Pathog 8:e1002993 Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32:1037–1049 Isin B, Doruker P, Bahar I (2002) Functional motions of influenza virus hemagglutinin: a structure-based analytical approach. Biophys J 82:569–581 Kapoor S, Dhama K (eds) (2014) Properties of influenza viruses. In: Insight into influenza viruses of animals and humans. Springer, Berlin, pp 7–13 Karplus P, Schulz G (1985) Prediction of chain flexibility in proteins. Naturwissenschaften 72:212–213 Kaur H, Raghava G (2003a) A neural-network based method for prediction of c-turns in proteins from multiple sequence alignment Protein. Science 12:923–929 Kaur H, Raghava GPS (2003b) Prediction of b-turns in proteins from multiple alignment using neural network Protein. Science 12:627–634 Kelley LA, Sternberg MJ (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc 4:363–371 Klenk H-D, Wagner R, Heuer D, Wolff T (2001) Importance of hemagglutinin glycosylation for the biological functions of influenza virus. Virus Res 82:73–75 Larsen JE, Lund O, Nielsen M (2006) Improved method for predicting linear B-cell epitopes. Immunome Res 2:2 Ma W et al (2011) 2009 pandemic H1N1 influenza virus causes disease and upregulation of genes related to inflammatory and immune responses, cell death, and lipid metabolism in pigs. J Virol 85:11626–11637 Mir-Shekari SY, Ashford DA, Harvey DJ, Dwek RA, Schulze IT (1997) The glycosylation of the influenza A virus hemagglutinin by Mammalian cells. A site-specific study. J Biol Chem 272:4027–4036 Mishin VP, Novikov D, Hayden FG, Gubareva LV (2005) Effect of hemagglutinin glycosylation on influenza virus susceptibility to neuraminidase inhibitors. J Virol 79:12416–12424 Parker J, Guo D, Hodges R (1986) New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 25:5425–5432 Roberts PC, Garten W, Klenk H-D (1993) Role of conserved glycosylation sites in maturation and transport of influenza A virus hemagglutinin. J Virol 67:3048–3060 Root CN, Wills EG, McNair LL, Whittaker GR (2000) Entry of influenza viruses into cells is inhibited by a highly specific protein kinase C inhibitor. J Gen Virol 81:2697–2705 Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 5:725–738

123

A. Moattari et al. Saha S, Raghava GPS (2004) BcePred: prediction of continuous b cell epitopes in antigenic sequences using physico-chemical properties. In: Nicosia G, Cutello V, Bentley P, Timmis J (eds) Artificial immune systems, vol 3239. Lecture Notes in Computer Science. Springer, Berlin, pp 197–204. doi:10.1007/978-3-540-30220-9_16 Saha S, Raghava G (2006a) AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res 34:W202–W209 Saha S, Raghava G (2006b) Prediction of continuous B-cell epitopes in an antigen using recurrent neural network Proteins: structure, Function. Bioinformatics 65:40–48 Schrauwen EJ, Fouchier RA (2014) Host adaptation and transmission of influenza A viruses in mammals. Emerg Microbes Infect 3:e9 Sieczkarski SB, Brown HA, Whittaker GR (2003) Role of protein kinase C bII in influenza virus entry via late endosomes. J Virol 77:460–469 Sriwilaijaroen N, Suzuki Y (2012) Molecular basis of the structure and function of H1 hemagglutinin of influenza virus. Proc Jpn Acad Ser B Phys Biol Sci 88:226 Strengell M, Ikonen N, Ziegler T, Julkunen I (2011) Minor changes in the hemagglutinin of influenza A (H1N1) 2009 virus alter its antigenic properties. PLoS ONE 6:e25848 Sun Y et al (2010) In silico characterization of the functional and structural modules of the hemagglutinin protein from the swine-origin influenza virus A (H1N1)-2009. Sci China Life Sci 53:633–642 Sun X et al (2013) N-linked glycosylation of the hemagglutinin protein influences virulence and antigenicity of the 1918 pandemic and seasonal H1N1 influenza a viruses. J Virol 87:8756–8766 Taubenberger JK, Morens DM (2006) 1918 Influenza: the mother of all pandemics. Rev Biomed 17:69–79 Taubenberger JK, Morens DM (2010) Influenza: the once and future pandemic. Public Health Rep 125:16 Tong S, Li Y, Rivailler P, Conrardy C, Castillo DA, Chen LM, Recuenco S, Ellison JA, Davis CT, York IA et al (2012) A distinct lineage of influenza A virus from bats. Proc Natl Acad Sci USA 109:4269–4274 Tong S, Zhu X, Li Y, Shi M, Zhang J, Bourgeois M, Yang H, Chen X, Recuenco S, Gomez J et al (2013) New world bats harbor diverse influenza A viruses. PLoS Pathog 9:e1003657 Wang C-C et al (2009) Glycans on influenza hemagglutinin affect receptor binding and immune response. Proc Natl Acad Sci 106:18137–18142 Wang S, Zhao Z, Bi Y, Sun L, Liu X, Liu W (2013) Tyrosine 132 phosphorylation of influenza A virus M1 protein is crucial for virus replication by controlling the nuclear import of M1. J Virol 87:6182–6191 World Health Organization (2010) Pandemic (H1N1) 2009—update 112.World Health Organization Yang ZR, Thomson R, McNeil P, Esnouf RM (2005) RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 21:3369–3376

123

In Silico Functional and Structural Characterization of H1N1 Influenza A Viruses Hemagglutinin, 2010-2013, Shiraz, Iran.

Hemagglutinin (HA) is a major virulence factor of influenza viruses and plays an important role in viral pathogenesis. Analysis of amino acid changes,...
2MB Sizes 0 Downloads 7 Views