Designing pH induced fold switch in proteins Anupaul Baruah and Parbati Biswas Citation: The Journal of Chemical Physics 142, 185102 (2015); doi: 10.1063/1.4920938 View online: http://dx.doi.org/10.1063/1.4920938 View Table of Contents: http://scitation.aip.org/content/aip/journal/jcp/142/18?ver=pdfcov Published by the AIP Publishing Articles you may be interested in Myths and verities in protein folding theories: From Frank and Evans iceberg-conjecture to explanation of the hydrophobic effect J. Chem. Phys. 139, 165105 (2013); 10.1063/1.4827086 Single-molecule spectroscopy of the unexpected collapse of an unfolded protein at low pH J. Chem. Phys. 139, 121930 (2013); 10.1063/1.4820490 Capturing molten globule state of α-lactalbumin through constant pH molecular dynamics simulations J. Chem. Phys. 138, 095101 (2013); 10.1063/1.4793470 Role of electrostatic interaction on surfactant induced protein unfolding AIP Conf. Proc. 1512, 160 (2013); 10.1063/1.4790960 Subtle pH differences trigger single residue motions for moderating conformations of calmodulin J. Chem. Phys. 135, 155102 (2011); 10.1063/1.3651807

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

THE JOURNAL OF CHEMICAL PHYSICS 142, 185102 (2015)

Designing pH induced fold switch in proteins Anupaul Baruah and Parbati Biswasa) Department of Chemistry, University of Delhi, Delhi-110007, India

(Received 7 January 2015; accepted 29 April 2015; published online 13 May 2015) This work investigates the computational design of a pH induced protein fold switch based on a self-consistent mean-field approach by identifying the ensemble averaged characteristics of sequences that encode a fold switch. The primary challenge to balance the alternative sets of interactions present in both target structures is overcome by simultaneously optimizing two foldability criteria corresponding to two target structures. The change in pH is modeled by altering the residual charge on the amino acids. The energy landscape of the fold switch protein is found to be double funneled. The fold switch sequences stabilize the interactions of the sites with similar relative surface accessibility in both target structures. Fold switch sequences have low sequence complexity and hence lower sequence entropy. The pH induced fold switch is mediated by attractive electrostatic interactions rather than hydrophobic-hydrophobic contacts. This study may provide valuable insights to the design of fold switch proteins. C 2015 AIP Publishing LLC. [http://dx.doi.org/10.1063/1.4920938]

I. INTRODUCTION

The basic principle of protein design relies on finding an optimal sequence that is consistent with either a preassigned target conformation or an ensemble of related conformations with desired functional properties.1–3 However, conventional protein design protocols do not account for the large scale structural transitions from one stable conformation to another, marked by a predominant switch/flip in the secondary structure, repacking the central core with altered surface accessibility of specific amino acid residues. Such proteins act as conformational switches4–7 and enhance their functional diversity through molecular recognition. Through conformational transitions, these proteins can modulate their interactions with other proteins or ligands to perform important biological functions such as cellular signaling and targeted drug delivery.8–10 This ambiguous fold propensity of proteins may have profound implications in understanding protein evolution, mutation related diseases, and functional annotation of protein sequences of unknown structure. Thus, successful design of such proteins may help to achieve novel functionalities used for biological imaging, biosensors,11 and therapeutic agents12 that requires large scale on demand conformational rearrangement. Designing metamorphic proteins with multiple folds poses a significant challenge since the design algorithm must simultaneously optimize the individual target structures and ensure that there is a substantial overlap among them in the sequence space.13 Achieving a relative balance between the different interactions present in the individual target structures may be a daunting problem. Nevertheless, in silico sequence design has been used for novel ligand binding sites,14–17 catalytic sites for a de novo protein, and small molecule dependent switches.18 The size and complexity of natural protein switches impose formidable restraints as templates for protein a)[email protected]

0021-9606/2015/142(18)/185102/11/$30.00

design because of the lack of control over the multiple folded states.13 Thus, the computational design of a regulatable protein conformational switch is a viable option worth exploring. Multistate protein design optimizes sequences, governed by the relative energetic contributions of the multiple target states, which may be stabilized by binding to ligands, small molecules, allosteric modifiers, and other proteins.14,18 Chemokine lymphotactin is a natural fold switch, which populates two distinct folds under physiological conditions.5 A different fold switching phenomenon is exhibited by two proteins, which belong to the Cro family of bacteriophage transcription factors6 with 40% sequence identity and different folds. Typically, the search for alternate conformational states either involves same sequences with multiple backbone templates such as specific native folds or simulating same sequences to obtain different functions like comparing the relative sequence specificity for different ligands. An example of an engineered protein fold switch is known as Sw2, which acquires two distinctly different folds on binding with transition metals.14 The first state represents a 2Cys-2His Zn finger stabilized by Zn ions, while the second state is a trimeric coiled coil protein in absence of Zn. The ligand binding specificity of the antibiotic target dihydrofolate reductase was altered by mutations, which imparted antibiotic resistance while retaining the catalytic function, thereby assisting the drug design process.19 The recently designed light-switchable protein cadherin is a functional fold switch which aims to modulate its Ca2+ binding affinity required for the multimerization of this protein.20 In this article, we propose a self-consistent mean field theory based model to design a pH induced protein fold switch with a 20 letter amino acid alphabet. Two different target structures are selected on the basis of their relative designability. The ensemble average property of fold switch sequences is identified by simultaneously optimizing two foldability criteria for the two respective target structures. The funneled energy landscape of these proteins is characterized

142, 185102-1

© 2015 AIP Publishing LLC

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-2

A. Baruah and P. Biswas

J. Chem. Phys. 142, 185102 (2015)

by two distinct energy minima, which is in sharp contrast to the energy landscape of proteins with a unique native structure. Fold switch sequences balance local interactions by optimizing these interactions for sites with similar relative surface accessibility for both target structures and identify minimally destabilizing amino acid residues for a large difference in the relative surface accessibility. Thus, these sequences stabilize both target structures simultaneously, mediating the fold switch. Attractive electrostatic interactions between the amino acid residues play a pivotal role in the pH induced conformational switch.

II. THEORY

Probabilistic protein design methods based on selfconsistent mean field theory for a single designable target conformation are well documented.21,22 Such methods are extensively used to study folding and misfolding of proteins due to point and pairwise residue mutations.3,23–25 In this paper, a self-consistent field theoretical model is presented to design a protein conformational switch as a function of pH. Sequence design in proteins requires a well-defined foldability criterion that quantifies the sequence-structure compatibility in terms of an optimized energy function. Negative design is a prerequisite for optimizing multiple predetermined conformational states of the protein fold switch. This implies simultaneous stabilization of the target conformations and the destabilization of the corresponding unfolded state ensemble. A commonly used foldability criterion is the stability gap, ∆, which denotes the difference in energy between the folded state and the average energy of the ensemble of unfolded conformations. The energy gap, ∆, may be derived as a first order term of a cumulant expansion approximating the free energy change of folding.22 Therefore, ∆ measures the stability of any sequence in a given conformation. In this manuscript, the stability gap, ∆, is scaled with respect to thermal energy unit, k BT, where k BT = 1. Thus, all energy values in the manuscript are dimensionless quantities. This method aims to design a sequence that folds to a predetermined target conformation at a particular pH, while it selects another target structure at a different value of pH. Thus, the designed sequence requires to be compatible with two different target structures corresponding to two different values of pH. Thus, two different ∆ values may be used to quantify the sequence compatibility with the two respective target structures. The stability gap at two different values of pH, ∆pH7 and ∆pH12, may be defined as ∆pH7 = ( E¯ f )pH7 − ⟨( E¯u )pH7⟩, ∆pH12 = ( E¯ f )pH12 − ⟨( E¯u )pH12⟩,

(1) (2)

where E¯ f is the sequence averaged energy of a given sequence in the folded (target) conformation and ⟨ E¯u ⟩ denotes the average energy of the sequence in the unfolded conformational ¯ represents the average over ensemble. The bar over E ( E) ¯ the sequence ensemble, while the angular brackets (⟨ E⟩) represents the average over the conformational ensemble. The energy of any conformation, c, may be expressed as the sum

of hydrophobic effect and electrostatic and steric interactions, E¯c = E¯c,hydro + E¯c,elec + E¯c,steric,

(3)

where E¯c,hydro, E¯c,elec, and E¯c,steric represent the hydrophobic, electrostatic, and steric energy contributions, respectively. For the folded/target conformation, Eq. (3) denotes E¯ f . This theory scans the entire sequence space with the complete set of 20 letter amino acids to estimate the number of sequences that adopts a specified target conformation. Thus, it identifies a set of foldable sequences in terms of the siteidentity monomer probabilities of finding α j type of residue at the jth site, ω j (α j ). The hydrophobic interaction energy function, E¯c,hydro, may be, thus, expressed as     E¯c,hydro = −ϵ hydro cn(c, j) ω j (α j ) H α j × Havg , (4) αj

j

where cn(c, j), H(α j ), and Havg are the coordination number of the jth site in the cth conformation, normalized hydrophobicity26,27 of the residue type α j , and the mean hydrophobicity per residue of the designed sequence, respectively. ϵ hydro is the energy contribution of a single hydrophobic interaction which is assumed to be of unit magnitude in any appropriate energy scale. The choice of this potential is motivated by the hydrophobic effect, which accounts for the relative stabilization of the hydrophobic-hydrophobic (HH) contacts as compared to the hydrophobic-polar (HP) and polar-polar (PP) contacts.28,29 Similarly, the energetic contribution from the electrostatic interactions, E¯c,elec, may be expressed as     E¯c,elec = ϵ elec cn(c, j) ω j (α j ) Q α j × Qavg , (5) αj

j

where Q(α j ) and Qavg denote the residual charge of the residue type α j and the mean net charge per residue of the designed sequence, respectively. ϵ elec is the energy of a single electrostatic interaction which is assumed to be unity in any measurable energy scale. The steric interaction term, E¯c,steric, in a protein sequence may be defined by   E¯c,steric = ϵ steric ωj αj j

αj

   × − (maxcn−mincn) − smw α j − rcn (c, j) , (6) where smw(α j ) is the molecular weight of the residue type, α j , scaled with respect to the range of the coordination number, while rcn(c, j) is the inverse coordination number of the jth site in the cth conformation, ) ( mw(α j ) − minmw (maxcn − mincn)+mincn, smw(α j ) = maxmw − minmw (7) rcn(c, j) = (maxcn − cn(c, j))+mincn, where mw(α j ) is the molecular weight of the residue type, α j , while maxmw and minmw are the maximum and the minimum molecular weights of the amino acid residues. maxcn and mincn are the maximum and the minimum coordination numbers in the conformational ensemble. E¯c,steric accounts for the steric interactions by assigning a preference of low

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-3

A. Baruah and P. Biswas

J. Chem. Phys. 142, 185102 (2015)

molecular weight residues in the highly coordinated sites and high molecular weight residues in low coordinated sites. ϵ steric is the energy contribution due to steric interaction which is assumed to be of unit magnitude. Mean hydrophobicity (Havg) and mean net charge (Qavg) per residue of a given sequence may be calculated as (8)

1  Qavg = ωi (α i )Q(α i ), n i α

(9)

i

where n is the number of amino acid residues in a sequence. The residual charge, Q(α i ), of each amino acid varies with the pH (see Table I for the residual charges of the charged amino acid residues at pH = 7 and pH = 12) of the medium and hence the mean net charge of the same sequence is different in different pH.30 Two values of the mean net charge, (Qavg)pH7 and (Qavg)pH12, corresponding to two values of pH, pH = 7 and pH = 12, are calculated for specific residues in both protein sequences. The sequence information entropy is maximized to yield the most probable set of the site-identity monomer probabilities subject to specific constraints on the sequence identities and energies. The sequence entropy may be given as  S=− ωi (α i ) ln (ωi (α i )) . (10) αi

The constraints used to specify the local/global features of the sequence and structure are defined by (i) the normalization of the site-identity monomer probabilities, m 

ωi (α i ) = 1,

(11)

α i =1

(ii) (iii) (iv) (v)

where m represents the number of different amino acid residues. In this work, all naturally occurring amino acid residues are considered, i.e., m = 20; the foldability criterion at pH = 7, ∆pH7, as given in Eq. (1); the foldability criterion at pH = 12, ∆pH12, as given in Eq. (2); the mean net charge per residue of the sequence at pH = 7, (Qavg)pH7; the mean net charge per residue of the sequence at pH = 12, (Qavg)pH12;

TABLE I. Residual charges at pH = 7 and 12. Residue TYR HIS GLU ASP LYS ARG

− (λ fold f fold)pH12 − (λ charge f charge)pH7 − (λ charge f charge)pH12 − λ hydro f hydro,

i

i

Using the Lagrange undetermined multiplier method, a variational functional of the entropy subject to specified constraints may be expressed as V = S − λ norm f norm − (λ fold f fold)pH7

1  ωi (α i )H(α i ), n i α

Havg =

(vi) the mean hydrophobicity of the sequence, Havg, as given in Eq. (8).

Charge at pH = 7

Charge at pH = 12

−0.001 0.240 −0.997 −0.999 0.999 0.999

−0.992 0.0 −1.0 −1.0 0.022 0.760

(12)

where f norm represents the normalization constraint as expressed in Eq. (11) and λ norm is the corresponding Lagrange multiplier. ( f fold)pH7 and ( f fold)pH12 are constraints of the foldability criteria, ∆, at pH = 7 and pH = 12 as given in Eqs. (1) and (2), respectively. (λ fold)pH7 and (λ fold)pH12 are the corresponding Lagrange multipliers. ( f charge)pH7 and ( f charge)pH12 represent the constraints for the mean net charge of the designed sequence at pH = 7 and pH = 12, respectively, as given in Eq. (9). The Lagrange multipliers for these two constraints are (λ charge)pH7 and (λ charge)pH12. f hydro and λ hydro are the constraints for the average hydrophobicity of the designed sequence and the corresponding Lagrange multiplier, respectively. This functional is maximized with respect to the siteidentity monomer probabilities to obtain a set of transcendental equations, which is then solved numerically to obtain the respective values of the site-identity monomer probabilities. The general forms of the set of transcendental equations are provided in the supplementary material.36 The crystal structure of the 35-mer fast folding real protein with PDB ID 1WY3 (X-ray resolution = 0.95 Å, Rvalue = 0.145) is selected from RCSB PDB.31 The reasons for the selection of 1WY3 protein are (i) availability of a high resolution X-ray crystal structure, (ii) the protein is autonomously fast folding ensuring high stability of the native fold with minimal topological frustration, and (iii) the small size of the protein makes it computationally easily tractable. Metropolis Monte Carlo (MC) simulation of the coarsegrained Cα chain backbone of 1WY3 with a 6-12 LennardJones potential is used to scan the conformational space of the protein. The Ci α-Ci+1α and the Ci α-Ci+2α pseudobond lengths are restricted to 3.8 ± 0.15 Å32 and 6.0 ± 1.5 Å,33 respectively. Conformations are generated by rotational and translational moves of the Cα atoms. A schematic diagram showing the moves of the Monte Carlo simulation is presented in Fig. S18 of the supplementary material.36 The thermal energy k BT is assumed to be equal to unity. A set of 151 271 conformations is generated. A set of 108 sequences is randomly sampled using the 20 letter amino acid alphabet, and the designability of each individual conformation is determined. For each sampled sequence, the energy is calculated for each of the 151 271 conformations. Thus the global energy minimum conformation is identified for each sequence. The number of sequences for which a specific conformation represents global energy minimum indicates the designability of that conformation. Most designable target structure represents the most stable structure for the highest number of sampled sequences.34 Two most designable non-degenerate structures (t1 and t2) are identified as the target conformations (see

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-4

A. Baruah and P. Biswas

J. Chem. Phys. 142, 185102 (2015)

FIG. 1. The Cα chain backbone structure of target structures (a) t1 and (b) t2.

FIG. 2. The Cα chain backbone structure of target structures (a) 2KDL and (b) 2KDM.

Fig. 1) at two different pH values, t1 at pH = 7 and t2 at pH = 12. Since only 108 sequences out of possible 2035 ∼ 1045 are randomly sampled, therefore to verify the unbiased nature of the sampling, another 9 sets, each comprising of 108 sequences, are randomly sampled. For each of these sets, the designability of all conformations is calculated. The same structures t1 and t2 are found to be the most designable structures for each of these 9 sets, confirming that the set of 108 randomly sampled sequences may be sufficient to measure the designability of the generated conformations. This design procedure describes a method to identify the ensemble average property of the sequences that switch fold between two different target structures at two different pH. For a comparative analysis with respect to the designed fold switch sequences, individual sequence design methods are also performed for the two specific target structures at the respective pH values. Thus three different types of sequence ensemble are designed: (a) Sequences which are simultaneously optimized for t1 at pH = 7 and t2 at pH = 12 as discussed in this section. These represent the fold switch sequences. (b) Sequences which are optimized only for t1 at pH = 7. The most probable composition of all sequences compatible to the target structure, t1, at pH = 7, is obtained by the same design method, but with constraints (i), (ii), (iv), and (vi). (c) Sequences which are optimized only for t2 at pH = 12. The site-specific residue probabilities for all sequences designed for the target structure, t2, are obtained from the constraints (i), (iii), (v), and (vi) only. The last two design methods (b) and (c) are not aimed for designing a fold switch. To benchmark this method, it is applied to a real fold switch protein. The single point mutation induced fold switch between the 56-mer G A (PDB id 2KDL) and G B (PDB id 2KDM) protein conformations is well studied experimentally.35 Hence, this 2KDL-2KDM fold switch is selected for the verification of the theoretical method. The reasons to select this fold switch protein are (i) the 2KDL to 2KDM fold switch primarily involves a dominant change in the secondary structure from an α-helix to a β-sheet, (ii) the experimentally determined structures of both G A and G B are available (see Fig. 2), (iii) the optimal size of the protein makes calculations computationally tractable, and (iv) since a point mutation leads to a conformational switch

of 2KDL-2KDM, the conformations of 2KDL and 2KDM have a high degree of overlap in the sequence space that is consistent with their respective native structures. In this article, the native conformation of 2KDL is considered as the target structure at pH = 7, while the native conformation of 2KDM corresponds to the target structure at pH = 12. The conformational ensemble is generated by performing two different Monte Carlo simulations, one with the native Cα chain backbone structure of 2KDL as the initial template while the other with the 2KDM structure as the initial template. The conformations generated from both Monte Carlo simulations are combined to form a single conformational ensemble. The conformational ensemble comprises of a set of 69 684 conformations. III. RESULTS AND DISCUSSIONS

Three designed sequences with similar foldability (∆pH7 = −40.0, ∆pH12 = −40.0, (Qavg)pH7 = +0.19, (Qavg)pH12 = −0.19, and Havg = 0.31) are selected. Two are specifically designed for the target structures t1 and t2 at pH = 7 and pH = 12, respectively, while the third one is designed for the fold switch. The sequence designed for t1 at pH = 7 exhibits energy minima in t1, while the sequence designed for t2 at pH = 12 records a global energy minima in t2. The designed fold switch sequence chooses the minimum energy target conformation t1 at pH = 7, while it flips to the conformation t2, which corresponds to the energy minimum at pH = 12. The respective energy and stability values of each of the sequences in both the target structures t1 and t2 are shown in Table II. Similarly, for the 2KDL-2KDM fold switch, three sequences (∆pH7 = −40.0, ∆pH12 = −40.0, (Qavg)pH7 = +0.13, (Qavg)pH12 = −0.12, and Havg = 0.59) are selected for analysis. The energy and stability values of all three selected sequences are given in Table III for both target conformations of 2KDL and 2KDM at pH = 7 and 12, respectively. This method identifies the ensemble of foldable sequences only in terms of the site identity monomer probabilities. Thus a designed sequence may be represented only in terms of the average hydrophobicity and average charge per site. The average hydrophobicity and the average charge per site of each of the six selected designed sequences are plotted against the corresponding residue sites in Figs. S1-S12 of the supplementary material.36 In Figs. S19

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-5

A. Baruah and P. Biswas

J. Chem. Phys. 142, 185102 (2015)

TABLE II. Energies and stabilities of the target structures for the designed sequences.

Sequence Designed sequence compatible with t1 Designed sequence compatible with t2 Designed sequence for fold switch at pH = 7 Designed sequence for fold switch at pH = 12

Energy in t1

Stability, ∆, of t1

Energy in t2

Stability, ∆, of t2

−248.9 −214.9 −230.4 −226.6

−40.0 −7.8 −40.0 −37.4

−230.7 −247.0 −227.5 −229.4

−21.7 −40.0 −37.1 −40.0

TABLE III. Energies and stabilities of the target structures (2KDL and 2KDM) for the designed sequences.

Sequence Designed sequence compatible with 2KDL Designed sequence compatible with 2KDM Designed sequence for fold switch at pH = 7 Designed sequence for fold switch at pH = 12

Energy in 2KDL

Stability, ∆, of 2KDL

Energy in 2KDM

Stability, ∆, of 2KDM

−519.4 −451.6 −516.5 −516.0

−40.0 27.3 −40.0 −38.5

−462.6 −518.9 −514.6 −517.5

16.8 −40.0 −38.1 −40.0

and S20 of the supplementary material,36 the site-identity monomer probabilities, ωi (α i ), for each of the six selected designed sequences are plotted as three-dimensional contour against the corresponding site (x-axis) and the residue (yaxis). The contour plots suggest that the probabilities are not uniformly distributed, i.e., at almost every site, some residues are clearly preferred over the other residues. In fact, at some sites, a specific residue may have probability up to ∼0.8, while a random unbiased probability for a residue would be 0.05. This confirms the uniqueness of the designed sequences. A Metropolis Monte Carlo simulation is performed with the generated conformational ensemble of 2KDL-2KDM for the designed fold switch sequence to confirm whether it exhibits a shift in the equilibrium population of 2KDL and 2KDM conformations with the variation in pH. Thousand independent trial runs of the Monte Carlo simulation each of 107 steps is performed at each pH ranging from pH = 2 to 14. The thermal energy k BT is assumed to be equal to unity. Either 2KDL or 2KDM conformation is selected randomly as the initial conformation for these 1000 Monte Carlo trials. In Fig. 3, the frequency of occurrence of 2KDL (black line) and the 2KDM (red dashed line) conformations is denoted by the number of MC steps, which is plotted against the corresponding pH. Conformations with root-mean-square deviation (RMSD) ≤ 0.7 Å with respect to 2KDL/2KDM native structure are assumed to be equivalent to the native structure of 2KDL/2KDM, respectively. The error bars are also plotted for these 1000 trial runs. The plot shows that for pH ≤ 9, the 2KDL conformation is more populated while for pH ≥ 10, the 2KDM conformation is more populated. This plot confirms the shift in the equilibrium population from 2KDL to 2KDM as the pH is varied from low to high. However, it is important to mention here that in this manuscript the dynamics or the kinetics of the fold switch is not considered. It is assumed that the thermodynamic hypothesis of Anfinsen which states that a sequence always folds to the global energy minimum is followed for simple folds like 2KDL and 2KDM. For complex folds like knotted proteins, the kinetics or the

FIG. 3. The population of 2KDL and 2KDM is plotted against the pH for the fold switch sequence with ∆pH7 = −40.0, ∆pH12 = −40.0, (Q avg)pH7 = +0.13, (Q avg)pH12 = −0.12, and Havg = 0.59.

dynamics may play an important role and needs to be taken into account. In Fig. 4, the RMSDs with respect to t1 and t2 for all conformations are plotted in x- and y-axes, respectively, while the energies of all corresponding conformations are plotted in the z-axis. Fig. 4(a) depicts the energy landscape for the sequence designed for t1 at pH = 7. From the color code of the figure, it is clear that the target structure t1 resides at the global minima corresponding to RMSD = 0.0 in the xaxis and RMSD = 4.8 in the y-axis. Similarly in Fig. 4(b), the global energy minima are located at values of RMSD = 4.8 and RMSD = 0.0 in the x- and y-axes, respectively, corresponding to the target structure t2. In Figs. 4(c) and 4(d), two minima are observed corresponding to two target structures t1 and t2. This suggests that the energy landscape of fold switch proteins is significantly different from that of the globular proteins with single native structure. These proteins exhibit multiple minima in their energy landscape depending on the number of

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-6

A. Baruah and P. Biswas

J. Chem. Phys. 142, 185102 (2015)

FIG. 4. The energy landscape contour of (a) sequence designed for single target structure (t1) at pH = 7, (b) sequence designed for single target structure (t2) at pH = 12, (c) sequence designed for fold switch at pH = 7, and (d) sequence designed for fold switch at pH = 12. The RMSD values are in Å.

different native conformations and environmental factors like pH and temperature. A two-dimensional energy landscape plot is provided in Figs. S13-S16 of the supplementary material.36 Corresponding energy landscape contours are also plotted for the selected designed sequences of 2KDL-2KDM system in Fig. 5. Fig. 5(a) shows a single global minima at RMSD with respect to 2KDM = 12.25 and RMSD with respect to 2KDL = 0.0 implying that the sequence adopts the most stable conformation in 2KDL. Similarly, Fig. 5(b) indicates that the conformation of 2KDM is located at the global minima for the designed sequence at pH = 12. However, the sequence designed for the fold switch exhibits double funnel-like energy landscape at both pH = 7 and 12 as shown in Figs. 5(c) and 5(d), respectively. At pH = 7, the global minima is at 2KDL while at pH = 12, the global minima correspond to 2KDM conformation. Although not much is known about the general energy landscape of the fold switch sequences, earlier experimental works on mutation induced G A-G B fold switch have shown that the mutations of the fold switch sequence marginally destabilize one conformation while stabilizing the alternate fold.35,39 This may suggest that as the fold switch sequence is approached, the energy minimum corresponding to one conformation becomes less prominent, while another new minimum may appear leading to a double funneled energy landscape. Therefore, we propose that the general energy

landscape of fold switch sequences may be represented by a double funnel. The two-dimensional energy landscape plot of this system is shown in Fig. S17 of the supplementary material.36 The sequences designed for fold switch compromise between the alternate sets of interactions, which stabilizes the respective target structures. Thus, the fold switch sequences may show an overall destabilization as compared to the sequences specifically designed for a single target structure. This destabilization is reflected in the different energy scales of Figs. 4(a)–4(d), which is replotted in Figs. S13-S16 of the supplementary material,36 respectively. However, the corresponding destabilization in the 2KDL-2KDM system is marginal and may be seen in the two-dimensional energy landscape (see Fig. S17 of the supplementary material36). To compare the relative energetic destabilization of a sequence specifically designed for t1 with respect to t2, the relative destabilization energy, (Drel)i, t1→ t2, at each site may be calculated as (Drel)i, t1→ t2 = ϵ i, sn, t1 (t2) − ϵ i, sn, t2 (t2) ,

(13)

where ϵ i, sn, t1 (t2) represents the energy of the sequence designed for t1 in t2 at ith site, and ϵ i, sn, t2 (t2) represents the energy of the sequence designed for t2 in t2 at ith site. (Drel)i, t1→ t2 represents the destabilization when a sequence

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-7

A. Baruah and P. Biswas

J. Chem. Phys. 142, 185102 (2015)

FIG. 5. The energy landscape contour of (a) sequence designed for single target structure (2KDL) at pH = 7, (b) sequence designed for single target structure (2KDM) at pH = 12, (c) sequence designed for fold switch at pH = 7, and (d) sequence designed for fold switch at pH = 12. The RMSD values are in Å.

designed for t1 is put in t2 compared to a sequence which is designed to be compatible to t2. Positive values of (Drel)i, t1→ t2 suggests that ϵ i, sn, t1 (t2) is higher than ϵ i, sn, t2 (t2). Therefore, a positive value of (Drel)i, t1→ t2 implies a destabilization of the structure t2 due to the designed sequence which is compatible to t1, while a negative value of (Drel)i, t1→ t2 signifies stabilization. Similarly, the relative destabilization, (Drel)i, t2→ t1, for the designed sequences at t2 with respect to t1 may be calculated as (Drel)i, t2→ t1 = ϵ i, sn, t2 (t1) − ϵ i, sn, t1 (t1) .

implying that the sequence specifically designed for a given target structure t1 is highly destabilized in the other target structure t2 at particular sites. Similarly, the values of (Drel)i, t2→ t1 are high at eight sites (4, 11, 13, 14, 21, 28, 29, and

(14)

The average destabilization energy, (Davg)i, f s , of the fold switch sequence at each site of the two different target structures as compared to the individual sequences which specifically folds to a single target structure at respective values of pH may be given by  1 (Davg)i, f s = ϵ i, f s (t1) − ϵ i, sn, t1 (t1) 2  + ϵ i, f s (t2) − ϵ i, sn, t2 (t2) . (15) Fig. 6 depicts the values of (Drel)i, t1→ t2, (Drel)i, t2→ t1, (Davg)i, f s plotted against the corresponding sites. This plot shows that the sequences designed for a single target structure at a given pH are highly destabilized in the other target structure for a different pH. The values of (Drel)i, t1→ t2 at six sites (6, 16, 17, 19, 20, and 26) exhibit high destabilization,

FIG. 6. The black bars represent the destabilization in energy due to the sequence designed for t1 when placed in t2 at every site. Similarly, red bars indicate the relative destabilization in energy due to the sequence designed for t2 when placed in t1. The blue bars show the average destabilization due to the fold switch sequences in t1 and t2.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-8

A. Baruah and P. Biswas

30), in t1, indicating that the sequence which folds specifically to t2 is not optimized for t1 because of unfavorable interactions at selected sites. However, the sequences designed for the fold switch show stabilization at some sites (i.e., negative values of destabilization). Notably, the fold switch sequences show minimal destabilization at each site. This suggests that the designed fold switch sequences are either stabilizing or minimally destabilizing in both target structures relative to the sequences designed for the single target structures, respectively. The corresponding results for the 2KDL-2KDM system are plotted in Fig. 7. For clarity, the results for the first 28 sites are depicted in Fig. 7(a) and the results for next 28 sites are plotted in Fig. 7(b). This plot confirms the results of Fig. 6, since the plot clearly indicates that for the 2KDL-2KDM system, the fold switch sequence is either stabilizing or minimally destabilizing at each residue site. Designing a fold switch sequence is challenging due to the precise balance of the interactions present in both target structures. Thus, it is important to explore how these fold switch sequences strike a balance between various interactions

J. Chem. Phys. 142, 185102 (2015)

FIG. 8. Average destabilization of the two target structures (t1 and t2) due to compromise in the fold switch sequence is plotted against the difference in the relative surface accessibility (∆rsa) of each site in both target structures.

and find optimized target structures at different pH. The relative surface accessibility of each site in both target structures is measured using DSSP.37 The difference in the relative surface accessibility (∆rsa) of each site between the two target structures is also calculated. The residue sites in the protein sequence are binned (bin size = 0.07) according to ∆rsa values. The average energetic destabilization for each bin is calculated for the fold switch sequences. The average destabilization in each bin is plotted against the value of ∆rsa in Fig. 8. The plot implies that the average destabilization of the fold switch sequence increases with the increase in the values of ∆rsa. For the sites with low ∆rsa values, the fold switch sequences show an enhanced stability compared to the non-fold switch sequences designed for the respective single target structures. This may be due to the fact that sites with similar relative surface accessibility prefer amino acid residues with similar charge, hydrophobicity, and packing constraints, which stabilizes

FIG. 7. The relative destabilization is plotted for (a) 1st to 28th site and (b) 29th to 56th site. The black bars represents the destabilization in energy due to the sequence designed for 2KDL when placed in 2KDM at every site. Similarly, red bars indicate the relative destabilization in energy due to the sequence designed for 2KDM when placed in 2KDL. The blue bars show the average destabilization due to the fold switch sequences in 2KDL and 2KDM.

FIG. 9. Average destabilization of the two target structures (2KDL and 2KDM) due to compromise in the fold switch sequence is plotted against the difference in the relative surface accessibility (∆rsa) of each site in both target structures.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-9

A. Baruah and P. Biswas

the interactions in both target structures. The fold switch sequences enhance specific residue preferences at these sites, thus imparting stability to both target structures and facilitating the conformational switch. At sites with large ∆rsa values, the residue preferences of two different target structures may be completely different to find a compromise between the two and identify residues that stabilize both target conformations. Thus, fold switch sequences compromise local stabilizing interactions at specific sites and selectively prefer those amino acid residues that minimize the destabilization such that both target conformations are minimally frustrated at different pH. However, the relation between average destabilization and ∆rsa for relatively larger and structurally complex proteins may not be linear as depicted in Fig. 9. For 2KDL-2KDM, the average destabilization increases linearly with ∆rsa, reaches maxima, and decreases subsequently. Therefore, for large and structurally complex proteins, the linear relation between average destabilization and ∆rsa may not be valid. The sequence entropy is a measure of the diversity of the sequence ensemble which may be calculated using the Eq. (10) given by information theory. Fig. 10 portrays the probability distribution of the sequence entropy of the fold switch sequence ensemble and the non-fold switch sequence ensemble. The plot clearly shows that the fold switch sequences populate lower sequence entropy region as compared to non-fold switch sequences. Thus, the need to balance the local interactions present in the respective target structures restricts the choice of residues for the fold switch sequence ensemble thereby decreasing the sequence entropy. The fold switch sequence ensemble prefers specific amino acids and is less diverse as compared to the non-fold switch sequences. A similar trend of sequence entropy may be observed for the 2KDL-2KDM fold switch (plotted in Fig. 11), confirming these findings. The role of different types of two-body contacts in pH induced protein fold switch may be important to understand the fold switch phenomenon. Two non-bonded residues with spatial distance ≤ 7.5 Å are assumed to be in contact.38 In Fig. 12(a), the probability distribution of HH contacts for fold switch and non-fold switch sequences is plotted. The HH contact distribution for both types of sequences coincides with each other depicting a similar trend, which implies

J. Chem. Phys. 142, 185102 (2015)

FIG. 10. Probability distribution of the sequence entropy of the fold switch (black line) and non-fold switch sequences (red line).

FIG. 11. Probability distribution of the sequence entropy of the fold switch (black line) and non-fold switch sequences (red line) for 2KDL-2KDM system.

FIG. 12. Probability distributions of (a) HH contacts in the fold switch and non-fold switch sequences, (b) PP contacts in the fold switch and non-fold switch sequences, and (c) electrostatic attractive and repulsive contacts in the fold switch and non-fold switch sequences are plotted.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-10

A. Baruah and P. Biswas

J. Chem. Phys. 142, 185102 (2015)

FIG. 13. Probability distributions for 2KDL-2KDM system of (a) HH contacts in the fold switch and non-fold switch sequences, (b) PP contacts in the fold switch and non-fold switch sequences, and (c) electrostatic attractive and repulsive contacts in the fold switch and non-fold switch sequences are plotted.

that HH contact does not play any important role in the pH induced fold switch. This is an important observation as HH contacts play a pivotal role to stabilize the native structures of globular proteins. Similarly, in Fig. 12(b), the probability distribution of PP contacts clearly indicates the negligible contribution of PP contacts in the pH fold switch. Fig. 12(c) depicts the distribution of attractive and repulsive contacts of the electrostatic interactions in the fold switch and non-fold switch sequence ensembles. The electrostatic contacts in fold switch and non-fold switch sequences exhibit a completely different pattern. The average number of attractive electrostatic contacts in fold switch sequences is much higher as compared to the non-fold switch sequences. Similarly, the repulsive electrostatic contacts are diminished in number in fold switch sequences as compared to their non-fold switch counterparts. This observation indicates that the relative contribution of the electrostatic interactions primarily governs pH induced conformational switching in proteins. Fig. 13 for 2KDL-2KDM system also supports this conclusion.

not be valid for larger and structurally complex proteins. The low sequence entropy of the fold switch sequences indicates that these sequences have restricted choice of the amino acid residues to optimize the interactions between the different target structures at different pH. The pH mediated conformational switch is primarily governed by the attractive electrostatic interactions, while the hydrophobic-hydrophobic contacts do not play any significant role. An insight into the physics of the conformational switch may pave the way to design sequences that undergo large scale conformational restructuring in response to the change in the environment. ACKNOWLEDGMENTS

A. Baruah acknowledges DST, India, for providing financial assistance in the form of SRF (Project No. SB/S1/PC023/ 2013). The authors acknowledge DST, India (Project No. SB/ S1/PC023/2013) and University of Delhi (University Research Grant) for financial assistance. 1B.

IV. CONCLUSIONS

This paper presents a self-consistent field approach for the computational design of a pH induced protein fold switch by scanning the entire sequence space with a complete set of 20 letter amino acids. The pH of the medium is tuned by varying the residual charge on the amino acid residues of the protein sequence. The stability gap is chosen as suitable foldability criteria corresponding to the two different target structures of the fold switch at two values of pH. Results suggest that the energy landscape of the fold switch sequences exhibits two global minima representing the two target structures, respectively, which is markedly different from that of the sequences specifically designed for a unique native structure. Fold switch sequences find a compromise between different interactions present in both target structures by optimizing the interactions of the sites with comparable relative surface accessibility in both target structures. However, for sites with large difference in surface accessibility, the local stabilizing interactions are compromised for minimally destabilizing residues. This linear correlation between change in relative surface accessibility and optimization of interactions may

I. Dahiyat and S. L. Mayo, Science 278, 82 (1997). B. Harbury, J. J. Plecs, B. Tidor, T. Alber, and P. S. Kim, Science 282, 1462 (1998). 3A. Baruah, A. Bhattacherjee, and P. Biswas, Soft Matter 8, 4432 (2012). 4A. G. Murzin, Science 320, 1725 (2008). 5R. L. Tuinstra, F. C. Peterson, S. Kutlesa, E. S. Elgin, M. A. Kron, and B. F. Volkman, Proc. Natl. Acad. Sci. U. S. A. 105, 5057 (2008). 6C. G. Roessler, B. M. Hall, W. J. Anderson, W. M. Ingram, S. A. Roberts, W. R. Montfort, and M. H. Cordes, Proc. Natl. Acad. Sci. U. S. A. 105, 2343 (2008). 7P. N. Bryan and J. Orban, Curr. Opin. Struct. Biol. 23, 314 (2013). 8O. Dagliyan, D. Shirvanyants, A. V. Karginov, F. Ding, L. Fee, S. N. Chandrasekaran, C. M. Freisinger, G. A. Smolen, A. Huttenlocher, K. M. Hahn, and N. V. Dokholyan, Proc. Natl. Acad. Sci. U. S. A. 110, 6800 (2013). 9J. E. Dueber, B. J. Yeh, K. Chak, and W. A. Lim, Science 301, 1904 (2003). 10M. V. Golynskiy, M. S. Koay, J. L. Vinkenborg, and M. Merkx, ChemBioChem 12, 353 (2011). 11A. Vallée-Bélisle and K. W. Plaxco, Curr. Opin. Struct. Biol. 20, 518 (2010). 12B. M. Mills and L. T. Chong, Biophys. J. 100, 756 (2011). 13X. I. Ambroggio and B. Kuhlman, Curr. Opin. Struct. Biol. 16, 525 (2006). 14X. I. Ambroggio and B. Kuhlman, J. Am. Chem. Soc. 128, 1154 (2006). 15C. E. Elling, T. M. Frimurer, L.-O. Gerlach, R. Jorgensen, B. Holst, and T. W. Schwartz, J. Biol. Chem. 281, 17337 (2006). 16J. S. Marvin and H. W. Hellinga, Proc. Natl. Acad. Sci. U. S. A. 98, 4955 (2001). 17M. Dwyer, L. Looger, and H. Hellinga, Proc. Natl. Acad. Sci. U. S. A. 100, 11255 (2003). 2P.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

185102-11 18Q.

A. Baruah and P. Biswas

Lin, C. F. Barbas, and P. G. Schultz, J. Am. Chem. Soc. 125, 612 (2003). 19K. M. Frey, I. Georgiev, B. R. Donald, and A. C. Anderson, Proc. Natl. Acad. Sci. U. S. A. 107, 13707 (2010). 20R. S. Ritterson, K. M. Kuchenbecker, M. Michalik, and T. Kortemme, J. Am. Chem. Soc. 135, 12516 (2013). 21J. Zou and J. G. Saven, J. Chem. Phys. 118, 3843 (2003). 22P. Biswas, J. Zou, and J. G. Saven, J. Chem. Phys. 123, 154908 (2005). 23A. Bhattacherjee and P. Biswas, J. Phys. Chem. B 115, 113 (2010). 24A. Baruah and P. Biswas, RSC Adv. 4, 8031 (2014). 25A. Baruah and P. Biswas, Phys. Chem. Chem. Phys. 16, 13964 (2014). 26J. Kyte and R. F. Doolittle, J. Mol. Biol. 157, 105 (1982). 27V. N. Uversky, J. R. Gillespie, and A. L. Fink, Proteins: Struct., Funct., Bioinf. 41, 415 (2000). 28K. A. Dill, Biochemistry 29, 7133 (1990). 29W. Kauzmann, Adv. Protein Chem. 14, 1 (1959). 30D. S. Moore, Biochem. Educ. 13, 10 (1985).

J. Chem. Phys. 142, 185102 (2015) 31H.

M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, Nucleic Acids Res. 28, 235 (2000). 32A. Liwo, R. Wawak, H. Scheraga, M. Pincus, and S. Rackovsky, Protein Sci. 2, 1697 (1993). 33F. Fogolari, G. Esposito, P. Viglino, and S. Cattarinussi, Biophys. J. 70, 1183 (1996). 34H. Li, R. Helling, C. Tang, and N. Wingreen, Science 273, 666 (1996). 35P. A. Alexander, Y. He, Y. Chen, J. Orban, and P. N. Bryan, Proc. Natl. Acad. Sci. U. S. A. 106, 21149 (2009). 36See supplementary material at http://dx.doi.org/10.1063/1.4920938 for average hydrophobicity and average charge per site for selected sequences and two-dimensional energy landscape. 37W. Kabsch and C. Sander, Biopolymers 22, 2577 (1983). 38L. A. Mirny, A. V. Finkelstein, and E. I. Shakhnovich, Proc. Natl. Acad. Sci. U. S. A. 97, 9978 (2000). 39Y. He, Y. Chen, P. A. Alexander, P. N. Bryan, and J. Orban, Structure 20, 283 (2012).

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 137.149.200.5 On: Tue, 11 Aug 2015 22:47:53

Designing pH induced fold switch in proteins.

This work investigates the computational design of a pH induced protein fold switch based on a self-consistent mean-field approach by identifying the ...
7MB Sizes 0 Downloads 7 Views