Assessing Protein Conformational Sampling and Structural Stability via De Novo Design and Molecular Dynamics Simulations Keila C. Cunha1‡, Victor H. Rusu2‡, Isabelle F. T. Viana3, Ernesto T. A. Marques3,4, Rafael Dhalia4 and Roberto D. Lins1,4,* 1

Department of Fundamental Chemistry, Federal University of Pernambuco, Recife, PE, 50740-

560, Brazil; 2Department of Physical Chemistry, ETH-Hönggerberg, Zurich, CH-8093, Switzerland, 3Center for Vaccine Research, University of Pittsburgh, 9052 BST3, Pittsburgh, PA, 15261, USA; 4Aggeu Magalhães Research Center, Oswaldo Cruz Foundation, Recife, PE, 50670-420, Brazil.

KEYWORDS Protein stability, structure prediction, protein engineering

ABSTRACT Molecular dynamics and de novo techniques, associated to quality parameter sets, have excelled at determining the structure of small proteins with high accuracy. To achieve a

This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process which may lead to differences between this version and the Version of Record. Please cite this article as an ‘Accepted Article’, doi: 10.1002/bip.22626 This article is protected by copyright. All rights reserved.

Biopolymers

Page 2 of 30

detailed description of protein conformations, these methods must critically assess the thermodynamic features of the molecular ensembles. Here, a comparison of the conformational ensemble generated by molecular dynamics and de novo techniques were carried out for six Top7-based proteins carrying gp41 HIV-1 epitopes. The native Top7, a highly stable computationally designed protein, was used as benchmark. Structural stability, flexibility and secondary structure content were assessed. The consistency of the latter was compared to experimental circular dichroism spectra for all proteins. While both methods are capable to identify the stable from unstable chimeric proteins, the sampled conformational space and flexibility differ significantly in both methods. Molecular dynamics simulations seem to better describe secondary structure content and identify regions responsible for conformational instability. The de novo method, as implemented in Rosetta - a prime tool for protein design, overestimates secondary structure content. On the other hand, its empirical energy function is capable to predict the threshold for protein stability.

INTRODUCTION The accurate prediction of protein structures from their amino acid sequences is a challenge aggravated by the large number of variables and possibilities in their conformational space. Understanding how protein folds or what are their native state and conformational features is a central issue in computational biology and chemistry.1,2 Experimental studies about protein structure present well-known intrinsic limitations and difficulties that have been often aided by the use of computational methodologies, such as molecular dynamics (MD) and Monte Carlo (MC). These classical modeling techniques, reasoned by statistical thermodynamics, depend on the quality of the parameters entered in a force field and enough conformational sampling to

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

2

Page 3 of 30

Biopolymers

allow an accurate description of various properties, especially structural and energetic ones.1,3 To be amenable to analytical solution, they required certain computational complexity. Simulation time varies widely, from some nanoseconds to milliseconds, depending on the target phenomenon to be studied. However, the use of these methodologies and the validity of the thermodynamical hypothesis may be hindered as viable choices by system of large dimensions and availability of resources.4,5 An alternative to predict protein structure from its amino acid sequence is de novo design techniques, where experimental-based conformational libraries of peptide fragments are used in conjunction with MC and tailored force field and energy functions.6 Well-known limitations are the use of implicit solvent treatment when important water molecules should be present, the need of improvements in sampling strategies, the treatment of electrostatic contributions and the dependence of structural motifs being present at the Protein DataBank.1,6,7 Therefore, unbiased sampling methods using explicit solvent models are generally considered more robust methodologies. On the other hand, the computational costs of the latter curb new predictions in a high throughput fashion, while de novo design offers an alternate path for a more efficient screening of new targets. The latter has been used for structure prediction and protein engineering to design and tune protein interactions with reliable results for short sequences.1,8-11 It has been widely applied to design better catalysts and proteins of immunological interest such as epitopes and antibodies.11-16 This is achieved by developing or modifying a protein structure to obtain a desired function or to improve its stability.17-20 In a recent study, we have used molecular modeling techniques to design six chimerical proteins based on the epitope grafting strategy.13 The amino acid sequences of three putative conformation-specific epitopes from the ectodomain of the HIV-1 gp41 protein were identified (namely here as R2, R3 and R8) and used to replace specific segments of a highly stable scaffold

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

3

Biopolymers

Page 4 of 30

called Top7 (PDB: 1QYS). In addition to its remarkable stability, ∆∆G unfolding ca. 10-13 kcal mol-1,20,21 the Top7 protein displays structural similarity to the identified epitopes. Top7 is a computationally designed protein by the Rosetta software that had its structure later confirmed by X-ray crystallography and NMR spectroscopy. It is composed by a β-sheet core and two αhelices [which were called helix A (from residue 24 to 41) and helix B (from residue 55 to 72)].20 Previous structural data had shown that the selected HIV-1 epitopes assume α-helical conformation on the native gp41 protein.22,23 Thus these sequences were used to partially or fully replace one of the helical regions on Top7, leading to the generation of six chimeric proteins: R2HA, R2HB, R3HA, R3HB, R8HA and R8HB. Molecular dynamics simulations were used to assess the structural and dynamical stability of the computationally designed proteins, which were further expressed in prokaryotic system and immunologically tested against a cohort of HIV-1 positive and negative patient sera through cytometric bead assay. Three out of six proteins displayed good immunogenic performance in differentiating HIV-1 positive from negative patient sera (R2HA, R3HA and R3HB), while the others were unable to discriminate them (R2HB, R8HA and R8HB). Together, these analyses showed that retention of the original secondary structure elements and overall packing of the scaffold and epitope as close as to their native conformation was directly correlated with the ability to be recognized by their respective antibodies.13 Additionally to structural stability, the charge distribution pattern on the grafted epitope surface was also shown to play a role.24 Despite the high confidence displayed by the molecular dynamics simulations, the structural stability assessment of such small proteins through this approach required from 103 to 104 core hours per chimeric protein, which is typically an additional 1 to 2 orders of magnitude per sampled system when compared to de novo methods. Therefore, the computational demands of molecular dynamics simulations and

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

4

Page 5 of 30

Biopolymers

subsequent analyses prevent the evaluation of a large number of molecules to identify putative proteins to be experimentally synthesized and tested. Even though the two methods use contrasting sampling and scoring strategies, a recent study has presented a novel MD-de novo protocol to speed up conformational sampling by improving the de novo refinement step.25 However, the authors did not evaluate the effect of sampling by both techniques on the generated conformational ensembles. Even though a detailed force field comparison is out of the scope of this work, it is well know that different flavors may offer contrasting properties.26 Subtle aspects related to sampling of the protein conformational space are far from being resolved. To minimize such issue, we compare the predictive accuracy of de novo design techniques with molecular dynamics simulations using four different force fields (GROMOS 53A6,27 GROMOS 54A7,28 AMBER 99sb29 and CHARMM2730) to evaluate the stability of the six previously designed, and well-characterized, HIV-1 Top7-based chimeric proteins. This comparison exposes intrinsic features associated to these techniques used in computational protein design for defining and validating the designed protein structure. In addition to protein stability analyses, secondary structure content predicted by both techniques is also compared to experimental circular dichroism spectra of each chimera.

RESULTS AND DISCUSSION Protein Fluctuations The root mean square deviation (RMSD) to a reference structure (when available) as a function of energy is commonly used to measure protein structure prediction success in de novo design. A higher correlation between lowest energy and lowest RMSD indicates higher confidence in the prediction. In the case of molecular dynamics simulations, sampling of the conformational space

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

5

Biopolymers

Page 6 of 30

tends to be more restricted around the equilibrium geometry, which is often provided by the user. However, it may not always be the case for computationally designed proteins. Therefore, if the initial structure if farther away from the equilibrium geometry, a significant displacement in RMSD is observed while the force field drives the system towards a minimum well. Because immunogenicity of these epitopes has been correlated to the ability of the mutants to maintain the scaffold folding, we have calculated the RMSD for all trajectories as a function of energy (Figure 1). Given the rather experimental low resolution (2.5 Å) for such a small protein, a unique reference structure for all systems was arbitrarily defined as the equilibrated backbone of the native Top7. The molecular dynamics simulations for native Top7 showed that the protein backbone displays RMSD values between 0.1 and 0.2 nm in all force fields. This corresponds to a twist between the helices, as pictorially shown in Figure 1A. The GROMOS 54A7 was the only force field to present two discreet distributions (Figure 1A). Average RMSD for R2HA, RH2B, R3HA and R3HB were below 0.25 nm in the GROMOS 54A7, while larger RMSD and energy values are shown for R2HA by AMBER 99sb and CHARMM27. R8HA and R8HB presented significantly larger displacements in all cases, suggesting that these chimeric proteins are not able to keep the overall folding of the native structure. This finding is consistent with the results from our previous study using the GROMOS 53A6 force field.13 A similar trend is displayed by the de novo method (Figure 1B). Native Top7 and R2 and R3 variants showed a much better correlation between low energy and low RMSD than their R8 counterparts (Figure 1B). In addition, R2 and R3 systems displayed a similar Top7 funnel profile showing two main conformational clusters with the most populated having RMSD values below 0.5 nm. All systems sampled low energy configurations of RMSD values ca. 0.17 nm (same RMSD value sampled by molecular dynamics), indicating that a near-native Top7 conformation

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

6

Page 7 of 30

Biopolymers

has been visited. However, a different pattern is observed for R8HA and R8HB. A lower correlation between low energy and low RMSD is observed, compared to Top7 and its R2 and R3 variants. In both R8 systems, most configurations have RMSD values over 0.75 nm, suggesting a relative lower stability for these systems to adopt a Top7-like configuration (as further discussed). This result is in qualitative agreement with the larger RMSD values from molecular dynamics simulations for both R8 systems.

FIGURE 1

The root mean square fluctuation (RMSF) was calculated for all systems to compare how the generated ensembles by each method describe protein internal dynamics (Figure 2). Qualitatively, both methods show that, independent of the force field used: replacement of R8 in helix A causes a significant increase in flexibility in the replaced region, and the replaced sequence in the other chimeras are not the largest contributing regions for protein flexibility (Figure 2). However, the underlying differences are more pronounced. The overall larger displacement amplitude exhibited by Rosetta might be expected due to a larger sampling of the conformational space. A comparison of both methods reveals that the calculated relative contribution for internal protein dynamics is shown to be dramatically different, even for native Top7 (Figure 2). A comparison of the calculated RMSF with the available experimental thermal factors for 1QYS is prevented by the corresponding high thermal factors; (more than 30% of all thermal factors display values between 80 and to 200). Nevertheless, molecular dynamics simulation has been capable to describe often well protein internal flexibility, when compared to experimental crystallographic thermal factors and nuclear magnetic resonance NOE restraints.3135

R8HA and R8HB display significant higher amplitude in the Rosetta ensemble, suggesting

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

7

Biopolymers

Page 8 of 30

once again a lower stability/increased flexibility of these chimeras (Figure 2B). Differences in RMSF from the different force fields (GROMOS 54A7,28 AMBER 99sb29 and CHARMM2730) used in these studies are less pronounced (Figure 2).

FIGURE 2

Secondary Structure Content Analyses It is well known that different force fields can lead to a different secondary structure content. We have previously used the GROMOS 53A6 parameter set, which is now compared to the GROMOS 54A7, AMBER 99sb and CHARMM27 force fields. Alternatively, de novo design in Rosetta uses pre-computed fragment libraries to assemble protein structures. Analyses of the secondary structure content were performed to evaluate the impact of such approach on the description of this property over the generated ensemble. Force field versions and de novo results are compared to experimental circular dichroism spectroscopy. The average number of residues in α-helix and β-sheet configurations was calculated over the last 50 ns of molecular dynamics simulation (10,000 structures) and the entire de novo generated ensemble (20,000 structures). Except for CHARMM27, the other three force force field versions indicate a significant loss of secondary structure for both R8 systems (Table 2a). The AMBER 99sb showed the greatest loss in helical content and the GROMOS 54A7 was the only force field that showed a significant decrease (ca. 7%) in β-sheet content for R8HB (Table 2b). These results are consistent with the increased RMSD values and flexibility displayed for R8 systems in the molecular dynamics simulations (Figure 1). On the other hand, the similarly observed higher RMSD values and increased flexibility of the R8 systems do not correlate with secondary

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

8

Page 9 of 30

Biopolymers

structure content in Rosetta. All values are within 3% of the predicted content for native Top7. This finding suggests that the pre-computed fragments tend to retain secondary structure elements even after the final refinement step. Therefore, measure of the secondary structure variation in de novo method is likely not a good standard to evaluate protein structural stability.

TABLE 2A and 2B

The calculated secondary structure content by molecular dynamics simulations with the four force fields and de novo are compared to circular dichroism spectra for all proteins (Figure 3) to offer insight into the different predictions. The Far-UV circular dichroism spectra of the computationally devised proteins were registered from 200 to 260 nm, allowing only a qualitative comparison to the theoretically calculated data. Under the tested conditions, the Top7 displays a circular dichroism spectrum characteristic of α/β proteins, as previously described.20 For these proteins, the two negative bands characteristic of α-helix content and the negative band correspondent to β-sheet content are comparable to the native scaffold in magnitude, corroborating with the MD data. Different levels of helical content deviation were observed for the R8 variants, and evident signal reduction on the region correspondent to helical content in the spectrum was observed for R8HA. This data corroborates with the computational predictions, suggesting that the R8 sequence cannot be successfully accommodated in the Top7 helices without resulting in significant distortion of the backbone and consequent unfolding of the scaffold and grafted epitope. Even though the β-sheet content is overall maintained in the computationally devised proteins, deviations were observed for R2HB, R8HA and R8HB proteins.

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

9

Biopolymers

Page 10 of 30

The systems R2HA, R3HA and R3HB did not present significant changes at 208, 222 and 216218 nm. R2HB and R8HB show a slight loss of helical and sheet content, while R8HA displays a dramatic loss of both secondary structure elements. Out of the four force field versions and the de novo method, only the GROMOS 54A7 reproduced the trend. The AMBER 99sb shows a pronounced loss of helical content for R3HA and has predicted an unchanged helical content for R2HB. CHARMM27 predicts a similar secondary structure content for all systems, including the R8 chimeras. The GROMOS 53A6 goes against the experimental result and shows an increase in sheet content and no significant change in helical content for R2HB.

FIGURE 3

Because the GROMOS54A7 best reproduced the experimental CD spectra trend, the remaining comparative analyses will be presented using the data from this force field. To further investigate the variation in secondary content from the molecular dynamics simulations, a time dependence of this property for the simulation is shown in Figure 4. All secondary elements of the native Top7 are well maintained over the entire 100-ns trajectory. Similar profile is observed for R2HA, R3HA and R3HB. Interestingly, the replacement of helix B with the R2 sequence causes instability of the first two β-strands (Figure 4), even though a loss of only 4% in β-sheet content is shown from the average numbers in Table 2a, compared to native Top7 using the same force field. Such increased flexibility suggests that R2HB stability may be compromised since the β-sheet core of Top7 has been identified as responsible for its stability.14,15 In fact, this finding

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

10

Page 11 of 30

Biopolymers

corroborates with the observed inability of this chimeric protein to discriminate between HIV-1 positive from negative patient sera.13

FIGURE 4

The Ramachandran plot for the GROMOS 54A7 force field and de novo ensembles has also been calculated to assess the relative preferences of the two methods (Figure 5). Molecular dynamics and de novo show similar preferences and distribution. A more uniform pattern is observed in the de novo plots for the different chimeras, as expected from the predicted secondary structure content. Despite the overall larger RMSD and RMSF values from the de novo predictions, a more restricted conformational space of secondary content is sampled, compared to molecular dynamics simulations. The latter allows for an increased flexibility of the residues around the equilibrium geometry. As also shown by the average secondary content over the ensemble, secondary elements tend to be preserved, as stated above, likely a consequence of using pre-computed fragment libraries.

FIGURE 5

While intrinsic dynamics and conformational sampling may differ significantly in de novo and molecular dynamics ensembles, the main criterion used in Rosetta for the successful prediction of protein structure and stability resides in its energy function. We have used an equilibrated structure of Top7 by molecular dynamics simulations with a final RMSD ca. 0.17 nm from its crystallographic structure as reference. Out of the 20,000 predictions, the lowest energy structure in each prediction was selected and their backbone RMSD calculated against the reference

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

11

Biopolymers

Page 12 of 30

(Top7) backbone. A number of Top7 mutants have been generated and their stability measured. From these studies, Top7 unfolding free energy has been estimated to be around 10-13 kcal.mol1 20,21

.

In addition, for design calculations, the predicted ∆∆G in Rosetta energy units was shown

to have a correlation coefficient of ca. 70% with experimental ∆∆G in kcal mol-1 and associated standard deviation of ca. 1.8 kcal mol-1, compared with a dataset of over 1,200 mutation values.36 Taking into account these findings, it is safe to assume that chimeras with a ∆∆G over ca. +8 Rosetta energy units, compared to native Top7, should not be thermodynamically viable or should not assume the same folding. The lowest energy of Top7 showed a RMSD value of 0.175 nm from the reference structure, a similar value observed from the molecular dynamics simulation. The lowest energy structures of all chimeric proteins also had similar RMSD values and higher energies, compared to native Top7. The ∆∆G for R2HA, R3HA and R3HB, all successful immunoreactive epitopes, were less than 6 Rosetta energy units. On the other hand, R2HB, R8HA and R8HB, unsuccessful immunoreactive proteins, displayed a ∆∆G over 9 Rosetta energy units (Table 3). This result illustrates another case where the energy function in Rosetta was shown to predict protein stability remarkably well.

TABLE 3

CONCLUSIONS We have compared de novo design, as implemented in the Rosetta 3.5 software, and molecular dynamics simulations to evaluate the structural stability of previously engineered Top7-based chimerical HIV-1 antigens. The success of molecular dynamics in predicting conformational

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

12

Page 13 of 30

Biopolymers

stability is strongly dependent on the force field used. The GROMOS 54A7 parameter set showed a better agreement with the experimental trend observed by CD spectroscopy. Using this force field, both methods were able to differentiate between the stable and unstable proteins, even though sampling of the conformational phase space is largely dependent on the method used and significantly contrasting. In the Rosetta method, overall secondary structure content is nearly unaltered as a function of the seven different sequences, even when both molecular dynamics simulations and CD spectroscopy show a loss of helical content. Such shortcoming is likely a result of using pre-computed fragment libraries, which is amended by its energy function that penalizes the overall structure when unfavorably assembled. The predicted relative energy values in Rosetta for each chimera offered an excellent measure of a mutant being stable or otherwise when compared to the experimentally known unfolding free energy of the native scaffold. In addition, the ensemble distribution for the variants with respect to the native scaffold structure also provides insights into protein stability. A narrower sampling as a function of the RMSD for the stable structures is consistent with the idea of the minimal frustration, where undesired residue interactions would increase surface ruggedness in the protein-folding funnel. Moreover, due to its computational efficiency when compared to molecular dynamics simulations, these results suggest that for small proteins, 20,000 cycles of the ab initio method in Rosetta can be used as a first triage step for an efficient in silico protein engineering workflow. Prediction of intrinsic disordered and highly flexible regions should undergo further scrutiny.

METHODS Molecular Dynamics Simulations

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

13

Biopolymers

Page 14 of 30

Initial coordinates for the native Top7 protein were obtained from the Protein DataBank (PDB ID: 1QYS).20 Using the pmut_scan_parallel program within Rosetta v3.5,37-39 gp-41 HIV-1 epitope sequences replaced residues in the two helices of Top7 as described by Viana et al.,13 and their relative stability assessed. The command line parsed to pmut_scan_parallel program was: mutants_list mutants_file -ex1:level 4 -ex2 -ex3 -ex4 -use_input_sc -ignore_unrecognized_res no_his_his_pairE -mute basic core -s top7_file.pdb -output_mutant_structures -extrachi_cutoff 1 -multi_cool_annealer 10 -DDG_cutoff 1000. The proteins were placed in the center of a cubic box of dimensions 7.0 x 7.0 x 7.0 nm and solvated using SPC water model40 and counter ions added to ensure a neutral system net charge. The systems were initially energy optimized using 10,000 steps of the steepest descent algorithm. All atomistic simulations were performed for 100 ns using the GROMOS 54A7,28 GROMOS 53A6,27 AMBER 99sb29 and CHARMM2730 force fields and periodic boundary conditions in the x, y and z directions using the NpT ensemble. The LINCS method41 was used to constrain bonds involving hydrogen atoms. Integration was carried out by the leapfrog algorithm42 using a 2-fs integration time step. The GROMOS force field short-range electrostatics and van der Waals interactions were computed within the cutoff radii of 1.4 nm. Long-range electrostatic interactions were treated using the reaction field method43 with ε = 61 outside of the 1.4 nm cutoff sphere. The CHARMM27 long-range electrostatics was computed using force-shifting and the Particle Mesh Ewald (PME) method with a cutoff of 12 Å. A switch function was applied to the van der Waals interactions from 10 Å to 12 Å. For the AMBER 99sb force field an 8-Å cutoff was applied to the non-bonded interactions and longrange electrostatic interactions were treated with the PME method. The temperature was kept at 298 K using separate thermostats for solute and solvent via the velocity rescale scheme44 with relaxation time of 1.0 ps. Pressure was maintained at 1 bar by isotropic coordinate scaling using

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

14

Page 15 of 30

Biopolymers

the Berendsen barostat45 with a coupling constant of 1.0 ps and compressibility (κT) of 4.51 x 105

bar-1. The simulations were performed using the GROMACS 4.6.5 simulation package.46

System dimensions for each protein are listed in Table 4.

TABLE 4

De novo Calculations 20,000 3D protein structures models were predicted, out of a total of 200,000 sampled, using the AbnitioRelax program from the Rosetta package version 3.5.37-39 A pre-equilibrated geometry of the Top7 protein was used as native structure to compute the RMSD. The fragments files were computed

using

the

Robetta

webserver

(http://robetta.bakerlab.org).47

For

increased

conformational sampling the cycles were increased by a factor of 10, χ1 and χ2 rotamer angles sampling were increased by ± 1 standard deviation and the fast relax algorithm was used. The gyration radii contribution to the total score was reweighted by a factor of 0.5. The command line parsed to the AbnitioRelax program was: AbinitioRelax -in:file:fasta t.fasta -in:file:native top7.pdb -in:file:frag3 frag3 -in:file:frag9 frag9 -database rosetta_database -nstruct 1 -out:pdb abinitio::increase_cycles

10

-relax::fast

-abinitio:relax

-ex1

-ex2

-no_filters

-

abinitio::rg_reweight 0.5 -output_all. Circular Dichroism Spectroscopy The chimeric proteins were produced in prokaryotic system and purified by affinity chromatography as previously described.13 The proteins were diluted in 300 mM NaCl, 50 mM Tris-HCl, pH 8.0 buffer (Buffer A) to the concentration of 100 µg mL-1 and dialyzed in Slide-ALyzer Dialysis Cassettes, 10K MWCO (Thermo Scientific). The dialysis procedure was

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

15

Biopolymers

Page 16 of 30

performed during 12 hours at 4°C against 5 buffers containing decreasing concentrations of NaCl and Tris-HCl as follows: Buffer B – 225 mM NaCl, 50 mM Tris-HCl, pH 8.0; Buffer C – 150 mM NaCl, 40 mM Tris-HCl, pH 8.0; Buffer D – 75 mM NaCl, 30 mM Tris-HCl, pH 8.0; Buffer E – 45 mM NaCl, 25 mM Tris-HCl, pH 8.0; Buffer F – 30 mM NaCl, 20 mM Tris-HCl, pH 8.0. The dialyzed samples were concentrated using Vivaspin 6, 10K MWCO (Sartorius Stedim Biotech), followed by quantification by spectrophotometry under denaturing conditions (in 6 M guanidine hydrochloride) using the Olis 2520 Clarity Spectrometer and absorbance correction based on the extinction coefficient of each protein.48 Circular dichroism data were collected on an Olis DSM17 Spectrometer using 10 µM of each protein. Far-UV circular dichroism wavelength scans were recorded in a 0.1 cm path length and carried out from 260 to 200 nm with 5 seconds averaging times, 1 nm step size, and 2 nm bandwidth at 25°C. Spectra were corrected for a buffer blank and baseline molar ellipticity at 260 nm. Scan data were smoothed by the Stavistsky-Golay method.49

AUTHOR INFORMATION Corresponding Author *Corresponding author: [email protected] Author Contributions KCC performed all the de novo calculations, analyses and contributed to the writing of the manuscript. VHR carried out all molecular dynamics calculations and analyses and contributed to the writing of the manuscript. IFTV cloned, expressed and purified all proteins, collected the circular dichroism spectra and helped with the writing of the manuscript. ETAM and RD oversaw and advised on the molecular biology and biophysical experiments. RDL devised the

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

16

Page 17 of 30

Biopolymers

concept, oversaw the computer simulations, helped analyzing the results and contributed to the writing and final revision of the manuscript. All authors have given approval to the final version of the manuscript. ‡These authors contributed equally. Notes The authors declare no competing financial interest. ACKNOWLEDGMENT This work was supported by FACEPE, CAPES (BioComputacional 051/2013), CNPq, INCTINAMI and nBioNet. Partial computational resources were provided by the Swiss National Supercomputing Centre (CSCS) and Argonne Leadership Computing Facility, an U.S. Department of Energy scientific user facility located at Argonne National Laboratory.

REFERENCES 1. Kim, D. E.; Blum, B.; Bradley, P.; Baker, D. J Mol Biol 2009, 393, 249-260. 2. Best, R. B.; Hummer, G.; Eaton, W. A. Proceedings of the National Academy of Sciences of the United States of America 2013, 110, 17874-17879. 3. van Gunsteren, W.; Bakowies, D.; Burgi, R.; Chandrasekhar, I.; Christen, M.; Daura, X.; Gee, P.; Glattli, A.; Hansson, T.; Oostenbrink, C.; Peter, C.; Pitera, J.; Schuler, L.; Soares, T.; Yu, H. B. Chimia 2001, 55, 856-860. 4. Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. Annual Review of Biophysics 2008, 37, 289-316. 5. Dror, R. O.; Dirks, R. M.; Grossman, J. P.; Xu, H. F.; Shaw, D. E. Annual Review of Biophysics, Vol 41 2012, 41, 429-452. 6. Kaufmann, K. W.; Lemmon, G. H.; DeLuca, S. L.; Sheehan, J. H.; Meiler, J. Biochemistry 2010, 49, 2987-2998. 7. Baker, D.; Sali, A. Science 2001, 294, 93-96. 8. Tyka, M. D.; Keedy, D. A.; André, I.; DiMaio, F.; Song, Y.; Richardson, D. C.; Richardson, J. S.; Baker, D. J Mol Biol 2011, 405, 607-618. 9. Huang, P. S.; Love, J. J.; Mayo, S. L. Protein Sci 2007, 16, 2770-2774. 10. Koellhoffer, J. F.; Higgins, C. D.; Lai, J. R. FEBS Lett 2014, 588, 298-307. 11. Tagliamonte, M.; Marasco, D.; Ruggiero, A.; De Stradis, A.; Tornesello, M.; Totrov, M.; Buonaguro, F. M.; Buonaguro, L. AIDS Res Hum Retroviruses 2013, 29, A138-A138.

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

17

Biopolymers

Page 18 of 30

12. Wang, C. K.; Gruber, C. W.; Cemazar, M.; Siatskas, C.; Tagore, P.; Payne, N.; Sun, G. Z.; Wang, S. H.; Bernard, C. C.; Craik, D. J. ACS Chem Biol 2014, 9, 156-163. 13. Viana, I. F. T.; Soares, T. A.; Lima, L. F. O.; Marques, E. T. A.; Krieger, M. A.; Dhalia, R.; Lins, R. D. Rsc Advances 2013, 3, 11790-11800. 14. Boschek, C. B.; Apiyo, D. O.; Soares, T. A.; Engelmann, H. E.; Pefaur, N. B.; Straatsma, T. P.; Baird, C. L. Protein Engineering, Design & Selection 2009, 22, 325-332. 15. Soares, T. A.; Boschek, C. B.; Apiyo, D.; Baird, C.; Straatsma, T. P. J Mol Graph Model 2010, 28, 755-765. 16. Rothlisberger, D.; Khersonsky, O.; Wollacott, A. M.; Jiang, L.; DeChancie, J.; Betker, J.; Gallaher, J. L.; Althoff, E. A.; Zanghellini, A.; Dym, O.; Albeck, S.; Houk, K. N.; Tawfik, D. S.; Baker, D. Nature 2008, 453, 190-195. 17. Bolon, D. N.; Voigt, C. A.; Mayo, S. L. Curr Opin Chem Biol 2002, 6, 125-129. 18. Dahiyat, B. I.; Mayo, S. L. Science 1997, 278, 82-87. 19. Dahiyat, B. I.; Sarisky, C. A.; Mayo, S. L. J Mol Biol 1997, 273, 789-796. 20. Kuhlman, B.; Dantas, G.; Ireton, G. C.; Varani, G.; Stoddard, B. L.; Baker, D. Science 2003, 302, 1364-1368. 21. Indu, S.; Kochat, V.; Thakurela, S.; Ramakrishnan, C.; Varadarajan, R. Proteins 2011, 79, 244-260. 22. Weissenhorn, W.; Dessen, A.; Harrison, S. C.; Skehel, J. J.; Wiley, D. C. Nature 1997, 387, 426-430. 23. Shi, W.; Bohon, J.; Han, D. P.; Habte, H.; Qin, Y.; Cho, M. W.; Chance, M. R. J Biol Chem 2010, 285, 24290-24298. 24. Viana, I. T.; Dhalia, R.; Krieger, M.; Marques, E. A.; Lins, R. In Advances in Bioinformatics and Computational Biology; Setubal, J.; Almeida, N., Eds.; Springer International Publishing, 2013, p 94-103. 25. Lindert, S.; Meiler, J.; McCammon, J. A. J Chem Theory Comput 2013, 9, 3843-3847. 26. Piana, S.; Klepeis, J. L.; Shaw, D. E. Curr Opin Struct Biol 2014, 24, 98-105. 27. Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. F. J Comput Chem 2004, 25, 1656-1676. 28. Schmid, N.; Eichenberger, A. P.; Choutko, A.; Riniker, S.; Winger, M.; Mark, A. E.; van Gunsteren, W. F. Eur Biophys J 2011, 40, 843-856. 29. Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. ProteinsStructure Function and Bioinformatics 2006, 65, 712-725. 30. MacKerell, A. D.; Banavali, N.; Foloppe, N. Biopolymers 2001, 56, 257-265. 31. Lins, R. D.; Straatsma, T. P.; Briggs, J. M. Biopolymers 2000, 53, 308-315. 32. Lins, R. D.; Briggs, J. M.; Straatsma, T. P.; Carlson, H. A.; Greenwald, J.; Choe, S.; McCammon, J. A. Biophys J 1999, 76, 2999-3011. 33. Oostenbrink, C.; Soares, T. A.; van der Vegt, N. F.; van Gunsteren, W. F. Eur Biophys J 2005, 34, 273-284. 34. Soares, T. A.; Hunenberger, P. H.; Kastenholz, M. A.; Krautler, V.; Lenz, T.; Lins, R. D.; Oostenbrink, C.; van Gunsteren, W. F. J Comput Chem 2005, 26, 725-737. 35. Soares, T. A.; Daura, X.; Oostenbrink, C.; Smith, L. J.; van Gunsteren, W. F. Jounal of Biomolecular NMR 2004, 30, 407-422. 36. Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Proteins 2011, 79, 830-838. 37. Simons, K. T.; Kooperberg, C.; Huang, E.; Baker, D. J Mol Biol 1997, 268, 209-225.

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

18

Page 19 of 30

Biopolymers

38. Simons, K. T.; Ruczinski, I.; Kooperberg, C.; Fox, B. A.; Bystroff, C.; Baker, D. Proteins: Struct Funct Bioinf 1999, 34, 82-95. 39. Bonneau, R.; Strauss, C. E. M.; Rohl, C. A.; Chivian, D.; Bradley, P.; Malmström, L.; Robertson, T.; Baker, D. J Mol Biol 2002, 322, 65-78. 40. Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Hermans, J. Pullman, B, Ed; Reidel Publishing Company: Dordrecht 1981, 331-342. 41. Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. J Comput Chem 1997, 18, 1463-1472. 42. Hockney, R. W. In Methods in Computational Physics; Alder, B.; Fernbach, S.; Rotenberg, M., Eds.; Academic Press: New York/London, 1970. 43. Tironi, I. G.; Sperb, R.; Smith, P. E.; van Gunsteren, W. F. The Journal of Chemical Physics 1995, 102, 5451-5459. 44. Bussi, G.; Donadio, D.; Parrinello, M. The Journal of chemical physics 2007, 126, 014101-014107. 45. Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Dinola, A.; Haak, J. R. The Journal of chemical physics 1984, 81, 3684-3690. 46. Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. Journal of Chemical Theory and Computation 2008, 4, 435-447. 47. Kim, D. E.; Chivian, D.; Baker, D. Nucleic Acids Res 2004, 32, W526-W531. 48. Gill, S. C.; von Hippel, P. H. Anal Biochem 1989, 182, 319-326. 49. Savitzky, A.; Golay, M. J. E. Anal Chem 1964, 36, 1627-1639.

FIGURE CAPTIONS Figure 1. Root mean square deviation (RMSD) as a function of energy for the native Top7 and its six helical-substituted chimeric proteins by 100 ns molecular dynamics simulations (A, C and D: GROMOS 54A7, AMBER 99sb and CHARMM27, respectively) and (B) by 20,000 cycles of de novo design (module AbInitioRelax in Rosetta). The molecular dynamics simulation of native Top7 showed two main conformations due to a twist motion of one α-helix with respect to the other. Representative structures of each cluster are shown in cartoon model, along with a corresponding arrow to their RMSD cluster. The twist between the α-helices is highlighted by the positions of each helical axis, showed by orange lines. (α-helices are shown in blue, β-sheet in red and unstructured regions in white).

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

19

Biopolymers

Page 20 of 30

Figure 2. Root mean square fluctuations for the native Top7 and its six helical-substituted chimeric proteins by 100 ns molecular dynamics simulations (A, C and D: GROMOS 54A7, AMBER 99sb and CHARMM27, respectively) and (B) by 20,000 cycles of de novo design (B) (module AbInitioRelax in Rosetta). Regions replaced by gp-41 HIV-1 epitope sequences are highlighted in red.

Figure 3. Far-UV circular dichroism spectra of the native Top7 and the six helical-substituted chimeric proteins, collected at 25 °C. The spectra of the Top7-based proteins (Top7-HIV-1 – brown lines) were superimposed to the spectrum of native Top7 (black lines), which secondary structure content was used as reference. Dashed lines mark 208 nm and 222 nm, points in the spectra corresponding to helical content, while the region from 216 to 218 nm corresponding to β-sheet content is highlighted by a pink band. MRE stands for mean residue ellipicity.

Figure 4. Secondary structure map as a function of time for the Top7 scaffold and its six helicalsubstituted chimeras sampled by molecular dynamics simulations using the GROMOS 54A7 force field. The Top7 regions that were replaced by the putative epitope sequences are shown within black squares. Arrows highlight regions of increased flexibility and/or loss of secondary structure content, compared to the native scaffold.

Figure 5. Ramachandran plots for the Top7 native scaffold and its six helical-substituted chimeric proteins from 100 ns molecular dynamics simulations using the GROMOS 54A7 force field (A) and 20,000 cycles of de novo design (B) (module AbInitioRelax in Rosetta).

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

20

Biopolymers

Page 22 of 30

Table 1. Sequence of native Top7 and its helical variants containing putative g41 HIV-1 epitopes.

System Sequence Top 7 DIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRI SITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQL R2HA DIQVQVNIDDNGKNFDYTYTVTTESELQKVLNGIVQQQNNLGAKRVRI SITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQL R2HB DIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRI SITARTKKEAEGIVQQQNNLFAELGYNDINVTFDGDTVTVEGQL R3HA DIQVQVNIDDNGKNFDYTYTVTTESEAIEAQQHLMDYIKKQGAKRVRI SITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQL R3HB DIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRI SITARTKKEAEKFAIEAQQHLAELGYNDINVTFDGDTVTVEGQL R8HA DIQVQVNIDDNGKNFDYTYTVTTRLIEESQNQQEKNEQELLGAKRVRI SITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQL R8HB DIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRI SITARTRLIEESQNQQEKNEQELLGYNDINVTFDGDTVTVEGQL Replaced Top7 residues with gp41 HIV-1 putative antigens are shown in bold.

Table 2. Details of the simulated systems by molecular dynamics

System

Number molecules

of

water Number and type of Total number ions particles

Top7

10914

3 Na+

33689

R2HA

10873

3 Na+

33637

R2HB

10872

5 Na+

33648

R3HA

10875

4 Na+

33654

R3HB

10867

5 Na+

33635

R8HA

10862

5 Na+

33621

R8HB

10863

7 Na+

33634

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

of

Page 23 of 30

Biopolymers

Table 3a. Comparison of the average α-helix content obtained by de novo and molecular dynamics simulations. α-helix (%) G54A7a

G53A6b AMBER 99sba

CHARMM27 a

de novoc

Top7

38

38

34

39

38

R2HA

38

39

34

39

37

R2HB

35

37

33

36

38

R3HA

38

37

26

35

38

R3HB

37

38

33

37

37

R8HA

29

20

23

35

35

R8HB

33

29

21

34

37

a

over the last 50-ns time interval; bover 50-ns simulation using the GROMOS 53A6 parameter set, data taken from44; cusing 20,000 cycles of the program AbinitioRelax within Rosetta 3.5. Variations larger than 5% of the native Top7 are highlighted in bold. (All values have been rounded to the closest integer).

Table 3b. Comparison of the average β-sheet content obtained by de novo and molecular dynamics simulations. β-sheet (%) G54A7a

G53A6b AMBER 99sb a CHARMM27 a

de novoc

Top7

38

37

40

37

37

R2HA

38

40

39

34

35

R2HB

34

42

36

37

35

R3HA

39

38

30

36

35

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

Biopolymers

Page 24 of 30

R3HB

37

40

38

33

35

R8HA

35

38

35

35

34

R8HB

31

38

39

40

35

a

over the last 50-ns simulation using the GROMOS 54A7 parameter set; bover 50-ns simulation using the GROMOS 53A6 parameter set, data taken from 44; cusing 20,000 cycles of the program AbinitioRelax within Rosetta 3.5. (All values have been rounded to the closest integer).

Table 4. Calculated change in free energy (in Rosetta units) for each sequence with respect to Top7 and the associated lowest backbone RMSD to the equilibrated Top7 structure, according to the Rosetta prediction algorithm. System

∆∆G (Rosetta Units)

Top7 Backbone RMSD (nm)

Top7

0

0.175

R2HA

5.701

0.176

R2HB

9.454

0.144

R3HA

5.837

0.160

R3HB

4.266

0.181

R8HA

18.439

0.162

R8HB

19.989

0.175

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

Page 25 of 30

Biopolymers

Root mean square deviation (RMSD) as a function of energy for the native Top7 and its six helicalsubstituted chimeric proteins by 100 ns molecular dynamics simulations (A, C and D: GROMOS 54A7, AMBER 99sb and CHARMM27, respectively) and (B) by 20,000 cycles of de novo design (module AbInitioRelax in Rosetta). The molecular dynamics simulation of native Top7 showed two main conformations due to a twist motion of one α-helix with respect to the other. Representative structures of each cluster are shown in cartoon model, along with a corresponding arrow to their RMSD cluster. The twist between the α-helices is highlighted by the positions of each helical axis, showed by orange lines. (α-helices are shown in blue, β-sheet in red and unstructured regions in white). 361x270mm (300 x 300 DPI)

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

Biopolymers

Root mean square fluctuations for the native Top7 and its six helical-substituted chimeric proteins by 100 ns molecular dynamics simulations (A, C and D: GROMOS 54A7, AMBER 99sb and CHARMM27, respectively) and (B) by 20,000 cycles of de novo design (B) (module AbInitioRelax in Rosetta). Regions replaced by gp41 HIV-1 epitope sequences are highlighted in red. 297x209mm (300 x 300 DPI)

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

Page 26 of 30

Page 27 of 30

Biopolymers

Far-UV circular dichroism spectra of the native Top7 and the six helical-substituted chimeric proteins, collected at 25 °C. The spectra of the Top7-based proteins (Top7-HIV-1 – brown lines) were superimposed to the spectrum of native Top7 (black lines), which secondary structure content was used as reference. Dashed lines mark 208 nm and 222 nm, points in the spectra corresponding to helical content, while the region from 216 to 218 nm corresponding to β-sheet content is highlighted by a pink band. MRE stands for mean residue ellipicity. 233x122mm (300 x 300 DPI)

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

Biopolymers

Secondary structure map as a function of time for the Top7 scaffold and its six helical-substituted chimeras sampled by molecular dynamics simulations using the GROMOS 54A7 force field. The Top7 regions that were replaced by the putative epitope sequences are shown within black squares. Arrows highlight regions of increased flexibility and/or loss of secondary structure content, compared to the native scaffold. 143x215mm (300 x 300 DPI)

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

Page 28 of 30

Page 29 of 30

Biopolymers

Ramachandran plots for the Top7 native scaffold and its six helical-substituted chimeric proteins from 100 ns molecular dynamics simulations using the GROMOS 54A7 force field (A) and 20,000 cycles of de novo design (B) (module AbInitioRelax in Rosetta). 361x270mm (300 x 300 DPI)

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

Biopolymers

40x35mm (300 x 300 DPI)

John Wiley & Sons, Inc. This article is protected by copyright. All rights reserved.

Page 30 of 30

Assessing protein conformational sampling and structural stability via de novo design and molecular dynamics simulations.

Molecular dynamics and de novo techniques, associated to quality parameter sets, have excelled at determining the structure of small proteins with hig...
2MB Sizes 0 Downloads 9 Views