Comparing a simple theoretical model for protein folding with all-atom molecular dynamics simulations Eric R. Henry1, Robert B. Best1, and William A. Eaton1 Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 Contributed by William A. Eaton, September 11, 2013 (sent for review June 25, 2013)

Advances in computing have enabled microsecond all-atom molecular dynamics trajectories of protein folding that can be used to compare with and test critical assumptions of theoretical models. We show that recent simulations by the Shaw group (10, 11, 14, 15) are consistent with a key assumption of an Ising-like theoretical model that native structure grows in only a few regions of the amino acid sequence as folding progresses. The distribution of mechanisms predicted by simulating the master equation of this native-centric model for the benchmark villin subdomain, with only two adjustable thermodynamic parameters and one temperature-dependent kinetic parameter, is remarkably similar to the distribution in the molecular dynamics trajectories.

|

stochastic kinetics funneled energy landscape statistical mechanics

| Ising-like model |

T

heoretical models are essential for understanding the complex self-assembly reaction of protein folding, despite their reliance on apparently strong simplifying assumptions (1–5). In fact, the validated assumptions in a successful model point to the important features in protein folding. However, these assumptions are generally difficult to either justify or falsify by experiment. Fortunately, we can now examine them directly, thanks to recent advances in computing that allow us to study folding kinetics and mechanisms with all-atom molecular dynamics (MD) simulations (6–13). In particular, through the work of the Shaw group (10, 11, 14, 15), many folding and unfolding transitions have been observed in long equilibrium trajectories. Here, and in the accompanying paper by Best et al. (16), we compare the key assumptions of a simple Ising-like model with the results from the all-atom MD simulations of the Shaw group (10, 11, 14, 15). We also compare the distribution of folding mechanisms predicted by the model and the MD simulations. Ideally, to capture the full complexity of the process, a theoretical model for protein folding should enumerate all relevant conformations and specify the transition rates between them. To be useful, the model should also quantify the observables in equilibrium and kinetics experiments, and, like any theoretical model, be testable by new experiments. Among the very large number of proposed models (see bibliography in SI Text), a simple Ising-like model based on the contact map of the folded structure (17, 18), with just two adjustable thermodynamic parameters and a third parameter to set the time scale for kinetics, is of particular interest because it is the only one thus far that enumerates conformations and quantitatively explains so many different kinds of experimental results (19–22). The two key assumptions of the Ising-like model are that amino acid interactions absent in the native structure play an insignificant role in determining the folding mechanism (23–25) and that native structure grows in only a few regions of the amino acid sequence as folding progresses. In the accompanying paper by Best et al. (16), the MD simulations for nine proteins with clear two-state kinetics simulated by the Shaw group (10, 11, 14, 15) are found to be consistent with the first assumption, which is also key to many other theoretical models (26–42) and numerous simulations of simplified representations of proteins (reviewed in refs. 1, 43, 44). In this work, we test the second key assumption of

17880–17885 | PNAS | October 29, 2013 | vol. 110 | no. 44

this model for the 35-residue benchmark protein, the villin headpiece subdomain, as well as for nine proteins simulated by MD (10, 11, 15), which also quantifies the extent to which the emergence of residual structure in the unfolded state initiates the folding transition. To compare in detail the mechanisms predicted by the model and the MD simulations, we go beyond the previous landscape description of the kinetics (21) and simulate the master equation of the model, which allows us to make a direct comparison of the transition path distributions predicted for the villin subdomain by the MD and model trajectories. The transition path is the small fraction of an equilibrium trajectory for a single molecule when folding or unfolding actually happens (Fig. 1), so it is by far the most important segment for establishing mechanism. In a reaction-coordinate picture, the transition path corresponds to the free-energy barrier-crossing process (Fig. 1). To observe a transition path is to watch the complete evolution from the unfolded structure to the folded structure, whereas a transition state structure is just one structure along the folding path, albeit the most important for determining rates. Observing transition paths presents an important and interesting challenge for single-molecule experiments, and will provide the most demanding test of mechanisms predicted by both theoretical models and simulations (45–47). Results and Discussion Brief Summary of Ising-Like Theoretical Model. This model is based

on the α-carbon contact map of the folded structure (17, 18, 21) (Fig. 2B). It is Ising-like in that each residue can adopt two configurations, native (n) and coil (c), as in the model of Zwanzig et al. (48). It differs from the standard Ising model in two important respects: Interactions between nearest neighbors are not Significance Although protein folding has been studied extensively by both theory and simulations, there have been very few attempts to test theoretical models by comparison with properties actually measured in equilibrium and kinetic experiments or with folding mechanisms observed in fully atomistic simulations. Here we compare the folding mechanisms of an Ising-like model that quantitatively explains a wide range of experimental data with fully atomistic simulations from the Shaw group. Not only are the simplifying assumptions of the theoretical model consistent with the simulations, but the folding mechanisms are remarkably similar. One of the key assumptions, that the 3D structure of the folded protein determines folding mechanisms, is strongly supported by the analysis in the accompanying paper by Best et al. (16).

Author contributions: E.R.H., R.B.B., and W.A.E. designed research, performed research, analyzed data, and wrote the paper. The authors declare no conflict of interest. Freely available online through the PNAS open access option. 1

To whom correspondence may be addressed. E-mail: [email protected], robertbe@helix. nih.gov, or [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1317105110/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1317105110

transition from the coil to the native configuration (Δsconf), and the free energy increase of connecting two native segments by a disordered loop (Δgi,loop) (calculation of the Δgi,loop is discussed in SI Text). Unlike the simulations, in which the energy of interaction of atoms is dependent on the type of atom, all residue– residue contact energies in this theoretical model are assumed to be the same. The Δsconf is also assumed to be the same for every residue except for the single proline at position 21, for which Δsconf = 0, because it remains in the same conformation (trans) in the unfolded and folded states in the MD simulations of Piana et al. (11). The master equation for this model is dp/dt = Rp, where the elements of the column vector p are the time-dependent populations of the 92,822 configurations (“conformations”) of the model and the elements of the sparse matrix R are the rate coefficients connecting configurations.

Folded

Rg TS

QU G(Q)

QF

U

Q

F G*

Fig. 1. Transition path on a 2D free energy surface for a two-state protein with the radius of gyration, Rg, and the fraction of native contacts, Q, as order parameters and the corresponding 1D free energy profile, G(Q), with Q as the reaction coordinate (adapted from ref. 47). The free energy surface for a two-state protein has only two deep minima, so that only two populations of molecules are observable, folded and unfolded, in both equilibrium and kinetic ensemble experiments. The vast majority of the time is spent exploring the configurations of these free energy minima with numerous unsuccessful attempts at crossing the free energy barrier between the folded and unfolded states. The continuous curve is the rare successful trajectory that leaves the unfolded state, crosses the position QU on the reaction coordinate, and reaches the other side of the free energy barrier at QF, without recrossing QU. The yellow portion of this trajectory is on a transition path. The white dashed curves are examples of unsuccessful trajectories.

considered because they are assumed to be the same in the folded and unfolded states, whereas interactions between residues separated by more than two residues in the sequence are allowed. Interactions are only allowed, however, if the residues are in contact in the folded structure, so the key ingredient of the model is the residue–residue contact map taken from the experimental structure. Explicit consideration of only native interactions (27) corresponds to the “perfectly funneled” energy landscape of Wolynes and colleagues (1, 23). The second simplifying assumption of the model is that structure grows in an individual molecule in no more than two stretches of contiguous native residues (“native segments”); however, it can be generalized to more. Configurations such as ...cccccnnnnccc... or ...ccccnnccccnnncc... are allowed, but the configuration ...ccnncccnnccnncc... is not allowed. Interactions are allowed between residues within a segment or between residues in two segments connected by a disordered loop. The two-native-segment assumption [also called the double-sequence approximation (17, 18)] markedly reduces the number of configurations to be enumerated for the 35-residue villin subdomain from 235∼1010 to ∼105. The sum of relative probabilities (Boltzmann weights) for each of the 92,822 configurations enumerated in the model is given by the partition function:   Gi exp − kB T i=1   92;822 X qi «contact − ni TΔsconf − Δgi;loop ; exp − = kB T i=1

Z=

92;822 X

[1]

where T is the absolute temperature, kB is the Boltzmann constant, and Gi is the free energy of the ith configuration determined by the number of native contacts (qi), the temperature-independent energy of a contact («contact), the number of native residues (ni), the conformational entropy loss per residue associated with the Henry et al.

Transition Paths for Villin Subdomain from MD Simulations. For the villin subdomain, transition paths for folding were taken as those segments of the Q vs. time trajectory (Fig. 3 A–C) for which Q becomes 0.20 or greater and reaches 0.89 without ever becoming less than 0.20. This procedure identified 25 transition paths in the 398-μs trajectory of Piana et al. (11). Fig. S2 shows the individual transition paths. The total number of transition paths is relatively insensitive to the boundaries used, with lower bounds in the range of 0.07–0.29 and upper bounds within 0.88–0.97 giving the same number of transition paths. MD Simulations Test the Two-Native Segment Assumption. In addition to the native-only assumption, a second major simplifying assumption of the Ising-like model is that folding proceeds along the transition path by developing structure in no more than a few (two in the version considered here) regions of the amino acid sequence. The idea of forming structure in a limited number of regions originates from helix–coil theory, where Schellman proposed it is highly unlikely that nucleation of helices of the length found in proteins would occur more than once in an individual molecule (49). An important question, then, is the following: How many segments of native structure form during the transition paths of the MD simulations? To test in a direct manner the assumption that structure grows during the transition path in no more than two native segments of the sequence, we defined a native segment as a stretch of contiguous native residues longer than a chosen minimum. Each residue was classified as being native or nonnative depending on whether it is within the same region of Ramachandran space as in the native structure (details are provided in SI Text). The number of such native segments was counted at each time point along a transition path for which the time scale was rescaled from 0 to 1 to account for the wide variation in transition path times (see below). Fig. 4 shows histograms of the frequency with which different numbers of native segments occur on transition paths. For each of three criteria for the number of contiguous native residues that constitutes a native segment (greater than three, four, or five residues in length) (Fig. 4), only a small fraction of conformations with more than two native segments is populated on the transition paths. This result supports one of the two major simplifying assumptions of the theoretical model for the villin subdomain. Does this conclusion hold more generally? To answer this, the same analysis was carried out on the MD trajectories for the 12 peptides and proteins simulated by Lindorff-Larsen et al. (10). For 3 of the proteins, which lack a clear folding barrier [e.g., the “one-state” protein BBL of Muñoz and colleagues (50)], transition paths cannot be sensibly defined, so 9 proteins were considered further (Figs. S1 and S2). Transition paths were identified using Q as before, with boundaries as indicated in Fig. S1, with the full set of corresponding histograms shown in Fig. S3. In Fig. 5B, we summarize the results by showing the cumulative fraction of PNAS | October 29, 2013 | vol. 110 | no. 44 | 17881

CHEMISTRY

Transition path

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Unfolded

X-ray structure

contact map

B

3

H27

1

residue number

A

W23

30 20

2 10

1 1

D tryptophan fluorescence

E log (relaxation rate (s-1))

excess Cp (kcal mol-1 K-1)

calorimetry

quantum yield

2

C 1.5 1.0 0.5

G(Q) (kcal/mol)

F

0

.04

.02 300

350 T (K)

G

2

Q 0.5

0

0.5 Q

350

1

10 20 30 residue number

laser T-jump rates

6

5 300

350 T (K)

H

1

0

1

7

T (K)

4

0

300

400

3

1

Q 0.5

0

50 time ( s)

100

0 68

78 time ( s)

Fig. 2. Ising-like theoretical model for the villin subdomain. Points are experimental data, and the continuous curves are calculated from the model. (A) Schematic of X-ray structure of villin subdomain [Protein Data Bank, www.pdb.org (PDB ID code 1YRF)] shows the tryptophan and the protonated histidine responsible for quenching its fluorescence used to monitor folding (22). (B) Residue–residue contact map of the folded structure. Two residues are in contact if the distance α-carbons is less than 0.8 nm. (C) Excess heat capacity, Cp, vs. temperature (data from ref. 60), calculated from the partition function as  2between  d ln Z RT ddT . The two adjustable thermodynamic parameters are «contact = 0.62 kcal/mol and Δsconf = −3.74 entropy units, both independent of Cpex = dT temperature. (D) Quantum yield of tryptophan fluorescence vs. temperature (data from ref. 21), with a single coefficient accounting for the fractional decrease in the fluorescent quantum yield upon folding and its temperature dependence. (E) Determination of the kinetic parameter, α, that sets the time scale of the rate matrix, R. We assume an Arrhenius temperature dependence and fit the temperature-dependent rates against laser T-jump experiments (data points from ref. 21), where the fluorescence relaxation rates were calculated by solving the 92,822 differential equations of the model. (F) Free energy vs. fraction of native contacts, Q, as a reaction coordinate at 354K. Red dashed lines are the boundaries used in extracting transition paths from stochastic kinetic trajectories. (G) One-hundred-microsecond segment of a 15-s trajectory from simulation of the master equation of the Ising-like theoretical model, showing boundaries of transition paths (α = 3.5 × 108 s−1). The pink highlighted region is shown in H. (H) Expanded scale with segments corresponding to transition paths highlighted in yellow. The shallow local minimum at Q ∼ 0.4 is responsible for pauses in the transition paths.

configurations on transition paths that can be accounted for by two or fewer native segments (defining a native segment as more than four contiguous native residues). The analyses for all seven molecules with fewer than 50 residues are consistent with the assumption of structure forming in no more than two native segments during the transition paths. The 56-residue virtual variant of protein G (NuG2; not yet investigated by experiment), the 73-residue α3D (A3D), and the 80-residue λ repressor (lambda) require more than two native segments to describe their transition paths. For these larger proteins, extending the current theoretical model to three native segments with connecting loops would be a substantial improvement. Comparing Transition Paths from Theoretical Model and MD Simulations.

Having shown in this work and the accompanying paper by Best et al. (16) that the MD simulations are consistent with the two major assumptions of the Ising-like model, it is interesting to ask how similar is the distribution of folding mechanisms observed in the transition paths extracted from the MD trajectories to that observed in the stochastic trajectories of the theoretical model. An important property of a transition path is its duration, because its average value can now be measured in single-molecule 17882 | www.pnas.org/cgi/doi/10.1073/pnas.1317105110

fluorescence experiments (45). Fig. 5 A and B shows the transition path time distributions from the MD simulations and the simulation of the master equation of the theoretical model. The average value for the 25 transition paths from the MD simulations is 0.66 μs, whereas the average value for the 2 × 106 transition paths calculated from the theoretical model is 0.34 μs. The viscosity of the TIP3P water model used is approximately threefold less than real water (51), so 0.66 μs is less than what would be obtained with a more accurate water model. In the two helical systems that have been studied without the use of chemical denaturants to counteract the stabilizing effect of the viscogens, the folding time varies approximately as the square root of the solvent viscosity (20, 52). Consequently, the transition path time predicted by the MD simulations with a more accurate water model is expected to be ∼31/2 longer (∼1.1 μs), which is still in reasonably good agreement with the Ising-like model. In the MD simulations, there is one transition path that is trapped in a misfolded state by a long-lived β-hairpin, resulting in a much longer duration of 3.4 μs. Despite this slowdown, the order of folding events observed in this transition path is indistinguishable from Henry et al.

1

Q 0.5

0 0

400

1

Q 0.5

0

56

60 62 time ( s)

64

G(Q) (kcal/mol)

C

58

Comparing MD Simulations and Theoretical Model with Experiments. 0

0.5 Q

1

Fig. 3. Transition paths in MD trajectories for the villin subdomain. (A) Entire Q vs. time trajectory of Piana et al. (11). The pink highlighted region is shown in B. (B) Expanded scale with segments corresponding to transition paths highlighted in yellow. (C) Free energy vs. fraction of native contacts, Q. Horizontal and vertical red-dashed lines define the boundaries of transition paths.

The most interesting, and also the most demanding, test of both the MD and model simulations would be the comparison with experimentally determined transition path distributions. Information about these distributions can only be obtained from single-molecule experiments. Such experiments are, however, currently in their infancy; thus far, only average transition path

that in the other transition paths not affected by this nonnative structure (TP4, Fig. S2). This is consistent with the expectation of a “funneled” energy landscape, where nonnative interactions affect only the dynamics of folding by slowing diffusion along the reaction coordinate but do not affect the mechanism. The simplest view of the mechanism for this protein is the order of helix formation in each of the transition paths (15). We used the criterion introduced by Gin et al. (53) that the order of helix formation is the order in which helices form and remain until the protein is folded (15). A helix was considered to be formed when at least n − 2 contiguous residues in the helix are in their native configuration, where n is the total number of residues in the native helix. The common feature of the mechanism distribution in the transition paths from the theoretical model and MD simulations is that both predict multiple mechanisms (Fig. 5 C and D). Moreover, in the vast majority of both the 25 MD and ∼106 model transition paths, helices 1 and 2 form before helix 3. Once the uncertainty of the MD results due to the finite sampling is taken into account (error bars, Fig. 5C), the overall similarity between the distribution of transition paths obtained from the MD trajectories and model simulations is remarkable. We have also considered how the observed set of mechanisms from the MD compares with sets of transition paths of the same size (25) drawn from the stochastic kinetics simulations. Describing each set by a vector of relative populations (in the 6D space of mechanisms), the closeness of a given set to the average population vector was defined by its Euclidean distance. By this measure, the MD dataset lies closer to the mean than ∼20% of the stochastic datasets, and therefore falls comfortably within the region sampled by the stochastic simulations. A significant difference between the MD and theoretical model trajectories is that native structure forms much earlier in MD transition paths, as indicated by Fig. 5 E and F (Fig. S2). This results mainly from the failure of the theoretical model to Henry et al.

Fraction time

1

villin

0.5

0

1

2

3

4

native segments

1FME 2F4K 2JOF A3D CLN025 GTT NTL9 NuG2 lambda

Fig. 4. Number of native segments in the MD trajectories. (Upper) Fraction of time during transition paths in which villin contains one, two, or three native segments, where each segment contains more than three, four, or five contiguous residues in the native configuration. Whether each residue was native was determined by its backbone ϕ,ψ angles (see SI Text for details). (Lower) Cumulative fraction of configurations on transition paths for 10 peptides and proteins simulated by Lindorff-Larsen et al. (10) with up to n native segments of more than four contiguous residues shown for each protein, for n = 2 (blue) and n = 3 (red). Table S1 provides names of proteins.

PNAS | October 29, 2013 | vol. 110 | no. 44 | 17883

CHEMISTRY

B

200 time ( s)

treat the unfolded state properly, with a minimum at Q = 0 instead of a more realistic higher value. By contrast, in the MD simulations, ∼40% of the helical residues are in their native backbone conformation already in the unfolded state, although this may be too large due to deficiencies in the force field (54). This percentage increases to ∼64%, on average, at the beginning of the transition paths, reflecting, in part, the choice of Q as a reaction coordinate to define transition paths. Another difference is that in the MD simulations, more structure forms in the region of the proline in position 21. This can be seen in plots of native residue probability vs. the Q reaction coordinate in individual trajectories (Fig. S2) and in the average overall transition paths in Fig. 5 E and F. Nevertheless, the good agreement between the MD and model simulations is a surprise, considering that a multidimensional free energy surface for the ultrafast-folding villin subdomain must be relatively flat so that small changes in force fields should result in very different transition path distributions, as was shown by Piana et al. (15) for a mutant of the villin subdomain. The force field used in the MD simulations of the WT villin subdomain is currently considered one of the most accurate, because it can fold both α- and β-proteins with the same set of parameters (13, 54–56).

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

A

Molecular Dynamics Simulations

A

B

2

P(tTP)

0

C

3

-1

[ s ]

0

0 0

D

0.4

fraction

fraction

1 2 transition path time ( s)

0.5

0.4 0.3 0.2 0.1

0.3 0.2 0.1 0.0

123 132 213 231 312 321

order of helix formation

E

1

1 2 3 4 5 transition path time ( s)

0.5

0.0

2

P(tTP)

1

-1

[ s ]

Theoretical Model

F

helix

30

3

20

2

10

1

-200 ns 0

0.5

1

fraction of transition path

0.9 0.7 0.5 0.3 0.1

123 132 213 231 312 321

order of helix formation

helix

30

3

20 10 -200 ns 0

0.5

0.9 0.7

2

0.5

1

0.1

0.3

1

fraction of transition path

Fig. 5. Comparison of transition paths from theoretical model and MD simulations for the villin subdomain. (A) Normalized distribution of transition path times from the 25 transitions of the MD simulations. (B) Normalized distribution of transition path times for 2 × 106 transitions in the master equation simulations. (C) Distribution of transition paths for the MD simulations in terms of the order of helix formation. Independent error bars for each mechanism were computed as ½pi ð1 − pi Þ=N1=2 , by assuming the probability of each mechanism, pi, to be its frequency in the N = 25 trials. (D) Same as in C for the master equation simulations, for which the number is so large that the statistical error is not resolved in the picture. (E) Average probability of residue being in native conformation vs. rescaled time on a transition path for all 25 transition paths of the MD simulations. The preceding 200 ns before the beginning of the transition path are also shown. The first and last residues of each of the three helices are indicated by the horizontal dashed white lines. (F) Same as E for transition paths of the master equation simulations.

times for two proteins have been determined from single-molecule fluorescence experiments: ∼2 μs for an all-beta protein (45) and 12 μs for an all-alpha protein (57). A simple theory by Attila Szabo for diffusive dynamics on a 1D surface predicts that the average transition path time is insensitive to the height of the free energy barrier separating the folded and unfolded states and depends only linearly on the curvature and the landscape diffusion coefficient at the barrier top (45, 58, 59). The insensitivity to the barrier height is suggested in the single-molecule fluorescence experiments (45) and, more recently, also for nucleic acid folding in optical tweezer experiments (46, 47). Thus, it is reassuring that the average transition path times of 1.1 μs and 0.34 μs predicted by the MD, corrected for water viscosity, and theoretical model simulations, respectively, differ by less than an order of magnitude from each other, and from those measured for similar-sized proteins. In the absence of additional information from single-molecule experiments, it is important to point out that both the MD simulations and the Ising-like theoretical model have been successful in quantitatively explaining ensemble equilibrium and kinetic experiments for the villin subdomain (20–22) (Fig. S4). In previous studies, the kinetics were calculated from the theoretical model as diffusion on a 1D free energy surface with either the number of native residues or the fraction of native contacts as a reaction coordinate. Together with the two thermodynamic 17884 | www.pnas.org/cgi/doi/10.1073/pnas.1317105110

parameters, «contact and Δsconf, only one additional adjustable parameter that describes the physics of the observable (and its temperature dependence) was needed to reproduce the data for each kind of experiment, except for the CD (Fig. S4B). Overall, the agreement is quite good, particularly for the excess heat capacity (Fig. 2C and Fig. S4A), which is most important because it is a measure of the accuracy of the model thermodynamic parameters and, with one exception (41), has not yet been calculated by any other theoretical model or simulation for any protein with even qualitative agreement. Another major success of the model is its ability to explain three highly unusual properties of the villin subdomain: the lack of a dependence of the relaxation rate on denaturant concentration (19) (Fig. S4G), the increase in internal viscosity with increasing temperature (20) (Fig. S4H), and the extremely low Φ values that increase with temperature (21) (Fig. S4I). All three are readily explained as arising from a change in the position of the major free energy barrier from close to the denatured state at low denaturant concentrations and low temperatures to close to the native state at high denaturant concentrations and high temperatures (Fig. S4D). The MD simulations also obtain impressive agreement with experimental results (table S1 of ref. 11), albeit for a much smaller set of observables than the theoretical model. The most notable exception is in the enthalpy change and heat capacity difference between the folded and unfolded states, which are much smaller than experimentally determined (60) and represent a generic problem for current force fields (13). Because almost all the temperature dependence in relaxation rates is contained in the unfolding rates, this deficiency in the force field is most evident in these rates (e.g., ref. 22). We should also mention that a deficiency in the theoretical model is its unrealistic treatment of the completely denatured state, in which there are no native contacts with Q as a reaction coordinate, and its inability to calculate experimental properties of the denatured state, such as the radius of gyration (61). Concluding Remarks. By analyzing the MD simulation trajectories in a unique way and by comparing them with the trajectories of a simple theoretical model, calculated by simulating the underlying master equation, we have been able to connect theory, experiment, and simulations of protein folding more closely. The similarity of the distribution of folding mechanisms predicted by the theoretical model, which is capable of quantitatively explaining a wide variety of experimental results, and of the distribution predicted by the MD simulations supports both the model and the simulations. We have also gained additional insight into how proteins fold. First, as postulated by the model and observed in the MD simulations, native structure develops in just a small number of regions of the polypeptide sequence. In the case of the villin subdomain, the native segments on transition paths correspond to helices already transiently populated in the denatured state (Fig. 5E and Fig. S2), highlighting the importance of residual elements of native structure in the denatured state as initiation points for folding. Note, however, that transient formation of these structural elements is not predictive of folding events, as discussed in the accompanying paper (16). As also discussed in detail in the accompanying paper (16), a key assumption that is critical for our theoretical model is that nonnative contacts affect the rates but not the mechanism. In the model, they affect rates by determining the value of the adjustable kinetic parameter, α, that sets the time scale for the elementary steps of the c→n and n→c transitions. The issue of the role of nonnative contacts is important because it has physical significance beyond the protein folding problem, as well as biological significance. The physical significance is that for molecules of known structure, it permits the construction of simple theoretical models, such as the Ising-like model discussed here, and makes it possible to simulate the kinetics and dynamics of systems that are Henry et al.

1. Oliveberg M, Wolynes PG (2005) The experimental survey of protein-folding energy landscapes. Q Rev Biophys 38(3):245–288. 2. Shakhnovich E (2006) Protein folding thermodynamics and dynamics: Where physics, chemistry, and biology meet. Chem Rev 106(5):1559–1588. 3. Dill KA, MacCallum JL (2012) The protein-folding problem, 50 years on. Science 338(6110):1042–1046. 4. Wolynes PG, Eaton WA, Fersht AR (2012) Chemical physics of protein folding. Proc Natl Acad Sci USA 109(44):17770–17771. 5. Thirumalai D, Liu ZX, O’Brien EP, Reddy G (2013) Protein folding: From theory to practice. Curr Opin Struct Biol 23(1):22–29. 6. Snow CD, Nguyen H, Pande VS, Gruebele M (2002) Absolute comparison of simulated and experimental protein-folding dynamics. Nature 420(6911):102–106. 7. Noé F, Schütte C, Vanden-Eijnden E, Reich L, Weikl TR (2009) Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations. Proc Natl Acad Sci USA 106(45):19011–19016. 8. Bowman GR, Pande VS (2010) Protein folded states are kinetic hubs. Proc Natl Acad Sci USA 107(24):10890–10895. 9. Bowman GR, Voelz VA, Pande VS (2011) Taming the complexity of protein folding. Curr Opin Struct Biol 21(1):4–11. 10. Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How fast-folding proteins fold. Science 334(6055):517–520. 11. Piana S, Lindorff-Larsen K, Shaw DE (2012) Protein folding kinetics and thermodynamics from atomistic simulation. Proc Natl Acad Sci USA 109(44):17845–17850. 12. Freddolino PL, Schulten K (2009) Common structural transitions in explicit-solvent simulations of villin headpiece folding. Biophys J 97(8):2338–2347. 13. Best RB (2012) Atomistic molecular simulations of protein folding. Curr Opin Struct Biol 22(1):52–61. 14. Shaw DE, et al. (2010) Atomic-level characterization of the structural dynamics of proteins. Science 330(6002):341–346. 15. Piana S, Lindorff-Larsen K, Shaw DE (2011) How robust are protein folding simulations with respect to force field parameterization? Biophys J 100(9):L47–L49. 16. Best RB, Hummer G, Eaton WA (2013) Native contacts determine protein folding mechanisms in atomistic simulations. Proc Natl Acad Sci USA 110:17874–17879. 17. Muñoz V, Eaton WA (1999) A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc Natl Acad Sci USA 96(20): 11311–11316. 18. Henry ER, Eaton WA (2004) Combinatorial modeling of protein folding kinetics: Free energy profiles and rates. Chem Phys 307(2-3):163–185. 19. Cellmer T, Henry ER, Kubelka J, Hofrichter J, Eaton WA (2007) Relaxation rate for an ultrafast folding protein is independent of chemical denaturant concentration. J Am Chem Soc 129(47):14564–14565. 20. Cellmer T, Henry ER, Hofrichter J, Eaton WA (2008) Measuring internal friction of an ultrafast-folding protein. Proc Natl Acad Sci USA 105(47):18320–18325. 21. Kubelka J, Henry ER, Cellmer T, Hofrichter J, Eaton WA (2008) Chemical, physical, and theoretical kinetics of an ultrafast folding protein. Proc Natl Acad Sci USA 105(48): 18655–18662. 22. Cellmer T, Buscaglia M, Henry ER, Hofrichter J, Eaton WA (2011) Making connections between ultrafast protein folding kinetics and molecular dynamics simulations. Proc Natl Acad Sci USA 108(15):6103–6108. 23. Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG (1995) Funnels, pathways, and the energy landscape of protein-folding—A synthesis. Proteins 21(3):167–195. 24. Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of protein folding: The energy landscape perspective. Annu Rev Phys Chem 48:545–600. 25. Wolynes PG (2005) Recent successes of the energy landscape theory of protein folding and function. Q Rev Biophys 38(4):405–410. 26. Wako H, Saito N (1978) Statistical mechanical theory of protein conformation 2. folding pathway for protein. J Phys Soc Jpn 44(6):1939–1945.  N (1983) Theoretical studies of protein folding. Annu Rev Biophys Bioeng 12: 27. Go 183–210. 28. Bryngelson JD, Wolynes PG (1987) Spin glasses and the statistical mechanics of protein folding. Proc Natl Acad Sci USA 84(21):7524–7528. 29. Bryngelson JD, Wolynes PG (1989) Intermediates and barrier crossing in a random energy model (with applications to protein folding). J Phys Chem 93(19):6902–6915. 30. Takada S, Wolynes PG (1997) Microscopic theory of critical folding nuclei and reconfiguration activation barriers in folding proteins. J Chem Phys 107(22):9585–9598. 31. Alm E, Baker D (1999) Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures. Proc Natl Acad Sci USA 96(20):11305– 11310. 32. Galzitskaya OV, Finkelstein AV (1999) A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc Natl Acad Sci USA 96(20):11299– 11304.

33. Portman JJ, Takada S, Wolynes PG (2001) Microscopic theory of protein folding rates. I. Fine structure of the free energy profile and folding routes from a variational approach. J Chem Phys 114(11):5069–5081. 34. Portman JJ, Takada S, Wolynes PG (2001) Microscopic theory of protein folding rates. II. Local reaction coordinates and chain dynamics. J Chem Phys 114(11):5082–5096. 35. Bruscolini P, Pelizzola A (2002) Exact solution of the Muñoz-Eaton model for protein folding. Phys Rev Lett 88(25 Pt 1):258101. 36. Abe H, Wako H (2006) Analyses of simulations of three-dimensional lattice proteins in comparison with a simplified statistical mechanical model of protein folding. Phys Rev E Stat Nonlin Soft Matter Phys 74(1 Pt 1):011913. 37. Qi X, Portman JJ (2007) Excluded volume, local structural cooperativity, and the polymer physics of protein folding rates. Proc Natl Acad Sci USA 104(26):10841–10846. 38. Nelson ED, Grishin NV (2008) Folding domain B of protein A on a dynamically partitioned free energy landscape. Proc Natl Acad Sci USA 105(5):1489–1493. 39. Ghosh K, Dill KA (2009) Theory for protein folding cooperativity: helix bundles. J Am Chem Soc 131(6):2306–2312. 40. Pande VS (2010) Simple theory of protein folding kinetics. Phys Rev Lett 105(19): 198101. 41. Bruscolini P, Naganathan AN (2011) Quantitative prediction of protein folding behaviors from a simple statistical model. J Am Chem Soc 133(14):5372–5379. 42. Lane TJ, Pande VS (2012) A simple model predicts experimental folding rates and a hub-like topology. J Phys Chem B 116(23):6764–6774.  models for protein 43. Hills RD, Jr., Brooks CL, 3rd (2009) Insights from coarse-grained go folding and dynamics. Int J Mol Sci 10(3):889–905. 44. Thirumalai D, O’Brien EP, Morrison G, Hyeon C (2010) Theoretical perspectives on protein folding. Annu Rev Biophys 39(39):159–183. 45. Chung HS, McHale K, Louis JM, Eaton WA (2012) Single-molecule fluorescence experiments determine protein folding transition path times. Science 335(6071): 981–984. 46. Neupane K, et al. (2012) Transition path times for nucleic acid folding determined from energy-landscape analysis of single-molecule trajectories. Phys Rev Lett 109(6): 068102. 47. Hummer G, Eaton WA (2012) Viewpoint: Transition path times for DNA and RNA folding from force spectroscopy. Physics 5:87–89. 48. Zwanzig R, Szabo A, Bagchi B (1992) Levinthal’s paradox. Proc Natl Acad Sci USA 89(1):20–22. 49. Schellman JA (1959) The factors affecting the stability of hydrogen-bonded polypeptide structures in solution. J Phys Chem 62(12):1485–1494. 50. Li P, Oliva FY, Naganathan AN, Muñoz V (2009) Dynamics of one-state downhill protein folding. Proc Natl Acad Sci USA 106(1):103–108. 51. Yeh IC, Hummer G (2004) System-size dependence of diffusion coefficients and viscosities from molecular dynamics simulations with periodic boundary conditions. J Phys Chem B 108(40):15873–15879. 52. Jas GS, Eaton WA, Hofrichter J (2001) Effect of viscosity on the kinetics of alpha-helix and beta-hairpin formation. J Phys Chem B 105(1):261–272. 53. Gin BC, Garrahan JP, Geissler PL (2009) The limited role of nonnative contacts in the folding pathways of a lattice protein. J Mol Biol 392(5):1303–1314. 54. Best RB, Hummer G (2009) Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides. J Phys Chem B 113(26):9004–9015. 55. Best RB, Mittal J (2010) Balance between alpha and beta structures in ab initio protein folding. J Phys Chem B 114(26):8790–8798. 56. Mittal J, Best RB (2010) Tackling force-field bias in protein folding simulations: Folding of Villin HP35 and Pin WW domains in explicit water. Biophys J 99(3):L26–L28. 57. Chung HS, Eaton WA (2013) Single-molecule fluorescence probes dynamics of barrier crossing. Nature (in press). 58. Hummer G (2004) From transition paths to transition states and rate coefficients. J Chem Phys 120(2):516–523. 59. Chung HS, Louis JM, Eaton WA (2009) Experimental determination of upper bound for transition path times in protein folding from single-molecule photon-by-photon trajectories. Proc Natl Acad Sci USA 106(29):11837–11844. 60. Godoy-Ruiz R, et al. (2008) Estimating free-energy barrier heights for an ultrafast folding protein from calorimetric and kinetic data. J Phys Chem B 112(19):5938–5949. 61. O’Brien EP, Ziv G, Haran G, Brooks BR, Thirumalai D (2008) Effects of denaturants and osmolytes on proteins are accurately predicted by the molecular transfer model. Proc Natl Acad Sci USA 105(36):13403–13408. 62. Koga N, Takada S (2006) Folding-based molecular simulations reveal mechanisms of the rotary motor F1-ATPase. Proc Natl Acad Sci USA 103(14):5367–5372. 63. Sirur A, Best RB (2013) Effects of interactions with the GroEL cavity on protein folding rates. Biophys J 104(5):1098–1106. 64. Chiti F, Dobson CM (2006) Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem 75:333–366.

Henry et al.

PNAS | October 29, 2013 | vol. 110 | no. 44 | 17885

CHEMISTRY

ACKNOWLEDGMENTS. We thank D. E. Shaw Research for making this work possible by giving us access to their MD trajectories, Gerhard Hummer and Attila Szabo for many insightful discussions, and Kresten Lindorff-Larsen and Stefano Piana for their comments on the manuscript. This work was supported by the Intramural Research Program of the National Institutes of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health.

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

too large to include all interresidue interactions (e.g., refs. 62, 63). The biological significance is that the dominance of native-only interactions increases the probability of avoiding long-lived misfolded traps. Such traps are believed to promote the protein aggregation that is responsible for multiple human diseases (64).

Comparing a simple theoretical model for protein folding with all-atom molecular dynamics simulations.

Advances in computing have enabled microsecond all-atom molecular dynamics trajectories of protein folding that can be used to compare with and test c...
921KB Sizes 0 Downloads 0 Views