Article pubs.acs.org/JPCB

How Accurately Do Current Force Fields Predict Experimental Peptide Conformations? An Adiabatic Free Energy Dynamics Study Alexandar T. Tzanov,†,∥ Michel A. Cuendet,†,‡,∥ and Mark E. Tuckerman*,†,§ †

Department of Chemistry, New York University, New York, New York 10003, United States Swiss Institute of Bioinformatics, UNIL Sorge, 1015 Lausanne, Switzerland § Courant Institute of Mathematical Sciences, New York University, New York, New York 10003, United States ‡

S Supporting Information *

ABSTRACT: The quality of classical biomolecular simulations is inevitably limited by two problems: the accuracy of the force field used and the comprehensiveness of configuration space sampling. In this work we tackle the sampling problem by carrying out driven adiabatic free energy dynamics to obtain converged free energy surfaces of dipeptides in the gas phase and in solution using selected dihedral angles as collective variables. To calculate populations of conformational macrostates observed in experiment, we introduce a fuzzy clustering algorithm in collective-variable space, which delineates macrostates without prior definition of arbitrary boundaries. With this approach, we calculate the conformational preferences of small peptides with six biomolecular force fields chosen from among the most recent and widely used. We assess the accuracy of each force field against recently published Raman or IR−UV spectroscopy measurements of conformer populations for the dipeptides in solution or in the gas phase.

1. INTRODUCTION Folding to a specific three-dimensional structure is essential for a protein to function properly. Because metastable conformational states represent kinetic pitfalls that reflect competing and mutually exclusive interactions, it is crucial to characterize not only the native state but also all non-native states. Experimental X-ray or nuclear magnetic resonance (NMR) structures are the usual starting point for studies of relations between the native structure of a protein and its function. The NMR and/or X-ray structures, however, do not provide direct information about the dynamics of the protein, its folding process, or its unfolded state. Experimental characterization of metastable states relies on numerous advanced techniques: NMR,1 infrared/ultraviolet (IR−UV) hole-filling and infrared-induced population transfer spectroscopies,2−9 attenuated total reflection (ATR) spectroscopy,10 and Raman spectroscopy,10 all of which can probe different stages during the folding process. The experimental picture is, however, often incomplete, such as when the folding times are within the same time scale as experiment. In many cases, computer simulation is a necessary complement to experiment to obtain a more detailed representation of conformational ensembles or of folding processes. Given the complexity of the systems and the large amount of sampling required to explore the associated conformational spaces, classical molecular dynamics (MD) simulation is the method of choice. For the most part, MD force fields (FFs) have been validated through their ability to find the folded states of proteins as a global minimum of the free energy surface. The accuracy of MD simulations to describe unfolded states has yet to be validated against experiment. © 2014 American Chemical Society

The present study represents a step in this direction. We use recent enhanced sampling techniques to extensively explore the conformational space of two small peptides: N-acetylalanineN′-methylamide (NA-A-MA or alanine dipeptide) and Nacetyltryptophan-N′-methylamide (NA-T-MA) (Figure 1). Though several computational studies have focused on comparing the free energy surfaces (FES) of dipeptides for different FFs,11−13 comparison to experiment requires one additional step. We need to introduce the notion of conformational macrostates and calculate the corresponding macrostate populations to compare to experimental results. Here, we propose a fuzzy clustering technique to delineate conformational macrostates based on the MD data and calculate populations in a robust way for the specific conformations observed in a given experiment. Comparison of six different FFs (CHARMM27, CHARMM36, AMBER99SB, FUJI, OPLS-AA/L, and GROMOS 54a7, as discussed below) reveals a large degree of variability among conformational preferences predicted by the different models. We compare these results to conformational populations obtained by infrared and Raman spectroscopy for NA-A-MA and NA-T-MA in solution and by IR−UV fluorescence spectroscopy after supersonic expansion for NA-T-MA in the gas phase. This enables a critical assessment of the ability of the six FFs to reproduce experimentally observed conformational ensembles of Special Issue: William C. Swope Festschrift Received: January 7, 2014 Revised: March 9, 2014 Published: March 13, 2014 6539

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

Figure 1. Collective variables used in dAFED simulations: (a) NA-A-MA; (b) NA-T-MA.

small peptides. In contrast to previous studies,11−13 our goal here is not to exhaustively explore the conformational states of all twenty amino acids. Rather, we focus on two representative systems for which experimental data are available and describe, in detail, how we classify the conformational states so that comparison to experiment can be made in the most unbiased way. Having a reliable model for these small systems and a robust methodology for analyzing the conformational states are necessary prerequisites for any meaningful description of unfolded states in larger proteins. 1.1. Conformational States of Peptides. 1.1.1. Definition of Macrostates. Conformations of amino acids are often represented in the Ramachandran map,14 i.e., the plane spanned by the two backbone dihedral angles ϕ and ψ defined by the atom quadruplets (C−N−Cα−C) and (N−Cα−C−N), respectively. On the Ramachandran map, all non-proline and non-glycine amino acids preferentially occupy similar regions schematically depicted in Figure 2a. These regions define archetypical macrostates that are related to the secondary structure of peptide chains. Most of the conformations fall into two major basins: the helical α basin with αR (right-hand helix structure) and α′, and the extended β basin with the C5 and polyproline II (PII) conformations. Other higher energy macrostates related to loops and turns are αL (left-hand helix), C7ax (γ turn), and C7eq.15 Figure 2a shows how these preferred conformations are mapped to the (ϕ, ψ) plane, and Table 1 gives archetypical coordinates for the centers of these regions. More recent extensive analysis of high resolution crystal structures suggested that residues may also favor conformations with (ϕ, ψ) outside “classical” regions.15−20 In amino acids with an indole ring side chain (NA-T-MA, melatonin, etc.), the conformations can be described in further detail. The conformation of the backbone with respect to the indole ring plane is denoted with four letters that have the following meaning. The first two letters indicate the conformation of the backbone by its position in the Ramachandran map, as shown in Figure 2a. The last two letters indicate the position of the N-terminus and C-terminus relative to the indole ring: A means anti, P denotes gauche on the pyrrole side, and Φ denotes gauche on the phenyl side. Using this notation, C7eq(AP) represents a C7 equatorial peptide backbone with the N-terminus anti to indole ring and the C-terminus gauche to

Figure 2. (a) Schematic representation of archetypical conformational regions for non-glycine and non-proline residues in the space defined by the Ramachandran angles ϕ and ψ (adapted from Feig et al.25). (b) Free energy surface of NA-A-MA in solution with the OPLS-AA/L force field from a dAFED simulation using ϕ and ψ as CVs (see text for details). (c) Fuzzy cluster representation of the conformational macrostates from the same simulation. Cluster centers are indicated by white dots and the centers that are restrained to the archetypes in (a) are noted as white diamonds. The samples are colored if their membership coefficient uik for a cluster is greater than 0.5. For clarity, only a subset of the sample is represented. For the three macrostates of interest, C5, PII, and αR, the relative populations are indicated in white font.

the indole ring on the pyrrole side of the indole. Examples of these structures for NA-T-MA are shown in Figure 3. 1.1.2. Experimental Approaches. Experimental approaches often utilize dipeptides, i.e., N-acetyl-N′-methylamides, of single 6540

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

cooling conditions. Error bars are reported to be of the order of 0.02 for all three numbers. In a recent landmark study, Grdadolnik et al.23 used the amide III region of the peptide IR and Raman spectra to infer populations for 19 dipeptides in solution. The analysis in this study is restricted to the three major conformers C5, PII, and αR. The relative intensity of the three bands are fitted in both IR and Raman spectra,24 and the bands are assigned to the three backbone conformers using NMR J-couplings. For NA-AMA, the populations reported for C5, PII, and αR were 0.29, 0.61, and 0.10, respectively. For NA-T-MA, the corresponding populations are 0.44, 0.54, and 0.02, with an uncertainty of 0.02 for all (Table 1). 1.2. Molecular Simulation as a Tool for Structure Prediction. Besides the sampling problem, which we address in the following sections, the central question in MD simulations is the suitability of a given FF to study a particular molecular system. In biomolecular systems, establishing agreement between simulated and experimental results or between simulations with different FFs is often highly dependent on the complexity of a system. In the case of protein folding, for example, it is difficult to disentangle effects of the local ϕ/ψ preference of each amino acid and the effects of long-range and cooperative phenomena. This is the key question for FF transferability to large systems, as it is well-known that amino acids show different conformational propensities when alone, in short peptide chains or in proteins.25,26 Nevertheless, if we wish to obtain a correct answer for the right reason, we must ensure that current FFs are reliable models for the shortest peptide chains, i.e., the dipeptide molecules. Most standard classical FFs for biomolecular simulations come from four families: CHARMM,27−29 AMBER,30,31 OPLS,32−35 and GROMOS36−38 (see refs 39 and 40 for more complete reviews on FFs). These FFs are nonpolarizable (they use fixed partial charges on atomic centers). All FFs of these families utilize very similar potential energy functions including so-called bonded terms for bond-stretching, anglebending, and torsions of rotatable bonds, as well as nonbonded terms in the form of Coulomb and Lennard-Jones potentials. Small differences include the specific functional form for torsion potentials and the treatment of 1−4 interactions. Parameter sets are typically obtained by fitting to quantum mechanical (QM) calculations for small molecules. These initial parameters are refined by following a procedure that varies among FFs but usually involves obtaining averages from long MD simulations, which are fed back to adjust parameters to reproduce experimental macroscopic properties, such as the solvation free energy. Another common approach involves fitting FF energies to a set of QM calculations done on incrementally larger molecular systems. The three main reasons

Table 1. Canonical Conformations of Non-proline and Nonglycine Amino Acids in Solution and Their Position in the Ramachandran Planea conformation

ϕ

ψ

C5 PII

−150 −65

160 150

αR α′ αL C7ax C7eq

−60 −130 50 50 −75

−50 25 50 −130 75

secondary structure β-sheet polyproline II helix right-handed helix

exp exp NA-A-MA NA-T-MA 0.29 0.61

0.44 0.54

0.10

0.02

left-handed helix γ turn

The (ϕ, ψ) values indicated for the C5, PII, and αR conformations are used as archetypes for our clustering method. Experimental values are from ref 23.

a

amino acids to gather insights about biomolecular conformation. It was found that three amino acids, tryptophan (Trp), phenylalanine (Phe), and tyrosine, (Tyr), all of which have an ultraviolet chromophore (indole and benzene rings), can exhibit conformation-specific infrared spectra when fluorescence dip infrared spectroscopy is combined with UV−UV spectroscopy.2−4,10,21,22 In particular, NA-T-MA engendered much attention as an object of experimental investigation despite the fact that tryptophan itself is not in abundance in biological systems. Its bulky side chain gives rise to a complex potential energy surface, the detailed characterization of which provides insights that can be carried over to larger systems. In the IR−UV hole-filling method,2−5,7,22 single modes in gas phase NA-T-MA molecules at 423 K are selectively excited using IR radiation before the molecules are cooled by supersonic jet expansion and probed by UV spectroscopy. The population transfer spectroscopy (PTS) technique analyzes the changes in populations of conformers as a function of infrared wavelength to generate populations directly from spectra.22 In refs 2−4 and 22, the three NA-T-MA macrostates apprearing in the spectra are labeled with with A, B, and C. Using a combination of empirical arguments and molecular modeling, these spectroscopic macrostates were assigned to conformers described by the four-letter codes C5(AΦ), C5(AP), and C7eq(ΦP), respectively. These four-letter codes can, in turn, be mapped to regions in the four-dimensional space spanned by ϕ, ψ, and the two dihedral angles χ1 and χ2. The centers of these regions, which we call archetypes, are reported in Table 2. The populations of conformers A, B, and C were measured to be 0.23, 0.40, and 0.37, respectively, in refs 3 and 22. A subsequent experiment by the same authors4 yielded slightly different values of 0.20, 0.50, and 0.30, respectively. The latter values are assumed to be of higher accuracy because of better expansion

Figure 3. Structures of NA-T-MA in the gas phase corresponding to states A, B, and C as assigned in ref 2 (Table 2). 6541

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

not have a minimum in the α region, in the subsequent version, denoted AMBER96,44 the parameters for ϕ and ψ were adjusted to reproduce α-helical energies for the alanine tetrapeptide. It was reported, however, that AMBER96 simulations overestimate the β-strand propensity.42,56 The problem was addressed in AMBER99 where the backbone parameters were refined by including 11 structures of the alanine tetrapeptide along with the alanine dipeptide. The AMBER99 FF was later shown to overstabilize the α-helical conformations.57,58 More recently, Sorin and Pande modified the AMBER99 set by replacing parameters for ϕ with those used in AMBER94 and named the resulting FF AMBER99ϕ.50,59 Simerling and coworkers45 reparametrized the backbone (ϕ, ψ) dihedral terms of AMBER99 by fitting the energies of multiple conformations of glycine and alanine tetrapeptides from high level ab initio quantum mechanical calculations to obtain the AMBER99SB FF. A further improvement of AMBER99SB called FUJI was reported by Fujitani et al., based on new backbone dihedral parameters derived by fitting torsional energy profiles obtained from very high level ab initio molecular orbital calculations for hydrogen-blocked and methyl-blocked glycine and alanine dipeptides.60 Note that, at the time the present study was being completed, the latest version of the AMBER package was released with the new AMBER12SB FF that contains corrected backbone and the side-chain torsion potentials.61 AMBER12SB is not, however, included in this work. The OPLS FF was introduced in 1995. In the first version of the FF, the torsional parameters were derived by fitting to rotational energy profiles obtained from ab initio molecular orbital calculations at the RHF/6-31G* level for more than 50 organic molecules and ions.62 The nonbonded parameters were developed in conjunction with Monte Carlo simulations to obtain thermodynamic and structural properties for 34 pure organic liquids including alkanes, alkenes, alcohols, ethers, acetals, thiols, sulfides, disulfides, aldehydes, ketones, and amides.62 The reparametrization of OPLS involved refitting the key Fourier torsional coefficients.63 A set of energies for all amino acids (more than 2000) was generated on the basis of geometry optimization and single-point LMP2/cc-pVTZ(-f) calculations.63 Then ϕ and ψ torsion angle parameters were refit to these data. In one case, the van der Waals and charge parameters were also modified on the basis of new liquid-state simulation data. The resulting FF, denoted OPLS-AA/L (L stands for LMP2),63 is the one employed in the simulations to be reported herein. The first GROMOS FF36 was parametrized to reproduce spectroscopic or crystallographic structural data and was followed by a second-generation FF called 53A6 based on hydration properties of amino acid analogues.37 However, 53A6 was shown to destabilize α-helices. This led to a reparametrization of the torsional angle terms based on a large set of crystal structures, which gave rise to the current GROMOS 54A7 version.38 1.2.2. Earlier Computational Studies of Amino Acid Conformational Propensities. Dipeptides, and in particular the alanine dipeptide, are common benchmark systems to assess the accuracy and convergence properties of enhanced sampling molecular dynamics methods.64−66 In addition, FESs of dipeptides have been used in a number of studies to compare different force fields. For example, Hu et al.67 computed distributions of the alanine and glycine dipeptides in the (ϕ, ψ) plane with five force fields and compared them to similar distributions from QM/MM calculations. Liu et al. produced

for discrepancies among FFs, both from the same family and across families, are (i) differences in QM methodology for the initial parameter derivation, (ii) differences in training sets used, and (iii) differences in parameter tuning strategies. The quality or accuracy of a FF is usually assessed from its ability to stabilize experimentally observed protein structures or to reproduce other indirect experimental observables reflecting conformational equilibria of peptides or proteins. A difficulty faced by all classical biomolecular FFs has been an improper balance between predicted α and β structures. Well-known examples of structural preferences across FFs are the bias of AMBER9430 and AMBER0341 toward α-helical structures42 and inclination of the AMBER96,43,44 AMBER99SB,45 and CHARMM22 46 toward β structures in explicit solvent.20,29,42,45,47,48 We note that these tendencies can differ in implicit solvent where, for example, AMBER99SB was shown to favor α-helical structures.48 Very subtle modifications to commonly used molecular mechanical potentials, such as changes to the scaling factor for 1−4 electrostatic interactions, can significantly alter the behavior of those FFs with respect to the stabilizing/destabilizing of protein substructure.49,50 Because of this, the primary questions for all classical biolomolecular FFs are (i) what is the limit of accuracy of a given functional form? and (ii) how can the parameters be adjusted to achieve optimal reliability? In the following section, we provide background information about the FFs considered in this study. 1.2.1. Overview of the Force Fields Considered in This Study. The first member of the CHARMM FF family was named EF2 and was designed to be used in a vacuum. This FF included both united-atom and all-atom models. The nonbonded terms were derived from dimer interaction energies, and EF2 also included an explicit term for hydrogen bonds. The next generation, CHARMM22,46 differed significantly in that it included the explicit TIP3P water model51 and the explicit representation of hydrogen bonds from EF2 was removed. Partial charges were based on a reproduction of QM minimum interaction energies and on distances between small molecule dimers.52 The van der Waals or Lenard-Jones parameters were transferred from the proteins or based on crystal heats of sublimation.52 A more recent version of the CHARMM force field used in this study, CHARMM27, includes many improvements for non-protein systems.27,52,53 For proteins, the only difference with CHARMM22 was the addition of an extra term in the potential energy function to account for a joint ϕ- and ψ-angular contributions to the energy, which is the dihedral cross term or CMAP correction,27 based on high-level QM calculations of the glycine, alanine, and proline dipeptides. The most recent development in the CHARMM family of FFs is CHARMM36, which offers an improved representation of the potential energy surface of proteins due to two key modifications.54 First, the CMAP term was optimized to reproduce NMR scalar coupling data on peptides in solution. Second, the side-chain dihedral potentials were optimized against quantum mechanical energies from dipeptides and NMR data from unfolded proteins. CHARMM36 also includes various previous revisions, including the improvements for the tryptophan side chain.55 In the first member of the AMBER FFs, introduced in 1994, dihedral parameters were developed by fitting to relative QM energies of rotamers of glycine and alanine.30 The partial charges were obtained by fitting the electrostatic potential to calculations at the Hartree−Fock 6-31G* level of theory. Because the gas phase energy surfaces of alanine and glycine do 6542

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

with dim(Ω) = n. The probability of observing a conformation q(r) = s in the canonical ensemble is given by

NA-A-MA FESs with AMBER94, AMBER03, OPLSAA, OPLSAA/L, and CHARMM27 and compared them to FESs from DFT calculations (where the FES is approximated using contributions of local vibrational modes, and an implicit solvation model was used). After an initial study11 on NA-A-MA, Vymetal et al. recently published an extensive survey12 of all amino acids comparing FESs from CHARMM27, AMBER99SB, AMBER03, and OPLS-AA/L FFs using the metadynamics method.68 Some studies went one step further and compared simulation results with conformer populations observed in the database of all known protein structures. For example, Feig25 generated FESs for all amino acids using the CHARMM27 FF and compared them to statistics of known structures. More recently, Jiang et al.69 used replica exchange to calculate FESs for all amino acids with the OPLS-AA/L and AMBER03 FFs and compared backbone and side-chain conformational propensities to a database of known structures (excluding helix and hairpin regions). The same authors subsequently proposed a knowledge-based correction to the OPLS-AA/to better reproduce the observed propensities.26 Finally, a few authors attempted to bridge FES calculations with data from experiments, such as NMR or IR spectroscopy, which are able to resolve conformational populations of small peptides. As a very significant example, Best et al.20,70 computed FESs for several short peptides using six different force fields and compared the conformation-dependent NMR J-couplings to experiment. Moradi et al.71 compared the polyproline II propensities of short peptides calculated with AMBER99SB in implicit solvent to circular dichroism experiments. Recently, Cruz et al.72 calculated FESs for all amino acids in solution with OPLS-AA/L and compared populations of the C5, PII, and αR conformers with the experimental results from Grdadolnik et al.23 mentioned above. Note that most computational studies mentioned above that infer conformer populations from the FES12,25,67,72 rely on hard rectangular macrostate boundaries in the (ϕ, ψ) plane. However, the dividing line between neighboring states separated by a low, wide free energy barrier is very arbitrary. For example, the separation between the α′ and αR states is set at ϕ = −120° in ref 12, whereas it is defined as ψ = −15° in ref 25 for all FFs. This led other authors, for example, the authors of ref 71, to apply clustering techniques to infer macrostate boundaries from the data. Here, we develop a fuzzy clustering method that defines macrostates based on the simulation data, as well as available experimental observables, and assigns shared memberships to conformations in overlapping regions. We further show how relative macrostate populations can be calculated using such partial memberships from biased ensembles resulting from enhanced sampling MD simulations. We employ this approach to study the conformational preferences of two representative dipeptide systems across all of the aforementioned FF families.

P(s) =

n

1 Z

∫

Ndf

dr e−βU (r) ∏ δ(qα(r) − sα) α=1

(1) −βU(r)

where U(r) is the potential energy function and Z = ∫ e dr is the partition function. Here, δ(·) is the Dirac delta function and β = (kBT)−1, where T is the system temperature, and kB is Boltzmann’s constant. The FES over the conformational space Ω is then

F(s) = −

1 ln P(s) β

(2)

2.1.1. Conformational Macrostates as Regions of the CV Space. We further define a conformational macrostate C as an ensemble of conformations occupying a contiguous region Ω(C) ⊂ Ω, which typically maps a significant basin of the FES. If the CVs are a good representation of the physical system, a macrostate should be an ensemble of rapidly interconverting conformations. Different macrostates are separated by barriers on the FES that, again under the assumption of a good choice of CVs, are related to the kinetics of macrostate transitions. The probability of observing (or the population of) macrostate C is P(C) =

∫Ω(C) ds e−βF(s)

(3)

Note that in eq 3, no normalization constant is necessary because ∫ Ωds e−βF(s) = 1, due to the normalization in eq 1. The free energy of macrostate C at temperature T is defined as F (C ) = −

1 ln P(C) β

(4)

According to standard thermodynamics, F(C) can be decomposed as F(C) = E(C) − TS(C), where E(C) is the enthalpy and S(C) is the entropy of macrostate C. The overall balance between enthalpic and entropic contributions determines the stability of a macrostate at the given T. Thus, it is possible for a given macrostate to be entropically stabilized despite an unfavorable enthalpic component. In terms of the FES, this means that a shallow but broad basin can represent a more stable macrostate than a deep but very localized minimum. In particular, the value of F(s) at the FES minimum is not the relevant quantity to determine the stability of a given macrostate, and eq 4 must be used instead. 2.1.2. Driven Adiabadic Free Energy Dynamics Enhanced Sampling Method. The PES of biomolecular systems is typically rough, with many local minima and barriers, which implies a wide distribution of time scales for the motions in these systems. Over the years, numerous methods have been developed to enhance sampling as well as to obtain dynamic information on various molecular systems.73,74 In particular, temperature accelerated molecular dynamics (TAMD)75 and diven-adiabatic free energy dynamics (dAFED)65 are two similar methods that stem from an earlier approach, adiabatic free energy dynamics (AFED).76−78 The AFED enhanced sampling method imposes an adiabatic separation between a set of collective variables (CVs) and the remaining degrees of freedom of a system, after a coordinate transformation that makes the CVs explicit dynamical variables. In TAMD and dAFED, however, the adiabatic separation is applied to a set of extended variables tightly coupled to the CVs, rather than the CVs themselves. Explicit coordinate transformations are thus no longer required, rendering TAMD

2. METHODS 2.1. Collective Variables and Macrostate Free Energies. Consider a molecular system with Ndf degrees of freedom described by a vector of coordinates r in Cartesian space. We assume that rare events can be characterized by a set of n collective variables (CVs), q(r) ≡ (q1(r), q2(r), ..., qn(r)), with n ≪ Ndf. We define a conformation as a particular value of the CVs, s ≡ (s1, s2, ..., sn). We denote by Ω the space spanned by s, 6543

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

2.2.1. Macrostate Populations in the Adiabatic Ensemble. In the context of dAFED, the samples s1, ..., sN are obtained at temperature Ts. We recently showed how to calculate the ensemble average of any observable at temperature T from a TAMD or dAFED simulation.80 If we are interested in populations, the relevant observable is the indicator function of macrostate C, 1Ω(C)(s), where 1Ω(C) = 1 if s ∈ Ω(C) and 0 otherwise. This observable needs to be reweighted by a factor e−[β−βs]F(s), with F(s) the dAFED FES calculated using eq 6. Thus, we can estimate the probability of observing macrostate C as

and dAFED much easier to implement with almost no loss of efficiency. The full description of the TAMD and dAFED approach is given elsewhere,65,75 and in the following we summarize only the main aspects of the dAFED approach. If we replace the product of δ functions in eq 1 with the limit of a product of Gaussian functions, we bring about terms in the exponential that can be viewed as harmonic potentials with harmonic constants k1, ..., kn that keep the collective variables q1(r), ..., qn(r) close the extended variables s1, ..., sn. These extended variables are assigned momenta ps1, ..., psn and fictitious masses ms1, ..., msn and are evolved as independent dynamical variables. If high values are assigned to these masses, the motion of the s variables is slow with respect to the time scales of the physical system, and the s variables become adiabatically decoupled from the physical variables. Under these conditions, s evolves on the potential of mean force, eq 2, generated by the averaged interactions with the physical system at inverse temperature β, according to the effective adiabatic Hamiltonian for s,65 n

Hadb(ps,s) =

∑ α=1

ps

2msα

+ Fβ(s) (5)

If the adiabatic decoupling is effective, we can couple the extended system to a heat bath at temperature Ts > T while keeping the amount of heat transferred to the physical system limited. In this way, the extended variables can slowly but effectively drive the system across free energy barriers of height up to a few kBTs, which results in enhanced sampling along the directions of the CVs. It can be shown65,75 that under the high-temperature and adiabatic conditions, the probability distribution Padb(s) sampled from the adiabatic dynamics leads to a good approximation of the exact free energy energy profile at temperature T, Fβ(s) = −kBTs ln Padb(s)

N

∑ 1Ω(C)(si)e−[β − β ]F(s ) s

i=1

i

(7)

Many previous studies of small peptides have used predefined regions of the Ramachandran surface to define macrostates. Each of these regions was delimited by maximum and minimum values for ϕ and ψ and had a rectangular shape.12,25,67,72 Using arbitrary regions of Ω to define conformational macrostates can be problematic when different molecules are studied. For example, a fixed region Ω(Ck) might not correctly encompass the free energy basin corresponding to macrostate Ck for all molecules. In addition, results can be sensitive to the precise location of the arbitrary boundary between two nearby macrostates. This calls for data-driven approaches to identify macrostates, such as a grouping of conformations in the (ϕ, ψ) plane using a clustering algorithm, as was done in ref 71. In the present study, we developed a fuzzy clustering approach to define states in the two- or four-dimensional CV space directly from a biased dAFED ensemble. The number M of clusters and the locations of the cluster centers c1, ..., cM are determined by the clustering algorithm (see below). For each sample point si, the algorithm provides a fractional membership weight uik ∈ [0, 1] for cluster Ck, with ∑M k=1uik = 1. This defines an N × M membership matrix Ũ . In this context, the cluster or macrostate probability eq 7 becomes

2

α

1 Nexp

P(C) =

(6)

This approximation holds under the assumptions that the adiabatic separation is effective and that the coupling of s with q(r) is tight. These requirements can, in principle, be approached by choosing appropriate values for masses ms1, ..., msn and coupling constants k1, ..., kn, which are arbitrary parameters of the simulation. We note that under the same assumptions, the FES can also be reconstructed from the mean force exerted by the physical system on s, a key element in the recent unified free energy dynamics (UFED) method79 that combined dAFED with metadynamics.68 The technical details of the simulations carried out in this study can be found in Appendix A. 2.2. Clustering Approach To Map Macrostates. If we have a set of N canonically distributed samples in CV space, e.g., collected during an MD or enhanced sampling simulation, we can estimate the probability of a conformational macrostate C at temperature T by counting the number of samples falling within Ω(C) and dividing by N. In comparison with experiment, a different normalization might be needed if some parts of the conformational space Ω do not contribute to the experimental observable. This happens, for example, if conformer populations are inferred from measurements using a spectroscopic technique in which some conformations do not show any signature. These invisible conformations are de facto considered to be out of Ω. The appropriate normalization constant is Nexp ≤ N, which is the number of samples falling into all Mexp conformational macrostates C1, ..., CMexp observed in the experiment.

P(Ck) =

1 u Nexp

N

∑ uik e−[β − β ]F(s ) s

i=1

i

(8)

The normalization constant Nuexp is the weighted sum of all membership coefficients related to the experimentally observed macrostates and is given by Mexp N u Nexp =

∑ ∑ uik e−[β − β ]F(s ) s

k=1 i=1

i

(9)

For the Mexp archetypical macrostates, the populations P(Ck), normalized as in eq 8, can be directly compared to experiment. 2.2.2. Modified Fuzzy Gustafson−Kessel Clustering Algorithm. To be used as predictive tools, clustering algorithms should be able to identify the underlying clustering structure of a data set. Most of the popular distance-based clustering algorithms such as K-means or fuzzy C-means81 (FCM) have several drawbacks which prevent direct use on MD simulation data. First, the FCM and K-means algorithms work well only for spherical-shaped clusters because their objective function is based on the Euclidian distance between samples and cluster centers. There are many algorithms to solve this problem, such as the Gustafson−Kessel algorithm82 (see below). However, shape-aware distance-based algorithms are still not appropriate for MD data because they tend to assign improper membership 6544

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

FGK method with archetypes C5, PII, and αR, as given in Table 1 for NA-A-MA in solution with the OPLS-AA/L FF. The populations in this case are 17%, 62%, and 21%, respectively. To verify the robustness of our clustering approach with respect to the particular definition of the archetypes, we repeated the clustering analysis 300 times using archetypes randomly perturbed within half of the cluster radius from the original archetypes. Figure 4 shows a close-up in the upper left corner of the Ramachandran plane and a few representative perturbed

to members of small clusters in the vicinity of large clusters and therefore distinguish poorly between clusters that have the same size but different densities. Second, proper classification requires the number of clusters to be known in advance or to be estimated directly from data set before the actual clustering begins. Approaches aiming to solve this problem, which involve simple, explicit testing of several cluster models that are compared to each other using a predefined quality measure, include the well-known Xie−Beni index,83 Arlot’s v-fold penalization algorithm,84 the gap clustering method,85 the resampling algorithm,86 and many others. Third, clustering algorithms are usually employed as inference methods to determine the number M, the centers c1, ..., cM, and the shapes of clusters from the data. In the present context, however, we need to determine populations for the particular set of Mexp macrostates assigned by experiment. These macrostates are centered around special positions a1, ..., aMexp in the CV space, which we call archetypes. For example, for dipeptides in solution (see ref 23), the archetypes are the centers of the C5, PII, and αR regions in the (ϕ, ψ) plane (defined following ref 29) and are given in Table 1 and are shown in Figure 2a. For NA-T-MA in the gas phase, the experimental macrostates of interest are A, B, and C and the archetypes are the centers of the corresponding regions in the 4-dimensional space (ϕ, ψ, χ1, χ2) as indicated in Table 2. Table 2. Archetype Conformations of NA-T-MA in the Gas Phase and Their Ramachandran and χ1, χ2 Backbone Dihedral Anglesa macrostate

conformation

ϕ

ψ

χ1

χ2

exp population

A B C

C5(AΦ) C5(AP) C7eq(ΦP)

−65 −150 −60

160 160 60

−160 −160 80

80 −80 80

0.20 0.50 0.30

a

Figure 4. Robustness of the clustering algorithm with respect to the location of the C5, PII, and αR archetypes shown on a zoomed portion of the Ramachandran map of NA-A-MA in water with the CHARMM36 FF. The archetypes (black stars) are initially perturbed from their original position by a random number. The clustering is then applied, and crisp centers (white circles) are obtained at locations very close to their original ones.

Experimental populations are indicated according to ref 4.

archetypes for the C5, PII, and αR states. The crisp centers obtained after clustering with perturbed archetypes and the original archetypes are almost identical, with maximum deviations within 5°. The corresponding populations all fall within 2% of the populations obtained with the original archetypes (Figure 4 and Table S2, Supporting Information). This demonstrates that our calculated populations are robust with respect to the choice of the archetypes.

To obtain population estimates comparable to experiment, our analysis method needs to assign a cluster to each of the archetypes, even if the standard clustering algorithm would not spontaneously place a cluster in this region. Therefore, we must introduce the archetypes as a priori information into the clustering scheme and enforce the presence of cluster centers v1, ..., vMexp in the vicinity of the archetypes. The exact position, shape, and total number of clusters is still freely determined by the algorithm. With this modification, the clustering algorithm is no longer strictly an inference method but is rather a probe to interrogate the data on populations at the predefined locations of the archetypes. To address the three points raised above, we use the modified fuzzy Gustafson−Kessel (FGK) algorithm,82 proposed in ref 87, which takes into account both the distance to cluster centers and the local density of samples.87 To enforce the presence of a cluster center in the vicinity of each archetype, an extra term is added to the penalty function. Finally, the adapted FGK is coupled to an iterative optimization scheme that finds the optimal number of clusters for each particular data set. The details of our clustering algorithm are described in Appendix B. The macrostate populations obtained from our method should not depend on local variations of the archetype positions, the choice of which involves a large degree of arbitrariness. Figure 2c shows an example of clustering using the modified

3. RESULTS AND DISCUSSION For NA-A-MA and NA-T-MA with the six FFs, CHARMM27 (C27), CHARMM36 (C36), AMBER99SB (A99SB), FUJI, OPLS-AA/L, and GROMOS 54a7 (G54a7), we ran dAFED simulations as described above. The resulting FESs, similar to those of Figure 2b, are shown in Figures S1−S3 of the Supporting Information. For NA-T-MA, we show projections of the four-dimensional FESs in the (ϕ, ψ) plane and in the (χ1, χ2) planes. Overall, we find good qualitative agreement with published FESs of dipeptides in solution12,13,64,66,69,72 and in the gas phase.64 We also verified that the FESs obtained with either Gromacs 4.5.588 using the modified PLUMED plugin89 or PINY_MD90 for the same systems exhibit negligible differences (data not shown). The Supporting Information also provides tables with detailed information on free energy minima for each system and each FF. In the present work, however, the focus is not on the FESs themselves, but on macrostate populations that can be 6545

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

compared to experiment. We applied the modified FGK clustering algorithm described in Appendix B to automatically identify conformational macrostates from simulation data, as shown, for example, in Figure 2b. In the Supporting Information, Figures S1−S3 similarly show the shape of the fuzzy clusters in the (ϕ, ψ) plane for all systems in this study. On the basis of the fuzzy clusters, we calculated populations for the experimentally observed macrostates C5, PII, and αR for peptides in solution, or A, B, and C for NA-T-MA in the gas phase using eq 8. The resulting populations are represented in Figures 5 and 6, and the corresponding numerical values can be

Figure 6. NA-T-MA in gas phase: (a) calculated populations for macrostates A, B, and C vs experimental data from ref 2; (b) prediction score for each forcefield calculated with eq 10.

adding up to 1. Therefore, the population space is a triangular section of a plane in 3, with each corner representing 100% of one macrostate and 0% of the other two. The largest possible distance in this triangle is dmax = (1002 + 1002)1/2 = 141.42. Noting that Pk = P(Ck) (k = 1, 2, 3) are the populations calculated with eq 8, we define the prediction score as Spred =

dmax −

3

∑k = 1 (Pk − Pkexp)2 dmax

(10)

Pexp are k pred

where the corresponding experimental values. The value of S lies in the interval [0, 1], with Spred = 1 being the best possible outcome. For assessment over more than one test system, we take the average value of Spred. Figures 5c and 6b show the prediction scores of all FFs for peptides in solution and in the gas phase, respectively. 3.1. Dipeptides in Solution. We found our dAFED FESs to be in good agreement with the metadynamics FESs published by Vymetal et al.11,12 for NA-A-MA in solution with AMBER99SB, OPLS-AA/L, and CHARMM27. For CHARMM27, our dAFED free energy minima correspond well to those tabulated in ref 64 using other methods.a Our FES with OPLS-AA/L is also in good agreement with that reported by Cruz et al.72 For all systems in solution, all FFs give FESs that are qualitatively similar. Notable differences are that the FUJI and GROMOS 54a7 FESs do not exhibit a local minimum in the α′ region and show a less pronounced αL basin. Instead of the αL state, GROMOS 54a7 favors the neighboring C7eq state, which is almost absent in all other FFs. For NA-A-MA in solution, the experiments of Grdadolnik et al.23 found the largest population of backbone conformations to be in the PII region with minor contributions of the C5 macrostate and only less than 10% population in the αR conformation. As shown in Figure 5a, our simulations confirm this trend only for the CHARMM36, FUJI, and GROMOS 45a7

Figure 5. (a) Populations of macrostates C5, PII, and αR for NA-A-MA in solution; (b) populations of macrostates C5, PII, and αR for NA-TMA in solution; (c) average prediction score for each force field calculated with eq 10. Experimental data are from ref 23.

found in Tables S2, S9, and S16 of the Supporting Information. In the two following sections, we will discuss the results in detail for each system. To give an aggregated assessment of FF accuracy, we define a prediction score based on the distance between calculated macrostate populations and experimental values. In all cases, we have three normalized populations that are positive numbers 6546

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

FFs. CHARMM27 strongly overestimates the αR population, denoting an exaggerated helical propensity for alanine, a tendency that has been noted in earlier studies.20,91 AMBER99SB overestimates the population of the C5 state, a problem that is partially addressed by the FUJI corrections to the AMBER99SB dihedral parameters. The summary of calculated populations of NA-T-MA in solution and the comparison to experiment is shown in Figure 5b. Experimentally, the effect of the tryptophan side chain on the backbone conformations in NA-T-MA translates into a stabilization of the C5 macrostate at the expense of both PII and αR. AMBER99SB largely overemphasizes this trend, with a dominating C5 conformer. On the other hand, CHARMM27 overstabilizes the αR macrostate, as in the case of NA-A-MA. CHARMM36 partially corrects this defect and restores the right ordering of macrostates, although αR is still over-represented. The GROMOS 54a7 populations are only marginally influenced by the bulky side chain, compared to NA-A-MA. Among all FFs, FUJI and OPLS-AA/L most accurately reproduce the experimental populations of NA-T-MA conformers. Overall, we note that all FFs tend to overestimate the αR conformation with respect to the extremely low population observed experimentally. We emphasize again that the populations shown in Figure 5 do not depend on prior definitions of arbitray boundaries between macrostates, which can artificially skew the populations. For example, Vymetal et al.12 placed the separation between C5 and PII at ϕ = 120. In the case of the CHARMM27 FF, for example, this will clearly artificially favor PII over C5. Using our fuzzy clustering approach without prior knowledge of macrostate boundaries, we indeed report differently ranked populations: αR > C5 > PII, versus αR > PII > C5 in ref 12 for NA-A-MA. The prediction scores calculated according to eq 10 for all systems in solution are shown in Figure 5c. Overall, we see that the CHARMM27 FF is the most challenged by the experimental results. The recent reparametrization efforts in CHARMM36 and FUJI clearly bring significant improvements over their respective predecessors, CHARMM27 and AMBER99SB. We note that the good prediction score of GROMOS 54a7 is mainly due to its excellent performance for NA-A-MA, whereas its lack of sensitivity to the nature of the side chain in NA-T-MA can be seen as a major drawback. The not-so-recent OPLS-AA/L force field shows a very good score, but the most accurate FF is FUJI, although it seems systematically to underestimate the PII population somewhat. 3.2. NA-T-MA in the Gas Phase. Projections of the FESs from our simulations are shown in Figure S3 in the Supporting Information. The free energy minima in the four-dimensional CV space were assigned to four-letter conformers and to A, B,

or C macrostates according to Table 2. Detailed information about free energy minima on the four-dimensional FES are given in Tables S10−S15 in the Supporting Information. We note that the FES minima for the A, B, or C macrostates consistently appear in the order B < A < C for all FFs, as shown in Table 3 as well. One interesting observation is that, consistently among all force fields, the C7eq conformer with lowest free energy is C7eq(AP) and not C7eq(ΦP). The latter was assigned by Dian et al.2 to the C state because of spectroscopic considerations and because C7eq(ΦP) was found to correspond to a lower potential energy minimum. The same authors, however, found a high free energy for the C7eq(ΦP) in a normal mode calculation based on DFT within the B3LYP approximation but dismissed that value, invoking the lack of accuracy of the normal mode approach. The fact that our enhanced-sampling calculations unambiguously confirm C7eq(AP) to be lower in free energy than C7eq(ΦP) reopens the case for the assignment of the C state. In the following, however, we stick to the assignment of states A, B, and C proposed by Dian et al.2,3 to preserve consistency of the definitions. The populations of the A, B, or C macrostates given by the clustering method and weighted summation using eq 7 are shown in Table S16 (Supporting Information) and illustrated in Figure 6, together with deviations from the experimental values.3 For all FFs, populations rank in the order B > C > A, which seems to contradict the ordering of the corresponding free energy minima mentioned above. For example, the CHARMM36 free energy minima are at 0.0, 2.18, and 1.66 kcal/mol for B, C, and A, respectively. However, the resolution of this apparent paradox lies in the sizes of the clusters. Indeed, the C macrostate is consistently more voluminous than the A macrostate, as shown in Table 3. In the example of CHARMM36, the radius of the C cluster is 65% larger that that of the A cluster, which amounts to a more than 7 times larger volume for a four-dimensional spherical cluster. This makes C entropically more stable than A, which inverts the ranking on the basis of the free energy minima alone. This is an illustration of the fact that the value of the free energy at the FES minimum is not a suitable indicator to rank macrostates. Figure 6b shows the aggregated prediction score of each FF for NA-T-MA in the gas phase, calculated according to eq 10. Overall, the behavior of all FFs is much more consistent in the gas phase than in solution, all FFs predicting qualitatively the same ranking of states A, B, and C. For NA-T-MA in the gas phase, the least accurate FFs are the ones of the former generation, CHARMM27 and AMBER-99SB. OPLS-AA/L gives the most faithful reproduction of experimental values, followed by the FUJI, which shows that the reparametrization60

Table 3. Characteristic Cluster Radii As Defined by Eq B.4 for NA-T-MA in a Gas Phase vs Calculated FE Minimaa CHARMM27

a

CHARMM36 B

AMBER99SB

A

B

C

A

FE min [kcal/mol] cluster radius [deg]

2.10 2.33

0 2.26 FUJI

2.18 4.23

1.66 4.90

A

B

C

A

B

C

A

B

C

FE min [kcal/mol] cluster radius [deg]

2.0 10.09

0 9.53

2.30 12.76

1.17 4.69

0 3.15

1.57 8.10

1.56 7.66

0 8.57

1.78 9.25

0 4.19 OPLS-AA/L

C

A

B

C

2.18 8.12

1.12 2.32

0 3.84 GROMOS 54a7

2.44 4.26

Macrostate C, although higher on the FES, is systematically larger than macrostate A, resulting in populations ranking B > C > A (Figure 6). 6547

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

the CHARMM27,27,53 AMBER99SB,45 and OPLS-AA/L63 FFs were carried out using the PINY_MD program.90 Additional simulations with CHARMM27, CHARMM36,54 FUJI,60 and GROMOS 54a738 were carried out with the GROMACS 4.5.5 software88 and the PLUMED 1.3 free energy plugin.89 The FF input files for CHARMM36 and FUJI were obtained personally from the main developers of the FFs. The input files for GROMOS 54a7 were downloaded from the Automatic Topology Builder website of A. E. Mark et al. (http:// compbio.biosci.uq.edu.au/atb/index.py?tab=forceField_tab). Due to the different implementations, there might be slight differences compared to these FFs in their respective original software package. both in the gas phase and in solvent. The collective variables for this study are the two backbone dihedral angles ϕ and ψ and, for NA-T-MA, two side-chain dihedral angles, χ1 and χ2, as indicated in Figure 1. Following the dAFED scheme, high masses and temperature were assigned to the extended variables. In all simulations we chose uniform masses, mϕ = mψ = mχ1 = mχ2, which we call ms. For NA-A-MA, we chose Ts = 600 K and ms = 400 amu·deg2·Å−2. For the other peptides, we chose Ts = 800 K and ms = 600 amu·deg2·Å−2 The harmonic coupling constant was 5.4 × 105 (kcal/mol)/deg2 for all simulations. These choices for the dAFED parameters were shown in to yield accurate FESs in our previous work on these systems.65 Peptides in the gas phase were equilibrated for 500 ps at T = 300 K for NA-A-MA and T = 423 K for NA-T-MA, which corresponds to the temperature at which the IR−UV spectroscopy measurements were conducted. For the solution phase simulations in PINY_MD, each peptide was solvated in a periodic cubic box of 860 TIP3P51 water molecules. The systems were equilibrated at constant volume (NVT) for 200 ps, then at constant pressure (NPT) for 50 ps, and again at NVT for 500 ps. The SHAKE93 and RATTLE94 algorithms were used to constrain the geometry of water molecules to a relative tolerance of 10−6. The electrostatics were treated with the particle mesh Ewald summation technique,95 and the shortrange forces were switched off at 12 Å. In PINY_MD, the NVT ensemble was generated by coupling a generalized Gaussian moment thermostat (GGMT)96 to each degree of freedom in the system (massive thermostating). The equations of motion were integrated using the multiple time scale r-RESPA algorithm97,98 to exploit the various time scales. Three rRESPA time steps were used: all intramolecular forces (covalent and ionic) were integrated using five steps per full MD step, and the dAFED harmonic coupling term was calculated using five steps per intramolecular force step. Production runs were up to 100 ns, with an outermost r-RESPA time step of 1 fs. Atomic coordinates were saved every 100 steps. Simulations in Gromacs used similar specifications, except for some variations due to the differences in implementation. In particular, no massive thermostating is available in Gromacs. Instead, molecules were assigned to 204 groups, which were thermostated independently with Nosé−Hoover chains.99 In addition, no RESPA scheme was used, but all bond lengths were constrained and a time step of 2 fs was used. The dAFED extended variables were coupled to a generalized Gaussian moment thermostat.96 The FESs were obtained by calculating a normalized probability distribution function Padb(s) from multidimensional nonuniform histograms with bin sizes ranging from 9° to 2.3°. The two-dimensional surfaces were smoothed using a modified

of the AMBER99SB dihedral angles brings significant improvement also in the gas phase.

4. CONCLUSIONS MD and enhanced sampling simulations of protein systems rely on the ability of the FF to faithfully model the conformational propensities of amino acids, both in the folded and in the disordered states. Recent experiments have been able to report the conformational preferences of dipeptides in solution or in the gas phase that can be used as direct benchmarks to evaluate the performance of FFs. We performed enhanced sampling MD simulations of the NA-A-MA and NA-T-MA peptides in solution and of NA-T-MA in the gas phase with six recent and widely used FFs, CHARMM27, CHARMM36, AMBER99SB, FUJI, OPLS-AA/L, and GROMOS 54a7. We employed a fuzzy clustering approach to map macrostates from the simulation data. The algorithm removes the requirement to define arbitrary macrostate boundaries and provides an optimal state partitionning given the simulation data. For the macrostates corresponding to experimental observables we derived populations that we used to assess FF accuracy. From the methodological point of view, we showed that using the free energy values at the FES minima are not appropriate to rank conformational macrostates. Instead, the free energy of the entire macrostate should be evaluated, as we have seen with the ordering of the A, B, and C states in the gas phase. In addition, we showed that defining arbitrary hard state boundaries to calculate populations can lead to ranking inversions in the case of close macrostates separated by a low free energy barrier. We note that the present approach relies on macrostate assignments provided in the experimental studies2,3,23 and can thus be seen as somewhat indirect. In future research, it would be interesting to bypass the macrostate assignment step altogether and compare the simulated ensemble of conformations directly to the experimental spectra. This would, however, require a model that can reliably predict Raman or IR−UV spectra from a given molecular conformation. Alternatively, a more direct comparison could be made on the basis of the J-couplings, which are simple functions of the torsion angles. For dipeptides in solution, FFs show a great degree of variability in the predicted conformational populations, in particular in NA-T-MA where the bulky tryptophan side chain has a notable influence on backbone conformation. On the other hand, in the gas phase all FFs consistently reproduce the correct macrostate ranking and obtain a larger prediction score, which is surprising given the fact that these FFs were originally parametrized in solution. Overall, taking into account solution and gas phase results, we find that within the CHARMM and AMBER families the recent CHARMM36 and FUJI reparametrizations bring significant improvements over previous versions. The FUJI and OPLS-AA/L FFs show the best performance on the systems studied here. These two cases are interesting because they show that relying on very high-level QM calculations for parametrizations is advantageous. They also show that there is still room for improvement within the standard FF potential energy function without necessarily introducing additional complexity through terms such as the CMAP correction. APPENDIX A. COMPUTATIONAL DETAILS Model dipeptides, NA-A-MA and NA-T-MA, were built and geometrically optimized with SPARTAN’08.92 Simulations with 6548

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

local polynomial regression filter. We first attempted to use the classical Nadaraya−Watson kernel,100,101 but it produced significant shrinkage of the smoothed surface due to nonuniform data mesh. That is why an algorithm that works like a low bypass filter and does not produce shrinkage has been developed and implemented. To estimate statistical uncertainties in our FES and macrostate populations, we used the nonparametric bootstrap bias corrected and accelerated (BCA) method,102,103 with 2000 bootstrap samples per trajectory. Tables S17−S19 in the Supporting Information show the standard error (SE) and BCA 95% confidence intervals (CIs) for all macrostate free energies and populations reported here. For these populations, the CIs are in most cases within ±2 percentage points of the reported values in most cases.

The second modification in the algorithm is to add a penalty function that maintains crisp centers close to the position of the Mexp experimentally observed macrostates (archetypes). We define the square distance between crisp center vk and archetype ak as Dk 2 = ||vk − ak||2

The modified objective function takes the form N

J ̃(X ,U ,V ,A) =

Skl = (B.1)

(B.2)

XB =

(B.3)

Here X = (s1, ..., sM) is a set of samples, and V is the list of crisp centers (cluster prototypes). In the modified GFK algorithm, the crisp centers are replaced by crisp volumes as suggested in ref 87, such that the samples that are very close to a cluster center automatically get membership 1. The volume of a cluster is defined as the determinant of the covariance matrix given by eq B.1. The effective cluster radius corresponding to this (nonspherical) volume is

Rk =

|Pk |1/ n

CVk =

Δkl = (B.5)

N

M

∑ ∑ uik̃ m dik̃ i=1 k=1

1 N

N

M

∑i = 1 ∑k = 1 uikmdik 2 mink , l(dkl 2)

(B.12)

∑i uik̃ dik 2

k = 1, ..., M

∑i uik̃

(B.13)

dkl 2 CVk × CVl

(B.14)

Then the modified Xie−Beni quality criterion becomes

and the objective function for the modified FGK becomes J ̃(X ,U ,V ) =

(B.10)

Then the dissimilarity between two clusters k and l is

We define the modified Mahalanobis distance 2

N

For clusters with crisp volumes of various sizes generated by the modified FGK algorithm of eq B.1, the denominator should depend not only on cluster centers but also on variability of a whole cluster CV. To incorporate variability, we use the average distance from the center as a metric for cluster variability:105,106

(B.4)

dik̃ = max(0, dik 2 − R k 2)

N

min(∑i = 1 uik̃ , ∑i = 1 uil̃ )

In our approach, we optimize the number of crisp centers by an iterative procedure that starts with different initial numbers of clusters. We then assess the final result of the modified FGK algorithm above using a modified version of the Xie−Beni index.83 The original Xie−Beni index is defined as

M

i=1 k=1

∑i = 1 min(uik̃ , uil̃ )

Given a similarity threshold parameter t, if Skl < t, we merge the clusters by summing the membership coefficients uik̃ ′ = uik̃ + uil̃ (B.11)

If instead of Pk one choses the identity matrix, the distance becomes the Euclidean distance. In the original FGK algorithm, the objective function is

∑ ∑ uikmdik 2

(B.9)

N

where the uik are elements of the membership matrix U and m is the “fuzzification” factor. The associated (squared) Mahalanobis distance in the n-dimensional feature space is

N

(B.8)

k=1

During the course of the modified FGK clustering algorithm above, we can merge clusters that overlap. As an overlap criterion, we calculate the similarity of Skl values between any pair of clusters by pairwise comparison of their membership values,

N

J(X ,U ,V ) =

k

B.2. Optimal Number of Clusters

∑i = 1 uikm(si − vk)(si − vk)T

dik 2 ≡ d 2(si,vk) = |Pk |1/ n (si − vk)T Pk −1(si − vk)

Mexp

+ κ ∑ Dk 2

⎡ M ⎛ 2 ⎞1/(m − 1)⎤−1 ⎢ ⎥ d̃ uik̃ = ⎢∑ ⎜ ik2 ⎟ ⎥ ⎜ d̃ ⎟ ⎢⎣ j = 1 ⎝ ij ⎠ ⎥⎦

Our clustering algorithm is based on the original fuzzy Gustafson−Kessel (FGK) algorithm.82,104 Our modifications seek to address the problems of (1) properly distinguishing between clusters with similar sizes but different densities, (2) robust discrimination of a small cluster corresponding to a lowpopulated experimental macrostate when it lies in close proximity to a larger one, and (3) maximum density in a vicinity of a cluster center. The original FGK algorithm is based on a fuzzy covariance matrix of each cluster Ck centered around crisp center vk, N

2

Here κ is the parameter that determines the importance of the archetypes in the penalty function, with respect to the datadependent forces that drive the optimization of the cluster centers vk. Note that the clusters linked to archetypes can never disappear in the cluster number optimization procedure described below. At each step the elements of the weighted membership matrix Ũ are calculated by the formula

B.1. Modified Fuzzy Gustafson−Kessel Clustering Algorithm

∑k = 1 uikm

M

∑ ∑ uik̃ m dik̃ i

APPENDIX B. CLUSTERING ALGORITHM

Pk =

(B.7)

2

͠ = XB (B.6) 6549

1 N

N

M

∑i = 1 ∑k = 1 uik̃ mdik 2 M

∑k = 1 Δkl

(B.15)

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

The simplest approach to find the optimal number of clusters is just to test systematically the quality of clusterings with increasing/decreasing number of cluster centers. Instead, we use the following search algorithm. We start with the Mexp archetypes at their initial position, and we add additional randomly generated cluster centers to reach a population of M clusters, where M > 4. Then we generate four extra cluster populations M1, ..., M4 with the formula. M1 = 2M

M2 = M + 1

M3 = M − 1

Amide and N-acetyl Tryptophan Methyl Amide. Chem. Phys. 2002, 117, 10688−10702. (3) Dian, B. C.; Florio, G. M.; Clarkson, J. R.; Longarte, A.; Zwier, T. S. Infrared-Induced Conformational Isomerization and Vibrational Relaxation Dynamics in Melatonin and 5-methoxy-N-acetyl Tryptophan Methyl Amide. J. Chem. Phys. 2004, 120, 9033−9046. (4) Dian, B. C.; Longarte, A.; Winter, P. R.; Zwier, T. The Dynamics of Conformational Isomerization in Flexible Biomolecules. I. Holefilling Spectroscopy of N-acetyl Tryptophan Methyl Amide and Nacetyl Tryptophan Amide. J. Chem. Phys. 2004, 120, 133−147. (5) Evans, D. A.; Wales, D. J.; Dian, B. C.; Zwier, T. S. The dynamics of Conformational Isomerization in Flexible Biomolecules. II. Simulating Isomerizations in a Supersonic Free Jet With Master Equation Dynamics. J. Chem. Phys. 2003, 120, 148−157. (6) Dian, B. C.; Clarkson, J. R.; Zwier, T. S. Direct Measurement of Energy Thresholds to Conformational Isomerization in Tryptamine. Science 2004, 303, 1169−1173. (7) Florio, G. M.; Zwier, T. S. Solvation of a Flexible Biomolecule in the Gas Phase: The Ultraviolet and Infrared Spectroscopy of Melatonin-Water Clusters. J. Phys. Chem. A 2003, 107, 974−983. (8) Woutersen, S.; Hamm, P. Structure Determination of Trialanine in Water Using Polarization Sensitive Two-Dimensional Vibrational Spectroscopy. J. Phys. Chem. B 2000, 104, 11316−11320. (9) Kim, Y. S.; Wang, J.; Hochstrasser, R. M. Two-Dimensional Infrared Spectroscopy of the Alanine Dipeptide in Aqueous Solution. J. Phys. Chem. B 2005, 109, 7511−7521. (10) Carney, J. R.; Zwier, T. S. Conformational Flexibility in Small Biomolecules: Tryptamine and 3-indole-propionic Acid. Chem. Phys. Lett. 2001, 341, 77−85. (11) Vymětal, J.; Vondrásě k, J. Metadynamics As a Tool for Mapping the Conformational and Free-Energy Space of Peptides − The Alanine Dipeptide Case Study. J. Phys. Chem. B 2010, 114, 5632−5642. (12) Vymětal, J.; Vondrásě k, J. Critical Assessment of Current Force Fields. Short Peptide Test Case. J. Chem. Theory Comput. 2012, 9, 441−451. (13) Liu, Z.; Ensing, B.; Moore, P. B. Quantitative Assessment of Force Fields on Both Low-Energy Conformational Basins and Transition-State Regions of the (ϕ-ψ) Space. J. Chem. Theory Comput. 2011, 7, 402−419. (14) Ramachandran, G.; Ramakrishnan, C.; Sasisekharan, V. Stereochemistry of Polypeptide Chain Configurations. J. Mol. Biol. 1963, 7, 95−99. (15) Kleywegt, G.; Jones, A. Φ/Ψ-hology: Ramachandran Revised. Structure 1996, 4, 1395−1400. (16) Chakrabarti, P.; Pal, D. The Interrelationships of Side-Chain and Main-Chain Conformations in Proteins. Prog. Biophys. Mol. Biol. 2001, 76, 1−102. (17) Hovmöller, S.; Zhou, T.; Ohlson, T. Conformations of amino acids in proteins. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2002, 58, 768−776. (18) Matthes, D.; De Groot, B. L. Secondary Structure Propensities in Peptide Folding Simulations: a Systematic Comparison of Molecular Mechanics Interaction Schemes. Biophys. J. 2009, 97, 599−608. (19) Yoda, T.; Sugita, Y.; Okamoto, Y. Secondary-structure Preferences of Force Fields for Proteins Evaluated by GeneralizedEnsemble Simulations. Chem. Phys. 2004, 307, 269−283. (20) Best, R. B.; Buchete, N.-V.; Hummer, G. Are Current Molecular Dynamics Force Fields Too Helical? Biophys. J. 2008, 95, L07−L09. (21) Shemesh, D.; Sobolewski, A.; Domcke, W. Role of exicited-state hydrogen detachment and hydrogen-transfer processes for the excitedstate deactivation of an aromatic dipeptide: N-Acetyl tryptophan methyl amide. Phys. Chem. Chem. Phys. 2010, 12, 4899−4905. (22) Dian, B.; Longarte, A.; Zwier, T. Conformational Dynamics in a Dipeptide After Single-Mode Vibrational Excitation. Science 2002, 296, 2369−2373. (23) Grdadolnik, J.; Vlasta, M.-G.; Baldwin, R.; Avbelj, F. Population of Three Major Backbone Conformations in 19 Amino Acid Dipeptides. Proc. Natl. Acad. Sci. 2011, 108, 1794−1798.

M4 = M /2 (B.16)

We optimize each population using the modified GFK algorithm described in eq B.1, pick the cluster population with ͠ , and substitute M with largest value of the modified index XB this number. The procedure repeats until the number of crisp centers does not change.



ASSOCIATED CONTENT

S Supporting Information *

Tables are given that include quantitative information about the position and height of PES minima for each of the systems considered in this study: numerical values for the macrostate populations obtained from the clustering approach and eq 8 are represented in Figures 5 and 6; the statistical uncertainty is related to both PES minima and macrostate populations for each system. Figures showing PESs and clusters obtained for all systems and force fields are represented in the middle and lower panels of Figure 2. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*M. E. Tuckerman: e-mail, [email protected]. Author Contributions ∥

These authors contributed equally to this work.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We are grateful to Hideaki Fujitani for providing the Gromacs input files for the FUJI force field. We also thank Robert Best for sending us the Gromacs input files for the CHARMM36 force field. Some of the computations were performed at the Vital-IT (http://www.vital-it.ch) Center for high-performance computing of the SIB Swiss Institute of Bioinformatics. M.A.C. acknowledges support from the Swiss National Sience Foundation grant PA00P2_129092 for part of this work. M.E.T. acknowledges support from the US National Science Foundation grants CHE-1012545 and CHE-1301314.

■ ■

ADDITIONAL NOTE Note that Figure 7 (middle) of ref 64 is in fact the FES for Charmm22, not for CHARMM27 as reported. a

REFERENCES

(1) Mehta, M. A.; Fry, E. A.; Eddy, M. T.; Dedeo, M. T.; Anagnost, A. E.; Long, J. R. Structure of the Alanine Dipeptide in Condensed Phases Determined by 13C NMR. J. Phys. Chem. B 2004, 108, 2777− 2780. (2) Dian, B. C.; Asier, L.; Mercier, S.; Evans, D. A.; Wales, D. J.; Zwier, T. The Infrared and Ultraviolet Spectra of Single Conformations of Methyl-Capped Dipeptides: N-acetyl Tryptophan 6550

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

(24) Grdadolnik, J.; Grdadolnik, S. G.; Avbelj, F. Determination of Conformational Preferences of Dipeptides Using Vibrational Spectroscopy. J. Phys. Chem. B 2008, 112, 2712−2718 PMID: 18260662.. (25) Feig, M. Is Alanine Dipeptide a Good Model for Representing the Torsional Preferences of Protein Backbones? J. Chem. Theory Comput. 2008, 4, 1555−1564. (26) Jiang, F.; Han, W.; Wu, Y.-D. The Intrinsic Conformational Features of Amino Acids From a Protein Coil Library and Their Applications in Force Field Development. Phys. Chem. Chem. Phys. 2013, 15, 3413. (27) MacKerell, A. D.; Feig, M.; Brooks, C. L. Improved Treatment of the Protein Backbone in Empirical Force Fields. J. Am. Chem. Soc. 2004, 126, 698−699. (28) Klauda, J. B.; Brooks, B. R.; MacKerell, A. D.; Venable, R. M.; Pastor, R. W. An ab Initio Study on the Torsional Surface of Alkanes and its Effect on Molecular Simulations of Alkanes and a DPPC Bilayer. J. Phys. Chem. B 2005, 109, 5300−5311. (29) Feig, M.; MacKerell, A. D.; Brooks, C. L. Force Field Influence on the Observation of π-helical Protein Structures in Molecular Dynamics Simulations. J. Phys. Chem. B 2003, 107, 2831−2836. (30) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc. 1995, 117, 5179−5197. (31) Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 1668−1688. (32) Damm, W.; Halgren, T.; Murphy, R.; Smondyrev, A.; Friesner, R.; Jorgensen, W. 224th ACS National Meeting, 2002. (33) Kony, D.; Damm, W.; Stoll, S.; Van Gunsteren, W. F. An Improved OPLS−AA Force Field for Carbohydrates. J. Comput. Chem. 2002, 23, 1416−1429. (34) Jorgensen, W. L.; Schyman, P. Treatment of Halogen Bonding in the OPLS-AA Force Field: Application to Potent Anti-HIV Agents. J. Chem. Theor. Comput. 2012, 8, 3895−3901. (35) Dahlgren, M. K.; Schyman, P.; Tirado-Rives, J.; Jorgensen, W. L. Characterization of Biaryl Torsional Energetics and its Treatment in OPLS All-Atom Force Fields. J. Chem. Inf. Model. 2013, 53, 1191− 1199. (36) van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hünenberger, P. H.; Krüger, P.; Mark, A. E.; Scott, W. R. P.; Tironi, I. G. Biomolecular Simulation: The GROMOS96 Manual and User Guide; Vdf Hochschulverlag AG an der ETH Zürich: Zürich, Switzerland, 1996. (37) Oostenbrink, C.; Villa, A.; Mark, A. E.; Van Gunsteren, W. F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS Force-Field Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656−1676. (38) Schmid, N.; Eichenberger, A. P.; Choutko, A.; Riniker, S.; Winger, M.; Mark, A. E.; van Gunsteren, W. F. Definition and Testing of the GROMOS Force-Field Versions 54A7 and 54B7. Eur. Biophys. J. 2011, 40, 843−856. (39) Ponder, J. W.; Case, D. A. In Protein Simulations; Daggett, V., Ed.; Advances in Protein Chemistry; Academic Press: New York, 2003; Vol. 66, pp 27−85. (40) MacKerell, A. D. Empirical Force Fields for Biological Macromolecules: Overview and Issues. J. Comput. Chem. 2004, 25, 1584. (41) Duan, Y.; et al. A point-charge Force Field for Molecular Mechanics Simulations of Proteins Based on Condensed-Phase Quantum Mechanical Calculations. J. Comput. Chem. 2003, 24, 1999−2012. (42) Ono, S.; Nakajima, N.; Higo, J.; Nakamura, H. Peptide FreeEnergy Profile is Strongly Dependent on the Force Field: Comparison of C96 and AMBER95. J. Comput. Chem. 2000, 21, 748−762. (43) Kolman, P.; Dixion, R.; Cornell, R.; Fox, T.; Chipot, C.; Pohorille, A. In Computer Simulation of Biological Systems; van Gunsteren, W., Ed.; Kluwer/Escom: Amsterdam, 1997.

(44) Kollman, P. A. Advances in Continuing Challenges in Achieving Realistic and Predictive Simulations of the Peptides in Organic and Biological Molecules. Acc. Chem. Res. 1996, 29, 461−469. (45) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins: Struct., Funct., Bioinf. 2006, 65, 712−725. (46) MacKerell, A.; et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 1998, 102, 3586−3616. (47) Feller, S. E.; MacKerell, A. D. An Improved Empirical Potential Energy Function for Molecular Simulations of Phospholipids. J. Phys. Chem. B 2000, 104, 7510−7515. (48) Zhang, Y.; Sagui, C. The gp41659−671 HIV-1 Antibody Epitope: A Structurally Challenging Small Peptide. J. Phys. Chem. B 2014, 118, 69−80. (49) Penev, E.; Ireta, J.; Shea, J.-E. Energetics of Infinite Homopolypeptide Chains: A New Look at Commonly Used Force Fields. J. Phys. Chem. B 2008, 112, 6872−6877. (50) Sorin, E. J.; Pande, V. S. Empirical Force-Field Assessment: The Interplay Between Backbone Torsions and Noncovalent Term Scaling. J. Comput. Chem. 2005, 26, 682−690. (51) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926−935. (52) MacKerell, A. D.; Banavali, N.; Foloppe, N. Development and Current Status of the CHARMM Force Field for Nucleic Acids. Biopolymers 2000, 56, 257−265. (53) MacKerell, A. D.; Feig, M.; Brooks, C. L. Extending the Treatment of Backbone Energetics in Protein Force Fields: Limitations of Gas-Phase Quantum Mechanics in Reproducing Protein Conformational Distributions in Molecular Dynamics Simulations. J. Comput. Chem. 2004, 25, 1400−1415. (54) Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E.; Mittal, J.; Feig, M.; MacKerell, A. D., Jr Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone ϕ, ψ and Side-Chain χ1 and χ2 Dihedral Angles. J. Chem. Theory Comput. 2012, 8, 3257−3273. (55) Macias, A. T.; MacKerell, A. D. CH/π Interactions Involving Aromatic Amino Acids: Refinement of the CHARMM Tryptophan Force Field. J. Comput. Chem. 2005, 26, 1452−1463. (56) Higo, J.; Ito, N.; Kuroda, M.; Ono, S.; Nakajima, N.; Nakamura, H. Energy Landscape of a Peptide Consisting of α-helix, 310-helix, βturn, β-hairpin, and Other Disordered Conformations. Protein Sci. 2001, 10, 1160−1171. (57) García, A. E.; Sanbonmatsu, K. Y. α-Helical Stabilization by Side Chain Shielding of Backbone Hydrogen Bonds. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 2782−2787. (58) Okur, A.; Strockbine, B.; Hornak, V.; Simmerling, C. Using PC clusters to Evaluate the Transferability of Molecular Mechanics Force Fields for Proteins. J. Comput. Chem. 2003, 24, 21−31. (59) Sorin, E. J.; Pande, V. S. Exploring the Helix-Coil Transition via All-Atom Equilibrium Ensemble Simulations. Biophys. J. 2005, 88, 2472−2493. (60) Fujitani, H.; Matsuura, A.; Sakai, S.; Sato, H.; Tanida, Y. Highlevel ab Initio Calculations to Improve Protein Backbone Dihedral Parameters. J. Chem. Theory Comput. 2009, 5, 1155−1165. (61) Case, D. A.; et al. AMBER 12; University of California: San Fransisco, 2012. (62) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. J. Am. Chem. Soc. 1996, 118, 11225−11236. (63) Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison With Accurate Quantum Chemical Calculations on Peptides. J. Phys. Chem. B 2001, 105, 6474−6487. 6551

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

The Journal of Physical Chemistry B

Article

(64) Ensing, B.; De Vivo, M.; Liu, Z.; Moore, P.; Klein, M. L. Metadynamics as a Tool for Exploring Free Energy Landscapes of Chemical Reactions. Acc. Chem. Res. 2006, 39, 73−81. (65) Abrams, J. B.; Tuckerman, M. E. Efficient and Direct Generation of Multidimensional Free Energy Surfaces via Adiabatic Dynamics without Coordinate Transformations. J. Phys. Chem. B 2008, 112, 15742−15757. (66) Faller, C. E.; Reilly, K. A.; Hills, R. D., Jr.; Guvench, O. Peptide Backbone Sampling Convergence with the Adaptive Biasing Force Algorithm. J. Phys. Chem. B 2013, 117, 518−526. (67) Hu, H.; Elstner, M.; Hermans, J. Comparison of a QM/MM Force Field and Molecular Mechanics Force Fields in Simulations of Alanine and Glycine “Dipeptides” (Ace-Ala-Nme and Ace-Gly-Nme) in Water in Relation to the Problem of Modeling the Unfolded Peptide Backbone in Solution. Proteins: Struct., Funct., Bioinf. 2003, 50, 451−463. (68) Laio, A.; Parrinello, M. Escaping free energy minima. Proc. Natl. Acad. Sci. 2002, 99, 12562. (69) Jiang, F.; Han, W.; Wu, Y.-D. Influence of Side Chain Conformations on Local Conformational Features of Amino Acids and Implication for Force Field Development. J. Phys. Chem. B 2010, 114, 5840−5850. (70) Best, R. B.; Hummer, G. Optimized Molecular Dynamics Force Fields Applied to the Helix-Coil Transition of Polypeptides. J. Phys. Chem. B 2009, 113, 9004−9015. (71) Moradi, M.; Babin, V.; Sagui, C.; Roland, C. PPII Propensity of Multiple-Guest Amino Acids in a Proline-Rich Environment. J. Phys. Chem. B 2011, 115, 8645−8656. (72) Cruz, V. L.; Ramos, J.; Martinez-Salazar, J. Assessment of the Intrinsic Conformational Preferences of Dipeptide Amino Acids in Aqueous Solution by Combined Umbrella Sampling/MBAR Statistics. A Comparison with Experimental Results. J. Phys. Chem. B 2012, 116, 469−475. (73) Chipot, C., Pohorille, A., Eds. Free Energy Calculations, Theory and Applications in Chemistry and Biology; Springer Series in Chemical Physics; Springer: Berlin, 2007; Vol. 86. (74) Christ, C.; Mark, A.; Van Gunsteren, W. Basic Ingredients of Free Energy Calculations: A Review. J. Comput. Chem. 2010, 31, 1569−1582. (75) Maragliano, L.; Vanden-Eijnden, E. A Temperature Accelerated Method for Sampling Free Energy and Determining Reaction Pathways in Rare Events Simulations. Chem. Phys. Lett. 2006, 426, 168−175. (76) Rosso, L.; Tuckerman, M. E. An Adiabatic Molecular Dynamics Method for the Calculation of Free Energy Profiles. Mol. Simul. 2002, 28, 91−112. (77) Rosso, L.; Mináry, P.; Zhu, Z.; Tuckerman, M. E. On the Use of the Adiabatic Molecular Dynamics Technique in the Calculation of Free Energy Profiles. J. Chem. Phys. 2002, 116, 4389−4402. (78) Rosso, L.; Abrams, J. B.; Tuckerman, M. E. Mapping the Backbone Dihedral Free-Energy Surfaces in Small Peptides in Solution Using Adiabatic Free-Energy Dynamics. J. Phys. Chem. B 2005, 109, 4162−4167. (79) Chen, M.; Cuendet, M. A.; Tuckerman, M. E. Heating and flooding: A unified approach for rapid generation of free energy surfaces. J. Chem. Phys. 2012, 137, 024102. (80) Cuendet, M. A.; Tuckerman, M. E. Alchemical Free Energy Differences in Flexible Molecules from Thermodynamic Integration or Free Energy Perturbation Combined with Driven Adiabatic Dynamics. J. Chem. Theory Comput. 2012, 8, 3504−3512. (81) Corsini, R.; Lazzerini, B.; Marcelloni, F. A new fuzzy Relational Clustering Algorithm Based on the Fuzzy C-means Algorithm. Int. J. Soft Comput. 2005, 439−447. (82) Gustafson, D. E.; Kessel, W. C. Fuzzy Clustering with a Fuzzy Covariance Matrix. Proceedings of the 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes; IEEE Control Systems Society: Piscataway, NJ, 1978.

(83) Xie, X. L.; Beni, G. A Validity Measure for Fuzzy Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991, 13, 841−847. (84) Arlot, S. V-fold Cross-Validation Improved: V-fold Penalization. arXiv preprint arXiv:0802.0566, 2008. (85) Tibshirani, R.; Walther, G.; Hastie, T. Estimating the Number of Clusters in a Data Set Via the Gap Statistic. Journal of The Royal Statistical Society Series B-Statistical Methodology 2001, 63, 411−423. (86) Dudoit, S.; Fridlyand, J. A Prediction-Based Resampling Method for Estimating the Number of Clusters in a Dataset. Genome Biol. 2002, 3, 1−21. (87) Kaymak, U.; Setnes, M. Fuzzy Clustering with Volume Prototypes and Adaptive Cluster Merging. IEEE Transactions on Fuzzy Systems 2002, 10, 705−712. (88) Pronk, S.; et al. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 2013, 29, 845−854. (89) Bonomi, M.; et al. PLUMED: A Portable Plugin for Free-energy Calculations with Molecular Dynamics. Comput. Phys. Commun. 2009, 180, 1961−1972. (90) Tuckerman, M. E.; Yarne, D.; Samuelson, S. O.; Hughes, A. L.; Martyna, G. J. Exploiting Multiple Levels of Parallelism in Molecular Dynamics Based Calculations via Modern Techniques and Software Paradigms on Distributed Memory Computers. Comput. Phys. Commun. 2000, 128, 333−376. (91) Best, R. B. Atomistic Molecular Simulations of Protein Folding. Curr. Opin. Struct. Biol. 2012, 22, 52−61. (92) SPARTAN’08. Wavefunction, Inc.: Irvine, CA. (93) Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327−341. (94) Andersen, H. C. RATTLE: A “Velocity” Version of the SHAKE Algorithm for Molecular Dynamics Calculations. J. Comput. Phys. 1983, 52, 24−34. (95) Ewald, P. P. Die Berechnung Optischer und Elektrostatischer Gitterpotentiale. Ann. Phys. (Berlin, Ger.) 1921, 369, 253−287. (96) Liu, Y.; Tuckerman, M. E. Generalized Gaussian Moment Thermostatting: A New Continuous Dynamical Approach to the Canonical Ensemble. J. Chem. Phys. 2000, 112, 1685−1700. (97) Tuckerman, M.; Berne, B. J.; Martyna, G. J. Reversible Multiple Time Scale Molecular Dynamics. J. Chem. Phys. 1992, 97, 1990−2001. (98) Martyna, G. J.; Tuckerman, M. E.; Tobias, D. J.; Klein, M. L. Explicit Reversible Integrators for Extended Systems Dynamics. Mol. Phys. 1996, 87, 1117−1157. (99) Martyna, G. J.; Klein, M. L.; Tuckerman, M. Nosé−Hoover chains: The canonical Ensemble via Continuous Dynamics. J. Chem. Phys. 1992, 97, 2635−2643. (100) Orfanidis, S. J. Introduction to signal processing; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1995. (101) Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J. The Elements of Statistical Learning: Data mining, Inference and Prediction. Mathematical Intelligencer 2005, 27, 83−85. (102) Efron, B. Nonparamtric Estimates of Standard Error; The Jackknife, the Bootstrap and Other Methods. Biometrika 1981, 63, 589−599. (103) DiCiccio, T. J.; Romano, J. P. A Review of Bootstrap Confidence Intervals. Journal of The Royal Statistical Society Series B 1988, 50, 338−354. (104) Krishnapuram, R.; Kim, J. A Note on the Gustafson-Kessel and Adaptive Fuzzy Clustering Algorithms. IEEE Transactions on Fuzzy Systems 1999, 7, 453−461. (105) Beringer, J.; Hullermeier, E. Online Clustering of Parallel Data Streams. Data Knowledge Engineering 2006, 58, 180−204. (106) Nasraoui, O.; Rojas, C.; Cardona, C. Data Mining Meets Evolutionary Computation: A New Framework for Dynamic and Scalable Evolutionary Data Mining based on Non-Stationary Function Optimization. J. Comput. Sci. 2005, 56, 63.

6552

dx.doi.org/10.1021/jp500193w | J. Phys. Chem. B 2014, 118, 6539−6552

How accurately do current force fields predict experimental peptide conformations? An adiabatic free energy dynamics study.

The quality of classical biomolecular simulations is inevitably limited by two problems: the accuracy of the force field used and the comprehensivenes...
2MB Sizes 1 Downloads 4 Views