Combining path-breaking with bidirectional nonequilibrium simulations to improve efficiency in free energy calculations Edoardo Giovannelli, Cristina Gellini, Giangaetano Pietraperzia, Gianni Cardini, and Riccardo Chelli Citation: The Journal of Chemical Physics 140, 064104 (2014); doi: 10.1063/1.4863999 View online: http://dx.doi.org/10.1063/1.4863999 View Table of Contents: http://scitation.aip.org/content/aip/journal/jcp/140/6?ver=pdfcov Published by the AIP Publishing Articles you may be interested in Path-breaking schemes for nonequilibrium free energy calculations J. Chem. Phys. 138, 214109 (2013); 10.1063/1.4808037 Density-dependent analysis of nonequilibrium paths improves free energy estimates II. A Feynman–Kac formalism J. Chem. Phys. 134, 034117 (2011); 10.1063/1.3541152 Density-dependent analysis of nonequilibrium paths improves free energy estimates J. Chem. Phys. 130, 204102 (2009); 10.1063/1.3139189 Rosenbluth-sampled nonequilibrium work method for calculation of free energies in molecular simulation J. Chem. Phys. 122, 204104 (2005); 10.1063/1.1906209 Efficient FreeEnergy Calculations by the Simulation of Nonequilibrium Processes Comput. Sci. Eng. 2, 88 (2000); 10.1109/5992.841802

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

THE JOURNAL OF CHEMICAL PHYSICS 140, 064104 (2014)

Combining path-breaking with bidirectional nonequilibrium simulations to improve efficiency in free energy calculations Edoardo Giovannelli, Cristina Gellini, Giangaetano Pietraperzia, Gianni Cardini, and Riccardo Chellia) Dipartimento di Chimica, Università di Firenze, Via della Lastruccia 3, I-50019 Sesto Fiorentino, Italy and European Laboratory for Non-linear Spectroscopy (LENS), Via Nello Carrara 1, I-50019 Sesto Fiorentino, Italy

(Received 22 November 2013; accepted 17 January 2014; published online 11 February 2014) An important limitation of unidirectional nonequilibrium simulations is the amount of realizations of the process necessary to reach suitable convergence of free energy estimates via Jarzynski’s relationship [C. Jarzynski, Phys. Rev. Lett. 78, 2690 (1997)]. To this regard, an improvement of the method has been achieved by means of path-breaking schemes [R. Chelli et al., J. Chem. Phys. 138, 214109 (2013)] based on stopping highly dissipative trajectories before their normal end, under the founded assumption that such trajectories contribute marginally to the work exponential averages. Here, we combine the path-breaking scheme, called probability threshold scheme, to bidirectional nonequilibrium methods for free energy calculations [G. E. Crooks, Phys. Rev. E 61, 2361 (2000); R. Chelli and P. Procacci, Phys. Chem. Chem. Phys. 11, 1152 (2009)]. The method is illustrated and tested on a benchmark system, i.e., the helix-coil transition of deca-alanine. By using path-breaking in our test system, the computer time needed to carry out a series of nonequilibrium trajectories can be reduced up to a factor 4, with marginal loss of accuracy in free energy estimates. © 2014 AIP Publishing LLC. [http://dx.doi.org/10.1063/1.4863999] I. INTRODUCTION

In the framework of computer simulation methods to compute free energy differences, an interesting scenario has been disclosed by two nonequilibrium work relations: the Jarzynski equality1 and the Crooks fluctuation theorem.2, 3 These theorems relate the free energy difference between two thermodynamic states to the external work performed in an ensemble of realizations switching the system between such states. The switching procedure is accomplished by a control parameter correlated to some collective coordinate of the system (e.g., an interatomic distance, a torsional angle, and the morphing coordinate in alchemical transformations). Computer simulations based on these techniques are typically known as steered simulations.4–7 In such calculations, it is of basic importance to improve the efficiency of path sampling, which globally increases by lowering the work dissipated during the realizations. To this aim, several approaches based on rational path sampling have been developed.8–22 The basic difference between methods which exploit Jarzynski equality and Crooks fluctuation theorem is that, while the former involve nonequilibrium simulations in only one direction of the process, the latter make use of both directions of the process. For such a reason, these techniques are often referred to as unidirectional and bidirectional nonequilibrium methods. In the last decade, several strategies based on bidirectional methods have been proposed to compute free energy differences3, 23–26 and the potential of mean force (PMF) as a function of collective variables of the system.27–31 a) Electronic mail: [email protected]

0021-9606/2014/140(6)/064104/8/$30.00

Recently, we have introduced a method, called pathbreaking, to speed up free energy calculations via nonequilibrium unidirectional Monte Carlo and molecular dynamics simulations.32 Path-breaking is based on checking the dissipated work periodically during the nonequilibrium realizations of the process and to stop the highly dissipative trajectories before their normal end. The contribution of these broken paths to the work exponential average is neglected under the founded assumption that such an average is dominated by paths with low dissipated work.33 As the method relies on estimates of the dissipated work at various checkpoints along the path, we need to determine somehow the free energy at the same points during the realizations of the process, a task which is accomplished with a self-consistent procedure. We remark that, once the simulation setup to realize pulling trajectories is given, which basically consists in setting the total number of trajectories and the pulling velocity, path-breaking is not aimed at improving path sampling with respect to the conventional approach, but rather at eliminating “as soon as possible” the most dissipative trajectories, which would contribute marginally to the Jarzynski equality. Two path-breaking schemes, called fixed threshold scheme and probability threshold scheme, were designed,32 whose difference lies in the criterion to establish whether a nonequilibrium simulated trajectory must or must not be stopped at a given check-point during the pulling path. In the fixed threshold scheme, an upper threshold limit for the dissipated work at the check-points is established and trajectories exceeding this limit are stopped. In the probability threshold scheme, an acceptance probability of overcoming the check-point is instead evaluated on the basis of the dissipated work. We have shown

140, 064104-1

© 2014 AIP Publishing LLC

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

064104-2

Giovannelli et al.

that these schemes are comparable in accuracy, if we set simulation conditions that provide similar computational efficiencies. However, as probability threshold scheme does not require introduction of any arbitrary threshold for the dissipated work, it would be preferred in those cases (practically all) in which the amount of dissipation is unknown. The method was successfully applied to the calculation of the PMF of decaalanine with respect to the end-to-end collective coordinate and the PMF of two methane molecules in water solution with respect to their mutual distance.32 In this article we extend the path-breaking machinery to bidirectional techniques. In particular, we focus on the Bennett method,34 generalized to nonequilibrium simulations by Crooks,3 and on a PMF estimator based on work exponential averages.30 Moreover, owing to the considerable advantages of the probability threshold scheme with respect to the fixed threshold scheme, the present treatment is limited to the former scheme. Following the guidelines of this study, formulation and implementation of a bidirectional version of the fixed threshold scheme appears straightforward. For consistency with Ref. 32, tests have been performed on the calculation of the free energy difference between folded and unfolded forms of deca-alanine (using the Bennett method3 ), as well as on the estimate of the PMF with respect to the end-to-end distance of the peptide (using the bidirectional PMF estimator of Ref. 30). The outline of the article is as follows. In Sec. II A, we describe the path-breaking algorithm for unidirectional simulations developed in Ref. 32. In Sec. II B, we introduce a theoretical justification of path-breaking in the context of bidirectional nonequilibrium simulations. In Sec. III, we present the calculation of the free energy difference between elongated and helix forms of deca-alanine, along with the PMF related to the end-to-end distance. Concluding remarks are given in Sec. IV.

II. THEORETICAL BACKGROUND A. Path-breaking in unidirectional nonequilibrium simulations

Path-breaking schemes for unidirectional nonequilibrium simulations have been developed and tested elsewhere,32 but, to make the article self-consistent, we summarize here the basic points of the algorithm. However, we refer to Ref. 32 for a deeper illustration of theoretical and other practical aspects of the method. Unidirectional nonequilibrium simulations, also known as steered molecular dynamics or steered Monte Carlo simulations,4–7 can be combined with the Jarzynski equality1 to estimate the PMF along a collective coordinate. Nonequilibrium trajectories are realized by associating a collective coordinate ζ (q)35 with a control parameter λ through an external potential with typical harmonic form: V (q; λ) = k[ζ (q) − λ]2 . During pulling realizations, the control parameter λ varies from an initial value λ1 to a final value λ2 and the work performed on the system is stored for subsequent analysis. Two path-breaking schemes, called fixed threshold scheme and probability threshold scheme, were designed,32

J. Chem. Phys. 140, 064104 (2014)

whose basic difference lies in the criterion to establish whether a nonequilibrium simulated trajectory must or must not be stopped at a given check-point during the pulling path. We thoroughly discussed on the convenience of using the probability threshold scheme instead of the fixed threshold scheme, especially due to the lower number of arbitrary parameters introduced in the former approach. In this study, we therefore limit ourselves to report on results of the probability threshold scheme. According to such a scheme, a nonequilibrium trajectory is broken at a check-point located at λ = , on the basis of the work dissipated as the collective coordinate is driven from λ1 to . The dissipated work at the  check-point can be determined if we know the free energy difference F = F () − F (λ1 ),36 which can be computed by adopting a self-consistent procedure. The algorithm can be summarized as follows. (i) Choose the collective coordinate ζ (q) associated with the control parameter λ. (ii) Sample the initial microstates from an equilibrium simulation by keeping fixed the control parameter to the initial value λ1 . Steps (i) and (ii) initialize the procedure without introducing any substantial modification to the standard approach. Next steps (iii) to (vii) describe how the pulling protocol is implemented to account for path-breaking. (iii) Choose the number and positions of the check-points along the pathway, say 1 , 2 , . . . , M . (iv) Generate a guess for the free energies F1 , F2 , . . . , FM , by exploiting the first few unbroken trajectories into Jarzynski equality: exp(−βFn ) Ng exp(−βWi (n )), where β −1 = kB T is the = Ng−1 i=1 inverse temperature, the average is over the first Ng trajectories, and Wi (n ) corresponds to the work performed on the system to drive the collective coordinate from the initial value λ1 to n during the ith trajectory. In principle, even a single trajectory can be used to get a free energy guess, i.e., Ng = 1. (v) For a generic trajectory i, when a check-point n is reached during the pulling path, the quantity p(Wi (n )) is calculated using the current value of Fn :   p(Wi (n )) = min 1, e−β[Wi (n )−Fn ] . (1) (Note that Wi (n ) − Fn corresponds to the dissipated work.) Then, a number r is randomly picked in the interval [0, 1]. If p(Wi (n )) ≥ r then the trajectory i is continued toward the next check-point n + 1 or the final step λ2 . Otherwise, it is stopped and the trajectory i + 1 is started. In both cases, the acceptance probability p(Wi (n )) is recorded to update the value of Fn , as described in step (vi). (vi) The free energies at the check-points, i.e., F1 , F2 , . . . , FM , are updated with established frequency by exploiting all the available work values into the following equations:   Nn−1  −1 −1 −1 −βWi (n ) Fn−1,i e Fn = −β ln N , (2) i=1

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

064104-3

Giovannelli et al.

⎡ −1 Fn−1,i =⎣

n−1

J. Chem. Phys. 140, 064104 (2014)

⎤−1 p(Wi (j ))⎦

j =1

=

n−1

  max 1, eβ[Wi (j )−Fj ] ,

(3)

j =1

where N is the number of (broken plus unbroken) trajectories realized till the current stage of the pulling calculations and Nn − 1 is the number of trajectories that overcome the n − 1 check-point and hence reach the n check-point. We point out that for n = 1, the statements −1 −1 ≡ F0,i = 1 and N0 = N hold because the first Fn−1,i check-point is surely reached by all trajectories. Note that in Eq. (2), as well as in the following Eqs. (4), (6), (9), and (10), the trajectories have an ordered labeling, namely highest indexes are assigned to trajectories which are broken later in the path. (i) The procedure continues by adopting the new free energies, F1 , F2 , . . . , FM , to check trajectory breaking at step (v). At the end of the nonequilibrium simulations, the PMF at λ, i.e., Fλ , is computed exploiting the quantities p(Wi (n )) and Wi (λ) for each i = 1, 2, . . . , N. For convenience of presentation, we report the related equation in Sec. II B (Eq. (4), or equivalently, Eq. (6)). B. Path-breaking in bidirectional nonequilibrium simulations

While in unidirectional nonequilibrium techniques (Sec. II A) only one of the two directions of the process is employed, bidirectional methodologies3, 27, 30 make use of two independent sets of trajectories to calculate free energy differences and PMFs. The first set is produced evolving the control parameter λ from a value λ1 to a value λ2 under established temporal protocol, whereas the second is produced varying λ from λ2 to λ1 with inverted protocol. In both cases, starting system configurations are sampled at equilibrium, holding λ fixed to the respective initial value (λ1 for the former and λ2 for the latter set of trajectories). The trajectories of the two sets are arbitrarily defined as forward and backward, with the only aim of distinguishing the direction of the process, under the assumption that λ1 < λ2 . We point out that in the following treatment pulling simulations are supposed to be performed in the regime of stiff spring approximation,4 so that PMF with respect to the control parameter λ can be considered to match the PMF with respect to the physical collective coordinate. Although extension of path-breaking to steered molecular dynamics simulations with soft driving potentials24, 37 is straightforward, we will not address it, as it has been shown that the best performances are usually obtained by using stiff driving potentials.38 In the present article, we extend the probability threshold scheme of path-breaking technique to two bidirectional nonequilibrium methods, specifically the Bennett method for the calculation of free energy differences3, 34, 39 and the approach to PMF of Ref. 30. Path-breaking adaptation of the

Minh-Adib PMF estimator27 is given in the supplementary material,40 but the related PMFs are not reported here because they almost match those obtained by means of the PMF estimator of Ref. 30. In order to combine path-breaking with bidirectional methods, we must employ the unidirectional probability threshold scheme32 to carry on a set of Nf forward trajectories and a set of Nb backward trajectories. Before illustrating the path-breaking relations for bidirectional methods, we need to introduce some notation to take into account the two directions of the process. The work performed on the system in the segment λ1 → λ of the ith forward trajectory is denoted as Wf,i (λ), while the work performed on the system in the segment λ2 → λ of the jth backward trajectory is denoted as Wb,j (λ). The nth check-point in forward direction and the mth check-point in backward direction are indicated with f, n and b, m , respectively.41 Without loss of generality, for the forward direction we consider the generic λ value in the range f, n < λ ≤ f, n + 1 . According to the probability threshold scheme,32 the expression of the PMF based on forward trajectories is (see also Eqs. (2) and (3)) ⎛ ⎞ Nf,n  −1 −βWf,i (λ) ⎠ Ff (λ) = −β −1 ln ⎝Nf−1 Fn,i e , (4) i=1

where Nf, n is the number of forward trajectories that overcome the f, n check-point thus reaching λ, and ⎡ ⎤−1 n

−1 = ⎣ p(Wf,i (f,j ))⎦ , (5) Fn,i j =1

p(Wf,i (f,j )) being computed through Eq. (1). Analogously, assuming that b, m + 1 ≤ λ < b, m , the expression of the PMF based on backward trajectories is (see also Eqs. (2) and (3)) ⎛ ⎞ Nb,m  −1 −βWb,i (λ) ⎠ Bm,i e , (6) Fb (λ) = −β −1 ln ⎝Nb−1 i=1

where Nb, m is the number of backward trajectories that overcome the b, m check-point thus reaching λ, and ⎡ ⎤−1 m

−1 Bm,i = ⎣ p(Wb,i (b,j ))⎦ . (7) j =1

As usual, p(Wb,i (b,j )) comes from Eq. (1). We remark that Ff (λ) and Fb (λ) are PMF estimates based on unidirectional nonequilibrium simulations such that Ff (λ1 ) = 0 and Fb (λ2 ) = 0. As shown in Ref. 30, the two independent free energy profiles obtained from unidirectional forward and backward simulations, together with a suitable estimate of the free energy difference F = F(λ2 ) − F(λ1 ), can be combined to recover a more accurate PMF estimate as e−βF (λ) = e−βFf (λ) + e−βF e−βFb (λ) ,

(8)

where Ff (λ) and Fb (λ) are the PMFs estimated by using the Jarzynski equality in forward and backward directions, respectively. Thus, employing the path-breaking estimates of

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

064104-4

Giovannelli et al.

J. Chem. Phys. 140, 064104 (2014)

Ff (λ) and Fb (λ) into Eq. (8), we can write  Nf,n

e−βF (λ) = Nf−1

−1 −βWf,i (λ) Fn,i e

i=1

+ e−βF Nb−1

Nb,m 

−1 −βWb,j (λ) Bm,j e .

(9)

j =1

A suitable estimate of F can be obtained from the Bennett method,30 including only the contributions of trajectories which complete their path (only complete trajectories can actually be employed in the Bennett method39 ). However, as these trajectories will reach the end point with a biased probability due to the check-points along the path (step (v) in Sec. II A), we need to reweight the contributions of the single trajectories by the inverse of the probability of overcoming the −1 for a generic ith forward trajectory last check-point, i.e., FM f ,i −1 and BMb ,j for a generic jth backward trajectory (note that Mf and Mb correspond to the numbers of check-points in forward and backward directions). The modified Bennett equation can therefore be written as Nf,Mf    Nf β (Wf,i (λ2 )−F ) −1 −1 1+ e FMf ,i Nb i=1   Nb β (Wb,j (λ1 )+F ) −1 −1 1+ − e BMb ,j = 0, Nf j =1 Nb,Mb

(10)

where Nf,Mf and Nb,Mb are the numbers of complete forward and backward trajectories. F can be evaluated from Eq. (10) by using an iterative procedure. We notice that the trajectories broken before their natural end do not contribute to the sums of Eq. (10). On the other side, neglecting these trajectories is not expected to affect the sums significantly, since breaking arises, on average, from large dissipation, which leads to large exponential functions and ultimately to negligible values of the terms in square brackets. We finally outline that the standard version of the Bennett equation3, 39 corresponds to −1 = 1, and Eq. (10) in which Nf,Mf = Nf , Nb,Mb = Nb , FM f ,i −1 BMb ,j = 1. III. APPLICATION OF PATH-BREAKING TO HELIX-COIL TRANSITION OF DECA-ALANINE A. Simulation details

As stated in Sec. II, classifying one of the two pulling directions as forward or backward is arbitrary. In this context, trajectories that produce deca-alanine unfolding are classified as forward, while refolding is obtained by means of backward trajectories.5 We point out that forward trajectories were also used in Ref. 32 to test path-breaking (PB) schemes in unidirectional nonequilibrium simulations. In order to compare standard and PB procedures under sampling protocols that produce identical nonequilibrium paths, PB has been applied a posteriori on full-length trajectories performed in the conventional way. Therefore, in the next paragraph, we discuss technical details of standard pulling simulations, while details on the PB application are given in the last part of the current subsection.

Constant-volume constant-temperature steered molecular dynamics simulations of one deca-alanine molecule have been carried out in vacuum by enforcing a temperature of 300 K through a Nosé-Hoover thermostat,42 without application of periodic boundary conditions. The force field is taken from Ref. 43, treating the electrostatic forces with the conventional Coulomb’s law. The N atom of the N-terminus residue has been constrained to a fixed position, while the N atom of the C-terminus residue has been restrained to move along a fixed direction by an external harmonic potential. Hence, the collective coordinate (end-to-end distance) corresponds to the distance between the N atoms of the two terminal amide groups. Forward pulling is accomplished by a linear time variation of the end-to-end distance from 1.55 to 3.15 nm, while the reverse protocol is employed for the backward direction. Both types of trajectories have been realized by setting the harmonic force constant to 8 × 104 kcal mol−1 nm−2 , a value large enough to assume the stiff spring approximation being valid.4 The starting configurations of pulling trajectories were randomly picked from two equilibrium molecular dynamics simulations using a time-independent harmonic potential on the end-to-end distance, with its equilibrium value fixed to 1.55 and 3.15 nm for the forward and backward trajectories, respectively. The same force constant of pulling simulations has been adopted. All the free energy differences and PMFs reported here have been obtained exploiting 104 forward and 104 backward trajectories, both directions being characterized by the same pulling velocity v. Three representative pulling velocities5 have been considered, namely v = 32, 53.3, 160 m s−1 .44 In applying PB to forward and backward trajectories, the same number Mf = Mb = M of check-points has been enforced, namely M = 1, 4, 9. Even if the check-point positions can be displaced arbitrarily along the pulling coordinate, here we choose an arrangement which divides the end-to-end coordinate into equal segments. The free energy guess at the check-points is calculated using the first 5 full-length trajectories, while the free energies at the check-points are updated every 5 trajectories. Making use of work functions along forward and backward trajectories, we have obtained several estimates of the free energy difference between folded and unfolded deca-alanine (by using the standard and PB-based Bennett method3 ) and of the PMF as a function of the end-to-end distance (by using the standard and PB-based PMF estimator of Ref. 30). In the former case, estimates obtained with standard and PB methods are generically indicated with Fst and Fpb , respectively, while the notations Fst (λ) and Fpb (λ) are used for the PMF. To evaluate the free energy uncertainty, a block average procedure45 has been applied. In particular, for each direction of the process, we have built 20 disjointed blocks of 500 trajectories each. Then, Fst , Fpb , Fst (λ), and Fpb (λ) have been recovered as averages of the estimates obtained from all possible pairs of blocks, one in forward and the other in backward direction, for a total of 20 × 20 = 400 block pairs. The error has been evaluated as twice the standard deviation. This approach has been repeated for all the simulation setups, which may differ each other for pulling velocity and/or check-point number.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

Giovannelli et al.

064104-5

J. Chem. Phys. 140, 064104 (2014)

FIG. 1. Fst (λ) and Fpb (λ) as functions of the deca-alanine end-to-end distance (black and red lines, respectively). From left to right panels, data from simulations performed at v = 160, 53.3, and 32 m s−1 are shown. For PB simulations, PMFs obtained with a different number of check-points M are reported (see panels). The exact PMF, from Ref. 5, is reported for comparison (blue lines). Error bars are calculated as explained in the text. For the sake of clarity, PMFs are translated along the y axis.

B. Results

The profiles Fpb (λ), calculated with v = 160, 53.3, 32 m s−1 and M = 1, 4, 9, are compared to the corresponding standard quantities, Fst (λ), in Fig. 1. According to Ref. 37, a proper constant has been added to each PMF profile to minimize its root mean square deviation (RMSD) from the exact PMF,5 defined as (see Sec. II of the supplementary material40 for technical details on the calculation of the exact PMF)   L   [FA (λi ) − FB (λi )]2 , (11) RMSD = L−1 i=1

where FA (λi ) and FB (λi ) represent two generic estimates of the PMF at the λi point and the sum is over the L = 81 points in which the collective coordinate is divided (corresponding to a PMF resolution of 0.02 nm). We remark that the accuracy of PB should be evaluated in terms of its capability in reproducing the free energies computed using the standard method, rather than the exact free energies. In fact, PB is essentially an approximation to the standard pulling protocol and hence it will suffer from all shortcomings of the standard method, starting from the drastic effects of pulling velocity on computational accuracy. Globally, standard and PB PMFs are in good agreement, especially considering the large variation of the PMF along the end-to-end coordinate. Significant differences are only observable for v = 160 m s−1 and M = 9. These outcomes are confirmed from the RMSDs (Eq. (11)) between Fpb (λ) and Fst (λ) reported in Table I. The worse situation, observed for the simulation setup v = 160 m s−1 and M = 9, gives RMSD = 3.6 kJ mol−1 , whereas, for other setups, RMSDs fall below 3 kJ mol−1 . TABLE I. RMSDs between Fpb (λ) and Fst (λ) (in units of kJ mol−1 ) for various simulation setups, characterized by different pulling velocity v (in units of m s−1 ) and number of check-points M. Note that, for a given pulling velocity v, a unique PMF is obtained from standard nonequilibrium simulations. v

M

RMSD

v

M

RMSD

v

M

RMSD

160 160 160

1 4 9

1.0 1.7 3.6

53.3 53.3 53.3

1 4 9

0.6 2.4 2.7

32 32 32

1 4 9

0.03 0.5 1.9

In any case, average deviations are of the order of chemical accuracy. In front of a substantial conservation of the computational accuracy of PB, here we must consider that the computer time efficiency is improved due to possible truncation of pulling simulations at the check-points. A quantitative analysis of computer time efficiency will be given below. Worsening of the PB accuracy with increasing v and M can be noted in Fig. 1 and in the related RMSDs of Table I, but it is better appreciated in Table II, where we report Fst and Fpb obtained by means of various simulation setups. This feature is consistent with the data of unidirectional pulling simulations.32 As found in the above PMF analysis, the worse result is got for v = 160 m s−1 and M = 9, which corresponds to a relative deviation, |(Fpb − Fst )/Fst |, of about 9.7%. In all other cases, relative deviations are well below 7%, revealing a satisfactory trend of PB performances. The worsening of PB accuracy with growing M is expected considering that Fpb is computed by exploiting only full-length trajectories and that accuracy becomes worse with decreasing the number of employed trajectories. An increase of M makes the probability of breaking a trajectory larger and hence the number of complete trajectories smaller. The percentage of complete trajectories employed in the calculation of Fpb is reported in Table II: the aforementioned trend of Fpb versus M is evident. The effect of the pulling velocity on the PB accuracy can instead be explained with the large average dissipation resulting from faster pulling, which ultimately leads to greater probability of breaking a trajectory. However, evaluation of the performances of PB with respect to the standard approach must result from a compromise between loss of accuracy in reproducing the standard free energies and gain in computer time efficiency. Computational efficiency can be quantified with the ratio Tst /Tpb between the computer times Tst and Tpb needed to perform standard and PB pulling simulations, respectively. In our case, as the numbers of trajectories are the same in

TABLE II. Fst and Fpb (Eq. (10)) correspond to estimates of the free energy difference between folded and unfolded states of deca-alanine computed by using standard and PB approaches, respectively (in units of kJ mol−1 ). Uncertainties, calculated as described in the text, are given in parenthesis. Tst /Tpb (Eq. (13)) is the efficiency ratio. nt is the percentage of complete forward and backward trajectories in PB simulations (we notice that the numbers of forward and backward trajectories, not reported here, are comparable). Results for simulation setups differing in pulling velocity v (in units of m s−1 ) and in the number M of check-points are reported. The exact value of F5 is 91.7 kJ mol−1 . v

M

Tst /Tpb

nt

Fpb

160 160 160 53.3 53.3 53.3 32 32 32

1 4 9 1 4 9 1 4 9

1.3 2.9 5.1 1.3 2.4 3.7 1.3 2.2 3.2

9.5 2.5 1.5 15.5 3.5 2.5 17.0 3.5 2.5

84.4 (0.2) 86.0 (0.3) 90.2 (0.3) 87.0 (0.4) 83.1 (0.5) 82.5 (0.5) 91.7 (0.2) 90.6 (0.3) 87.4 (0.3)

Fst

82.2 (0.2)

88.3 (0.3)

91.7 (0.2)

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

064104-6

Giovannelli et al.

J. Chem. Phys. 140, 064104 (2014)

forward and backward directions, i.e., Nf = Nb = N, the time spent in standard pulling simulations is Tst = 2Nt, t being the time necessary to complete a single forward or backward trajectory. In PB simulations, considering that Mf = Mb = M, the time Tpb is given by  M M   Nt Tpb = f,k + b,l , (12) M + 1 k=0 l=0 where  f, k and  b, l correspond to the fractions of forward and backward trajectories that overcome the f, k and b, l checkpoints, respectively. Note that  f, 0 and  b, 0 refer to the first segments of coordinate, going from the starting point to the first check-point (this implies that  f, 0 =  b, 0 = 1). Thus, the efficiency ratio is given by the relation Tst 2(M + 1) = M . M Tpb k=0 f,k + l=0 b,l

(13)

Just as in unidirectional pulling simulations,32 the efficiency ratio lies in the interval [1, M + 1]. The minimum efficiency, Tst /Tpb = 1, occurs as no trajectory is broken and hence PB converges to the standard procedure ( f, k =  b, l = 1, ∀ k, l). The maximum efficiency, Tst /Tpb = M + 1, virtually occurs when all trajectories are broken at the first check point ( f, 0 =  b, 0 = 1 and  f, k =  b, l = 0 for k > 1 and l > 1). Note, however, that the latter case is not suitable for free energy calculations, because at least one forward and one backward trajectory must reach its natural end. This implies that the maximum efficiency ratio for suitable calculations must be (M + 1)/(1 + M/N) [in our calculations M/N is of the order of 0.01, so that (M + 1)/(1 + M/N)  M + 1]. The efficiency ratio is reported in Table II. It is significant that the computational efficiency, which ranges from 1.3 to 5.1, becomes worse as accuracy increases and vice versa. In any case, we may safely state that improving the computational efficiency up to a factor 3 or 4 does not lead to relevant deviations between Fpb and Fst . The gain in computational efficiency obtainable with PB gives us the opportunity of decreasing the pulling velocity by keeping unchanged the overall simulation time with respect to the standard method. This is possible because in PB simulations the growth of computational time arising from lowering the pulling velocity can be balanced by making the number of check-points larger. In conditions of having similar computational efficiency, i.e., Tst /Tpb  1, which can be realized using a proper combination of v and M in PB simulations, it is therefore possible to compare directly the accuracies of Fst (λ) and Fpb (λ) in reproducing the exact PMF. Different pulling velocities for standard and PB simulations yield different simulation times per trajectory that we can denote as tst and tpb , respectively. The expression of Tst /Tpb (Eq. (13)) is thus modified as follows:46 Tst 2(M + 1) tst = . (14)  M Tpb tpb M k=0 f,k + l=0 b,l Obviously, as Tpb depends on both v and M, it is possible to find different combinations of these parameters such that standard and PB computational efficiencies are comparable, i.e., Tst /Tpb  1. The combinations that enforce the condition

TABLE III. RMSDst and RMSDpb are the root mean square deviations between the exact PMF5 and the PMFs computed by using standard and PB methods, respectively (in units of kJ mol−1 ). For standard and PB methods, the reported combinations of v (in units of m s−1 ) and M yield comparable efficiency, as quantified by the ratio Tst /Tpb . For example, standard simulations performed at v = 80 m s−1 have computational efficiency comparable to PB simulations performed at (i) v = 32 m s−1 and M = 5 and (ii) v = 53.3 m s−1 and M = 2 (for these simulation setups Tst /Tpb = 1.01 and 1.12, respectively). Note that, for a given pulling velocity v, a unique PMF is obtained from standard nonequilibrium simulations. Standard simulations v 160 160 80 80 53.3 53.3

Path-breaking simulations

RMSDst

v

M

RMSDpb

Tst /Tpb

6.66 6.66 5.28 5.28 2.31 2.31

53.3 32 53.3 32 32 16

7 26 2 5 3 13

4.94 2.36 3.36 1.64 0.93 0.16

1.08 1.04 1.12 1.01 1.16 1.01

Tst /Tpb  1 for standard simulations performed with v = 160, 80, and 53.3 m s−1 are reported in Table III. In Fig. 2, Fst (λ) calculated with v = 53.3 m s−1 is compared to Fpb (λ) profiles having comparable computational efficiency (last two rows in Table III). The better performances of PB are evident even from a visual inspection of the curves. This is confirmed quantitatively by the RMSDs between the PMF estimates and the exact PMF5 (see Table III). Whereas PB profiles deviate from the exact PMF by less than 1 kJ mol−1 , RMSD related to Fst (λ) exceeds 2 kJ mol−1 . Similar results have been obtained for the other combinations of v and M parameters reported in Table III. Ultimately, it can be stated that, in bidirectional as well as unidirectional32 nonequilibrium simulations, computational time gained by using PB can be exploited to decrease pulling velocity, eventually improving the accuracy of free energy estimates.

FIG. 2. Fst (λ) (black line) and Fpb (λ) (red and blue lines) as functions of the deca-alanine end-to-end distance. The PMFs refer to standard and PB simulations having comparable computational efficiency (see last two lines in Table III). The simulation setups of standard and PB simulations (pulling velocity v and check-point number M) are reported in the legend. The exact PMF, from Ref. 5, is reported for comparison (dashed line). Error bars are calculated as explained in the text. RMSDs of Fst (λ) and Fpb (λ) from the exact PMF are also given (in units of kJ mol−1 ).

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

064104-7

Giovannelli et al.

IV. CONCLUDING REMARKS AND PERSPECTIVES

A major shortcoming of free energy calculations via nonequilibrium molecular dynamics simulations4 is the relatively large number of realizations of the process (simulations) necessary to reach a suitable convergence of free energy or potential of mean force estimates. Such an aspect can have impact on the feasibility of a calculation as the time needed for a single realization is large owing to the complexity of the system. This problem is particularly relevant in unidirectional nonequilibrium methods,1 because of the difficulty to sample pulling trajectories with low dissipated work.33 Pathbreaking techniques,32 based on breaking strongly dissipative trajectories before their natural end, give us the opportunity of a considerable computer-time saving, since only important nonequilibrium trajectories, i.e., the less dissipative, are retained. In this article, the path-breaking variant called probability threshold scheme, applied to unidirectional steered molecular dynamics simulations in Ref. 32, is extended to the case of bidirectional nonequilibrium simulations. We specifically focus on two relations to compute free energy: the Bennett method3 employed to compute the free energy difference between two states and the potential of mean force estimator based on work exponential averages proposed in Ref. 30. Combining path-breaking with bidirectional methodologies does not require execution of specific additional protocols in performing nonequilibrium pulling simulations. These can be realized by applying the unidirectional path-breaking procedure independently to both forward and backward sets of trajectories. Reweighting relations for free energy evaluation are applied a posteriori to the data stored during the nonequilibrium simulations. As in the unidirectional case,32 bidirectional adaptation of path-breaking has been tested on the estimate of the free energy profile of deca-alanine as a function of the end-to-end distance of the peptide, which may range from a α-helix to a fully elongated conformation. Results are compared with free energy estimates obtained from nonequilibrium standard methods in the same operating conditions. The enhancement of computer-time efficiency with respect to the standard approach has been shown to range from a factor 1.5 to 5, without significant changes in the accuracy of the results. An important advantage of path-breaking, also in this bidirectional version, is the almost complete compatibility with most of path-sampling techniques developed in the framework of nonequilibrium simulation schemes. An example of such a compatibility has been shown in Ref. 32 through the methane pulling process, which has been realized combining path-breaking with the configurational freezing approach.20, 21 We point out that, in the framework of methodologies aimed at improving nonequilibrium path-sampling, other effective strategies have been devised to eliminate highly dissipative trajectories. In particular, we mention the variant of the method annealed importance sampling47, 48 supplied by trajectory-resampling.49 Although annealed importance sampling with trajectory-resampling is inherently a unidirectional technique in which thermal changes are realized,

J. Chem. Phys. 140, 064104 (2014)

extension to mechanical changes and bidirectional nonequilibrium schemes appears straightforward. In perspective, path-breaking could be exploited in replica exchange50–52 or serial generalized-ensemble simulations53, 54 with trial exchanges realized by means of nonequilibrium work simulations.55, 56 To generate a trial exchange using nonequilibrium simulations, the molecular system undergoes a simulation governed by a time-dependent Hamiltonian that changes in a nonequilibrium fashion from the current Hamiltonian into the target Hamiltonian over a prescribed period of time. The probability of accepting the transition to the new Hamiltonian depends on the work dissipated during the nonequilibrium trial exchange. The time needed for these trial exchanges is the most critical aspect of the method because, in the case of rejection of the trial move, it would be lost. This approach is similar to that employed in nonequilibrium candidate Monte Carlo,57, 58 a technique to perform equilibrium simulations which exploits finite-time rather than instantaneous switching moves to drive the dynamics of important degrees of freedom. In this respect, unidirectional and bidirectional path-breaking schemes59 could be used to truncate nonequilibrium trial exchanges when the dissipated work exceeds a given threshold, with evident gain of computer time. Possibility of application of path-breaking is envisaged in both replica exchange simulations50–52 and serial generalized-ensemble simulations with self-consistent determination of weights.60–62 Future studies will be devoted to this subject. ACKNOWLEDGMENTS

The authors are grateful to Pierluigi Cresci for technical support. This work was supported by European Union Contract No. RII3-CT-2003-506350 and by the Italian Ministero dell’Istruzione, dell’Università e della Ricerca (No. PRIN 2010-2011). 1 C.

Jarzynski, Phys. Rev. Lett. 78, 2690 (1997). E. Crooks, J. Stat. Phys. 90, 1481 (1998). 3 G. E. Crooks, Phys. Rev. E 61, 2361 (2000). 4 S. Park and K. Schulten, J. Chem. Phys. 120, 5946 (2004). 5 P. Procacci, S. Marsili, A. Barducci, G. F. Signorini, and R. Chelli, J. Chem. Phys. 125, 164101 (2006). 6 C. Chatelain, J. Stat. Mech.: Theor. Exp. P04011 (2007). 7 S. Mitternacht, S. Luccioli, A. Torcini, A. Imparato, and A. Irbäck, Biophys. J. 96, 429 (2009). 8 F. M. Ytreberg and D. M. Zuckerman, J. Chem. Phys. 120, 10876 (2004). 9 S. X. Sun, J. Chem. Phys. 118, 5769 (2003). 10 P. L. Geissler and C. Dellago, J. Phys. Chem. B 108, 6667 (2004). 11 D. Wu and D. A. Kofke, J. Chem. Phys. 122, 204104 (2005). 12 S. Vaikuntanathan and C. Jarzynski, Phys. Rev. Lett. 100, 190601 (2008). 13 T. Schmiedl and U. Seifert, Phys. Rev. Lett. 98, 108301 (2007). 14 W. Lechner, H. Oberhofer, C. Dellago, and P. L. Geissler, J. Chem. Phys. 124, 044113 (2006). 15 C. Jarzynski, Phys. Rev. E 65, 046122 (2002). 16 T. Z. Mordasini and J. A. McCammon, J. Phys. Chem. B 104, 360 (2000). 17 R. Bitetti-Putzer, W. Yang, and M. Karplus, Chem. Phys. Lett. 377, 633 (2003). 18 C. Oostenbrink and W. F. van Gunsteren, J. Comput. Chem. 24, 1730 (2003). 19 P. Nicolini and R. Chelli, Phys. Rev. E 80, 041124 (2009). 20 P. Nicolini, D. Frezzato, and R. Chelli, J. Chem. Theory Comput. 7, 582 (2011). 21 R. Chelli, J. Chem. Theory Comput. 8, 4040 (2012). 2 G.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

064104-8 22 P.

Giovannelli et al.

Nicolini, D. Frezzato, C. Gellini, M. Bizzarri, and R. Chelli, J. Comput. Chem. 34, 1561 (2013). 23 M. R. Shirts and V. S. Pande, J. Chem. Phys. 122, 144107 (2005). 24 G. Hummer and A. Szabo, Proc. Natl. Acad. Sci. U.S.A. 98, 3658 (2001). 25 P. Maragakis, M. Spichty, and M. Karplus, Phys. Rev. Lett. 96, 100602 (2006). 26 R. Chelli, J. Chem. Phys. 130, 054102 (2009). 27 D. D. L. Minh and A. B. Adib, Phys. Rev. Lett. 100, 180602 (2008). 28 R. Chelli, S. Marsili, and P. Procacci, Phys. Rev. E 77, 031104 (2008). 29 E. H. Feng and G. E. Crooks, Phys. Rev. E 79, 012104 (2009). 30 R. Chelli and P. Procacci, Phys. Chem. Chem. Phys. 11, 1152 (2009). 31 T. N. Do, P. Carloni, G. Varani, and G. Bussi, J. Chem. Theory Comput. 9, 1720 (2013). 32 R. Chelli, C. Gellini, G. Pietraperzia, E. Giovannelli, and G. Cardini, J. Chem. Phys. 138, 214109 (2013). 33 J. Gore, F. Ritort, and C. Bustamante, Proc. Natl. Acad. Sci. U.S.A. 100, 12564 (2003). 34 C. H. Bennett, J. Comput. Phys. 22, 245 (1976). 35 Here, q denotes the whole set or a subset of the atomic coordinates of the system involved in the definition of the collective coordinate, which can be an interatomic distance, a dihedral angle formed by covalently bonded atoms, the coordination number of an atom, etc. 36 Dissipated work is defined as W () − F .  37 P. Nicolini, P. Procacci, and R. Chelli, J. Phys. Chem. B 114, 9546 (2010). 38 F. M. Ytreberg, J. Chem. Phys. 130, 164906 (2009). 39 M. R. Shirts, E. Bair, G. Hooker, and V. S. Pande, Phys. Rev. Lett. 91, 140601 (2003). 40 See supplementary material at http://dx.doi.org/10.1063/1.4863999 for the implementation of path-breaking in the Minh-Adib potential of mean force estimator (Sec. I) and for technical details on the calculation of the exact potential of mean force of the deca-alanine system (Sec. II). 41 Note that the numbers of check-points along the collective coordinate λ in forward and backward directions may not be identical. 42 W. G. Hoover, Phys. Rev. A 31, 1695 (1985).

J. Chem. Phys. 140, 064104 (2014) 43 A.

D. MacKerell, Jr., D. Bashford, M. Bellot, R. L. Dunbrack, Jr., J. D. Evanseck, M. J. Field, S. Fisher, J. Gao, H. Guo, S. Ha, D. JosephMcCarthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher III, B. Roux, M. Schlenkrich, J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. WiórkiewiczKuczera, D. Yin, and M. Karplus, J. Phys. Chem. B 102, 3586 (1998). 44 These pulling velocities correspond to driven trajectories lasting 50, 30, and 10 ps, respectively. 45 D. M. Zuckerman and T. B. Woolf, Chem. Phys. Lett. 351, 445 (2002). 46 Equation (14) is obtained by setting T = 2nt , while T is from Eq. (12) st st pb with t = tpb . 47 R. M. Neal, Stat. Comput. 11, 125 (2001). 48 E. Lyman and D. M. Zuckerman, J. Chem. Phys. 127, 065101 (2007). 49 E. Lyman and D. M. Zuckerman, J. Chem. Phys. 130, 081102 (2009). 50 U. H. E. Hansmann, Chem. Phys. Lett. 281, 140 (1997). 51 Y. Sugita and Y. Okamoto, Chem. Phys. Lett. 314, 141 (1999). 52 Y. Okamoto, J. Mol. Graphics Modell. 22, 425 (2004). 53 E. Marinari and G. Parisi, Europhys. Lett. 19, 451 (1992). 54 A. P. Lyubartsev, A. A. Martsinovski, S. V. Shevkunov, and P. N. Vorontsov-Velyaminov, J. Chem. Phys. 96, 1776 (1992). 55 A. J. Ballard and C. Jarzynski, Proc. Natl. Acad. Sci. U.S.A. 106, 12224 (2009). 56 R. M. Dirks, H. Xu, and D. E. Shaw, J. Chem. Theory Comput. 8, 162 (2012). 57 J. P. Nilmeier, G. E. Crooks, D. D. L. Minh, and J. D. Chodera, Proc. Natl. Acad. Sci. U.S.A. 108, E1009 (2011). 58 J. P. Nilmeier, G. E. Crooks, D. D. L. Minh, and J. D. Chodera, Proc. Natl. Acad. Sci. U.S.A. 109, 9665 (2012). 59 In the case of nonequilibrium candidate Monte Carlo, only unidirectional path-breaking can be used. 60 R. Chelli, J. Chem. Theory Comput. 6, 1935 (2010). 61 R. Chelli and G. F. Signorini, J. Chem. Theory Comput. 8, 830 (2012). 62 R. Chelli and G. F. Signorini, J. Chem. Theory Comput. 8, 2552 (2012).

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.209.6.50 On: Sun, 21 Dec 2014 00:09:34

Combining path-breaking with bidirectional nonequilibrium simulations to improve efficiency in free energy calculations.

An important limitation of unidirectional nonequilibrium simulations is the amount of realizations of the process necessary to reach suitable converge...
665KB Sizes 0 Downloads 0 Views