Binding energy calculations for hevein-carbohydrate interactions using expanded ensemble molecular dynamics simulations.

J Comput Aided Mol Des DOI 10.1007/s10822-014-9792-5

Binding energy calculations for hevein–carbohydrate interactions using expanded ensemble molecular dynamics simulations Chaitanya A. K. Koppisetty • Martin Frank • Alexander P. Lyubartsev • Per-Georg Nyholm

Received: 26 May 2014 / Accepted: 2 September 2014 Ó Springer International Publishing Switzerland 2014

Abstract Accurate estimation of protein–carbohydrate binding energies using computational methods is a challenging task. Here we report the use of expanded ensemble molecular dynamics (EEMD) simulation with double decoupling for estimation of binding energies of hevein, a plant lectin with its monosaccharide and disaccharide ligands GlcNAc and (GlcNAc)2, respectively. In addition to the binding energies, enthalpy and entropy components of the binding energy are also calculated. The estimated binding energies for the hevein–carbohydrate interactions are within the range of ±0.5 kcal of the previously reported experimental binding data. For comparison, binding energies were also estimated using thermodynamic integration, molecular dynamics end point calculations (MM/GBSA) and the expanded ensemble methodology is seen to be more accurate. To our knowledge, the method of EEMD simulations has not been previously reported for estimating biomolecular binding energies. Keywords Binding energies Expanded ensemble molecular dynamics Carbohydrates Docking

C. A. K. Koppisetty M. Frank P.-G. Nyholm Biognos AB, Gothenburg, Sweden C. A. K. Koppisetty Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden A. P. Lyubartsev Arrhenius Laboratory, Division of Physical Chemistry, Stockholm University, 10691 Stockholm, Sweden P.-G. Nyholm (&) Department of Medical Biochemistry, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden e-mail: [email protected]; [email protected]

Introduction The structure and energetics of protein–carbohydrate complexes are of great interest due to their importance in many biological phenomena. Protein–carbohydrate interactions have been implicated in manifestation of many diseases due to e.g. microbial invasions or cancer. Many plants produce defense proteins in response to pathogenic attacks that have the ability to bind to chitin, a b(1–4) N-acetyl glucosamine (GlcNAc) polysaccharide [1–3]. Chitin is present in the cell wall of fungi and exoskeleton of insects. A majority of the defense proteins have a common structural motif called hevein domain consisting of 30–43 residues that are rich in glycines and cysteines. Thermodynamic data of chitin binding to hevein has been obtained using NMR titration and isothermal titration calorimetry (ITC) experiments [4]. Calculation of binding energies for protein–carbohydrate interactions has potential applications in understanding the underlying phenomena and for developing novel inhibitors of molecular interaction pathways. Here we report the binding energy calculations for hevein with GlcNAc, a monosaccharide ligand and a disaccharide ligand, (GlcNAc)2. Due to their weak interactions accurate calculation of the binding free energies for protein– carbohydrate interactions is difficult [5]. Developments of specific empirical scoring functions were relatively successful in predicting the binding energies for protein– carbohydrate interactions [6–8]. However, the empirical methods suffer the disadvantage of limited availability of experimental data and bias to the characteristics of the training dataset. This is due to limited availability of synthesized carbohydrates, their inherent flexibility and often difficulties are even reported for designing experiments involving carbohydrates [9]. Hence, there is still

123

J Comput Aided Mol Des

demand for development of reliable and accurate methods for estimating the binding affinities of protein–carbohydrate interactions. Calculation of binding energies for protein–ligand interactions using molecular dynamics (MD) simulations have been attempted using end point energy difference methods such as MD MM-PB/GBSA and thermodynamic integration (TI) methods [10–13]. It is known that the accuracy of MD based methods depends on sufficient sampling for convergence, as well as on the force field used. In the end point energy difference methodology, the binding energy is obtained by calculating individual components of binding energy, which are the enthalpy and entropy. The components of binding energy are estimated through the energy difference between the bound and unbound states of the protein–ligand complexes. By contrast, the TI methods directly estimate the binding energy by integrating the differential changes in the binding energy with respect to the relative presence or absence of the ligand in the system. In this study, we introduce the method of expanded ensemble molecular dynamics (EEMD) simulations for calculating binding energies. The EEMD method was successfully used earlier for accurate calculation of solvation energies for organic molecules [14–16]. The EEMD simulations described here have some similarities with the TI methodology, however there are some differences. Both methods share the principle that the ligand is coupled or decoupled during MD sampling. The intermediate states of the ligand are also known as subensembles. The potential of the ligand is coupled with the rest of the environment using a coupling parameter k. A value of k = 1 represents an ensemble of the system where the ligand is fully coupled to the system whereas k = 0 represents an ensemble with fully decoupled ligand. Thus the ensembles with k values between 0 and 1 represent the states where the ligand is partially present. In the TI methods, MD simulations are performed with k values starting from either 0 or 1 and gradual transformation to the completely present ligand representation or the completely absent ligand representation. The convergence of the sampling is verified upon consistent results in both the directions of k transformation. The transformation of the k values occurs as soon as the maximum numbers of MD simulation steps are reached. The EEMD method described herein is different from the TI methodology since it involves a Monte Carlo (MC) guided transition between subensembles. Using the EEMD method it is also possible to obtain the thermodynamic components of binding energy which has not been attempted using the TI methodology. Further details of MD end point energy difference and TI methods can be found elsewhere [11, 12, 17, 18].

123

Theoretical background: EEMD methodology The EEMD methodology involves calculating the free energy changes caused by insertion or deletion of a molecule in a defined molecular system. Let us consider a system with N particles and an additional particle, (N ? 1) is gradually inserted in the system. The aim is to estimate the free energy changes in the system caused by gradual insertion of the (N ? 1)th particle. The insertion of the new particle is performed through a series of subensembles where the value of the coupling parameter k varies from 0 to 1. The configurational partition function of the system at a given subensemble m can be expressed as Z NY þ1 Zm ¼ dqi exp b HN ðqi Þ þ km hðNþ1Þ ðqi Þ ð1Þ V

i¼1

where b = 1/kT, HN is the Hamiltonian of the system with N particles, h(n?1) is the energy of interaction of the new particle with the rest of the system. qi are the three dimensional coordinates of the particles that are used as space variables for determining the Hamiltonian for the system in subensemble m. k is the coupling parameter with values varying from 0 to 1 depending on the subensemble m. Thus, the partition function for the fully expanded ensemble system can be expressed as Z¼

M X

Zm expðgm Þ

m¼0

¼

M Z N þ1 X Y m¼0

V

dqi exp b HN ðqi Þ þ km hðNþ1Þ ðqi Þ þ gm

i¼1

ð2Þ where gm is a parameter called balancing factor for a given subensemble m. The transitions between the expanded configurations of subensembles are performed in a MC fashion simultaneously simulating the displacement of particles together with variation of the coupling parameters. Thus, the MC procedure produces a probability distribution over the subensembles of the system given by PM. The probability ratio of the extreme subensembles, m = 0 where the (N ? 1)th particle is absent and m = M where the (N ? 1)th particle is completely present can be estimated by pM ZM ¼ expðgM g0 Þ p0 Z0 ¼ expfb½FðMÞ Fð0Þ þ gM g0 g

ð3Þ

F(M) is the free energy of the system with (N ? 1) particles and F(0) is the sum of free energies of the system

J Comput Aided Mol Des Fig. 1 Double decoupling scheme for estimating binding energy using EEMD simulations. a Decoupling of the ligand from the binding site of the protein. The ligand is held in the binding site of the protein by applying weak positional restraints to the ligand atoms and a few atoms in the binding site. b Decoupling of the ligand from the bulk solvent

with N particles and the free energy of the (N ? 1)th molecule in the ideal gas phase. Thus the difference between F(M) and F(0) is the energy DE, required to transfer the (N ? 1)th particle from the ideal gas phase to the solution at a given density. The DE can be estimated by pM DE ¼ ln ð4Þ þ gM g0 p0 The balancing factors (g) need to be chosen in a way that probability distribution over subensembles is uniform or at least that the probabilities are reliably estimated during the simulation. In this work, balancing factors were optimized using the Wang-Landau algorithm [19], details of which in EEMD calculations are described in the literature [20]. In the current context of protein–ligand interactions, the protein and the solvent (water) is the original system in which the ligand is gradually inserted or deleted. The aim is to estimate the binding energy of the ligand to the protein. The binding energy of the protein–ligand interaction is estimated through a two step decoupling of the ligand from the protein-solvent system (Fig. 1). In both the stages, the expanded ensemble MD method was employed for calculating the energy of decoupling the ligand from the systems. In the first stage (Fig. 1 step A), the energy for transferring the ligand from gas phase to the active site of the protein in presence of solvent is estimated as DE1. In the second step (Fig. 1 step B) of the methodology, the energy for transferring the ligand from the gas phase to the solvent in absence of protein is estimated as DE2. Thus the difference between DE1 and DE2 is the binding energy of the ligand to the protein in the solvent environment. The scheme for calculating the binding energies in two stages is

referred to as double decoupling method and was reported earlier for estimation of free energy required to remove a localized water molecule from the active site of proteins using molecular dynamic simulations [21]. DGbinding ¼ DE1 DE2 The EEMD simulations face a complication that, during the sampling in subensembles, there is a risk that the ligand might drift away from the active site of the protein. This occurs when the partial representation of the ligand no longer contains the key interactions that are necessary for its binding to the active site residues. Once the ligand drifts away from the active site of the protein, the energy estimations might no longer represent the binding energy of the ligand. In order to avoid this complication, the ligand needs to be restrained in the active site of the protein, which was done in this work by applying harmonic constraints.

Materials and methods The MDynaMix program (version 5.2) [16] was used for performing the EEMD simulations that are reported in this study. The atomic coordinates of hevein used for the simulations were downloaded from the protein databank (PDB 1HEV). The EEMD simulations were performed using the AMBER-ff94 forcefield parameters for the protein atoms while the glycam06 forcefield parameters were used for the carbohydrate atoms. The TIP3P water model was used to describe the solvent during the simulations. The ‘‘makemol’’ utility was used for preparing the final input files necessary for the EEMD simulations using

123


MDYNAMIX. For the hevein–GlcNAc and hevein-(GlcNAc)2 systems, the initial configuration consisting of one protein molecule, one carbohydrate molecule and water molecules was generated. Rectangular periodic boundary conditions with a box density of 1 g/cm3 were used while Nose thermostat was used for regulating the temperature at 298 K with a relaxation time of 30 fs. The pressure is regulated using a Nose–Hoover barostat at 1 bar and a relaxation time of 1,000 fs. A long time step of 2 fs was used with a double time step size of 10 following the double time step algorithm. The electrostatic interactions ˚ was were treated with the Ewald method. A cut-off of 11 A used for Lennard-Jones and real-space part of the electro˚ was used for Lennard-Jones static forces. A cut-off of 5 A forces computed at each short-time step. The list of neighbours was updated after every 10 steps. Both the hevein GlcNAc and (GlcNAc)2 systems were simulated for 26 subensembles for calculating the free energies. A MC walk between the subensembles was performed with transitions between the subensembles every 100 molecular dynamic steps. The system is considered to be converged once the balancing factors are optimized which usually depends on the size of the system and the number of atoms that are considered for sampling in the expanded ensembles. Once the system is converged the EEMD production simulation is performed for at least 2 ns with the fixed balancing factors in order to collect the final free energy values. Optimizing the balancing factors is the computationally expensive part of the simulation. The coordinates of the ligand’s center of mass were constrained to occupy the binding site by using a weak harmonic potential ˚ 2 and allowing (1/2)kDr2 with force constant k = 5 kBT/A centered in the middle of binding site, which does not restrict for the ligand to find optimal coordination in the binding site, but preventing the ligand from leaving the vicinity of the binding site. All the EEMD simulations reported in this study were performed on the supercomputing infrastructure provided by the Swedish National Infrastructure for Computing. The binding energies of hevein with GlcNAc and (GlcNAc)2 carbohydrates were additionally calculated by standard MD simulations followed by molecular mechanics Generalized Born’s surface area (MMGBSA) approach for estimating binding energies of biomolecular systems. The MD simulations were performed with AMBER (version 8.0; University of California, San Francisco) suite of MD programs. The hevein–carbohydrate systems were solvated with the TIP3P solvent representation model of the water molecules. The system was equilibrated gradually from 0 to 300 K using MD for 50 ps. The equilibration was then followed by 30 ns of MD production at 300 K. During the entire equilibration and production runs, the bonds to hydrogen atoms are constrained through the SHAKE

123

procedure. The temperature was maintained constant and regulated by a weak coupling algorithm and the pressure was regulated using isotropic position scaling. A non˚ was used to truncate the non-bonbonded cut off of 10 A ded pairs. Snapshots of the MD trajectories were stored every 1 ps for processing using the MMGBSA procedure for calculating the binding energies. The trajectories of the MD simulations were analyzed and the binding energy calculations were performed with the conformational analysis tools (CAT) [22] software suite. The energies from the MM/GBSA methodology represent mainly the enthalpic component of the binding energy. The entropy component of the binding energy was estimated using the normal mode analysis that uses a rigid-rotor, harmonic oscillator (RRHO) approximation. The normal mode analysis estimates the entropy components of the binding energies arising from the protein and the ligand but not from the solvent. In addition to the MMGBSA procedure, the binding energy calculation was performed for hevein (GlcNAc)2 using TI method [23]. In TI, the free energy change between two states of a system L and L* can be estimated by Eq. 5. ZkL oHðkÞ dk ð5Þ DGL!L ¼ ok k kL

where H(k) denotes the system Hamiltonian from a single trajectory as a function of the coupling parameter k and h i denotes the ensemble averages at a given value of k. The integration is usually performed using numerical integration based on MD simulations at a number of fixed k values. The values of k vary between 0 and 1 representing the starting and the initial and final states of the system. Under the theoretical assumptions of infinite long trajectories and access to the complete phase space, the energy differences between the states can be accurately estimated following the Eq. 5. The TI calculations were originally developed to estimate the relative differences between binding energies of ligand with minute changes in their chemical structure by simulating the transformation of one chemical system to a different one along a non-physical path hence computing the associated free energy changes. In this study, the (GlcNAc)2 was completely annihilated by series of simulations with varying k. The binding energies for the hevein (GlcNAc)2 system was estimated in the similar fashion as with EEMD simulations shown in Fig. 1. The energies for decoupling the ligand from the active site and decoupling the ligand from the bulk solvent were performed using the TI procedure. Each decoupling consists of two stages involving removal of charges and scaling down the van der Waals interactions performed at various values of k. For each value of k, MD simulation production runs were performed for 2 ns and the ensemble


averages were collected. The decoupling of (GlcNAc)2 ligand was performed with 9 values of k varying between 0 and 1. Thus the entire TI procedure for estimating the (GlcNAc)2 binding energies consists of 36 MD production runs each with a 2 ns time length. Thus the TI run has an aggregated time length of 72 ns of MD simulation for estimating the binding energy values for (GlcNAc)2. The final integrations for solving Eq. 5 to estimate the binding energy was performed numerically using the trapezoid rule. Besides the MD simulations, empirical docking methods were also explored for calculating binding energies of hevein with GlcNAc and (GlcNAc)2 ligands. The binding energies were estimated using the AutoDock4.2 [24] and Glide (Schro¨dinger LLC, NY) docking programs. Computational docking generally involves the production of poses of the ligand in the active site of the protein and the best poses are selected by a scoring function that evaluates the binding energy of the ligand in a given pose. The energy calculations with AutoDock and Glide were performed in two ways. Initially, the poses of GlcNAc and (GlcNAc)2 that were used for MD simulations were minimized in place and the energy was evaluated. In the second approach, a routine docking run was performed generating multiple poses and finally the best pose that is suggested by the scoring function was selected. It should be noted that both the docking programs, AutoDock and Glide do not have explicit solvent representations during the docking. The minimization in AutoDock was performed with the local minimization options for a maximum of 1,000 steps. The full dockings were performed using the Lamarckian genetic algorithm (LGA) options in AutoDock. For each LGA run, population sizes of 150 individuals for 27,000 generations were generated. The genetic algorithm (GA) parameters mutation rate pf 0.02, crossover rate of 0.8 and a window size of 10 were chosen. The Solis and Wets local search parameters were used for the local optimization. A maximum of 1,000 iterations were chosen. The termination criteria for the LGA run was either the number of energy evaluations, which is 2,500,000 or the number of generations whichever occurs earlier. The calculations were performed for a maximum of 10 GA runs. In Glide, the calculations were performed with the XP mode of Glide. The local scoring was performed using the ‘‘refine only’’ option.

Results and discussion Binding energy of GlcNAc Initially, the EEMD simulation for the hevein–GlcNAc system was performed without applying the positional constraints on the ligand. The distance between the ligand

and the binding site of the protein were analyzed for subensemble one, which represents the complete presence of the ligand in the binding site of the protein. The distance analysis of the first subensemble showed major fluctuations ˚ (Fig. 2). After approximately 50 in the range of 20–45 A transitions to the first subensemble from the other subensembles, it is evident that the ligand drifted away from the protein binding site. The displacement of the ligand away from the binding site is obviously due to the reduced interactions between the protein and the ligand in the intermediate subensembles. The energies collected while the ligand is away from the binding site of the hevein do not represent the binding energy of GlcNAc to the active site of hevein. Thus the unrestrained EEMD simulations for estimating the protein–ligand binding energies are unreliable unless the ligand maintains its position in the binding site of the protein throughout the simulation. The EEMD simulation for the hevein–GlcNAc system was later performed with positional restraints applied on the atoms of the ligand in the binding site of the protein. The restrained EEMD simulation for decoupling GlcNAc from hevein binding site converged after 50 ns of simulation time and took approximately 25,000 core CPU hours. It was observed that during the optimization of balancing factors, the first subensemble that represents complete presence of the GlcNAc molecule was visited 1,570 times from the other 26 subensembles that represent either partial or null presence of the GlcNAc molecule. The distance analysis of the first subensemble during the optimization run shows that the restraints succeeded in preserving the

Fig. 2 Unrestrained EEMD simulation of hevein–GlcNAc complex. The snapshots of subensemble one were extracted from the trajectory and distance between centroid of the carbohydrate and centroid of three atoms in the binding site was measured. The dark line parallel to the x-axis represents the mean distance between the carbohydrate and the protein binding site. The trend of distance shows that the ligand moved out of the binding site after the first 50 transitions to the first subensemble

123


Fig. 3 Restrained EEMD simulation of hevein with GlcNAc. The subensemble-1 snapshots are extracted from the trajectory and distance between centroid of the carbohydrate and centroid of three atoms in the binding site is measured. The dark line parallel to the x-axis represents the mean distance between the carbohydrate and the protein binding site. The positional restraints on the ligand and atoms in the protein binding site are effective in holding the ligand in binding site throughout all the subensemble transitions

coordinates of ligand within the binding site of the protein throughout the EEMD simulation (Fig. 3). The distance between the GlcNAc carbohydrate and amino acid residues in the binding site of the protein had maximum displace˚ from the starting position (Fig. 3). The ment of 6.5 A

Fig. 4 Subensemble transition matrix for hevein with GlcNAc simulation with positional restraints on the ligand and several atoms in the protein binding site. The color code has a maximum cut-off of 120 transitions

123

fluctuation of the distance is in agreement with the harmonic potential that allows a typical applied positional ˚ from the initial position. displacement of 5 A In addition to optimizing the balancing factors, the minimum requirement for reliable energy estimates is that the system should walk through extreme subensembles (ligand fully present and fully absent) several times during the production run. The transitions between the extreme subensembles 1 and 26 have occurred 14 times during the production EEMD simulation. Thus the energies are collected for 26 cycles of forward and backward transitions of the extreme subensembles. More transitions between the inner consecutive subensembles were observed when compared to the transitions between the initial and final subensembles (Fig. 4). The snapshots from the first and the last subensembles were analyzed and visually confirmed that the space of the ligand is fully solvated with the water molecules in the subensemble 26 (Fig. 5). The energies for transferring the GlcNAc ligand from hevein binding site (DE1) and the ligand GlcNAc from the bulk solvent (DE2) to ideal gas phases were estimated to be -21.97 and -20.54 kcal mol-1 respectively. The difference in the energies is the binding energy for GlcNAc monosaccharide to hevein estimated to be around -1.43 kcal mol-1, which is in good agreement with experimental data where the binding energy is estimated to be around -2.0 kcal mol-1 [4].


Fig. 5 Z-clipped images of the extreme subensembles of the hevein– GlcNAc EEMD simulation. The protein and GlcNAc surfaces are shown in pink and green wire representations respectively. The water molecules are shown in stick representation. a Snapshot of the

subensemble 26 where the GlcNAc ligand is completely absent. It is observed that water molecules occupy the ligand region when in the subensemble 26. b A snapshot of subensemble 1 where the GlcNAc ligand is fully present

Binding energy of (GlcNAc)2 Since the restrained EEMD simulation of the hevein GlcNAc complex was successful in preserving the ligand in the binding site of the protein over the subensembles, the (GlcNAc)2 simulations were performed with positional restrains on the ligand. The balancing factors for the EEMD simulation with hevein and (GlcNAc)2 were optimized after 50 ns of simulation time and took approximately 27,000 core cpu hours. During the simulation it was observed that the subensemble one in which the (GlcNAc)2 ligand is fully represented was visited at least 2,000 times from the other subensembles when optimizing the balancing factors. Only minor fluctuations in the distance between the centroid of the ligand and centroid representing the binding site residues were observed (Fig. 6). Thus the stability of the ligand in the binding site throughout the expanded ensembles was confirmed. The analysis of the first subensemble for the hevein (GlcNAc)2 simulation shows a very minor variation of the distances about the mean position despite the generous ˚ disrestraint on the ligand that allows a maximum of 5 A placement of the ligand. The fluctuations of the (GlcNAc)2 ligand are lower when compared to the GlcNAc ligand in the binding site of the protein. This could be because of the higher binding affinity of the (GlcNAc)2 when compared with the GlcNAc ligand. The energies for transferring the (GlcNAc)2 ligand from hevein binding site (DE1) and the ligand (GlcNAc)2 from the bulk solvent (DE2) to ideal gas phases were estimated to be about -31.92 and -28.52 kcal mol-1 respectively. The difference in the energies is the binding energy for (GlcNAc)2 to hevein estimated to be around -3.4 kcal mol-1, which is in good agreement with experimental data where the binding energy is estimated to be around -3.8 kcal mol-1 [4].

Fig. 6 EEMD simulation of hevein (GlcNAc)2 complex with positional restraints on the ligand and atoms in protein binding site. The snapshots of the first subensemble were extracted from the trajectory and distance between the centroid of the carbohydrate and the centroid of three atoms in the binding site was measured. The dark line represents the mean distance for the entire simulation

Comparisons In addition to the hevein carbohydrate binding energy calculations, as a benchmark, the standard energy for decoupling a water molecule from the bulk at 298 K is performed using the EEMD methodology and is estimated to be around 7.0 kcal mol-1. The calculations are performed with similar EEMD parameters that are used for the hevein carbohydrate binding energy calculations. However, the EEMD calculations for the estimation of energy required for decoupling a water molecule from the bulk converged faster than the EEMD simulations for hevein carbohydrate binding energy calculations. This is due to the

123


fact that in the water molecule has fewer atoms when compared with the GlcNAc molecule for performing the EEMD simulations. The estimated energy for decoupling a water molecule from the bulk using EEMD simulations is in good agreement with previously reported experimentally observed free energy of 6.3 kcal mol-1 [25]. In order to compare the differences between the Amber force field parameters ff94 and the frequently used ff99SB, a test EEMD simulation was performed to estimate the solvation free energy of a short peptide (Ser-Phe) in aqueous solution. The ff99SB consists of improved parameters for the protein backbone dihedrals [26]. The simulations with ff94 and ff99SB parameters were performed with 150 water molecules and showed similar solvation free energy values for the Ser-Phe peptide using both the parameter sets. The ff94 parameters set produced a solvation free energy value of -15.0 kcal mol-1 while the ff99SB parameters produced a solvation free energy value of -16.0 kcal mol-1. In this study, the ligand that is being decoupled is the (GlcNAc)1-2 and the protein remains in the system before and after the decoupling is performed. For this reason, the usage of ff99SB parameters in the current EEMD simulations may not significantly affect the final binding energy values. However for EEMD simulations involving coupling/decoupling of protein-peptide systems, the ff99SB parameters might make a difference in the final energy estimations. The binding energy of hevein–GlcNAc interaction estimated by the MMGBSA and normal mode methods resulted in -2.0 kcal mol-1 which agrees well with the experimental binding data. From the MM/GBSA methodology, the hevein–GlcNAc interactions enthalpy component is estimated to be around -16 kcal mol-1 while the entropy component is estimated to be around -14 kcal mol-1. The estimation of the entropy component using the nmode methodology might be incomplete since the entropy estimations reflect the contributions only from the protein and ligand but not the solvent. Depending on the nature of interaction between the protein and ligand, the solvent entropy might affect the binding energy values. The binding energy for the hevein (GlcNAc)2 system using TI procedure is estimated to be -4.7 kcal mol-1. The binding energy calculation using TI is seen to be more accurate than the MM/GBSA procedure for the hevein (GlcNAc)2 system. However, following the TI method, it is difficult to estimate the enthalpy and entropy components of the binding energy. Theoretically it might be possible to estimate the enthalpy using the end point energy difference methodology. However such a procedure needs to be validated through case studies with existing experimental data. Further studies are necessary to compare the advantages and disadvantages of the TI methodology when compared with the EEMD simulations.

123

Table 1 Binding energies of hevein with GlcNAc and (GlcNAc)2 using various computational methods and comparison with experimental values

MMGBSA

Ligand

DG (kcal mol-1)

DH (kcal mol-1)

TDS (kcal mol-1)

GlcNAc

-2.0

-16.0

-14.0

(GlcNAc)2

-5.2

-22.2

-17.0

GlcNAc

-1.8

-25.5

-23.7

(GlcNAc)2

-3.4

-28.5

-25.1

TI

(GlcNAc)2

-4.7

NA

NA

GLIDE-XP score

GlcNAc

-4.1

NA

NA

(GlcNAc)2

-4.8

NA

NA

GlcNAc

-4.3

NA

NA

(GlcNAc)2

-3.8

NA

NA

GlcNAc

-2.0

NA

NA

(GlcNAc)2

-3.8

-6.3

-2.5

EEMD

AutoDock score Experimental

The experimental values are shown in bold

The binding energy for hevein–(GlcNAc)2 system was estimated to be around -4.8 and -3.8 kcal mol-1 from Glide and AutoDock respectively. The docking of the poses of hevein–GlcNAc complexes using docking programs GLIDE and AutoDock indicate binding energy values that are slightly lower when compared with the experimental data. The binding energy of hevein-(GlcNAc)2 interaction was estimated to be -5.2 kcal mol-1 by the MM/GBSA method, which is also an overestimation of the binding energy when compared with the experimental data. The estimation of binding energies for hevein-(GlcNAc)2 interaction by the docking programs, GLIDE and AutoDock showed binding energies of -4.8 and -3.8 kcal mol-1 respectively. Similar to the MMGBSA methodology, the energies predicted by GLIDE for the hevein (GlcNAc)2 system are seen to overestimate the binding energy values. The failure of the MMGBSA and the scoring methods to accurately estimate the binding energies of hevein interactions with (GlcNAc)1–2 might be because of incomplete estimation of the entropy term. The EEMD methodology on the other hand estimated the binding energies of GlcNAc and (GlcNAc)2 with significant accuracy. It is observed that both the MMGBSA and EEMD methodologies estimated the enthalpy term ranging from -28.5 to -16.0 kcal mol-1. The estimated absolute values of enthalpy and entropy terms by both MMGBSA and EEMD methodologies do not match with the experimental values for (GlcNAc)2 binding (Table 1). It is generally observed that the computationally estimated enthalpy and entropy values need not necessarily match with the absolute experimental values, but the combination, which is the binding energy could still be accurately estimated [27, 28]. This suggests that further parameterization


of the force field is necessary for accurate estimation of enthalpy and entropy values.

Conclusions In this study EEMD simulations were used successfully to estimate the binding energies of hevein with its mono- and disaccharide ligands. The binding energy estimations for hevein with GlcNAc and (GlcNAc)2 are in good agreement with the experimental data. The results from the EEMD simulations are in agreement with the fact that the hevein (GlcNAc)1–2 interactions are enthalpy dominated and the increase in the enthalpy component is compensated by a proportional entropy component. It is observed that the optimization of the balancing factors is the crucial and computationally expensive stage of the EEMD simulations and it might be possible to improve the efficiency of this step. The binding energy estimations using the EEMD simulations are more accurate than the estimations using end point energy difference calculations and the estimations from existing docking methods. Further studies are necessary to compare the EEMD methodology with the TI methodology for estimating the binding energies of biological systems. The EEMD simulations have the advantage that the entropic effects from the solvent are considered in estimating the binding energies and the thermodynamic components.

Supplementary material The MDynaMix program for performing the EEMD simulations is freely distributed under terms of GNU public license and can be downloaded from http://www.fos.su.se/ *sasha/md_prog.html/. Acknowledgments The authors thank Prof. Graham J. L. Kemp at the Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden for valuable suggestions and comments on the manuscript. The computational resources at C3SE (www.c3se.chalmers.se) provided by the Swedish National Infrastructure for Computing (SNIC) are gratefully acknowledged. Financial support from Biognos AB is gratefully acknowledged.

References 1. Boller T, He SY (2009) Science 324(5928):742 2. Zipfel C (2008) Curr Opin Immunol 20(1):10 3. Jimenez-Barbero J, Canada FJ, Asensio JL, Aboitiz N, Vidal P, Canales A, Groves P, Gabius HJ, Siebert HC (2006) Adv Carbohydr Chem Biochem 60:303 4. Asensio JL, Canada FJ, Siebert HC, Laynez J, Poveda A, Nieto PM, Soedjanaamadja UM, Gabius HJ, Jimenez-Barbero J (2000) Chem Biol 7(7):529 5. Bryce RA, Hillier IH, Naismith JH (2001) Biophys J 81(3):1373 6. Laederach A, Reilly PJ (2003) J Comput Chem 24(14):1748 7. Kerzmann A, Neumann D, Kohlbacher O (2006) J Chem Inf Model 46(4):1635 8. Hill AD, Reilly PJ (2008) J Comput Chem 29(7):1131 9. Toone EJ (1994) Curr Opin Struct Biol 4(5):719 10. Shivakumar DM, Graves AP, Boyce S, Case DA, Shoichet BK (2008) Abstracts of Papers of the American Chemical Society 236 11. Shivakumar DM, Case DA (2006) Abstracts of Papers of the American Chemical Society 232: 107 12. Archontis G, Simonson T, Karplus M (2001) J Mol Biol 306(2):307 13. Marelius J, Hansson T, Aqvist J (1998) Int J Quantum Chem 69(1):77 14. Lyubartsev AP, Forrisdahl OK, Laaksonen A (1998) J Chem Phys 108(1):227 15. Lyubartsev AP, Laaksonen A, Vorontsovvelyaminov PN (1994) Mol Phys 82(3):455 16. Aberg KM, Lyubartsev AP, Jacobsson SP, Laaksonen A (2004) J Chem Phys 120(8):3770 17. Fajer M, Hamelberg D, McCammon JA (2008) J Chem Theory Comput 4(10):1565 18. Straatsma TP, Berendsen HJC (1988) J Chem Phys 89(9):5876 19. Wang FG, Landau DP (2001) Phys Rev Lett 86(10):2050 20. Jambeck JPM, Mocci F, Lyubartsev AP, Laaksonen A (2013) J Comput Chem 34(3):187 21. Hamelberg D, McCammon JA (2004) Abstracts of Papers of the American Chemical Society 227: U1007 22. Conformational Analysis Tools \http://www.md-simulations.de/ CAT/[. Accessed 23. Kirkwood JG (1935) J Chem Phys 3(5):300 24. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, Olson AJ (2009) J Comput Chem 30(16):2785 25. Bennaim A, Marcus Y (1984) J Chem Phys 81(4):2016 26. Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C (2006) Proteins Struct Funct Bioinform 65(3):712 27. Deng NJ, Zhang P, Cieplak P, Lai LH (2011) J Phys Chem B 115(41):11902 28. Singh N, Warshel A (2010) Proteins Struct Funct Bioinform 78(7):1705

123

Efficient nonequilibrium method for binding free energy calculations in molecular dynamics simulations.

Understanding Miltefosine-Membrane Interactions Using Molecular Dynamics Simulations.

Free energy calculations from adaptive molecular dynamics simulations with adiabatic reweighting.

Molecular dynamics algorithm enforcing energy conservation for microcanonical simulations.

Insight into the structural mechanism for PKBα allosteric inhibition by molecular dynamics simulations and free energy calculations.

Characterizing the Free-Energy Landscape of MDM2 Protein-Ligand Interactions by Steered Molecular Dynamics Simulations.

Assessing the ligand selectivity of sphingosine kinases using molecular dynamics and MM-PBSA binding free energy calculations.

Molecular dynamics simulations of glycoproteins using CHARMM.

Free energy predictions of ligand binding to an α-helix using steered molecular dynamics and umbrella sampling simulations.

Modeling of annexin A2-Membrane interactions by molecular dynamics simulations.

Molecular dynamics simulations of anchored viral peptide interactions.

PBSA Free Energy Calculations.

Predicting RNA Duplex Dimerization Free-Energy Changes upon Mutations Using Molecular Dynamics Simulations.

Insight into the binding modes of Lassa nucleoprotein complexed with ssRNA by molecular dynamic simulations and free energy calculations.

Coarse-grained molecular dynamics simulations of protein-ligand binding.

Analyzing machupo virus-receptor binding by molecular dynamics simulations.

Interactions of Multimodal Ligands with Proteins: Insights into Selectivity Using Molecular Dynamics Simulations.

Revealing inhibition difference between PFI-2 enantiomers against SETD7 by molecular dynamics simulations, binding free energy calculations and unbinding pathway analysis.

Influence of Chirality of Crizotinib on Its MTH1 Protein Inhibitory Activity: Insight from Molecular Dynamics Simulations and Binding Free Energy Calculations.

Polymer segregation under confinement: free energy calculations and segregation dynamics simulations.

Restrained and unrestrained molecular dynamics simulations in the NVT ensemble of alamethicin.

Relative Binding Enthalpies from Molecular Dynamics Simulations Using a Direct Method.

Thermodynamics of protein-peptide interactions in the ribonuclease-S system studied by molecular dynamics and free energy calculations.

Thermal Transport in Fullerene Derivatives Using Molecular Dynamics Simulations.