Coarse-grained molecular dynamics simulations of protein-ligand binding.

FULL PAPER

WWW.C-CHEM.ORG

Coarse-Grained Molecular Dynamics Simulations of Protein–Ligand Binding Tatsuki Negami,[a] Kentaro Shimizu,[a,b] and Tohru Terada*[a,b] Coarse-grained molecular dynamics (CGMD) simulations with the MARTINI force field were performed to reproduce the protein–ligand binding processes. We chose two protein–ligand systems, the levansucrase–sugar (glucose or sucrose), and LinB–1,2-dichloroethane systems, as target systems that differ in terms of the size and shape of the ligand-binding pocket and the physicochemical properties of the pocket and the ligand. Spatial distributions of the Coarse-grained (CG) ligand molecules revealed potential ligand-binding sites on the protein surfaces other than the real ligand-binding sites. The ligands bound most strongly to the real ligand-binding sites. The binding and unbinding rate constants obtained from the CGMD simulation of the levansucrase–sucrose system were

approximately 10 times greater than the experimental values; this is mainly due to faster diffusion of the CG ligand in the CG water model. We could obtain dissociation constants close to the experimental values for both systems. Analysis of the ligand fluxes demonstrated that the CG ligand molecules entered the ligand-binding pockets through specific pathways. The ligands tended to move through grooves on the protein surface. Thus, the CGMD simulations produced reasonable results for the two different systems overall and are useful for C 2014 Wiley studying the protein–ligand binding processes. V Periodicals, Inc.

Introduction

simulation for a larger molecular system can be performed at a lower computational cost using a CG model. To date, the dynamics of substrate binding and product release have been studied with a CG model tailored for a specific protein system.[4] The MARTINI force field[5] is a widely used general-purpose CG model for biomolecular simulations. In the MARTINI model, four non-hydrogen atoms are generally represented by a single CG bead.[5] The force field parameters for lipids,[6] proteins,[7] and carbohydrates[8] have been developed. This model has been used to simulate various biological phenomena, including spontaneous aggregation of phospholipids into a bilayer,[9] gate opening of a mechanosensitive protein channel in a liposome in a condition of hypo-osmotic shock,[10–12] and self-assembly of G-protein coupled receptors in a lipid bilayer.[13,14] In addition, dynamics of a ligand peptide within the ligand-binding pocket of an oligopeptide-binding protein have been studied with this model.[15] In this study, we explored the applicability of a coarsegrained molecular dynamics (CGMD) simulation using the MARTINI force field to the modeling of a ligand binding to a

Proteins exhibit their biological functions by binding with other molecules, such as proteins, nucleic acids, and small molecules. An understanding of the mechanism of binding helps understanding of protein functions at the atomistic level. Because direct observation of such binding using experimental methods is impracticable, theoretical approaches, such as molecular dynamics (MD) simulation, are expected to be useful. Buch et al.[1] and Dror et al.[2] reported success in reproducing the processes of binding of small molecule ligands to proteins using all-atom (AA) MD simulations. These simulations revealed that the ligands entered the ligand-binding pockets on the protein surfaces by passing through specific metastable states. A comparison of the two simulations indicates that the ligand-binding pathways differ between proteins. Therefore, to increase our understanding of the ligand-binding mechanisms, we should next identify the important factors determining the ligand-binding pathway. To this end, it is necessary to repeatedly perform MD simulations for different types of protein– ligand pairs. However, the MD simulations demand huge computational resources, and this requirement impedes their application to various protein–ligand systems. Coarse-grained (CG) models have attracted attention because they can extend the applicability of MD simulation.[3] Because the CG model has a smaller number of interaction centers than does the AA model, the computational cost required for the CG simulation is much less than that required for the AA simulation.[3] In addition, the CG representation of a molecular system makes its potential energy surface smooth.[3] This allows for the use of a larger time step in the MD simulation as well as enables more efficient sampling of the conformational space. Therefore, an effectively longer MD

DOI: 10.1002/jcc.23693

[a] T. Negami, K. Shimizu, T. Terada Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan. E-mail: [email protected] [b] K. Shimizu, T. Terada Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan Contract/grant sponsor: JSPS KAKENHI; Contract/grant numbers: 23300109, 24570179 (to K. S. and T. T.); Contract/grant sponsor: Platform for Drug Discovery, Informatics, and Structural Life Science (to K. S. and T. T.) C 2014 Wiley Periodicals, Inc. V

Journal of Computational Chemistry 2014, 35, 1835–1845

1835

FULL PAPER

WWW.C-CHEM.ORG

Table 1. Detailed conditions of each simulation. Protein levansucrase levansucrase LinB

Ligand

Number of ligand molecules

Ligand concentration, clig (mM)

Simulation length

glucose sucrose 1,2-dichloroethane

5 5 10

13.5 13.5 41

1 ls 3 100 4 ls 3 50 1 ls 3 100

protein. We chose two protein–ligand systems, the levansucrase–sugar, and LinB–1,2-dichloroethane systems, which have quite different properties in terms of the shape of the ligandbinding pocket and the hydrophobicity of the ligand. For each system, CGMD simulations were performed, starting from the initial structures in which the ligand molecules were randomly placed around the protein. The thermodynamic and kinetic parameters of ligand binding were estimated from the MD trajectories and compared with the experimental data. In addition, we examined the effects of the properties of the ligand and the ligand-binding pocket on the ligand-binding pathway.

Methods Protein–ligand systems To select different types of protein–ligand systems, we first classified protein–ligand complex structures in the Protein Data Bank (PDB) into groups according to the hydrophobicity of the ligand and the shape of the ligand-binding pocket. From the group with a hydrophilic ligand and a shallow, wide ligand-binding pocket, the levansucrase–sucrose system was selected, because both the ligand-free and ligand-bound structures were determined using X-ray crystallography, the structural change caused by ligand binding is small, and the CG model of the ligand is available. Levansucrase catalyzes the elongation of levan, a homo polysaccharide of D-fructofuranose, transferring the fructosyl group of sucrose to levan.[16] In addition to the elongated levan, glucose is produced by this reaction.[16] Therefore, levansucrase binds sucrose as the substrate and binds glucose as the product. The PDB IDs of the ligand-free and sucrose-bound structures are 1OYG and 1PT2, respectively.[16] The Ca root-mean-square difference (RMSD) ˚ . From the group with a between the two structures is 0.15 A hydrophobic ligand and a deep, narrow ligand-binding pocket, the haloalkane dehalogenase LinB–1,2-dichloroethane system was selected. LinB catalyzes the hydrolysis of haloalkanes, such as 1-chlorobutane to alcohols and inorganic halides, and is inhibited by 1,2-dichloroethane in a competitive manner.[17] The structures of the ligand-free and ligand (1,2-dichloroethane)-bound forms have been determined using X-ray crystallography. The PDB IDs for the ligand-free and ligand-bound structures are 1IZ7 and 1G5F, respectively.[17] The Ca RMSD ˚. between the two structures is 0.18 A

1836

tions, are described by the MARTINI force field, whereas the protein tertiary structure is maintained using the elastic network model.[18] Although the protein structure can fluctuate around its equilibrium structure, the protein cannot greatly change its conformation during the simulation. Therefore, as described earlier, we chose protein–ligand systems in which the protein structure does not change on binding to the ligand. The CG models of the proteins were generated from the ligand-free forms of the crystal ˚ structures. All the backbone beads within a cutoff distance of 9 A 21 ˚ 22 were linked by springs with a force constant of 500 kJ mol A . Because some side-chain beads were quite close to each other in the initial CG models with distances less than rij of the LennardJones potential, these beads were also connected by springs with a force constant of 30 kJ mol21 A˚22 to avoid strong repulsion between the beads. MARTINI force field (ver. 2.1)[7] was used for the protein models and the water model. The CG models of glucose and sucrose were taken from the MARTINI force field for carbohydrates. The ligand of LinB, 1,2-dichloroethane, was represented by one C4-type CG bead, because the partitioning free energy between water and octanol of this molecule (DGpart 5 8.5 kJ mol21)[19] was close to that of this CG bead (DGpart 5 9 kJ mol21).[6] The Lennard-Jones parameter r was not changed from its default value (4.7 A˚), although this value is ˚ ). slightly smaller than the width of the molecule (5.4 A CGMD simulations of each protein–ligand system were performed as follows. The ligand molecules were randomly placed around the protein. Then, the protein and the ligand molecules were immersed in a box of the CG water, after ensuring a mini˚ between any protein bead and the face of mum distance of 10 A the box. The CG water comprised the P4 and BP4 beads mixed at a molar ratio of 9:1. The BP4 beads were added to prevent freezing of the CG water.[6] No salt ions were added to the CG water. The CGMD simulations were performed repeatedly with different initial velocities and initial ligand placements. The numbers of the ligands, ligand concentrations (clig), and the lengths and the numbers of the simulations are listed in Table 1. Nonbonded interactions were calculated according to a previous study.[7] Some bond lengths were constrained in the ELNEDIN model.[18] The LINCS algorithm[20,21] was used to constrain them. The time step of each simulation was 20 fs. The temperature and pressure were maintained at 310 K and 1.0 3 105 Pa, respectively, using the weak coupling method.[22] The coordinates were recorded every 50 ps. All simulations were performed using GROMACS 4.0.7.[23]

CGMD simulations

AA simulations

The ELNEDIN model[18] was used to construct the CG models of the proteins. In this model, the intermolecular interactions, such as the protein–ligand, protein–water, and ligand–water interac-

For a comparison, all-atom molecular dynamics (AAMD) simulations were also performed for the ligand-free forms of the two proteins (PDB IDs: 1OYG and 1IZ7). The initial coordinates


WWW.CHEMISTRYVIEWS.COM

FULL PAPER

WWW.C-CHEM.ORG

of the proteins were taken from the PDB files. Because the coordinates of the N-terminal residues are missing in the PDB files, the N-terminal residues of the PDB files were capped with acetyl groups for both proteins. All the ions and small molecules were removed except for the calcium ion that tightly binds to levansucrase. Solvation shells containing sodium and chloride ions were added to the proteins with Solvate 1.0. The numbers of the ions were determined so that the net charges of the whole systems were zero and the final salt concentrations were approximately 50 mM. Then, water molecules were further added using the LEaP module of AmberTools 12[24] so that the shape of the system became a rectangular parallelepiped. The Amber ff99SBildn force field[25] and the TIP3P model[26] were used for the proteins and water molecules, respectively. After energy minimization, each system was equilibrated in the constant-NPT ensemble at 310 K and 1.0 3 105 Pa for 1 ns. During equilibration, harmonic position restraints were imposed on the Ca atoms of the protein, and the force constant of the restraints was gradually reduced from 41.8 to 0 kJ mol21 A˚22. One production run was performed for each system in the constant-NPT ensemble for 20 ns. System temperature was controlled using the velocity rescaling method,[27] and system pressure was controlled using the weak coupling method.[22] Electrostatic interactions were calculated using the particle mesh Ewald method.[28,29] The lengths of the bonds with hydrogen atoms were constrained with the LINCS algorithm[20,21] to allow the use of a large time step of 2 fs. The coordinates were recorded every ps. The simulations were performed using GROMACS 4.5.5.[23]

Calculation of distribution and flux of ligand For each protein–ligand system, the distribution and flux of the ligand were calculated as follows. First, the coordinates of each snapshot of the trajectories were translated and rotated so that the positions of the backbone beads of the protein best fit those of the corresponding Ca atoms of the crystal structure of the ligand-bound form. Then, the space was divided into a regular grid of cubic cells. The ligand density qi of the ith cell was calculated using

qi 5

T 1 X dðxt 2ci Þ; TVd t51

where T is the number of the snapshots in the trajectories, Vd is the volume of the cell used in the distribution analysis, xt is the center-of-mass coordinate of the ligand of the tth snapshot, and ci is the coordinate of the center of the ith cell. The function d(xt–ci) is 1 when xt is within the ith cell, and otherwise, it is zero. The flux of the ligand ji of the ith cell was calculated using ji 5

T 1 X dðxt 2ci Þvt ; TVf t51

where Vf is the volume of the cell used in the flux analysis and vt is the center-of-mass velocity of the ligand at the tth ˚ in the distrisnapshot. The edge lengths of the cells were 6 A ˚ bution analysis and 2.5 A in the flux analysis.

Estimation of rate constants The binding rate constant, kon, was inferred using a Bayesian approach. Because ligand binding can be viewed as a Poisson process, the probability distribution of the number of the ligandbinding events, nbind, follows the Poisson distribution given as follows

kon clig sfree Pðnbind jkon ; sfree Þ5 nbind !

nbind

exp 2kon clig sfree ;

where sfree is the duration of the ligand-free state and clig is the concentration of the ligand. The probability distribution of kon is calculated using P(nbindjkon, sfree) as the likelihood function Pðkon jnbind ; sfree Þ5

Pðnbind jkon ; sfree ÞPðkon Þ : Pðnbind Þ

When the simulation was performed multiple times, the prior, P(kon), could be estimated from the numbers of ligandbinding events in the previous runs. Therefore, the posterior after N runs is calculated using

P kon j nbind; 1 ; . . . ; nbind; N ; sfree; 1 ; . . . ; sfree; N P nbind; N jkon ; sfree; N P kon j nbind; 1 ; . . . ; nbind; N21 ; sfree; 1 ; . . . ; sfree; N21 5 P nbind; 1 ; . . . ; nbind; N ; sfree; 1 ; . . . ; sfree; N " # N Y 5C P nbind; i jkon ; sfree; i P0 ðkon Þ; i51

P N

nbind; i 11 hkon i5 ; P clig Ni51 sfree; i P N nbind; i 11 i51 2 hkon i2hkon i2 5 P 2 : clig Ni51 sfree; i i51

where nbind,i is the number of ligand-binding events in the ith run, sfree,i is the duration of the ligand-free state in the ith run, and C is the normalization factor. P0(kon) is the prior for the first run, for which we assumed a uniform distribution. In that case, the average and variance of kon are calculated using


1837

FULL PAPER

WWW.C-CHEM.ORG

The distribution function of koff was estimated in a similar manner. The probability distribution of the number of unbinding events, nunbind, is calculated using koff scomplex nunbind P nunbind jkoff ; scomplex 5 exp 2koff scomplex ; nunbind !

where scomplex is the duration of the ligand-bound state. After N runs, the probability distribution of koff is calculated using

P koff j nunbind; 1 ; . . . ; nunbind; N ; scomplex; 1 ; . . . ; scomplex; N P nunbind; N jkoff ; scomplex; N P koff j nunbind; 1 ; . . . ; nunbind; N21 ; scomplex; 1 ; . . . ; scomplex; N21 5 P nunbind; 1 ; . . . ; nunbind; N ; scomplex; 1 ; . . . ; scomplex; N " # N Y P nunbind; k jkoff ; scomplex; k P0 ðkoff Þ: 5C k51

The average and variance of koff were calculated similarly to those described earlier. The timing of binding and unbinding was determined by monitoring the distance of the center-of-mass of the ligand from the “ligand-binding site” (see definition below). When the distance became less than rbound, the ligand was judged to be bound to the protein. Conversely, when the distance became more than runbound, the ligand was judged to be dissociated from the protein.

Results and Discussion Fluctuations of proteins in CGMD simulations In the CGMD simulation, the structure of the protein was maintained using the elastic network model. Therefore, the dynamics of the protein depend on the parameters of the model, cutoff distance, and force constants. To confirm that the parameters used in the elastic network model were appropriate, we performed the CGMD and AAMD simulations for the ligand-free forms of the proteins and compared the dynamics of the proteins between the simulations. The CGMD simulations were performed for 1 ms without ligands. In the AAMD simulations, RMSDs of Ca atoms from those of the crystal structures reached constant levels within 10 ns. Therefore, we continued the simulations for a further 10 ns, and the trajectories of this period were used for the analyses. The average RMSD values calculated over this period in the AAMD simulations were 1.30 6 0.08 and 0.92 6 0.06 A˚ for levansucrase and LinB, respectively, and these values are almost the same as those from the CGMD simulations (1.24 6 0.08 and 1.05 6 0.06 ˚ , respectively; calculated for backbone beads). Similarly, rootA mean-square fluctuations (RMSFs) around the average structures of the CGMD simulations were comparable with those of the AAMD simulations for both proteins (Fig. 1). Looking more closely, there is a discrepancy in the RMSF plot around Pro181 of LinB. Because Pro181 is located near the center of a loop, its backbone bead was connected in the elastic network model only with those residues close to it in the primary sequence. As a result, it showed a large fluctuation in the CGMD simulation. In the AAMD simulation, however, its adja1838


cent residue, Arg180, formed tight salt bridges with Glu186 and Glu276, which made the loop less flexible. These interactions were less stable in the CGMD simulation. We believe it is unlikely that this discrepancy affected the ligand-binding process because these residues are located away from the ligandbinding pocket. Therefore, we concluded that the protein dynamics were properly reproduced in the CGMD simulations and proceeded to the next step of CGMD simulations with ligands.

Protein–ligand-binding CGMD simulations First, protein–ligand-binding CGMD simulations were performed for the levansucrase–glucose system. One-microsecond simulations were repeated 100 times, with random changes in the initial velocities and the initial placement of the ligand molecules between each run. For each snapshot structure of the CGMD simulations, the “ligand-binding site” was defined as the center of mass of the ligand in the ligand-bound crystal structure, calculated after superposing the Ca atoms of the crystal structure on those of the snapshot structure. Because only the sucrose-bound crystal structure is available, the glucose-binding site was calculated as the center of mass of the glucose moiety of sucrose. Then, the distance between the center of mass of each ligand molecule and the ligand-binding site was calculated for each snapshot structure. Figure 2a shows the time evolutions of the distances of those molecules ˚ ) in one CGMD that came near the ligand-binding site (

Analyzing machupo virus-receptor binding by molecular dynamics simulations.

Molecular dynamics simulations in biology.

Floating orbital molecular dynamics simulations.

Homology modeling, molecular docking, and molecular dynamics simulations elucidated α-fetoprotein binding modes.

Microsecond molecular dynamics simulations of lipid mixing.

Molecular dynamics simulations of classical stopping power.

Molecular dynamics simulations of large macromolecular complexes.

Accelerated molecular dynamics simulations of protein folding.

Molecular dynamics simulations of helix denaturation.

Molecular dynamics simulations of glycoproteins using CHARMM.

Molecular dynamics simulations: advances and applications.

AceCloud: Molecular Dynamics Simulations in the Cloud.

New ways to boost molecular dynamics simulations.

Molecular Dynamics Simulations of Ternary Complexes: Comparisons of LEAFY Protein Binding to Different DNA Motifs.

Investigation of ion binding in chlorite dismutases by means of molecular dynamics simulations.

Assessing the potential of atomistic molecular dynamics simulations to probe reversible protein-protein recognition and binding.

Molecular Dynamics Simulations of Insulin: Elucidating the Conformational Changes that Enable Its Binding.

Investigation of arc repressor DNA-binding specificity by comparative molecular dynamics simulations.

Molecular dynamics simulations: insight into molecular phenomena at interfaces.

Molecular dynamics and Monte Carlo simulations for protein-ligand binding and inhibitor design.

Searching for protein binding sites from Molecular Dynamics simulations and paramagnetic fragment-based NMR studies.

Insights into ligand binding to PreQ1 Riboswitch Aptamer from molecular dynamics simulations.

Binding energy calculations for hevein-carbohydrate interactions using expanded ensemble molecular dynamics simulations.

Relative Binding Enthalpies from Molecular Dynamics Simulations Using a Direct Method.