FULL PAPER

WWW.C-CHEM.ORG

Multipolar Electrostatics for Proteins: Atom–Atom Electrostatic Energies in Crambin Yongna Yuan,[a,b]y Matthew J. L. Mills,[a,b]z and Paul L. A. Popelier[a,b]* Accurate electrostatics necessitates the use of multipole moments centered on nuclei or extra point charges centered away from the nuclei. Here, we follow the former alternative and investigate the convergence behavior of atom-atom electrostatic interactions in the pilot protein crambin. Amino acids are cut out from a Protein Data Bank structure of crambin, as single amino acids, di, or tripeptides, and are then capped with a peptide bond at each side. The atoms in the amino acids are defined through Quantum Chemical Topology (QCT) as finite volume electron density fragments. Atom-atom electrostatic energies are computed by means of a multipole expansion with regular spherical harmonics, up to a total interaction rank of L 5 ‘A1 ‘B 1 1 5 10. The minimum internuclear distance in the convergent region of all the 15 possible types

of atom-atom interactions in crambin that were calculated based on single amino acids are close to the values calculated from di and tripeptides. Values obtained at B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ levels are only slightly larger than those calculated at HF/6-31G(d,p) level. This convergence behavior is transferable to the well-known amyloid beta polypeptide Ab1–42. Moreover, for a selected central atom, the influence of its neighbors on its multipole moments is investigated, and how far away this influence can be ignored is also determined. Finally, the convergence behavior of AMBER becomes closer to C 2013 Wiley that of QCT with increasing internuclear distance. V Periodicals, Inc.

Introduction

one to regard molecular fragments as isolated from the rest of the molecule, especially if the rest is far away. Of course, this is not true for delocalized systems (even as small as an imidazole ring in histidine), which cannot be cut into isolated fragments. One should be prepared to find that the transferable molecular fragment groups can be larger than the classic named functional groups of organic chemistry. For amino acids in a small protein such as crambin, many molecular fragments have a chemical meaning that is largely isolated from the full molecular environment. Therefore, if this fragment appears in a small molecule, the (quantum) chemical information it contains must be safely transferable from a small molecule (e.g., a single amino acid) to a large one (e.g., a protein). Although transferability is the cornerstone of any force field this key concept has not been researched as much as it deserves, certainly not by the community that uses force fields or modifies existing ones. At a practical level, transferability is translated into the concept of an atom type. This is essentially a “packet of information,” that is, an atom surrounded by a

Obtaining reliable information on the structure and dynamics of proteins through computation remains a difficult problem because of the challenges of effective sampling and accurate energy evaluation. The current contribution concerns the latter only, and tackles this challenge in the context of force field design. The sheer computational cost of molecular dynamics simulations will justify the use of force fields in the foreseeable future, in spite of ever increasing availability of computer power. In fact, we propose that this power should be utilized to radically redesign force fields with an eye to making them more realistic. Here, we focus on the electrostatic interaction, which is widely recognized as a leading component in the modeling of polar polymers called proteins. Popular force fields, such as assisted model building with energy refinement (AMBER),[1] are still dominated by (atomic partial) charges, which are often simply called point charges. Even recent improvements to this force field such as the f99SB modification[2] continue to rely on point charges, in spite of consistent evidence of their limited accuracy. Unfortunately, this evidence appears rather scattered in the literature (e.g., Refs. [3–21]) but is currently being brought together in a review[22] that demonstrates and highlights the necessity of atomic multipole moments for accurate electrostatic interaction. The central concept behind force fields is that of transferability. In its broadest sense, this concept expresses the “locality of chemistry” through the existence of functional groups. These groups represent molecular fragments that retain their properties when transferred from one molecule to another. In other words, the molecular electron density allows

DOI: 10.1002/jcc.23469

[a] Y. Yuan, M. J. L. Mills, P. L. A. Popelier Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester, M1 7DN, United Kingdom [b] Y. Yuan, M. J. L. Mills, P. L. A. Popelier School of Chemistry, University of Manchester, Oxford Road, Manchester, M13 9PL, United Kingdom E-mail: [email protected] † Present address: School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China ‡ Present address: Department of Chemistry (SGM 418), University of Southern California, 3620 McClintock Avenue, Los Angeles, California 90089 C 2013 Wiley Periodicals, Inc. V

Journal of Computational Chemistry 2014, 35, 343–359

343

FULL PAPER

WWW.C-CHEM.ORG

given environment, which has its own energy parameters. With the sole knowledge of a Lewis diagram (which is “back of the envelope”) a protein backbone (i.e., its nuclear skeleton) can be dressed up with the information the atom type introduces. A valuable challenge is to extract atom types from actual molecular electron densities. This enables the computation of an atom type, which was achieved[23,24] for the first time in 2003, using cluster analysis acting on 760 so-called topological atoms sampled from the 20 natural amino acids and smaller derived molecules. That work used an appealing definition of an atom in a molecule, which is the one offered by Quantum Chemical Topology (QCT).[25–28] This method proposes topological atoms, which actually “draw themselves,” only using the central idea of the gradient vector operating on a molecular electron density. These topological atoms have a finite volume, do not overlap and exhaust space (i.e., each point in real 3D space must belong to an atom). The first computed atom types emerged from the properties of the atoms themselves, that is, their atomic charge (monopole moment), dipole to hexadecupole moments (magnitudes thereof ), and atomic kinetic energy. This analysis was already able to show that the guessed and imposed (rather than observed or detected) atom types of AMBER were ill balanced.[29] In other words, the AMBER atom types are sometimes overdifferentiated (e.g., there are too many nitrogen atom types in AMBER) while sometimes under-differentiated (e.g., aliphatic and aromatic alcohol oxygen atoms should be distinguished as different atom types). The inevitable introduction of multipole moments into accurate force field design presents the issue of convergence of the multipolar series expansion which, in principle, however, leads to the exact electrostatic interaction energy between two atoms. The convergence behavior between topological atoms has been extensively studied,[30–33] as well as that[34–37] of the electrostatic potential generated by a topological atom. It is important to realize that the finite volume of a topological atom enables formal (i.e., mathematically exact) convergence rather than pseudoconvergence.[33] In fact, 1,4 interactions, which play a role in torsional potentials, can be calculated through a practically converging topological multipole expansion.[33] Even 1,3 interactions can be successfully expressed like this and occasionally 1,2 interactions[38] as well. All this means that both the exploration of potential energy surfaces[39,40] and molecular dynamics simulations of liquids and solutions[41–44] are perfectly possible with topological multipole moments, in spite of a false assertion published elsewhere.[45] What is more, a novel polarization method is under development, which uses machine learning (kriging) that captures the way topological multipole moments change as a direct function of the positions of the surrounding nuclei.[46–50] This combined work will culminate in a novel topological force field[51] for biomolecular modeling and potential application in drug design.[52] In view of the importance of polarization in next-generation force fields[53] this kriging approach is anticipated to be able to capture both nonadditivity effects[54] in highly charged complexes and the polarizabilities of bonds and atom lone-pairs.[55] 344

Journal of Computational Chemistry 2014, 35, 343–359

The current contribution builds on earlier work[56] that investigated the long range behavior of high-rank topological multipole moments in three medium-sized systems: serine…(H2O)5, tyrosine…(H2O)5, and the tetrapeptide tuftsin. Here, we will ask five precise questions with an eye on understanding the behavior of multipole moments in a real protein, and compare them with typical (atomic) point charges. As an example, one such question asks how high the multipole rank needs to be, on two given atoms a given distance away from each other, in order for their electrostatic energy to converge within a certain energy error. The other questions will be spelled out, and answered one by one in the main text below.

Background and Method The multipole expansion Atomic multipole moments are vital in the accurate representation of atomic electron densities that interact electrostatically. In this work, atomic multipole moments are defined by combining the topological partitioning[25,26] and the spherical tensor formalism,[30] which leads to irreducible multipole moments, resembling the angular wave functions of the hydrogen atom (p, d, f, and g). For example, there are five d orbitals and also five quadrupole moments, as opposed to the Cartesian tensor formalism, which introduces six quadrupole moments. In the latter formalism, the number of redundant components (i.e., moments) further increases with increasing rank. Topological atoms are carved out by gradient paths traced in the electron density q. Most gradient paths originate at infinity, and traverse space in the direction of maximum increase in q, until they reach a maximum. This maximum practically coincides with a nuclear position. The total set of gradient paths that meet at a nucleus form the basin of the atom, that is, the region of real 3D space that it occupies. Interatomic surfaces mark the atomic boundaries inside the molecule. They consist of bundles of gradient paths that do not terminate at any nucleus. Because these surfaces are infinitely thin, topological atoms that are separated by an interatomic surface touch each other; they do not overlap and leave no gaps between them. As integration over atomic volumes is still carried out by a numerical algorithm we cannot really integrate to infinity but only to a practical boundary. We cap each topological atom in a free molecule in the gas phase by a constant electron density envelope. In practice, q(r) 5 1 3 1026 a.u. is the default value at which an atom is regarded to terminate if one wants to include practically all the electronic charge it contains. If this value is set to q(r) 5 1 3 1023 a.u. then one would lose a little bit of electronic charge but avoid potentially complicated features such as long and narrow atomic tails. For visual purposes such an envelope is appropriate. Finally, we note that a topological atom in condensed matter is completely bounded by interatomic surfaces. In other words, then there is no need for rather artificial “outer boundaries” delivered by constant electron density envelopes. WWW.CHEMISTRYVIEWS.COM

FULL PAPER

WWW.C-CHEM.ORG

The exact Coulomb interatomic interaction energy can be obtained via a six-dimensional (6D) integration[31] over the two participating topological atoms as given by eq. (1), ð ð q ðrA Þqtot ðrB Þ E AB 5 drA drB tot rAB XA XB

(1)

where qtot ðrÞ is the total molecular charge density, which includes the nuclear charges. The distance between two infinitesimally small charge elements is denoted rAB. Although this calculation is exact, it has a high computational cost because the evaluation of a 6D integral requires the multiplication of two quadrature grids, essentially. Moreover, the 6D integrals must be recalculated for every possible mutual orientation of two atoms. In other words, no atomic information can be precomputed once and for all, nor can it be separated from the mutual orientation of two atoms. However, the introduction of the multipolar series expansion makes such precomputation possible. Such an expansion factorizes the 1/rAB expression, and thereby enables the disentanglement of the 6D integration into two separate 3D integrations, each yielding atomic multipole moments. The Coulomb or electrostatic interaction energy between two (topological) atoms is then given in terms of atomic multipole moments as shown in eq. (2), ‘max X ‘max X ‘A X E AB 5 ‘A

‘B X

Q‘A mA ðXA ÞT‘A mA ‘B mB Q‘B mB ðXB Þ

(2)

‘B mA 52‘A mB 52‘B

where Q‘A mA ðXA Þ and Q‘B mB ðXB Þ represent the moments of topological atom XA and XB, respectively, whereas T is the interaction tensor for a pair of atomic multipole moments.[30] Equation (2) is designed to be used when each atom is expressed in its own local axis system. Care was taken that the atoms are oriented in the same way as they are in crambin, and formulae to convert atomic multipole moments between a local frame and the global frame facilitate the practical calculation of the moments. An atomic multipole moment is obtained by integrating the electron density multiplied by a spherical harmonic over the 3D volume of the atom. The rank of the multipole moment determines the particular spherical harmonic function used in the integration. A moment of rank ‘ has 2‘ 1 1 individual components from 2‘ to 1‘, as shown in eq. (2). For example, the dipole moment is of rank 1 and has three components or m 5 (21, 0, 11). The (atom-atom) interaction rank, denoted L, measures the number of multipole moments that is used to compute the electrostatic energy of two interaction atoms. This important quantity, which features throughout this article, is defined by eq. (3), L5‘A 1‘B 11

dipole moments (‘A 5 ‘B 5 1), one between a monopole moment (‘A 5 0), and a quadrupole moment (‘B 5 2), and one with the inverse counterpart (i.e., ‘B 5 0 and ‘A 5 2). The convergence of the multipole expansion is monitored by varying L. Pilot system: Crambin For this work we chose crambin [Protein Data Bank (PDB) code: 2EYA] as a pilot system. This hydrophobic protein has been isolated from the seeds of the cabbage Crambe abyssinica.[57] It is the smallest naturally occurring protein. Crambin has also been widely utilized in developing methodology for the determination of protein structure from nuclear magnetic resonance (NMR) data.[58–61] In the PDB file “2EYA,” the crambin structure is obtained by NMR experiment and there are in total 20 molecular geometries, which are all deposited in the 2EYA file. From the 20 possible molecular geometries, the first one was selected. The current work does not crucially depend on the geometry of crambin. The chosen geometry of crambin is depicted in Figure 1, which was created using the visualization package Visual Molecular Dynamics.[75] Crambin contains 46 amino acids, starting with the first amino acid threonine (Thr1) to the last one, which is asparagine (Asn46). To simplify the calculations, all amino acids were taken as neutral. For example, the guanidinium group in Arg is represented by a guanidine group. Figure 1 shows some essential features of the crambin structure we took as a pilot system. The total number of atoms in crambin is 642. The longest internuclear distance between two Ca atoms is 28.9 A˚, which occurs between Pro19 and Asp43. The longest internuclear distance occurring anywhere in crambin (including hydrogens) reaches ˚ , and is an OO interaction (between a peptidic O near 33.3 A Pro19 and a carboxylic O in Asp43). Computational details Crambin is too large for the quantum topological computer program MORPHY01[62] to integrate its atomic multipole

(3)

where ‘A and ‘B are the ranks of the multipole moments for atom A and B, respectively. For example, L 5 3 triggers the calculation of monopole, dipole, and quadrupole moments, and therefore generates 9 (51 1 3 1 5) moments in total, for each atom. This interaction rank means that ‘max 5 2 in eq. (2), such the evaluated interactions consist of one between two atomic

Figure 1. An NMR structure of crambin (PDB code: 2EYA). In total there are 46 amino acids (Thr1 ! Asn46). The longest internuclear distance between two Ca atoms is 28.9 A˚, and occurs between Pro19 and Asp43. The Thr30Gly31 group is marked as one example of the 22 “A1A2” groups calculated (see “Question 2” in the main text).

Journal of Computational Chemistry 2014, 35, 343–359

345

FULL PAPER

WWW.C-CHEM.ORG

moments, as the program has not been designed for systems of such size. Because of this, crambin is cut into 46 single amino acids, each of which is “capped” with a peptide bond at each side: CH3AC(@O)AN(H)A(H)Ca(R)AC(@O)AN(H)ACH3, where R is the residue marking the amino acid (e.g., R@H for glycine). To cap the fragment at both ends, we use the CaAC(@O) group of the amino acid preceding the amino acid in question, and the N(H)-Ca group following it. Hydrogen atoms are added to the unsaturated atoms in these geometries to achieve neutrality, using standard geometrical values. The groups N2acetyl and N2methylamino are attached to the N- and C-terminus, respectively, to mimic the peptide environment in proteins. MORPHY01 is able to compute all multipole moments of a sufficient number of atoms in any capped amino acid. Figure 2 shows the three representative systems that are investigated: a disulfide bond, formed by two cysteine residues; a peptide consisting of two amino acids (tyrosine and alanine); a peptide with three amino acids (proline, glycine, and alanine). These plots were generated by the in-house[63] MORPHY graphical user interface (GUI). To maintain the original amino acid geometry in crambin’s PDB structure, the wave functions of these capped amino acid groups are computed directly by GAUSSIAN03, that is, without energy optimizing their geometries. The levels of theory used in this work are HF/6–31G(d,p), B3LYP/augcc-pVTZ, and MP2/aug-cc-pVTZ. The atomic multipole moments of all atoms in each amino acid are subsequently integrated

using MORPHY01. The radial and angular quadrature used inside the b2sphere is set to (nr, nh, nu) 5 (105, 75, 105), whereas that outside the b2sphere is set to (nr, nh, nu) 5 (70, 50, 70). The grid sizes were guided by much earlier work[64] and more recent tests reported in the PhD thesis of the first author of the current article. The atomic basins are capped by a q(r) 5 1026 a.u. isodensity envelope. The atomic multipole moments were computed up to L 5 10, which means [by virtue of eq. (3)] that the highest atomic multipole rank is ‘ 5 9. Only a few of the atoms in these amino acids were omitted because they failed to be integrated owing to their intricate topologies.[65] Of course, the vast majority “survive” such that the overall trends in this work can be observed. Finally, only successfully integrated atoms are used to investigate the convergence behavior of atom-atom interactions. For the electrostatic energy between two atoms, we define a multipolar series expansion to be convergent at rank L if two conditions are satisfied: jE ðLÞ–E ðL11Þj < 0:1kJ mol 21 and jE ðL11Þ–E ðL12Þj < 0:1kJ mol 21

(4)

where E(K) is the interaction energy obtained at rank K. These are rather severe “convergence criteria,” which have also been used in previous work.[56] In fact, because this criterion does not strictly prove convergence, in the sense of a rigorous mathematical test, one could describe eq. (4) as an indicator

Figure 2. Molecular graphs (dressed with several topological atoms) of a) cysteine-cysteine, showing a disulfide bond; b) tyrosine-alanine; c) prolineglycine-alanine. The bond critical points are shown in purple and the ring critical points in pink. The orange rectangles mark the N2acetyl and N2methylamino groups bonded to the N- and C-terminus, respectively.

346

Journal of Computational Chemistry 2014, 35, 343–359

WWW.CHEMISTRYVIEWS.COM

FULL PAPER

WWW.C-CHEM.ORG

of “plausibility of convergence.” It is useful to mention here the phenomenon of “pseudoconvergence,” as coined in 2003[37] and explored further,[33] which is the effect that the electrostatic energy from a truncated multipole expansion hovers closely around the exact electrostatic energy (6D value). If the criteria in eq. (4) are satisfied then rank L energy is regarded as the “exact” multipolar reference energy. The energies of all the high-rank atom-atom interactions in this work are computed by the in-house computer program NYX, which contains an implementation[33] of a recursive and therefore open-ended formula[66] to obtain the intricate interaction T in eq. (2). It is instructive to explore the convergence behavior of representative atom-atom interactions prior to answering the five key questions of this article. The seven central atoms in glycine (ignoring the two caps) are taken as a work space to make some numerical checks. Figure 3 shows a small labeled diagram of these seven atoms, which are N1, C2 (or Ca), C3 (keto), O4, H5 (bonded to N), H6, and H7 (both bonded to Ca). There are six 1,4 interactions and one 1,5 interaction in this set of test atoms. The high-rank (L 5 10) multipolar expansion energies of these seven interactions were calculated using eq. (2) with atomic multipole moments obtained from the program AIMAII,[67] whereas the exact 6D energies were calculated using the program PROMOLDEN.[68] Table 1 compares these two energies and shows that they match within 0.1 kJ mol21. Figure 3 focuses on the convergence of one interaction in Table 1, namely that of H5 and H6. The profile of the interaction energy versus the interaction L shows that the expansion has already converged at L 5 5. Clearly, L 5 5 suffices to obtain an accurate energy value for the H5–H6 interaction, as an HH interaction is the easiest kind of interaction to converge. For HA or AB interactions (where A and B are both nonhydrogen atoms), the rank L 5 10 is needed to ensure convergence. Figure S1 of the Supporting Information shows how well the exact 6D and multipolar (L 5 10) interaction energies correlate for all seven 1,n (n 5 4 or 5) interactions in glycine. From a computational and practical point of view, the energy error at rank L 5 10 is small enough. In a previous study,[56] it has also

Figure 3. Pure electrostatic energy only between atoms H5 and H6 in glycine (1,4) calculated as a function of the multipolar interaction rank L. The dashed line represents the exact 6D energy of 29.2 kJ mol21. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Table 1. Comparison of exact and multipolar energy (kJ mol21) for seven representative short range interactions in the central glycine fragment (diagram see Fig. 3). Interactions

1,n

E(6 D)[a]

E(L 5 10)[b]

DE[c]

H5H6 H5H7 H5C3 H6O4 H7O4 N1O4 H5O4

1,4 1,4 1,4 1,4 1,4 1,4 1,5

29.2 23.7 386.4 22.2 266.9 902.2 2242.5

29.2 23.7 386.3 22.1 266.9 902.1 2242.5

0.0 0.0 20.1 20.1 0.0 20.1 0.0

[a] Exact energy (kJ mol21) calculated by PROMOLDEN, without the multipolar expansion of eq. (2). [b] Energy (kJ mol21) calculated by NYX from a multipolar series expansion at interaction rank of L 5 10 and multipole moments obtained from AIMAll. [c] Energy difference (kJ mol21).

been proven that L 5 10 yields an accurate energy value and that the interactions are convergent at this rank. In summary, in this study, an energy evaluated at L 5 10 is taken as the “exact” (reference) energy. This is computationally advantageous compared to the calculation of 6D interaction energies.

Key Questions We are now in a position to address the following five questions, written in italics, and appended with a short introductory commentary: 1. For a given “convergence energy error” DE and a given pair of atoms A and B, how does the interaction rank L at which the energy has converged, change with increasing internuclear distance R? This is actually a fivedimensional function, involving the quantities R, L, DE, and the qualifiers A and B. 2. How reliable is the assumption of using a single (capped) amino acid to reproduce the atom-atom energies in oligopeptides? This can be explored by comparing the energy difference of a given atom-atom interaction energy originating from capped oligopeptides “A1A2” or even “A1A2A3.” 3. What is the influence of the level of theory on the electrostatic energy values? The minimum internuclear distance at which all of the 15 possible element-element interactions are convergent will be compared with the values obtained at HF/6–31G(d,p) level. 4. For a selected central atom, how do that atom’s neighbors influence its multipole moments and at which distance can their influence be ignored? This question is crucial to quantify transferability, the zeroth cornerstone of any force field. 5. What is the convergence behavior of interactions between groups of atoms, in particular, is the energy difference obtained from AMBER and QCT for a total group-group interaction smaller than that for individual atom-atom interactions? The atom-atom interactions between two different peptide groups [AC(@O) ANHA], each consisting of four atoms, will be investigated. Journal of Computational Chemistry 2014, 35, 343–359

347

FULL PAPER

WWW.C-CHEM.ORG

Results and Discussion Before we examine the five key questions, we first provide more detail on atom-atom interactions in crambin and their convergence behavior. In terms of element-element combinations, there is a total of 15 types of atom-atom interactions in crambin: HH, OH, NH, CH, SH, OC, ON, OO, CC, CN, NN, SC, SN, SO, and SS. To obtain a quasicontinuum of internuclear distances for each interaction type, we calculated all possible interaction types within each amino acid, as well as atom-atom interactions between different amino acids. Taking the CH interaction as an example, the internuclear distances (R) associated with 295 CH interactions in crambin have been calculated and then are ranked according to increasing R value. The smallest R value obviously occurs for a 1,2 interaction, where C is covalently bonded to H in the same amino acid. The maximum R value of any atom-atom interaction type is larger than ˚ , with the exception of SS for which the maximum R is 20 A ˚ 18 A. Figure 4 illustrates at which rank L the multipole expansion can be regarded as convergent for a set of CH interactions with increasing internuclear distance R. Here, the energy evaluated at L 5 10 is taken as the reference value for convergence. If L > 10 then the multipole expansion is considered to be divergent. Figure 4 shows two regions: a divergent region, RD, and a convergent region, RC. The RD region occurs at short ˚ , and the multipole internuclear distances of less than 2.7 A expansion with an R value within this RD region is divergent. In other words, a pair of interactions is certainly divergent if the expansion has not yet converged at L 5 10. On the other hand, the convergent region, RC, appears beyond 2.7 A˚ and stays stable at long distance. In the RC range the multipole expansion of all the interactions are convergent at L < 10. In fact, Figure 4 shows that, in the RC region, almost all the atom-atom interactions converge at rank L 5 6 and often as ˚ ). Therefore, a CH interaction low as L 5 1 (already from 6.3 A that has a convergent multipole expansion always converges at rank L  6.

Figure 4. The dependence of the rank (L) on the internuclear distance (R, ˚ ) for CH interactions. The RD region contains divergent interactions, while A RC contains convergent interactions with a rank L always lower than 6. This profile refers to the Coulomb energy only. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

348

Journal of Computational Chemistry 2014, 35, 343–359

Table 2 lists all the 15 types of atom-atom interactions. The total number of each atom-atom interaction and corresponding minimum internuclear distance in the region RC is also listed in Table 2. These 15 interaction types are sorted by the minimum distance of the RC region, in increasing order. The CH entry indeed returns 295 interactions and a convergence ˚ , both mentioned above. It turns out to be threshold of 2.7 A useful to separate hydrogen from nonhydrogen atoms (C, N, O, and S). We then have three kinds of interactions: HH interactions, HA interactions, and AB interactions, where elements A and B stand for nonhydrogen atoms. Table 2 shows that the HH interaction converges the fastest because the minimum internuclear distance (where it becomes convergent) is the smallest of the 15 possible interaction types. In this ranking, the HH interaction is followed by all four possible AH interactions. Note that the OH interaction, where O possesses the highest electronegativity amongst the four heavy atoms, has the smallest divergence region RD (starting below 2.5 A˚). Not surprisingly, AB interactions (where A can be equal to B) have the largest region RD amongst the three kinds of interactions, ˚ . Their minimum values in the RC starting with OO at 2.8 A region are larger than the HH and AH interactions. The largest divergence region occurs for the SS interaction (convergence starts only from 3.7 A˚), based on 20 precise values inspected. Furthermore, five ON, CC, or NN interactions, for which one or two atoms belong to a nitrogen-containing heterocyclic ring or the guanidine functional group [ANAC(@N)ANHA] in arginine, are unable to reach the (admittedly arbitrary) preset energy threshold of 0.1 kJ mol21 at the lower limit of the convergent regions. Instead, they converged at a slightly higher energy threshold (0.13 kJ mol21). Not omitting these five exceptions from Table 2 increases the resulting Rc values considerably. In fact, the new Rc values would then become 3.6,

Table 2. The number (N) of atom-atom interactions of each the 15 possible types occurring in crambin and the minimum internuclear distance ˚ ) of the convergent regions (of the energy multipole expansion). All (RC, A calculations are performed at HF/6–31G(d,p) level using single (capped) amino acids. Type

N

RC

HH OH NH CH SH OO ON[a] OC SN CC[a] CN OS NN[a] SC

266 71 277 295 39 64 94 74 60 216 213 37 67 64

2.1 2.5 2.7 2.5 2.7 2.7 2.8 2.8 2.9 3.2 3.2 3.4 3.4 3.4

[a] One or two interactions have been omitted from the full set because they failed to converge at exactly 0.1 kJ mol21 but did converge at a slightly higher value (e.g., 0.13 kJ mol21).

WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG

˚ for ON, CC, and NN, respectively, instead of the 3.8, and 4.7 A ˚. currently quoted 2.9, 3.4, and 3.6 A In crambin there are only two SS interactions, which is insufficient to determine a sensible RC value. The SS distance within ˚ and the smallest SS distance one disulfide bond is 2.0 A ˚ . Thus, there are between two different disulfide bonds is 6.8 A ˚ no distances between 2.0 and 6.8 A in crambin. Hence, the enzyme aromatic amine dehydrogenase (AADH) (PDB code: 2AH1), was used to obtain the desirable semicontinuum of SS ˚ . The enzyme internuclear distances between 2.0 and 6.8 A AADH is an a2b2 heterotetramer in which two tryptophan tryptophyl quinone (TTQ)-containing b moieties are covalently bonded to two a moieties. There are seven disulfide bonds and a methionine residue in each TTQ-containing b molecule. A sufficient number of SS internuclear distances appear in the latter so that calculations can be restricted to this subsystem. A total of 20 SS interactions is obtained with internuclear dis˚ and the minimum R value tances ranging from 2.0 to 17.7 A ˚. of the RC region is finalized as 3.7 A Question 1: For a given “convergence energy error” DE and a given pair of atoms A and B, how does the interaction rank L change with increasing internuclear distance R? The correlation between the properties we are interested in is expressed by the general equation L5f ðA; B; DE; RÞ

(5)

where the dependent variable L represents the interaction rank between multipole moments, A and B are two interacting atoms, R is their internuclear distance, and DE is the energy accuracy threshold. The latter is typically 0.1 kJ mol21, which we set to be the maximum accepted deviation between exact and multipolar interaction energy. In the shape of eq. (5), the latter four quantities are independent variables. Although the function in eq. (5) is complicated, one can obtain insight from it by systematically fixing three of the independent variables, and then studying the relation between the dependent variables and the remaining independent variable. For example, for a given pair of atoms A and B at a certain distance R, we can work out which rank L is needed for the multipole expansion to meet the energy accuracy threshold DE. The relationship between all five variables can also be expressed as R 5 g (A, B, L, DE), which is another form of eq. (5) offering different insight. The convergent behavior of interactions is further analyzed using Figure 5, which focuses on OH interactions only. Figure 5 shows the energy difference between the reference energy calculated at rank L 5 10 and energies computed at L 5 1, 2, 3, 4, and 5. The horizontal line drawn at DE 5 0.1 kJ mol21 marks the preset energy accuracy threshold we work with. For a given rank L one can easily determine the internuclear distance R at which this desired energy accuracy of 0.1 kJ mol21 is reached. For example, for L 5 3, the plot intersects the hori˚ . The plot also indicates that if R zontal energy line at R 5 4.1 A ˚ is less than 4.1 A, then a higher rank (L 5 4 or 5) is needed to ensure that the energy accuracy reaches DE 5 0.1 kJ mol21.

FULL PAPER

Figure 5. The energy difference (DE, kJ mol21) (of the Coulomb component only) between the reference energy calculated at rank L 5 10 and that calculated at ranks L 5 1, 2, 3, 4, and 5, for OH interactions in crambin. Low ranks clearly lead to large energy differences, which, however, decrease with increasing internuclear distance R (A˚). Only converged interactions are ˚ (see Table 2). [Color figure shown, right of the vertical line at RC 5 2.5 A can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

˚ , L 5 3 is sufficient to On the contrary, if R is greater than 4.1 A ˚, again reach this DE threshold. We note that from R 5 6.6 A even L 5 2 is sufficient. However, monopole…monopole inter˚ . When the rank L actions (L 5 1) are sufficient from 28.5 A reaches 5, then almost all the atom-atom interactions are convergent with DE < 0.1 kJ mol21. Table 3 provides a further way of analyzing the information contained in eq. (5). This table allows one to fix an atom-atom interaction, set the rank L and then read off the internuclear distance R at which a multipole expansion converges. Note that here we set the energy accuracy to 0.4 kJ mol21 (or 0.1 kcal mol21) and thereby relax the previously used threshold of 0.1 kJ mol21. For example, for OH interactions, and setting L to 3, convergence is reached when the O-H distance is larger ˚ . This example can also be verified in Figure 5. Had than 3.3 A we taken L 5 4 (or L 5 5) instead of L 5 3, then we would have found that all interactions converged, provided of course that their internuclear distance is larger than Rc 5 2.5 A˚. This is why the distance 2.5 A˚ is marked (in brackets) in the entry for L 5 4 and L 5 5 in Table 3. Again, this fact could have been deduced from the figure, where it should be clear that the red (L 5 5) and blue (L 5 4) curves are completely below 0.4 kJ mol21 as ˚ . Returning to Table 3, all interaction types long as R > 2.5 A except SN and CN, meet the energy accuracy of DE 5 0.4 kJ mol21 at rank L 5 5. This is why these two interaction types have no bracketed entries. In other words, a bracketed internuclear distance means that the corresponding profile of DE versus R (i.e., any curve in Fig. 5 but note that this is just for OH) does not intersect the DE 5 0.4 kJ mol21 bar (horizontal line). The rank L 5 5 is only necessary for interactions SN and CN. In contrast, for the interactions HH, OH, NH, and OO, the rank L 5 3 is sufficient to meet the DE 5 0.4 kJ mol21 criterion. To test the reliability of the values given in Table 3, Ab1–42 (PDB: 1Z0Q) was taken as a test case, another well-studied polypeptide (amyloid beta, 42 amino acids) that is highly Journal of Computational Chemistry 2014, 35, 343–359

349

FULL PAPER

WWW.C-CHEM.ORG

Table 3. Minimum internuclear distance (A˚) of convergent multipolar atom-atom interactions at ranks L 5 3, L 5 4, and L 5 5, within an energy accuracy of 0.4 kJ mol21 (or 0.1 kcal mol21). All calculations were performed at HF/6–31G(d,p) level. The brackets are explained in the main text. Type HH OH NH CH SH OO ON OC SN CC CN OS NN SC

L53

L54

L55

2.2 3.3 3.3 3.4 4.8 4.4 5.6 4.5 5.1 5.2 6.2 6.5 6.2 5.9

(2.1) (2.5) (2.7) 3.0 3.1 (2.8) 3.6 3.7 5.0 3.9 3.6 4.2 4.6 4.7

(2.1) (2.5) (2.7) (2.7) (2.8) (2.8) (2.9) (3.2) 3.3 (3.4) 3.5 (3.4) (3.6) (3.7)

relevant in Alzheimer’s disease. The same computational methodology applied to crambin was used for Ab1–42. Each possible atom-atom interaction type in Ab1–42 is classified into two groups: divergent interactions and convergent interactions. As for crambin, the interactions in the latter group are convergent at rank L 5 10. The minimum internuclear distances of the convergence region (0.4 kJ mol21) are calculated at rank L 5 3 and L 5 4. Because L 5 5 is only needed by two interactions (SN and CN) to satisfy the predetermined energy accuracy, the minimum internuclear distances of the atom-atom interactions in Ab1–42 are not computed at rank L 5 5. Moreover, the minimum internuclear distance of SS interactions in the RC region in Table 3 was not tested as there is not a sufficient number of S-S distances in Ab1–42. If the minimum internuclear distance for Ab1–42 is smaller than the corresponding distance for crambin (see Table 3), then the convergence predictions, drawn from crambin, successfully apply to Ab1–42. Table 4 shows that this is indeed the case for the vast majority of atom-atom interactions: there are only 5 failing interactions out of a total of 428 convergent interactions. These failed interactions occur in the CH and OO subsets. Closer inspection shows that the 3 failures in the CH group are not dramatic. At rank L 5 3, the minimum internuclear dis˚ with an tance of the CH interaction in crambin is R 5 3.4 A 21 absolute energy error less than 0.4 kJ mol . In Ab1–42, there are two CH interactions that produce absolute energy errors slightly larger than 0.4 kJ mol21, that is, only 0.46 and 0.48 kJ mol21. The third unsuccessfully predicted CH interaction occurs at L 5 4. In Table 3, the minimum internuclear distance at this rank is R 5 3.0 A˚ while in Ab1–42, the absolute energy ˚ . A similar error cannot reach 0.4 kJ mol21 until R 5 3.2 A inspection of the OO interactions shows that here the failures are more serious. At rank L 5 3, the absolute energy error for the first failed interaction is 0.5 kJ mol21 at the minimum ˚ . The second failed interaction is internuclear distance of 4.4 A at rank L 5 4 with an absolute energy error of 0.7 kJ mol21. The pairs of oxygen atoms should be separated by 4.9 and 3.0 ˚ , for the first and second interaction, respectively, To stay A 350

Journal of Computational Chemistry 2014, 35, 343–359

Table 4. Predictions made for the peptide Ab1–42 based on the data obtained for crambin, for each interaction type listed in Table 3 (except SS). Type

Ninter[a]

Nconv[b]

NL53[c]

NL54[c]

Nfail[d]

HH OH NH CH SH ON OO OC SN CC CN OS NN SC Total

51 56 59 60 37 23 26 21 11 43 28 18 14 34 481

49 55 49 51 32 22 24 15 11 36 24 18 11 31 428

46 50 44 43 25 16 18 11 9 33 13 13 1 16 338

3 5 5 5 7 6 4 4 2 3 11 5 10 15 85

0 0 0 3 0 0 2 0 0 0 0 0 0 0 5

[a] Ninter is the total number of interactions. [b] Nconv is the number of convergent interactions. [c] NL53 and NL54 are the number of interactions computed at rank L 5 3 and L 5 4, respectively. [d] Nfail is the number of interactions that failed in the prediction (the interactions for which the energy differences between the rank given in Table 3 and rank L 5 10 is greater than 0.4 kJ mol21).

within the 0.4 kJ mol21 energy limit. In other words, the con˚ longer than the values in vergence distances are 0.5 and 0.2 A Table 3, respectively, for the first and the second failed OO interactions. Before we address the second key question we briefly digress commenting on the distribution of internuclear distances in crambin. Because of the mandatory convergence, we only calculated the multipolar electrostatic energies of atomatom interactions in the RC region. Fortunately, most of the atom-atom interactions occur in the RC region. In crambin, there are 642 3 (642 2 1)/2 5 205,761 atom-atom interactions across all 15 possible interaction types. Again, because there are only four sulfur atoms (two disulfide bonds) in crambin, we do not consider the distribution of the SA (where A can be C, H, O, or N) interactions, leaving 201,688 atom-atom interactions (or 98% of grand total quoted above). Table 5 gives the total number of atom-atom interactions for each of the possible atom-atom interaction types that exclude S, with the number of interactions that converge and diverge, and their average internuclear distance. Taking the CC interaction as an example, there are in total 20,272 CC interactions in crambin. Figure 6 shows the distribution of their internuclear distances. It is clear from Figure 6 that the distribution of CC internu˚ (see also averages in Table clear distances peaks around 13.0 A 5). The number of interactions that occur in the RD region (or R < 3.4 A˚), which is 426, accounts for a very small part of the total number of interactions. The percentage of the divergent CC interactions is only 2%, which is a pleasing result. From Figure 7 (which is the CC equivalent of Fig. 5, which focused on OH) it is clear that L 5 5 is sufficient to reach the preset energy threshold of DE 5 0.1 kJ mol21, with very few exceptions. Figure 7 shows the energy difference between the reference energy calculated at rank L 5 10 and energies computed at WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG

FULL PAPER

Table 5. Statistical data on all 10 possible atom-atom interaction types that exclude S in crambin. Type HH OH NH CH ON OO OC CC CN NN

N[a]

RD[b]

RC[c]

Average[d]

Percentage[e]

49,391 20,139 17,310 63,553 3517 2015 12,911 20,272 11,097 1483

170 108 228 965 77 8 255 426 330 47

49,221 20,231 17,082 62,588 3440 2007 12,656 19,846 10,767 1436

13.5 13.3 13.0 13.2 12.7 13.2 13.0 13.0 12.7 12.6

0.3 0.5 1.3 1.5 2.2 0.4 2.0 2.1 3.0 3.2

[a] N represents the total number of each atom-atom interaction type in crambin. [b] RD denotes the number of interactions in the RD region. [c] RC is the number of interactions in the RC region. [d] Average means the average internuclear distance for each interaction type. [e] Percentage refers to the number of interactions in the RD region over the total number of interactions N.

L 5 1, 2, 3, 4, and 5. A horizontal line is drawn at DE 5 0.1 kJ mol21 in Figure 7. In summary, at HF/6-31G(d,p) level all 15 possible types of atom-atom interactions converge within 0.1 kJ mol21 from 2.1 ˚ for HH and 3.7 A ˚ for SS. Although this information was gathA ered from crambin it is valid for a completely different polypeptide Ab1–42, with very few exceptions. Question 2: How reliable is the assumption of using a single (capped) amino acid to reproduce the atom-atom energies in oligopeptides? Technologically, atomic multipole moments cannot (yet) be calculated for atoms appearing in the complete crambin molecule. This is why calculations were so far restricted to fragments of crambin, in particular, each of the 46 single amino acids occurring in crambin. To confirm the reliability of working with a single amino acid, oligopeptides with two (“double”) and three (“triple”) amino acids were also cut out from crambin. Again, such larger fragments are capped in the same manner as a single amino acid. These dipeptides (in the strict sense of a two-amino-acid oligopeptide) include ThrThr, GlyThr, ProGln, ThrGly, IleIle, TyrAla, and CysCys. The latter system provided 16 disulfide bonds. Only one tripeptide (again in the strict sense of the word) was investigated: ProGlyAla. In

˚ ) in crambin. Figure 6. The distribution of CC internuclear distances R (A The total number of interactions is 20,272. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 7. The energy difference (DE, kJ mol21) between the reference energy (of the Coulomb component only) calculated at rank L 5 10 and that calculated at ranks L 5 1, 2, 3, 4, and 5, for CC interactions in crambin. Low ranks clearly lead to large energy differences, which, however, ˚ ). Only converged interdecrease with increasing internuclear distance R (A ˚ (see Table 2). actions are shown, right of the vertical line at RC 5 3.4 A [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

total there are 22 dipeptides, referred to as “A1A2,” and one tripeptide “A1A2A3.” Moreover, 14 disulfide bonds were isolated from the AADH enzyme (see above). MORPHY01 obtained the atomic multipole moments of all atoms in all A1A2 and A1A2A3 systems operating on HF/6-31G(d,p) wave functions. The minimum internuclear distances of all 15 types of atomatom interactions of the RC region are now calculated based on di- and tripeptides. Table 6 lists these values. As before, the interaction types are ranked according to increasing minimum values of the RC region. All 15 atom-atom interaction types of Table 2 can be found back in Table 6. The pleasing news that Table 6 brings is that its minimum internuclear distances on the RC region are equal to the corresponding values in Table 2. The only exceptions are the CH and ON interactions. ˚ Fortunately, the differences are small: 0.1 A˚ for CH and 0.2 A for NO. Moreover, the minimum internuclear distances of the RC region are larger in double and triple amino acid groups than in single acids (for both CH and ON interactions). This means that these two interactions are somewhat more difficult to converge in oligopeptides. However, these differences are small and within an acceptable range. In summary, the minimum internuclear distances of all 15 interactions in region RC can be safely transferred from single acids to di and tripeptides. Moreover, Lorenzo et al. [69] geometry optimized (at B3LYP/ 6-31G(d,p) level) a series of nine “triple amino acids” A1A2A3 where the central amino acid is always glycine (G), whereas its neighboring residues on either side can be glycine (G), alanine (A), or serine (S). In general, the presence of the two neighbors does not much influence the geometry of the central glycine compared to its geometry with regard to that in the triple amino acid GGG. The only exceptions to this observation occur in GGS and SGS, where the side-chain of serine is involved in a hydrogen bond between its hydroxyl hydrogen and the C5O oxygen of the peptide group adjacent to serine’s Ca. This illustrates that the only significant factor is an intramolecular Journal of Computational Chemistry 2014, 35, 343–359

351

FULL PAPER

WWW.C-CHEM.ORG

Table 6. The number (N) of atom-atom interactions of each the 15 possible types occurring in crambin and the minimum internuclear distance (RC, A˚) of the convergent regions (of the energy multipole expansion). All calculations are performed on di and tripeptides (for details on which ones, see main text) at HF/6–31G(d,p) level. Type

N

RC

HH OH NH CH SH OO ON OC SN CC CN OS NN SC SS

44 52 53 54 28 31 37 43 28 46 60 37 30 36 20

2.1 2.5 2.7 2.8 2.8 2.8 3.1 3.2 3.2 3.4 3.4 3.4 3.6 3.7 3.7

hydrogen bond, that is, a hydrogen bond between the sidechain and the backbone, which causes geometrical changes in the central amino acid. To better understand the transferability of interatomic electrostatic energies from a single amino acid to a dipeptide, we investigated the case study of a ThrGly fragment in crambin. Figure 8 shows the geometries of this system of two covalently bonded amino acids: threonine (Thr30) and glycine (Gly31) forming Thr30Gly31. First, threonine and glycine are cut out from crambin, one by one as two single amino acids, and each is capped at the C- and N-terminus in the usual way. Subsequently, these same two amino acids are cut out from crambin, but now together, as the dipeptide Thr30Gly31. This dipeptide is capped as well in the usual manner as well. The molecular geometries of both the single acids Thr30 and Gly31,

Figure 8. The molecular geometries of the single amino acids Thr30 and Gly31, and the dipeptide Thr30Gly31 as cut out from crambin. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

352

Journal of Computational Chemistry 2014, 35, 343–359

and the dipeptide Thr30Gly31, are shown in Figure 8. Investigated atom-atom interactions occur both within a single amino acid and a dipeptide. The Scheme below shows the molecular skeletons of the single amino acids Thr30 and Gly31, and the dipeptide Thr30Gly31. The common fragments between Thr30 and Thr30Gly31 are highlighted in the red dashed rectangle, whereas the fragments in the black dashed rectangle are the common fragments between Gly31 and Thr30Gly31. There are 174 atom-atom interactions (1,n with n  4), of all 15 interaction types, which are common to both fragments in the red rectangle. The number of atom-atom interactions common to both fragments in the black rectangle is 64. The electrostatic energies of the atom-atom interactions are calculated separately, in Thr30, in Gly31, and in Thr30Gly31 Unfortunately, these energy values differ somewhat from the values calculated from the two single amino acids Thr30 and Gly31. For each interaction, the absolute energy deviation between single amino acid and dipeptide is denoted DE 5 |Esingle 2 Edouble|. Figure 9a plots the relative number or proportion of interactions between the two fragments in the red rectangle, within a given energy deviation DE.

Scheme The common fragments between two single amino acids and the corresponding dipeptide. The red dotted rectangle represents the common fragments between Thr30 and Thr30Gly31. The black striped rectangle highlights the common fragments between Gly31 and Thr30Gly31.

In Figure 9a, the proportion of interactions within an absolute energy deviation of less than 1.0 kJ mol21 is 57%. However, for an absolute energy deviation of 4.0 kJ mol21 (or 1 kcal mol21) the corresponding proportion is 89%. In other words, the multipole moment of the single amino acid Thr30 are 89% accurate (compared to Thr30Gly31) if an energy threshold of 4.0 kJ mol21 is allowed. The proportion rises to 100% with DE increasing from 4.0 to 9.8 kJ mol21. The interaction that yields the highest absolute energy deviation of 9.8 kJ mol21 is of the type NN, which is a 1,4 interaction common to Thr30 and Thr30Gly31. The mean absolute energy deviation of these 174 atom-atom interactions is 1.5 kJ mol21. Figure 9b shows the relative number of interactions between the two fragments in the black rectangle (in the scheme above) as a function of the energy deviation DE. In Figure 9b, the proportion of interactions is 42% within an absolute energy deviation of 1.0 kJ mol21. For an absolute energy deviation of 4.0 kJ mol21 the corresponding proportion is 86%, which is almost the same as that for red rectangle. The highest absolute energy deviation is 7.0 kJ mol21, which occurs between Ca and nitrogen in the capped group (on the right-hand of Ca in Gly31). The mean absolute energy deviation of these 64 atomatom interactions is 1.9 kJ mol21. In summary, electrostatic energies between two atoms in a single amino acid differ, on average, by less than 2 kJ mol21 from the energy between the same atoms in a dipeptide WWW.CHEMISTRYVIEWS.COM

FULL PAPER

WWW.C-CHEM.ORG

Table 7. The number (N) of atom-atom interactions of each the 15 possible types occurring in crambin and the minimum internuclear distance ˚ ) of the convergent regions (of the energy multipole expansion). All (RC, A calculations are performed on single amino acids at B3LYP/aug-cc-pVTZ level.

Figure 9. Proportion of interactions (%) within a given energy deviation (kJ mol21) (Coulomb component only), occurring in the two fragments in the Scheme in the main text for a) the red rectangle (i.e., Thr30 and Thr30Gly31) and b) the black rectangle (i.e., Gly31 and Thr30Gly31). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

containing this amino acid. Almost half of the atom-atom energies differ by less than 1 kJ mol21. Question 3: What is the influence of the level of theory on the electrostatic energy values? To test the reliability of the HF/6-31G(d,p) level in calculating minimum internuclear distances of all 15 atom-atom interaction types in the RC region, higher levels of theory are used. These higher levels of theory include B3LYP/aug-cc-pVTZ and MP2/augcc-pVTZ. In Table 2, all the minimum internuclear distances of ˚ . With this knowlthese 15 interaction types are smaller than 4 A edge we can restrict the search interval of internuclear distances from 4 to 8 A˚. This harmless restriction will save computation time. Consequently, the number of each atom-atom interaction type at B3LYP/aug-cc-pVTZ level is smaller than at HF/6–31G(d,p) level. The final values are presented in Table 7. Table 7 shows that HH, OH, OC, OS, SC, and SS interactions yield the same minimum internuclear distances in the RC region as those listed in Table 2. The minimum internuclear distances for the remaining nine interactions in Table 7 are all ˚ ) than the corresponding values in slightly larger (0.1 to 0.3 A Table 2.

Type

N

RC

HH OH NH CH SH ON OO OC SN OS CC CN NN SN SC SS

46 62 54 44 39 20 30 18 31 11 21 15 11 31 42 9

2.1 2.5 2.8 2.9 3.1 3.1 3.1 3.2 3.4 3.4 3.5 3.5 3.5 3.5 3.7 3.7

The minimum internuclear distances of all 15 possible atomatom interaction types that are calculated based on di- and tripeptides are listed in Table 8. These calculations are performed at B3LYP/aug-cc-pVTZ level. By comparing with Table 7 we can see that two interactions of OO and NN, respectively, yield 0.1 ˚ larger minimum internuclear distances in RC, and 0.2 A whereas the remaining 13 atom-atom interaction types yield the same minimum values. Table 8 was also compared with Table 6 (di-, tripeptides, HF/6-31G(d,p) level). Here, seven of the 15 interactions yield the same minimum values in RC between the two tables. In contrast, the remaining eight inter˚) action types generate higher minimum values (0.1 to 0.4 A compared to those in Table 6, including NH, CH, SH, OO, SN, CC, CN, and NN. The absolute values of the electrostatic interaction energy at HF/6-31G(d,p) level are higher than the corresponding values at B3LYP/aug-cc-pVTZ level, for all atom-atom interactions, for single amino acids, di- and tripeptides. Table 8. The number (N) of atom-atom interactions of each the 15 possible types occurring in crambin and the minimum internuclear distance ˚ ) of the convergent regions (of the energy multipole expansion). All (RC, A calculations are performed on di and tripeptides at B3LYP/aug-cc-pVTZ level. Type

N

RC

HH OH NH CH SH ON OO OC SN OS CC CN NN SC SS

35 47 54 32 28 35 30 36 24 11 21 46 22 32 9

2.1 2.5 2.8 2.9 3.1 3.1 3.2 3.2 3.4 3.4 3.5 3.5 3.7 3.7 3.7

Journal of Computational Chemistry 2014, 35, 343–359

353

FULL PAPER

WWW.C-CHEM.ORG

In summary, the B3LYP/aug-cc-pVTZ level of theory and the much more expensive (at least an order of magnitude) MP2/ aug-cc-pVTZ level yield convergence regions that are shifted to slightly larger interatomic distance (by 0.1 to 0.4 A˚) compared to HF/6–31G(d,p) level. For the single amino acids, the minimum values calculated at MP2/aug-cc-pVTZ level are very close to those computed at B3LYP/aug-cc-pVTZ level. As MP2/aug-cc-pVTZ calculations are extremely expensive computationally (see below), we did not obtain the values for all of the 15 atom-atom interactions but only those shown in Table 9. Comparing Table 9 with Table 2 and with Table 7 we can see that, except for HH and OH interactions, all interactions yield somewhat larger minimum internuclear distances at MP2/aug-cc-pVTZ level than at HF/6– 31G(d,p) level. The minimum internuclear distances in Table 9, however, are equal to or slightly smaller than the values in Table 7 (single amino acids at B3LYP/aug-cc-pVTZ level). Balancing the pros and cons of these three levels of theory suggests that B3LYP is the best choice. Finally, it should be mentioned that the MP2 method takes more than 20 times as long as the B3LYP method for threonine (26 atoms), for the same basis set, a ratio that deteriorates to a factor of 38 for ThrGly. All HF calculations, which use the much smaller basis set of 6–31G(d,p) finish in the order of minutes for all systems, which explains why this level featured in the exploratory work of this article. Question 4: For a selected central atom, how do its neighbors influence its multipole moments and at which distance can their influence be ignored? The determination of how each atom in a protein is influenced by its neighbors and at which distance this influence can be ignored is fundamental in the construction of any force field because it quantifies transferability. We will focus on each of the five most abundant elements appearing in proteins. We take a given atom (i.e., element) to be the center of a socalled horizon sphere, and will first discuss at length the case of a Ca central atom. We incrementally increase the radius r of ˚ , and observe the growing this horizon sphere in steps of 0.1 A number of other atoms appearing inside the sphere. At every step, new atoms are potentially found. If not, the horizon sphere grows by another step. We will study a set of nested structures, each containing the central atom, and observe how its multipole moments change with increasing structure size. For this purpose, we need to construct closed-shell and neu-

Table 9. The 15 types of atom-atom interactions based on single amino acids and the minimum internuclear distance of the energy multipole ˚ ) region at the MP2/aug-cc-pVTZ level. expansion in the convergent (RC, A

354

Type

N

RC

HH OH CH NH ON OC CC

27 12 20 18 8 9 13

2.1 2.5 2.8 2.8 3.1 3.2 3.5

Journal of Computational Chemistry 2014, 35, 343–359

tral molecules that represent these structures, which have been sampled from the PDB data. As new atoms enter the horizon sphere it makes sense to cap them with hydrogens such that their standard valence is satisfied, and they form a closed-shell and neutral molecule. In particular, the valence of a newly encountered atom is actually unknown but completed (saturated) to its highest possible value using only hydrogen atoms. The highest valence of carbon is 4, that of nitrogen is 3, that of oxygen and sulfur is 2, and that of hydrogen is 1. For example, if a terminal carbon atom connects to another atom as a single bond [AC], then three hydrogen atoms will be added to this carbon atom resulting in [ACH3]. As a second example, if a terminal carbon atom is singly bonded to another carbon and doubly bonded to an oxygen [AC(@O)], then one hydrogen will be added to this terminal carbon. Hydrogen causes the smallest perturbation to the system it becomes attached to. Hence, it can be introduced while more or less preserving the electron distribution of the original atoms occurring in the protein. We always construct closed-shell and neutral molecules (i.e., with an even number of electrons and without net charge). A consequence of this procedure is that some molecules may end up with local connectivities that do not occur in the protein. For example, the fragment [ANHACaAC(@O)A], which occurs in the original protein, becomes [ANHACaACH3A], which does not occur in the protein. This may not be desirable but we give priority to a systematic control of the growth of the environment of the central atom being studied. We now discuss the construction of a horizon sphere centered on Ca in Ser6, which serves a case study and which is situated at the beginning of the top a-helix region in Figure 1. ˚ , such that all atoms covalently The initial r value is set to 2 A bonded to Ca (1,2 interactions) are already included in the sphere. The radius of the horizon sphere is then increased in ˚ . The steps of 0.1 A˚ until it reaches the maximum value of 12 A wave functions of all hydrogen-capped structures that occur as the horizon sphere grows, are computed at HF/6–31G(d,p) level. The program AIMAll then obtains the multipole moments of the Ca atom in each structure of the growing sequence of structures. The nearest neighbors of the Ca atom form the initial structures and will be those that have the greatest influence on the multipole moments of this central Ca atom. The largest structure, for which the radius is 12 A˚, is taken as the reference structure and is shown in Figure 10. In this reference structure, there are 294 atoms and four atoms (one of each element or C, H, O and N) are selected as “probing atoms.” The multipole moments of the latter always remain invariant. They are combined with the varying multipole moments of Ca atom in eq. (2) to yield the electrostatic energy of interaction between Ca and a probing atom. The internuclear distance between the central atom and the four ˚ , RCaH 5 4.9 A ˚ , RCaN 5 8.8 A˚ and probing atoms are: RCaC 5 5.9 A ˚ RCaO 5 7.6 A. The multipole moments of the Ca atom in the reference structure are then replaced by this atom’s multipole moments, ˚  r < 12 now calculated in the other structures, for which 2 A ˚ A. The interaction energies of CaC, CaH, CaO, and CaN WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG

Figure 10. The system corresponding to the largest horizon sphere (r 5 12 ˚ ) centered at Ca. Four atoms (one of each of the elements C, H, O, and N) A are selected as “probing atoms.” The central atom Ca is in Ser6, which is located at the beginning of the top a-helix in Figure 1. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

interactions in the reference structure are computed using the in-house program NYX. Subsequently, a sequence of interaction energies of CaC, CaH, CaO, and CaN is computed based on the different multipole moments of the Ca atom occurring in the sequence of structures. Figures 11a–11d show the interaction energy of CaC, CaH, CaO, and CaN as a function of the growing radius r of the sequence of horizon spheres. If the new atoms entering the growing sphere do not markedly affect the interaction energies of any of the four interactions between Ca and the probe atom, then the multipole moments of Ca are deemed stable. In our work, the rule for judging this “stabilization” behavior of the Ca atom’s multipole moments operates according to the sum of the interaction energy of CaC, CaH, CaO, and CaN, which is denoted Esum. The multipole moments of the Ca atom stop changing when the total energy Esum remains stable in the last 20% of the total number of horizon spheres (see Fig. 11e). In other words, when the difference between Esum for the reference sphere and Esum for any ˚ , is smaller than sphere with a radius between 10.4 and 12 A 21 21 0.4 kJ mol (or 0.1 kcal mol ). Figures 11a–11d show that the energies of the four possible interactions CaA(C,H,O, or N), change dramatically when the ˚ . This means that the radius r increases from about 2.0 to 2.8 A new atoms entering the structure of the early horizon spheres have a great influence on the multipole moments of Ca in this range of internuclear distances, which is one of 1,3 interactions. Typically, there is then a short flat region between 2.8 ˚ as there is no new atom entering the horizon and 3.1 A sphere and hence the multipole moments of the Ca atom stay ˚ onward, the absolute value of the the same. From r 5 3.1 A interaction energies of CaC, CaH, CaO, and CaN increases, but does so much more slowly than at the outset. This indicates that with the radius increasing, the influence of the new atoms on the multipole moments (of Ca) becomes smaller. The energies of the four interactions stay more or less flat within the ˚ to r 5 12 A˚ in Figure 11a–11d. These four interval r 5 10.5 A

FULL PAPER

interaction energies have thus become stable until they reach ˚ at which point the multipole moments of Ca will r 5 10.5 A not be influenced by new atoms entering the horizon sphere. There are in total 236 atoms in this last horizon sphere. The total interaction energy of CaC, CaH, CaO, and CaN is obtained by summing their individual interaction energies for ˚ to an r value close to 12 A ˚. each structure from r 5 10.5 A These total energies are then compared with the total energy of these four interactions in the reference structure (r 5 12.0 ˚ ). The energy differences are summarized in Table S1 of the A Supporting Information. This table shows that all the total energy discrepancies from the reference energy are less than 0.4 kJ mol21. These total energy differences are also shown in ˚ . In Figure 11e, Figure 11e, where the red line marks r 5 10.5 A 21 there are all not greater than 0.4 kJ mol . This illustrates that the multipole moments of the Ca atom become stable from ˚ and the influence of other atoms can the sphere of r 5 10.5 A be ignored. In addition, we also calculated the distances at which one can ignore the influence of new atoms (entering the horizon sphere) on multipole moments of each of the four elements (H, O, N, and S) other than C. The minimum radius values of the horizon sphere are listed in Table 10. As we predicted, the H atom returns the smallest horizon sphere. Of course, which radius should be chosen depends on which degree of accuracy we expect to reach. For example, in Figure 11e the total ˚ and the total energy discrepancy is 1.0 kJ mol21 at r 5 9.8 A ˚ energy discrepancy at r 5 8.8 A is less than 4.0 kJ mol21 (1 kcal mol21). In summary, the influence of the neighbors for each possible element (Ca, H, O, N, and S) was investigated by computing the minimum radius of the “horizon sphere” at which the influence on the central atom of new atoms, entering the growing sphere, can be ignored. The minimum radius values ˚ for the five investigated elements. Thus, are all about 10 A atoms are not only affected by their nearby neighbors but also by distant ones. Hence, the rather large environment corresponding to this radius of about 10 A˚ should be considered to obtain accurate atomic multipole moments as well as atomatom electrostatic interaction energies. Question 5: What is the convergence behavior of interactions between groups of atoms, in particular, is the energy difference obtained from AMBER and QCT for a total group-group interaction smaller than that for individual atom-atom interactions? Here we make contact with the popular AMBER force field, and use atom-atom interactions between two different peptide groups [AC(@O)ANHA] as a case study. The electrostatic energies of all atom-atom interactions between a pair of peptide bonds are computed using both QCT and AMBER electrostatics. Do AMBER interaction energies start resembling QCT energies at increasing peptide bond interaction distance R? Figure 12 helps to answer this question, the answer to which will be affirmative. Figure 12 shows the energy difference of each atom-atom interaction, DE 5 EQCT 2 EAMBER, as a function of R. The energy difference DE of each atom-atom Journal of Computational Chemistry 2014, 35, 343–359

355

FULL PAPER

WWW.C-CHEM.ORG

Figure 11. The purely electrostatic (Coulomb) interaction energy (kJ mol21) between Ca (at the center of the horizon sphere) and four possible probing ˚ ; c) CaN, ˚ ): a) CaC, internuclear distance R 5 5.9 A˚; b) CaH, R 5 4.9 A atoms (C, H, N, and O, one for each panel) as a function of the sphere’s radius r (A ˚ ; d) CaO, R 5 9.1 A ˚ ; e) the total energy Esum of these four interactions versus the radius r of the horizon spheres. Note the different energy scales R 5 8.8 A between the five panels. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Table 10. The minimum radius values (A˚) of the “horizon sphere” for all elements Ca, H, O, N, and S at which distance the influence on their multipole moment, of the new atoms entering a growing horizon sphere, can be ignored. Atom H Ca O N S

r

N[a]

DE[b]

7.7 10.5 11.0 11.3 10.8

46 75 84 80 83

0.4 20.3 20.3 0.3 20.4

[a] N is the number of horizon spheres for each element. [b] DE refers to the total energy discrepancy (kJ mol21) from the reference sphere energy.

356

Journal of Computational Chemistry 2014, 35, 343–359

interaction is computed between the QCT multipolar electrostatic energy (L 5 10) and the corresponding AMBER point charge electrostatic energy. The distance R (between a pair of peptide bonds) is the average of all 16 internuclear distances between the four atoms in each peptide bond. The distances between the peptide groups are all greater than 5.0 A˚ to make sure that all the atom-atom interactions are convergent. Twelve pairs of peptide groups are selected in crambin with ˚ . A peptide group is distances changing from 6.0 to 27.0 A composed of the four atom types C, O, N, and H. Thus, there are in total 10 types of atom-atom interactions between two peptide bond groups: CC, NN, CN, CO, NO, OO, NH, CH, OH, and HH. Figure 12 plots the interaction energy differences DE WWW.CHEMISTRYVIEWS.COM

FULL PAPER

WWW.C-CHEM.ORG

Figure 12. The energy differences DE (kJ mol21) of all atom-atom interactions in a pair of peptide bonds between QCT (L 5 10) and AMBER electrostatics, versus the peptide bond distances R (A˚). There are 10 types of atom-atom interactions and one total interaction energy. The solid lines and dashed lines represent the energy differences when DE > 0 or < 0, respectively. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

(in logarithmic scale) versus the peptide bond distance R. The solid lines represent the energy differences when DE > 0, whereas the dashed lines those when DE < 0. From Figure 12 it is clear that the CC interaction, with a DE ˚ , shows the greatest value of over 667 kJ mol21 at R 5 6.1 A energy difference between QCT and AMBER electrostatics. The energy difference decreases visibly with increasing distance R. Finally, DE decreases to 151 kJ mol21 at R 5 27.8 A˚ for this interaction type. The second largest energy difference occurs for the NN interaction. Now the energy difference between ˚ , dropping QCT and AMBER is nearly 571 kJ mol21 at R 5 5.6 A 21 ˚ to 122 kJ mol at R 5 25.7 A. This interaction is followed by CN and CO interactions, both of which reach also high (in terms of magnitude) energy differences of 2564 kJ mol21 and ˚ , respectively. The 2534 kJ mol21 at R 5 6.4 and R 5 5.1 A energy differences of CN and CO interactions decrease (in terms of magnitude) to 2132 and 2108 kJ mol21, respectively, when the corresponding distances rise to 26.7 and 27.8 A˚. They are followed by NO and OO interactions. Unsurprisingly, the HH interactions possess the smallest energy difference DE, ˚ and 7 kJ mol21 at which is less than 30 kJ mol21 at R 5 5.6 A ˚ R 5 25.0 A, respectively. To sum up, the AB interactions (neither A nor B is H) have much greater energy discrepancies than AH and HH interactions. The energy profiles of AMBER and QCT start becoming more similar with increasing R, espe˚. cially at long range when R > 20.0 A From this analysis we see that the atom-atom electrostatic interaction energies calculated by the high-rank topological multipole moments are quite different from the values computed by the AMBER force field, in particular at short and medium range. We discuss a CN interaction as an example, where C is in one peptide group and N in another peptide group, about 10 A˚ away from each other. The AMBER force field assigns to C a point charge value of 10.74, in its own local units, which corresponds to 0.59 e. The N atom is assigned a value of 29.41 (or 20.52 e), and the internuclear ˚ . So, using a formula proposed by AMBER, distance RCN is 9.5 A

the CN interaction energy EAMBER 5 [10.74 3 (29.41)]/ 9.5 5 210.6 kcal mol21 5 244.7 kJ mol21. However, the QCT net charges (essentially monopole moments or QCT electronic population corrected for nuclear charge), for the same atoms C and N are 1.80 and 21.67 e, respectively. The corresponding charge-charge QCT electrostatic interaction energy is then EQCT 5 [1.80 3 (21.67)/(9.5 3 1.89)] 3 2625.5 5 2439.5 kJ mol21, which is about an order of magnitude larger than that of AMBER. Note that the numbers 1.89 and 2625.5 are the ˚ to bohr and from hartree to kJ conversion factors from A 21 mol , respectively. This difference is simply caused by the fact that the QCT net charge is around three times the AMBER charge, or 1.80/0.59 5 3.05 and 21.67/20.52 5 3.21. Therefore, EQCT is about 9.8 (3.05 3 3.21 5 9.8) times EAMBER. Finally, we ask if the energy difference obtained from AMBER and QCT for a total group-group interaction is smaller than that for individual atom-atom interactions. To answer this question, all the atom-atom interactions between two peptide bonds are summed to obtain the total group-group interaction energy. There are four atoms in a peptide bond and each atom in one peptide bond interacts with all atoms in the other peptide bond, so there are in total 16 atom-atom interactions for a pair of peptide bonds. The red profile in Figure 12 shows that the total energy difference of DE is greatly diminished compared with the atom-atom interactions at the ˚ is 119 kJ same distance. The energy discrepancy at R 5 6.0 A 21 21 mol and then drops to only 23 kJ mol when R reaches ˚ . Hence, the energy difference between AMBER and QCT 26.9 A is much smaller for the total group-group interactions than for individual atom-atom interactions. Two more comments are in place here. Vigne–Maeder and Claverie[70] and subsequently Day et al. [71] have shown that a correct reproduction of the electrostatic potential at relevant distances can be obtained if the multipolar expansion is truncated at quadrupole level, with centers on atoms as well as on bond-midpoints, or at octupole level with solely nuclear centers. Future QCT research could reveal if lower L ranks suffice exploiting work[72] that has already addressed off-nucleus expansion sites in the context of the topological approach. Also, in line with the two afore mentioned papers, other authors[73,74] demonstrated that atom- and bond-centered multipoles up to quadrupoles (augmented with a penetration correction) safely reproduce the Coulomb contribution from energy-decomposition analyses. In summary, a QCT atom-atom electrostatic energy is typically about 10 times larger than its AMBER counterpart. However, the energy difference between AMBER and QCT for the total group-group interactions is smaller than the individual atom-atom interactions. This is because the individual atomatom interaction energy differences cancel each other when summed. Second, the behavior of AMBER becomes closer to that of QCT with increasing peptide bond interaction distance.

Conclusions High-rank topological atomic multipole moments were employed to study the convergence behavior of the atom-atom Journal of Computational Chemistry 2014, 35, 343–359

357

FULL PAPER

WWW.C-CHEM.ORG

electrostatic interaction energy. An energy evaluated at in the interaction L 5 10 is taken as the “exact” (reference) energy, which is computationally faster than calculating exact 6D interaction energies. At HF/6–31G(d,p) level all 15 possible types of atom-atom interactions converge within 0.1 kJ mol21 from 2.1 A˚ ˚ for SS. Although this information was gathered for HH and 3.7 A from crambin it is valid for a completely different polypeptide called Ab1–42, with very few exceptions. The minimum internuclear distances of the convergence region Rc for all 15 interactions can be safely transferred from single amino acids to di and tripeptides. Electrostatic energies between two atoms in a single amino acid differ, on average, by less than 2 kJ mol21 from the energy between the same atoms in a dipeptide containing this amino acid. Almost half of the atom-atom energies differ by less than 1 kJ mol21. The B3LYP/aug-cc-pVTZ level of theory and the much more expensive MP2/aug-cc-pVTZ level yield convergence regions that are shifted to slightly larger interatomic distance (by 0.1 to 0.4 A˚) compared to HF/6–31G(d,p) level. The influence of the neighbors for each possible element (Ca, H, O, N, and S) was investigated by computing the minimum radius of the “horizon sphere” at which the influence on the central atom of new atoms, entering the growing sphere, can be ignored. The minimum radius values are all about 10 A˚ for the five investigated elements. Thus, atoms are not only affected by their nearby neighbors but also by distant ones. Hence, a large environment should be considered to obtain accurate atomic multipole moments as well as the atom-atom electrostatic interaction energies. Finally, a QCT atom-atom electrostatic energy is typically about 10 times larger than its AMBER counterpart. However, the energy difference between AMBER and QCT for the total group-group interactions is smaller than the individual atomatom interactions. This is because the individual atom-atom interaction energy differences cancel each other when summed. Second, the behavior of AMBER becomes closer to that of QCT with increasing peptide bond interaction distance. Keywords: quantum chemical topology • force field • electrostatics • protein • amino acids • peptides • level of theory • ab initio • Multipole Moments • convergence • crambin • Ab1-42

How to cite this article: Y. Yuan, M. J. L. Mills, P. L. A. Popelier, J. Comput. Chem. 2014, 35, 343–359. DOI: 10.1002/jcc.23469

]

Additional Supporting Information may be found in the online version of this article.

[1] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, P. A. Kollman, J. Am. Chem. Soc. 2002, 117, 5179. [2] V. Hornak, R. Abel, A. Okur, B. Strockbine, A. Roitberg, C. Simmerling, Proteins: Struct. Funct. Bioinf. 2006, 65, 712. [3] C. Sagui, L. G. Pedersen, T. A. Darden, J. Chem. Phys. 2004, 120, 73. [4] A. J. Stone, Science 2008, 321, 787. [5] G. M. Day, W. D. S. Motherwell, W. Jones, Cryst. Growth Des. 2005, 5, 1023. [6] J. W. Ponder, D. A. Case, Adv. Protein Chem. 2003, 66, 27.

358

Journal of Computational Chemistry 2014, 35, 343–359

[7] J. G.Vinter, J. Comput. Aided Mol. Des. 1994, 8, 653. [8] N. Gresh, G. A. Cisneros, T. A. Darden, J.-P. Piquemal, J. Chem. Theory Comput. 2007, 3, 1960. [9] R. J. Wheatley, A. S. Tulegenov, E. Bichoutskaia, Int. Rev. Phys. Chem. 2004, 23, 151. [10] A. Holt, J. Bostr€ om, G. Karlstr€ om, R. Lindh, J. Comput. Chem. 2010, 31, 1583. [11] G. M. Day, W. D. S. Motherwell, H. L. Ammon, S. X. M. Boerrigter, R. G. Della Valle, E. Venuti, A. Dzyabchenko, J. D. Dunitz, B. Schweizer, B. P. van Eijck, P. Erk, J. C. Facelli, V. E. Bazterra, M. B. Ferraro, D. W. M. Hofmann, F. J. J. Leusen, C. Liang, C. C. Pantelides, P. G. Karamertzanis, S. L. Price, T. C. Lewis, H. Nowell, A. Torrisi, H. A. Scheraga, Y. A. Arnautova, M. U. Schmidt, P. Verwer, Acta Crystallogr. Sect. B 2005, 61, 511. [12] N. Gresh, J. E.  Sponer, N. A. Spackova, J. Leszczynski, J. Sponer, J. Phys. Chem. B 2003, 107, 8669. [13] C. H. Faerman, S. L. Price, J. Am. Chem. Soc. 1990, 112, 4915. [14] C. Jelsch, V. Pichon-Pesme, C. Lecomte, A. Aubry, Acta Crystallogr Sect. D 1998, 54, 1306. [15] M. Woinska, P. M. Dominiak, J. Phys. Chem. A 2013, 117, 1535. [16] U. Koch, E. Egert, J. Comput. Chem. 1995, 16, 937. [17] C. Kramer, P. Gedeck, M. Meuwly, J. Chem. Theory Comput. 2013, 9, 1499. [18] K. E. Laidig, J. Phys. Chem. 1993, 97, 12760. [19] M. S. Gordon, L. Slipchenko, H. Li, J. H. Jensen, Annu. Rep. Comput. Chem. 2007, 3, 177. [20] N. Gresh, H. Guo, D. R. Salahub, B. P. Roques, S. A. Kafafi, J. Am. Chem. Soc. 1999, 121, 7885. [21] X. Zheng, C. Wu, J. W. Ponder, G. R. Marshall, J. Am. Chem. Soc. 2012, 134, 15970. [22] S. Cardamone, T. Hughes, P. L. A. Popelier, Chem. Soc. Rev. (in press). [23] P. L. A. Popelier, F. M. Aicken, J. Am. Chem. Soc. 2003, 125, 1284. [24] P. L. A. Popelier, F. M. Aicken, Chem. Eur. J. 2003, 9, 1207. [25] R. F. W. Bader, Atoms in Molecules. A Quantum Theory; Oxford University Press: Oxford, Great Britain, 1990. [26] P. L. A. Popelier, Atoms in Molecules. An Introduction; Pearson Education: London, Great Britain, 2000. [27] P. L. A. Popelier, In Structure and Bonding. Intermolecular Forces and Clusters, Vol. 115; D. J. Wales, Ed.; Springer: Heidelberg, Germany, 2005; p. 1. [28] P. L. A. Popelier, E. A. G. Br emond, Int. J. Quantum Chem. 2009, 109, 2542. [29] P. L. A. Popelier, F. M. Aicken, Chem. Phys. Chem. 2003, 4, 824. [30] P. L. A. Popelier, L. Joubert, D. S. Kosov, J. Phys. Chem. A 2001, 105, 8254. [31] P. L. A. Popelier, D. S. Kosov, J. Chem. Phys. 2001, 114, 6539. [32] L. Joubert, P. L. A. Popelier, Mol. Phys. 2002, 100, 3357. [33] M. Rafat, P. L. A. Popelier, J. Chem. Phys. 2006, 124, 144102. [34] M. Rafat, P. L. A. Popelier, J. Chem. Phys. 2005, 123, 204103. [35] D. S. Kosov, P. L. A. Popelier, J. Chem. Phys. 2000, 113, 3969. [36] D. S. Kosov, P. L. A. Popelier, J. Phys. Chem. A 2000, 104, 7339. [37] P. L. A. Popelier, M. Rafat, Chem. Phys. Lett. 2003, 376, 148. [38] C. J. F. Solano, A. M. Pendas, E. Francisco, M. A. Blanco, P. L. A. Popelier, J. Chem. Phys. 2010, 132, 194110. [39] L. Joubert, P. L. A. Popelier, Phys. Chem. Chem. Phys. 2002, 4, 4353. [40] M. S. Shaik, M. Devereux, P. L. A. Popelier, Mol. Phys. 2008, 106, 1495. [41] S. Y. Liem, P. L. A. Popelier, M. Leslie, Int. J. Quantum Chem. 2004, 99, 685. [42] S. Y. Liem, M. S. Shaik, P. L. A. Popelier, J. Phys. Chem. B 2011, 115, 11389. [43] S. Liem, P. L. A. Popelier, J. Chem. Phys. 2003, 119, 4560. [44] S. Y. Liem, P. L. A. Popelier, J. Chem. Theory Comput. 2008, 4, 353. [45] A. J. Stone, The Theory of Intermolecular Forces, 1st ed.; Vol. 32.; Clarendon Press: Oxford, 1996. [46] C. M. Handley, G. I. Hawe, D. B. Kell, Popelier, Phys. Chem. Chem. Phys. 2009, 11, 6365. [47] C. M. Handley, P. L. A. Popelier, J. Chem. Theory Comput. 2009, 5, 1474. [48] M. G. Darley, C. M. Handley, P. L. A. Popelier, J. Chem. Theory Comput. 2008, 4, 1435. [49] M. J. L. Mills, P. L. A. Popelier, Comput. Theor. Chem. 2011, 975, 42. [50] M. J. L. Mills, P. L. A. Popelier, Theor. Chem. Acc. 2012, 131, 1137.

WWW.CHEMISTRYVIEWS.COM

FULL PAPER

WWW.C-CHEM.ORG

[51] P. L. A. Popelier, In AIP Conference Proceedings, Vol. 1456; 2012; pp. 261–268. [52] P. L. A. Popelier, Curr. Top. Med. Chem. 2012, 12, 1924. [53] W. Wang, R. D. Skeel, J. Chem. Phys. 2005, 123, 164107. [54] M. J. Mills, G. I. Hawe, C. M. Handley, P. L. A. Popelier, Phys. Chem. Chem. Phys. 2013, 15, 18249. [55] D. R. Garmer, W. J. Stevens, J. Phys. Chem. 1989, 93, 8263. [56] M. Rafat, P. L. A. Popelier, J. Comput. Chem. 2007, 28, 832. [57] C. H. Van Etten, H. C. Nielson, J. E. Peters, Phytochemistry 1965, 4, 467. [58] J. P. Ritchie, Chem. Phys. Lett. 2004, 387, 243. [59] B. A. Wallace, N. Kohl, M. M. Teeter, Proc. Natl. Acad. Sci. 1984, 81, 1406. [60] L. Lobb, B. Stec, E. K. Kantrowitz, A. Yamano, V. Stojanoff, O Markman, M. M. Teeter, Protein Eng. 1996, 9, 1233. [61] D. Bang, N. Chopra, S. B. H. Kent, J. Am. Chem. Soc. 2004, 126, 1377. [62] P. L. A. Popelier, Comput. Phys. Commun. 1996, 93, 212. [63] M. Rafat, M. Devereux, P. L. A. Popelier, J. Mol. Graph. Model. 2005, 24, 111. [64] F. M. Aicken, P. L. A. Popelier, Can. J. Chem. 2000, 78, 415. [65] M. Rafat, PhD thesis; UMIST: Manchester, 2005. [66] C. Haettig, Chem. Phys. Lett. 1996, 260, 341.

[67] T. A. Keith, AIMAll, 1997–2010. Available at: http://aim.tkgristmill.com. Accessed 14 Oct 2013. [68] A. M. Pendas, V. Luana, PROMOLDEN, 1st ed.; University of Oviedo: Spain, 2002. [69] L. Lorenzo, M. J. G. Moa, M. Mandado, R. A. Mosquera, J. Chem. Inf. Model. 2006, 46, 2056. [70] F. Vigne-Maeder, P. Claverie, J. Chem. Phys. 1988, 88, 4934. [71] P. N. Day, J. H. Jensen, M. S. Gordon, S. P. Webb, W. J. Stevens, M. Kraus, D. Garmer, H. Basch, D. Cohen, J. Chem. Phys. 1996, 105, 1968. [72] L. Joubert, P. L. A. Popelier, Mol. Phys. 2002, 100, 3357. [73] M. S. Gordon, M. A. Freitag, P. Bandyopadhyay, J. H. Jensen, V. Kairys, W. J. Stevens, J. Phys. Chem. A 2001, 105, 293. [74] J.-P. Piquemal, N. Gresh, C. Giessner-Prettre, J. Phys. Chem. A 2003, 107, 10353. [75] W. Humphrey, A. Dalke, K. Schulten, J. Molec. Graphics 1996, 14, 33.

Received: 18 June 2013 Revised: 4 September 2013 Accepted: 29 September 2013 Published online on 29 October 2013

Journal of Computational Chemistry 2014, 35, 343–359

359

Multipolar electrostatics for proteins: atom-atom electrostatic energies in crambin.

Accurate electrostatics necessitates the use of multipole moments centered on nuclei or extra point charges centered away from the nuclei. Here, we fo...
1MB Sizes 0 Downloads 0 Views