Computational contributions to chemistry, biological chemistry and biophysical chemistry: the 2013 Nobel Prize in Chemistry.

Anal Bioanal Chem (2014) 406:1825–1828 DOI 10.1007/s00216-013-7575-9

FEATURE ARTICLE

Computational contributions to chemistry, biological chemistry and biophysical chemistry: the 2013 Nobel Prize in Chemistry José A. Sordo

Published online: 23 January 2014 # Springer-Verlag Berlin Heidelberg 2014

The Royal Swedish Academy of Science has awarded several Nobel Prizes in Chemistry to scientists making groundbreaking contributions to theoretical and computational chemistry over the years. Linus Pauling was awarded a Nobel Prize in 1954 for his research into the nature of the chemical bond. In 1966, the Academy awarded Robert S. Mulliken a Nobel Prize for fundamental work concerning chemical bonds and the electronic structure of molecules determined by the molecular-orbital method. In 1981, the Nobel Prize in Chemistry was shared by Kenichi Fukui and Roald Hoffmann for their theories concerning the course of chemical reactions. Rudolph A. Marcus received the 1992 Nobel Prize for his contributions to the theory of electron-transfer (ET) reactions in chemical systems. The last Nobel Prize for work in theoretical and computational chemistry was awarded in 1998 and divided equally between Walter Kohn, for his development of density functional theory (DFT), and John A. Pople, for his development of computational methods in quantum chemistry. The Academy has decided to award the 2013 Nobel Prize in Chemistry to Martin Karplus, Michael Levitt, and Arieh Warshel for the development of multiscale models for complex chemical systems. No doubt, the new Nobel laureates might well borrow Newton’s statement: “If I have seen further, it is by standing on the shoulders of giants”.

Computer simulations Computer simulations represent a theoretical tool that bridges the gap between the micro- and macroscopic worlds through the use of statistical mechanics [1]. One starts from the

J. A. Sordo (*) Departamento de Química Física y Analítica, Laboratorio de Química Computacional, Facultad de Química, Universidad de Oviedo, Julián Clavería 8, 33006 Oviedo, Spain e-mail: [email protected]

microscopic components of matter, namely, atoms (nuclei and electrons), which adopt a specific geometrical disposition, as well as from the laws governing their mutual interactions. Then, statistical mechanics makes use of that information to generate theoretical predictions for the macroscopic properties of experimental interest. Let us focus on a common macroscopic system: a given volume (V) of water at a given temperature (T) containing a given number of molecules (N). From a dynamic microscopic viewpoint, the huge amount of water molecules present are in continuous motion. When we measure a given property, P(rN, pN, t), we obtain an average value,

time, that corresponds to the contributions from the instantaneous values adopted by property P at different points in the phase space (space and momentum coordinates) as the system evolves in time during the measurement process. From a static microscopic viewpoint, an averaged value for P,

ensemble, could also be obtained by performing what is called an ensemble average over all possible quantum states of the system under the assumption that every quantum state with a given energy is equally likely to be occupied. While in the so-called molecular dynamics (MD) approach one estimates the time-averaged property, the Monte Carlo (MC) technique yields the corresponding ensemble-averaged values. It can be shown that, under the so-called ergodic hypothesis, namely, that every accessible point in configuration space can be reached in a finite number of MC steps from any other point, both averages should be equivalent and represent the observed value of the property considered. The MD approach has the great advantage of rendering not only the static properties of the system under study (those obtained from MC simulations) but also its dynamic properties. Some relevant static (thermodynamic) properties are the radial distribution function (a microscopic property that can be experimentally determined), temperature, pressure, or heat capacity (macroscopic). The time-correlation functions (microscopic)

1826

J.A. Sordo

and viscosity or thermal conductivity (macroscopic) are among the most important dynamic properties. Appropriate sample techniques (using appropriate Boltzmann weighting) are required to estimate thermodynamic functions such as entropy and Helmholtz or Gibbs free energies. Simulations, particularly the MC technique, partly emerged from wartime scientific research at some of the US DOE National Laboratories, where the calculations of nuclear cross sections were crucial in the development of thermonuclear weapons. The very first calculations were published in the 1950s [2, 3]. In an MD simulation, we start from an N particle system with a given initial geometric disposition and then we solve Newton’s equations of motion, which are assumed to be valid in the microscopic world (motions of atoms are treated in a classical way), mi

d 2 ri ¼ −∇i V ðr1 ; r2 ; …rN Þ i ¼ 1; 2; …N dt 2

ð1Þ

in which mi is the mass and ri is the position of the ith particle; V(r1, r2, … rN) represents the potential at instant t. The problem is very much like that dealt with more than two centuries ago by Laplace in his celebrated Mécanique Céleste; the main difference is that, in the present case, electromagnetic forces, which are relevant at the microscopic level, replace gravitational forces that govern the macroscopic (celestial) world. There are very efficient numerical algorithms that, coupled with appropriate computational resources, allow MD simulations to be carried out today on very large systems (104–106 atoms) and timescales beyond the microseconds limit. We start from a given initial configuration by assigning initial positions and velocities to all particles in the system and then we proceed to solve Eq. (1) by employing finite difference methods to obtain a full trajectory ri(t) in a number of time steps. Usually, several shorter trajectories, under different initial conditions, will provide a better understanding of the system under study than the one with a very long trajectory.

Multiscale modeling In 1975, Levitt and Warshel performed a computer simulation of protein folding [4] on the protein bovine pancreas trypsin inhibitor (BPTI) as a model; this is a small protein for which the X-ray structure was available at that time. The subject presented an enormous challenge because of the computational resources required. To make the problem computationally tractable, the authors employed a coarse-grained model in which pseudoatoms (spheres with an effective potential) represented the solvated side chains of the protein. Despite the drastic simplifications involved, the model succeeded in that it was able to provide a plausible final conformation for the

protein. That is, the native structure was recovered from the starting unfolded structure. In multiscale modeling, multiple models at different scales are employed simultaneously to describe a given system. To perform simulations of chemical reactions involving large (complex) biological molecules, the small reactive areas (usually tens of atoms) can be treated at higher (computationally more expensive) theoretical levels (quantum mechanics (QM)) while the chemically “inert” regions (thousands of atoms) are described at the less sophisticated classical molecular mechanics (MM; classical techniques with no wave functions) level (much less computationally demanding), which employs empirically (or semiempirically) parametrized force fields. The resulting multiscale modeling is called the hybrid QM/MM approach. The introduction of the QM/MM model to tackle MD simulations in biomolecules was made by Warshel and Levitt in a seminal paper in 1976 on theoretical studies of enzymatic reactions [5]. The whole enzyme–substrate complex was considered, but, although the energy and charge distribution of the atoms directly involved in the reaction were treated quantum mechanically, the rest of atoms, including the surrounding solvent, were represented by classical forces. The same authors also implemented Shneior Lifson’s consistent force field (CFF) in a computer code that was the seed of some of the more popular programs in today’s computational biology, namely, Chemistry at HARvard Molecular Mechanics, developed by the Karplus group (CHARMM), Assisted Model Building with Energy Refinement (AMBER), developed by the Kollman group, and GROningen MOlecular Simulation (GROMOS), developed by the van Gunsteren group. Figure 1 illustrates the QM/MM approach for the case of the complex (2665 atoms) formed by the interaction of the MMP-2 metalloenzyme (ribbon model) with a peptide substrate (balland-stick model). The structure was taken from a snapshot of an MD simulation using QM/MM multiscale modeling. The QM region (100 atoms) consists of the catalytically active MMP-2 residues and the peptide linkage. The MM region includes 1700 water molecules. The inset in Fig. 1 shows charge density embedding in the QM region obtained by DFT. The general form of a force field can be written as, V ðr1 ; r2 ; …rN Þ ¼ V non−bonded þ V bonded þ V solvation þ V water ð2Þ with appropriate analytical expressions for the terms appearing on the right-hand side of Eq. (2); all of them have functional simplicity in common to minimize computational costs. Thus, for example, usually, Vnon-bonded adopts a form similar to, " # XX qi q j Aij Bij V non−bonded ¼ þ 12 − 6 ð3Þ i j>i 4πε ε r rij rij 0 r ij

Computational contributions: the 2013 Nobel Prize in chemistry

1827

Fig. 1 Ribbon model of the catalytic domain of the MMP-2 metalloenzyme. The inset shows the active site in the complex formed with a peptide substrate

Equation (3) is a combined Coulomb and Lennard-Jones potential, in which qi are atomic charges, Aij and Bij are fitting parameters, and rij are interatomic distances. Although the parameters used to define every term in Eq. (2) usually have an empirical or even semiempirical origin, parametrizations based on first-principles (QM) calculations have also been very popular. A good example is provided by the so-called Matsuoka–Clementi–Yoshimine (MCY) potential for water–water interactions (Vwater term in Eq. (2)) developed in the 1970s [6] at the IBM Research Laboratories in San Jose (California). In 1985, Car and Parrinello proposed an alternative approach called ab initio (first principles) MD, in which Newton’s equations (Eq. (1)) are solved for nuclei dynamics while the Schrödinger equation, within the DFT formalism, is used to describe the electronic motion [7]. The ab initio MD technique is a multiscale modeling technique in which interatomic forces are computed on the fly by using DFT, while the nuclei propagate by obeying classical mechanics. Despite its high efficiency [8], it represents a much more computer-timeconsuming approach than the QM/MM methods.

Some groundbreaking contributions from the laureates Of course, there are a plethora of problems in which molecular simulations using multiscale models have provided information of paramount importance for the understanding of processes of physical, chemical, and biological interest. In the following, a sample of subjects is given, in which the 2013

Nobel laureates have made representative breakthrough contributions. As early as 1969, Levitt published articles showing the huge utility of CFF to provide invaluable information on systems of biological interest. In one of those contributions, the first energy minimization of an entire protein structure (myoglobin and lysozyme) was presented [9]. The authors developed a refinement procedure to improve structural information from X-ray diffraction measurements. A second breakthrough consisted of the sequence analysis of tRNA [10]. Levitt used CFF to refine the Cartesian coordinates and proposed a model for tRNA that was energetically stable and stereochemically plausible. Thus, research into computational biology was underway. Later, Warshel made the transition from energy minimization calculations towards molecular simulations by publishing, in 1976, the first MD simulation study on the dynamics of a biological process: the vision process [11]. A detailed model for the sequence of events in the first step of the vision process, which starts with the absorption of light by the protonated Schiff base of retinal chromophore bound to the active site of the protein rhodopsin, was provided by the MD simulation. The structure of rhodopsin was not yet available, and a simplistic model was adopted for the protein–chromophore phase. Even so, the main experimental observations were reproduced and explained how the protein made the photoisomerization process unique. Soon after, in 1977, McCamon, Gelin and Karplus published their “Dynamics of folded proteins” [12] after performing the very first MD simulation of a macromolecule of biological

1828

interest: the BPTI model protein on which Levitt and Warshel had carried out their coarse-grained model study previously [4]. It consisted of a 9.2-ps trajectory and was performed in vacuum. Despite its crudeness (from today’s perspective), Karplus pointed out that the results obtained were “instrumental in replacing our view of proteins as relatively rigid structures with the realization that they were dynamic systems, whose internal motion plays a functional role. The new understanding of protein dynamics subsumed the static picture in that the average positions are still useful for the discussion of many aspects of biomolecular function in the language of structural chemistry. However, the recognition of the importance of fluctuations opened the way for more sophisticated and accurate interpretations of functional properties”. In the 1980s, Warshel published the first microscopic simulation of ET reactions in condensed phases [13], as well as the first simulation of an enzymatic reaction [14]. ET reactions represent fundamental processes in photochemistry, for which MD results helped to elucidate the single step versus stepwise mechanistic alternatives in bacterial photosynthesis. Regarding enzyme catalysis, MD simulations contributed toward understanding the role played by the enzyme in lowering the energy barrier associated with the transition structure (TS), thus controlling the reaction rate. The MD calculations render the dynamic information required to rationalize why enzyme– TS binding is energetically more favorable than that of enzyme–substrate binding. Also, throughout the 1980s, the so-called simulated annealing methods, commonly employed for X-ray structure refinement [15] and nuclear magnetic resonance (NMR) spectroscopic structure determination [16], were investigated. The interpretations of the relaxation rates (T1 and T2) and nuclear Overhauser enhancement (NOE) measurements, in terms of MD calculations, are particularly relevant [17]. The present state-of-the-art, massive parallel computers, coupled with available powerful parallel software, make it possible to open up the MD simulation world to very exciting research fields in which more and more complex biological processes can be dealt with in real time. Indeed, continued progress in computational technology (software and hardware) allows one to look forward, with rational optimism, to the applications of MD techniques to the cellular scale in the near future. It is clear that properly processing the data libraries available, in which all the parameters and properties arising from computer simulations are recorded together with the corresponding mechanistic information on a wide variety of chemical, biophysical, or biochemical processes, through the tools provided by bioinformatics should be most useful to stimulate further progress in bioanalytical chemistry [18]. To finish this brief overview on the contributions of the three laureates for the 2013 Nobel Prize in Chemistry, it is apt

J.A. Sordo

to quote the prophetic words of one of the founding fathers of theoretical and computational chemistry, Robert S. Mulliken, from his 1966 Nobel Lecture: “I would like to emphasize strongly my belief that the era of computing chemists, when hundreds if not thousands of chemists will go to the computing machine instead of the laboratory for increasingly many facets of chemical information, is already at hand. There is only one obstacle, namely that someone must pay for the computing time”. Acknowledgments I am indebted to Prof. Alfredo Sanz-Medel (University of Oviedo) for helpful discussions on the relevance of bioanalytics in the present state of the art of analytical chemistry. I also thank Prof. Dimas Suárez (University of Oviedo) for preparing Fig. 1. Financial support from the Ministerio de Educación y Ciencia of Spain (Grant QCT2010-16864) is acknowledged.

References 1. Frenkel D, Smit B (2002) Understanding Molecular Simulations. From Algorithms to Applications. Academic Press, San Diego 2. Metropolis N, Rosenbluth AW, Teller AH, Teller E (1953) J Chem Phys 21:1087–1092 3. Alder BJ, Wainwright TE (1957) J Chem Phys 27:1208–1209 4. Levitt M, Warshel A (1975) Nature 253:694–698 5. Warshel A, Levitt M (1976) J Mol Biol 103:227–249 6. Matsuoka O, Clementi E, Yoshimine M (1976) J Chem Phys 64: 1351–1361 7. Car R, Parrinello M (1985) Phys Rev Lett 55:2471–2474 8. Marx D, Hutter J (2009) Ab Initio Molecular Dynamics. Basic Theory and Advanced Methods. Cambridge University Press, Cambridge 9. Levitt M (1969) Lifson. J Mol Biol 46:269–279 10. Levitt M (1969) Nature 224:759–763 11. Warshel A (1976) Nature 260:679–683 12. McCammon JA, Gelin BR, Karplus M (1977) Nature 267:585–590 13. Warshel A (1982) J Phys Chem 86:2218–2224 14. Warshel A (1984) Proc Natl Acad Sci U S A 81:444–448 15. Brünger AT, Kuriyan J, Karplus M (1987) Science 235:458–460 16. Brünger AT, Clare GM, Gronenborn AM, Karplus M (1986) Proc Natl Acad Sci U S A 83:3801–3805 17. Dobson CM, Karplus M (1986) Methods Enzymol 131:362–389 18. Manz A, Pamme N, Iossifidis D (2004) Bioanalytical Chemistry. Imperial College Press, London

José A. Sordo is Professor of Physical Chemistry at the University of Oviedo (Principado de Asturias, Spain). He carried out postdoctoral research on atomic structure at the University of Alberta (Canada) with Prof. Serafín Fraga and collaborated with Prof. Enrico Clementi at IBM-Kingston (New York) in the development of amino acid–amino acid ab initio pair potentials. His present interests include chemical kinetics studies using statistical theories.

Nobel 2013 Chemistry: Methods for computational chemistry.

Eppur si muove! The 2013 Nobel Prize in Chemistry.

Nobel Prize in Chemistry: seeing the nanoscale.

The significance of the 2013 Nobel Prize in Chemistry and the challenges ahead.

The 2014 Nobel Prize in Chemistry: a large-scale prize for achievements on the nanoscale.

Biophysical chemistry: Strength in numbers.

Foreword: applied computational chemistry.

Applied computational chemistry.

Contributions to analytical chemistry of vitamin B12.

The Nobel Prize in Chemistry 2014 for the development of super-resolved fluorescence microscopy.

Allicin: chemistry and biological properties.

Design, Synthesis, Action! Molecular Machines Take the 2016 Nobel Prize in Chemistry.

ATMOSPHERIC CHEMISTRY. Atmospheric radical chemistry revisited.

BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology.

Digital chemistry in the Journal of Medicinal Chemistry.

Beijing Symposium 2013: frontiers in organic chemistry.

Early Experiences with Computational Quantum Chemistry.

The Journal of Biological Chemistry: 2016 Onward.

Editorial overview: bioinorganic chemistry: recent advances in bioinorganic chemistry.

Chemistry and the law.

Chemistry masterclass.

Retraction notice to "Protein dynamics by neutron scattering" [Biophysical Chemistry 182 (2013) 16-22].

Allenes and computational chemistry: from bonding situations to reaction mechanisms.

Organophosphorus chemistry.