Available online at www.sciencedirect.com

ScienceDirect Hybrid methods for macromolecular structure determination: experiment with expectations Gunnar F Schro¨der1,2 Studies of large and heterogeneous macromolecules often yield low-resolution data that alone does not suffice to build accurate atomic models. Adding information from molecular simulation or other structure prediction methods can lead to models with significantly better quality. Different strategies are discussed to combine experimental data with results from simulation and prediction. This review describes recent approaches for building atomic models with a focus on X-ray diffraction and single-particle cryo-electron microscopy (cryoEM) data. In addition, both cryo-EM and X-ray diffraction provide information on molecular dynamics. Therefore, the best description of molecular structures is often by an ensemble of models. It furthermore becomes apparent that using raw data for the modeling ensures that all information obtained by the experiment can be fully exploited. It is also important to quantify the errors of both experiment and simulation to correctly weigh their different contributions. Addresses 1 Institute of Complex Systems, Structural Biochemistry (ICS-6), Forschungszentrum Ju¨lich, 52425 Ju¨lich, Germany 2 Physics Department, Heinrich-Heine Universita¨t Du¨sseldorf, 40225 Du¨sseldorf, Germany Corresponding author: Schro¨der, Gunnar F ([email protected])

birth can be traced to the method proposed by Jack and Levitt in 1978 [1] who introduced a hybrid energy function to optimize the energy of a protein at the same time as its fit to X-ray diffraction data. The fit of the protein model to the data has been defined ad hoc as an energy term EData, which is added with a weight w to the molecular mechanics energy EMM of the protein EHybrid ¼ EMM þ wEData : EData in its most basic form simply describes the deviation of experimental observables P from those calculated from the model, e.g. EData = hkl(F obs(hkl)  F calc(hkl))2 for diffraction data. Minimization of this combined hybrid energy function, EHybrid, yields a refined structure that fulfills both the restraints imposed by the experimental data, as well as the stereo-chemical restraints, which represent information we have on protein structures in general. Such refinement is instrumental in the interpretation of the data, e.g. in the case of crystallographic data, to improve phase information. Later this hybrid approach made it also possible to determine protein structures using restraints derived from NMR experiments.

Current Opinion in Structural Biology 2015, 31:20–27 This review comes from a themed issue on Theory and simulation Edited by Claire Lesieur and Klaus Schulten

http://dx.doi.org/10.1016/j.sbi.2015.02.016

These early developments lead 30 years later to the notion that including as much information as possible is the best way of building models that are as accurate as possible [2]. A tour de force in such exhaustive integrative modeling was the determination of the molecular architecture of the nuclear pore complex [3,4].

0959-440X/# 2015 Published by Elsevier Ltd.

Hybrid modeling in structural biology describes the combination of computational modeling with experimental data to determine macromolecular structures (cf. Figure 1). It has become particularly important to determine structures with low-resolution or sparse data, where the data alone would not suffice to build molecular models. Hybrid modeling is often used synonymously with integrative modeling, which emphasizes more the simultaneous use of different types of experimental information in the structure determination process.

This review focuses on the use of intermediate to lowresolution X-ray diffraction and single-particle cryo-electron microscopy (cryo-EM) data to determine atomic models of protein structures. In particular cryo-EM have made tremendous progress in the past few years, which spurred the development of several new computational model-building techniques. The classical approach of building a single model that best fits the data is still prevalent even though the uncertainty in the modeling process could be captured more appropriately by generating an entire ensemble of models. However, the determination of model ensembles poses significant challenges, as will be discussed further below.

Refinement of a single model Even though hybrid modeling in structural biology is a highly modern topic and a very active field of research, its Current Opinion in Structural Biology 2015, 31:20–27

At intermediate to low-resolution (4–8 A˚) the observation-to-parameter ratio is too low to completely www.sciencedirect.com

Hybrid methods for macromolecular structures Schro¨der 21

Figure 1

refined high-resolution X-ray structure

ab-initio prediction

unrefined model

Model Accuracy/ Information Content

Quality of Prediction/ Level of Theory / Sampling

homology model with 95% sequence identity

Amount of Experimental Information Current Opinion in Structural Biology

The motivation for hybrid modeling is that the more information is used to build a model the more accurate it will be. Both experiment and simulation should be considered information, both improve the accuracy of a structural model.

determine the atomic coordinates from the data alone. This problem can be solved by either reducing the number of parameters (e.g. by allowing only torsional degrees of freedom) or by adding information. The additional information could come from very different sources. For example known structures of homologous proteins can be used to guide the refinement and effectively reduce the number of degrees of freedom, since it is known that homologous proteins fold into similar structures. Approaches that exploit this similarity such as the deformable elastic network (DEN) [5,6], jelly-body, or reference model [7–9] restraints have been implemented in crystallographic refinement programs. Another source of additional information are simulation and structure prediction techniques, which use molecular mechanics force fields. Such force fields bias and confine the sampling of the conformational space to physically realistic and energetically favorable conformations. When interpreting both the experimental observations as well as the added molecular mechanics energy function as general information about the protein structure, the information-to-parameter is increased, which facilitates the structure determination. For example electrostatic interactions had disappeared completely from standard crystallographic refinement procedures, but it has recently been shown to improve the refinement [10]; even at high resolution when accounting for polarizability and anisotropic structure factors [11,12]. More exhaustive sampling of protein conformations using all-atom explicit solvent MD simulations allow for larger www.sciencedirect.com

conformational changes and can lead to significant improvement of the refinement with increased radius of convergence and better phases [13] as compared to standard crystallographic refinement. Similarly, the combination of energy-guided remodeling by the program Rosetta with the Autobuild and (now also real-space [14]) refinement tools of the program Phenix shows significant improvement in the refinement [15,16]. This clearly demonstrates that force field/energy functions can provide valuable information that help to build better models or even solve structures that could otherwise not have been solved. The same strategies have been followed in the refinement of (high-resolution) X-ray protein structures against (lower resolution) cryo-EM density maps, which is also referred to as flexible fitting. Several flexible fitting methods have been developed and they, again, differ in the type of additional information that is used during the structure refinement: Normal mode based approaches (e.g. NORMA [17], NMFF [18]) directly deform the high-resolution structure along the first few normal modes and thereby reduce the number of degrees of freedom. Other methods use information from a reference model (typically the starting high-resolution structure): For example the program DireX [19] employs DEN restraints (but no normal modes) and MDfit [20] uses Go-type structure-based potentials. Another class of methods rely mostly on molecular energy functions (force fields), such as the molecular dynamics based fitting methods MDFF [21] or Tama’s approach [22]; although in practice some MD based fitting methods often use restraints to the starting structure (e.g. to maintain secondary structure). MDFF has been applied successfully also to very large systems, for example for building a model for the entire HIV capsid [23]. Since flexible fitting is not the focus of this review the existing programs are far from completely covered and the reader is referred to very good reviews on flexible fitting and modeling with cryo-EM density maps which have been recently published [24–27]. Most of these flexible fitting methods work in real-space, however, crystallographic refinement programs which work in reciprocal space have also been used successfully [28] for this task. In that case the EM density map needs to be converted to structure factors. However, how to best translate errors in the EM reconstructions (e.g. quantified by Fourier shell correlation) to errors in the complex structure factors for use in maximum likelihood target functions still remains to be worked out. When fitting structures against low-resolution data, overfitting is a major concern. The standard tool to detect over-fitting is cross-validation, where a portion of the data (the test set), which needs to be independent from the Current Opinion in Structural Biology 2015, 31:20–27

22 Theory and simulation

rest, is not used for model fitting but only for model validation. In X-ray crystallographic structure refinement Rfree with randomly selected structure factors for the test set is the standard cross-validated measure [29]. In practice by searching for structures with low Rfree values one effectively optimizes for Rfree, which is in principle problematic [30]. Furthermore, independence of the test set from the work set is not always negligible, in particular for high solvent content or high non-crystallographic symmetry for which different test set selection procedure should be applied [31]. It should also be noted that leaving out structure factors has detrimental effects in real-space for which solutions have been described [32]. For cryo-EM, two cross-validation methods for the refinement of models into cryo-EM densities have been recently presented; one splits the set of particle images into two parts [33], and the other splits the density into a high-resolution and low-resolution part [34]. When using reciprocal space refinement the free phase residual could also be used as validation measure [35]. In general it needs to be carefully assessed whether flexible refinement is justified with a given data set (ideally by cross-validation) or whether rigid-body docking (using less degrees of freedom) should be preferred. If the resolution is too low, such that a structure cannot be refined without over-fitting, the placement of rigid domains of the protein (referred to as rigid-body fitting) could yield valuable insights about the arrangement of these domains in a complex. Several programs have been developed to perform this task [36–40]. Recent work has focussed on solving the combinatorial problem of placing many models into very low-resolution density maps [41–43]. Furthermore, the error of placing rigid domains into low resolution density maps has been described recently [44].

De novo modeling at intermediate resolution In X-ray crystallography at intermediate to low resolution, the phase information is initially often weak, resulting in erroneous and fragmented electron density. Therefore, the complete model cannot be built at once, but needs to be improved and extended iteratively while the phase information (and with it the electron density) improves. Furthermore, sequence assignment to the electron density is very difficult if initially only small model fragments can be built and side-chain densities are not clearly visible. In contrast to X-ray crystallography, cryo-EM yields also phase information and therefore allows for computing a three-dimensional density map directly from the particle images. This makes it possible to build the entire model at once by using global optimization techniques. At intermediate to low resolution (4–8 A˚), models can be built by detecting secondary structure elements, which can then be connected by modeling the missing loop segments [45–47]. Current Opinion in Structural Biology 2015, 31:20–27

Cryo-EM has made amazing progress in increasing the resolution in the past few years, mostly due to a new electron detector technology [48]. Resolutions below 4 A˚ have been reported even for particles with no or low symmetry [49–52] and resolutions of even better than 3 A˚ seem to be in reach. At such resolutions it is possible to build protein structures de novo from the density [53,54]. Traditionally, the first step in building protein models is to trace the backbone in the density. The Pathwalker approach [55] treats the tracing procedure as a Traveling Salesman Problem, where each region in the density map needs to be visited exactly once. This provides a powerful additional restraint on finding the correct topology. Another promising although computationally expensive global optimization approach is the ACMI program [56,57], which combines a local matching procedure and a global constraint procedure in a probabilistic framework. Breaking the 3 A˚ barrier also seems possible when correcting for spherical aberrations. In this resolution range standard automatic modeling tools for X-ray crystallography should also be well applicable with minor adjustments to EM derived density maps.

Uncertainty, error and dynamics There is always an uncertainty in exactly where to place an atom. This uncertainty can be due to both missing (or ambiguous) experimental data and/or conformational variance and leads to decreased precision. Errors of the data and the force field as well as incomplete sampling lead to a decreased model accuracy. When building a model the uncertainty in the atomic coordinates needs to be taken into account. A good overview of uncertainty in integrative structural modeling is given in Ref [58]. The uncertainty can be represented by an ensemble of models as usual in NMR, or by B-factors in crystallography. However, how much of this uncertainty is due to just missing data or actually conformational variance is often unclear and cannot easily be decided. B-factors describe any kind of disorder, which could arise from conformational variance, crystalline disorder, or any mismatch of model and data. It has been suggested that the extent of atomic motions is significantly underestimated by B-factor refinement [59], especially when distinct alternate conformations are sampled [60]. It becomes increasingly clear that even with high-resolution X-ray diffraction data (and even more so with low-resolution data), a single model might often not be sufficient to describe the experimental data, due to conformational heterogeneity and dynamics [61,62,63,64]. Interestingly, careful analysis of electron density by estimating its noise level reveals hidden conformational motions, low-occupancy alternative conformations and ligands [65,66,67]. www.sciencedirect.com

Hybrid methods for macromolecular structures Schro¨der 23

Two interpretations of a model ensemble need to be distinguished (cf. Figure 2): In the first interpretation, the ensemble represents the uncertainty and the width of the ensemble reflects the fact that the amount of restraints (structural information) that the experiment provides is limited. In that case the aim is to find the broadest ensemble that is in agreement with data, according to the principle of maximum entropy. In this case increasing the size of the ensemble does not increase the number of parameters. In the second interpretation, it is assumed that the data provide information on the dynamics of the molecule and that the ensemble therefore represents true conformational

variance, which means that the refinement of an (restrained) ensemble increases the number of parameters. In that case the aim is to find the smallest ensemble that describes the data well (in accordance to Occam’s razor) to keep the parameter-to-observations ratio low. In this case increasing the size of the ensemble does increases the number of parameters. Following the second interpretation, the Sparse Ensemble Selection (SES) method selects the smallest (sparsest) non-uniformly weighted representative ensemble that explains the experimental data to within a desired error. SES does not require any prior information such as a force field; the only restraint is sparsity [68].

Figure 2

Distribution describes Uncertainty (a) Bayes formalism (with experimental error)

Distribution describes Dynamics (c) Predicted and measured ensemble without errors

posterior

likelihood

prior

coordinates

coordinates

(b) Minimally biased ensemble by maximum entropy (no experimental error) measured ensemble average predicted ensemble average

(d)

Ensembles with errors

coordinates

Distribution predicted by simulation

coordinates

Experimental data

Experiment and prediction combined Current Opinion in Structural Biology

Showing different scenarios of combining experimental data (green) with predicted ensembles (orange). Two different interpretations of the distribution of coordinates (structural ensemble) are considered. (a) Using Bayes formalism to combine prior knowledge (predicted ensemble) with the probability distribution of an experimental observable (likelihood to observe measured data for a given set of coordinates), which also encodes the error of the measurement. The posterior is obtained by multiplying the prior and the likelihood. Note that the width of the posterior distribution is smaller since the uncertainty is decreased by the measurement. (b) Schematic plot of a predicted ensemble that is minimally biased by experimental measurement of the ensemble average (green dashed line). Here the biased ensemble (blue) has the same average value as the measured ensemble average, but is otherwise as similar as possible to the predicted ensemble. (c) Predicted and measured ensembles both yield a distribution of coordinates. The best estimate for the distribution is the (possibly weighted) average of both distributions (blue). No errors are considered. Note that the distribution does not necessarily become narrower. In the extreme case, if prediction and experiment both yield the same correct distribution, combining this knowledge should yield that exact same distribution. (d) Same as in (c) but here errors of measured and predicted model ensembles are represented by an ensemble of coordinate distributions. Note that while the error of the combined distribution (blue) becomes smaller, its width does not. www.sciencedirect.com

Current Opinion in Structural Biology 2015, 31:20–27

24 Theory and simulation

Several ensemble refinement methods have been developed for NMR structure determination [69–76] and also for X-ray crystallography [77–79,61,80,81]. While the modeling of ensembles yields a picture of the structural heterogeneity, the combination of NMR with X-ray diffraction data adds information about the timescale of atomic motions and shows that the same picosecond motions observed in solution also occur in the crystalline state at room temperature [82].

Maximum entropy and Bayes Oftentimes it is easier to measure ensemble averages than the probability distribution of conformational states. For example, the 3D reconstruction from single-particle cryoEM images is an ensemble average, while information about the conformational distribution is hidden in the collection of very noisy particle images. The determination of the structural distribution is usually an underdetermined problem and there exist therefore many different ensembles that lead to the same ensemble average. To help defining the ensemble, an estimate of the conformational distribution can be obtained from force field based molecular simulations. Recently, Pitera and Chodera presented an approach to bias molecular simulations by experimental data according to the maximum entropy principle, such that the simulated ensemble is minimally biased by the experimental observations [83]. Later it was proven that maximum entropy ensembles are obtained by restrained-ensemble simulations [84–86]. The recently proposed method of experiment directed simulation (EDS) yields such a biasing potential more efficiently [87]. Olsson et al. showed how an expectation maximization algorithm yields a minimally biased native state ensemble with NMR data [88]. Errors of experimental data are best included using Bayes formalism, as this allows for using exact error and noise models. The application of Bayes formalism to NMR structure determination has been described in the influential work by Rieping et al. [89,90]. Recently, the formalism has been applied to modeling structures with single particle cryo-EM images [91] and X-ray singleparticle diffraction images [92].

Comparing models with raw data For the refinement either a single model or an ensemble of models needs to be compared with experimental data. For this the experimental data are typically processed to some extent, e.g. in the form of peak lists of resonances and reflections for NMR and diffraction data sets, respectively, or three-dimensional density reconstructions for cryo-EM data. But to ensure that all information provided by the experiment can be fully exploited, it should in Current Opinion in Structural Biology 2015, 31:20–27

general be best to compare the model with the raw experimental data. However, oftentimes the raw data sets are huge and difficult to handle. For example, in cryo-EM the number of single-particle images is typically on the order of 10 000 to several 100 000, such that refinement of a model against the individual particle images is computationally expensive. Using class-averages instead reduces the number of images to work with and allows for building the structure of macromolecular assemblies. Velazquez et al. [93] score candidate models of the assembly by the similarity of model projections to class-averages. A Monte-Carlo sampling of this scoring function then yields assembly structures. Also the refinement of models against classaverages is possible [94]. Class-averages, however, have the limitation that any variance is lost and sample heterogeneity is averaged out. Cossio and Hummer [91] showed that correct model conformations can be detected by comparing model projections against the raw particle images; even the correct model ensemble can be determined from a set of images containing a mixture of difference conformations. Also in X-ray diffraction images there is more information than just the Bragg reflection peaks. Diffuse scattering outside the Bragg reflection reports on correlations between the motions of atoms, which could provide valuable restraints on protein motion [95].

Discussion Model building starts to become a bottleneck in cryo-EM, in particular since typical molecular systems studied by cryo-EM are large and model building is time consuming. Building atomic models at 4.5 A˚ or worse is still a considerable challenge and often cannot be done reliably let alone in an automated way. Ensembles predicted from molecular simulations provide important additional information. However, combining experimental data with predicted ensembles requires accounting for uncertainties not only of the data but also of the prediction arising from errors in the force fields as well as insufficient conformational sampling. How the error of the force field parameters (e.g. partial charges of atoms) can be determined and how these errors propagate to the error of the predicted ensemble, in particular the error on ensemble averages or populations of conformational states is still an open question. But it needs to be resolved to avoid an overly optimistic influence of the predicted information on hybrid modeling. The error of the resulting hybrid model will depend to different extent on errors of the experimental data, errors in the force field and insufficient sampling. However, a general quality measure of hybrid models, depending on www.sciencedirect.com

Hybrid methods for macromolecular structures Schro¨der 25

how much information (both predicted and experimental) was used to build it, is still missing. For this the information content of any experimental data set [96] needs to be quantified. Such a measure is necessary to compare the quality of hybrid models obtained with different types and combinations of experimental data and predictions.

13. McGreevy R, Singharoy A, Li Q, Zhang J, Xu D, Perozo E, Schulten K: xMDFF: molecular dynamics flexible fitting of lowresolution X-ray structures. Acta Crystallogr D: Biol Crystallogr 2014, 70:2344-2355.

Conflict of interest statement

15. DiMaio F, Terwilliger TC, Read RJ, Wlodawer A, Oberdorfer G, Wagner U, Valkov E, Alon A, Fass D, Axelrod HL et al.: Improved molecular replacement by density- and energy-guided protein structure optimization. Nature 2011, 473:540-543.

Nothing declared.

References and recommended reading Papers of particular interest, published within the period of review, have been highlighted as:  of special interest  of outstanding interest 1.

Jack A, Levitt M: Refinement of large structures by simultaneous minimization of energy and R factor. Acta Crystallogr A 1978, 34:931-935.

Russel D, Lasker K, Webb B, Vela´zquez-Muriel J, Tjioe E, Schneidman-Duhovny D, Peterson B, Sali A: Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol 2012, 10:e1001244. The Integrative Modeling Platform (IMP) is presented. The paper further envisions to publish integrative models together with data and applications that produces theses models, such that researchers can easily test hypotheses, plan experiments and predict the effect of additional information on the model.

2. 

3.

Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, Suprapto A, Karni-Schmidt O, Williams R, Chait BT et al.: Determining the architectures of macromolecular assemblies. Nature 2007, 450:683-694.

4.

Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, Suprapto A, Karni-Schmidt O, Williams R, Chait BT et al.: The molecular architecture of the nuclear pore complex. Nature 2007, 450:695-701.

5.

Schro¨der GF, Brunger AT, Levitt M: Combining efficient conformational sampling with a deformable elastic network model facilitates structure refinement at low resolution. Structure 2007, 15:1630-1641.

6.

Schro¨der GF, Levitt M, Brunger AT: Super-resolution biomolecular crystallography with low-resolution data. Nature 2010, 464:1218-1222.

7.

Murshudov GN, Skuba´k P, Lebedev AA, Pannu NS, Steiner RA, Nicholls RA, Winn MD, Long F, Vagin AA: REFMAC 5 for the refinement of macromolecular crystal structures. Acta Crystallogr D 2011, 67:355-367.

8.

Nicholls RA, Long F, Murshudov GN: Low-resolution refinement tools in REFMAC5. Acta Crystallogr D 2012, 68:404-417.

9.

Headd JJ, Echols N, Afonine PV, Grosse-Kunstleve RW, Chen VB, Moriarty NW, Richardson DC, Richardson JS, Adams PD: Use of knowledge-based restraints in phenix.refine to improve macromolecular refinement at low resolution. Acta Crystallogr D 2012, 68:381-390.

10. Fenn TD, Schnieders MJ, Mustyakimov M, Wu C, Langan P, Pande VS, Brunger AT: Reintroducing electrostatics into macromolecular crystallographic refinement: application to neutron crystallography and DNA hydration. Structure 2011, 19:523-533.

14. Afonine PV, Grosse-Kunstleve RW, Echols N, Headd JJ, Moriarty NW, Mustyakimov M, Terwilliger TC, Urzhumtsev A, Zwart PH, Adams PD: Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr D 2012, 68:352-367.

16. DiMaio F, Echols N, Headd JJ, Terwilliger TC, Adams PD, Baker D: Improved low-resolution crystallographic refinement with Phenix and Rosetta. Nat Methods 2013, 10:1102-1104. 17. Suhre K, Navaza J, Sanejouand Y-H: NORMA: a tool for flexible fitting of high-resolution protein structures into. Acta Crystallogr D 2006, D62:1098-1100. 18. Tama F, Miyashita O, Brooks CL: Flexible multi-scale fitting of atomic structures into low-resolution electron density maps with elastic network normal mode analysis. J Mol Biol 2004, 337:985-999. 19. Wang Z, Schro¨der GF: Real-space refinement with DireX: from global fitting to side-chain improvements. Biopolymers 2012, 97:687-697. 20. Ratje AH, Loerke J, Mikolajka A, Brunner M, Hildebrand PW, Starosta AL, Donhofer A, Connell SR, Fucini P, Mielke T et al.: Head swivel on the ribosome facilitates translocation by means of intra-subunit tRNA hybrid sites. Nature 2010, 468:713-716. 21. Trabuco LG, Villa E, Mitra K, Frank J, Schulten K: Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics. Structure 2008, 16:673-683. 22. Orzechowski M, Tama F: Flexible fitting of high-resolution X-ray structures into cryoelectron microscopy maps using biased molecular dynamics simulations. Biophys J 2008, 95:56925705. 23. Zhao G, Perilla JR, Yufenyuy EL, Meng X, Chen B, Ning J, Ahn J, Gronenborn AM, Schulten K, Aiken C et al.: Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature 2013, 497:643-646. 24. Esquivel-Rodrı´guez J, Kihara D: Computational methods for constructing protein structure models from 3D electron microscopy maps. J Struct Biol 2013, 184:93-102. 25. Villa E, Lasker K: Finding the right fit: chiseling structures out of cryo-electron microscopy maps. Curr Opin Struct Biol 2014, 25:118-125. 26. Lindert S, Stewart PL, Meiler J: Hybrid approaches: applying computational methods in cryo-electron microscopy. Curr Opin Struct Biol 2009, 19:218-225. 27. Lander GC, Saibil HR, Nogales E: Go hybrid: EM, crystallography, and beyond. Curr Opin Struct Biol 2012, 22:627635. 28. Zhao M, Wu S, Zhou Q, Vivona S, Cipriano DJ, Cheng Y, Brunger AT: Mechanistic insights into the recycling machine of the SNARE complex. Nature 2015, 518:61-67. 29. Bru¨nger AT: Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature 1992, 355:472-475.

11. Schnieders MJ, Fenn TD, Pande VS, Brunger AT: Polarizable atomic multipole X-ray refinement: application to peptide crystals. Acta Crystallogr D 2009, 65:952-965.

30. Kleywegt GJ: Separating model optimization and model validation in statistical cross-validation as applied to crystallography. Acta Crystallogr D 2007, 63:939-940.

12. Fenn TD, Schnieders MJ: Polarizable atomic multipole X-ray refinement: weighting schemes for macromolecular diffraction. Acta Crystallogr D 2011, 67:957-965.

31. Fabiola F, Korostelev A, Chapman MS: Bias in cross-validated free R factors: mitigation of the effects of non-crystallographic symmetry. Acta Crystallogr D 2006, D62:227-238.

www.sciencedirect.com

Current Opinion in Structural Biology 2015, 31:20–27

26 Theory and simulation

32. Chen Z, Blanc E, Chapman MS: Improved free R factors for cross-validation of macromolecular structure: importance for real-space refinement. Acta Crystallogr D 1999, 55:219-224.

52. Liao M, Cao E, Julius D, Cheng Y: Structure of the TRPV1 ion channel determined by electron cryo-microscopy. Nature 2013, 504:107-112.

33. DiMaio F, Zhang J, Chiu W, Baker D: Cryo-EM model validation using independent map reconstructions. Protein Sci 2013, 22:865-868.

53. Wang Z, Hryc CF, Bammes B, Afonine PV, Jakana J, Chen D-H, Liu X, Baker ML, Kao C, Ludtke SJ et al.: An atomic model of brome mosaic virus using direct electron detection and realspace optimization. Nat Commun 2014, 5:4808.

34. Falkner B, Schro¨der GF: Cross-validation in cryo-EM-based structural modeling. Proc Natl Acad Sci U S A 2013, 110:89308935. 35. Grigorieff N, Ceska TA, Downing KH, Baldwin JM, Henderson R: Electron-crystallographic refinement of the structure of bacteriorhodopsin. J Mol Biol 1996, 259:393-421. 36. Kawabata T: Multiple subunit fitting into a low-resolution density map of a macromolecular complex using a Gaussian mixture model. Biophys J 2008:95. 37. Rossmann MG: Fitting atomic models into electronmicroscopy maps. Acta Crystallogr D 2000, D65:1341-1349. 38. Esquivel-Rodriguez J, Kihara D: Fitting multimeric protein complexes into electron microscopy maps using 3D zernike descriptors. J Phys Chem B 2012, 116:6854-6861. 39. Wriggers W, Milligan RA, Mccammon JA: Situs: a package for docking crystal structures into low-resolution maps from electron microscopy. J Struct Biol 1999, 125:185-195. 40. Woetzel N, Lindert S, Stewart PL, Meiler J: BCL::EM-Fit: rigid body fitting of atomic structures into density maps using geometric hashing and real space refinement. J Struct Biol 2011, 175:264-276. 41. Lasker K, Topf M, Sali A, Wolfson HJ: Inferential optimization for simultaneous fitting of multiple components into a CryoEM map of their assembly. J Mol Biol 2009, 388:180-194. 42. Tjioe E, Lasker K, Webb B, Wolfson HJ, Sali A: MultiFit: a web server for fitting multiple protein structures into their electron microscopy density map. Nucleic Acids Res 2011, 39:167-170. 43. Dror O, Lasker K, Nussinov R, Wolfson HJ: EMatch: an efficient method for aligning atomic resolution subunits into intermediate-resolution cryo-EM maps of large macromolecular assemblies. Acta Crystallogr D 2007, 63:42-49. 44. Volkmann N: Confidence intervals for fitting of atomic models into low-resolution densities. Acta Crystallogr D 2009, 65:679689. 45. Lindert S, Alexander N, Wo¨tzel N, Karakas¸ M, Stewart PL, Meiler J: EM-fold: de novo atomic-detail protein structure determination from medium-resolution density maps. Structure 2012, 20:464-478. 46. Baker ML, Baker MR, Hryc CF, Ju T, Chiu W: Gorgon and pathwalking: macromolecular modeling tools for subnanometer resolution density maps. Biopolymers 2012, 97:655-668. 47. Baker ML, Ju T, Chiu W: Identification of secondary structure elements in intermediate-resolution density maps. Structure 2007, 15:7-19.

54. Baker ML, Abeysinghe SS, Schuh S, Coleman RA, Abrams A, Marsh MP, Hryc CF, Ruths T, Chiu W, Ju T: Modeling protein structure at near atomic resolutions with Gorgon. J Struct Biol 2011, 174:360-373. 55. Baker MR, Rees I, Ludtke SJ, Chiu W, Baker ML: Constructing and validating initial Ca models from subnanometer resolution density maps with pathwalking. Structure 2012, 20:450-463. 56. DiMaio F, Shavlik J, Phillips GN: A probabilistic approach to protein backbone tracing in electron density maps. Bioinformatics 2006, 22:e81-e89. 57. Soni A, Shavlik J: Probabilistic ensembles for improved inference in protein-structure determination. J Bioinform Comput Biol 2012, 10:1240009. 58. Schneidman-Duhovny D, Pellarin R, Sali A: Uncertainty in integrative structural modeling. Curr Opin Struct Biol 2014, 28C:96-104. 59. Kuzmanic A, Pannu NS, Zagrovic B: X-ray refinement significantly underestimates the level of microscopic heterogeneity in biomolecular crystals. Nat Commun 2014, 5:3220. 60. Janowski PA, Cerutti DS, Holton J, Case DA: Peptide crystal simulations reveal hidden dynamics. J Am Chem Soc 2013, 135:7938-7948. 61. Forneris F, Burnley BT, Gros P: Ensemble refinement shows conformational flexibility in crystal structures of human complement factor D. Acta Crystallogr D 2014, 70:733-743. 62. Burnley BT, Afonine PV, Adams PD, Gros P: Modelling dynamics in protein crystal structures by ensemble refinement. eLife  2012, 1:e00311. An ensemble refinement method against X-ray diffraction data is presented that represents structural dynamics by an ensemble of models. Local motions are sampled by molecular dynamics simulations and large scale motions are modeled by TLS refinement. Restraints are calculated using time-averages of the dynamics simulation. 63. Furnham N, Blundell TL, Depristo MA, Terwilliger TC: Is one solution good enough? Nat Struct Mol Biol 2006, 13:184-185. 64. Altman RB, Jardetzky O: New strategies for the determination of macromolecular structure in solution. J Biochem 1986, 100:1403-1423. 65. Fraser JS, van den Bedem H, Samelson AJ, Lang PT, Holton JM, Echols N, Alber T: Accessing protein conformational ensembles using room-temperature X-ray crystallography. Proc Natl Acad Sci U S A 2011, 108:16247-16252.

48. Bammes BE, Rochat RH, Jakana J, Chen D-H, Chiu W: Direct electron detection yields cryo-EM reconstructions at resolutions beyond 3/4 Nyquist frequency. J Struct Biol 2012, 177:589-601.

66. Lang PT, Holton JM, Fraser JS, Alber T: Protein structural ensembles are revealed by redefining X-ray electron density  noise. Proc Natl Acad Sci U S A 2014, 111:237-242. Carefully estimating the noise levels at each point in electron density maps reveals structural details that are typically not expected. The proposed approach shows structural heterogeneity such as alternate conformations and density for weakly binding ligands.

49. Lu A, Magupalli VG, Ruan J, Yin Q, Atianand MK, Vos MR, Schro¨der GF, Fitzgerald KA, Wu H, Egelman EH: Unified polymerization mechanism for the assembly of ASCdependent inflammasomes. Cell 2014, 156:1193-1206.

67. van den Bedem H, Bhabha G, Yang K, Wright PE, Fraser JS: Automated identification of functional dynamic contact networks from X-ray crystallography. Nat Methods 2013, 10:896-902.

50. Bai X-C, Fernandez IS, McMullan G, Scheres SH: Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles. eLife 2013, 2:e00461.

68. Berlin K, Castan˜eda CA, Schneidman-Duhovny D, Sali A, NavaTudela A, Fushman D: Recovering a representative  conformational ensemble from underdetermined macromolecular structural data. J Am Chem Soc 2013, 135:16595-16609. The Sparse Ensemble Selection (SES) method yields a small ensemble of models that describes the data without overfitting. It uses sparsity as a regularizing restraint and yields automatically the optimal ensemble size.

51. Li X, Mooney P, Zheng S, Booth CR, Braunfeld MB, Gubbens S, Agard DA, Cheng Y: Electron counting and beam-induced motion correction enable near-atomic-resolution singleparticle cryo-EM. Nat Methods 2013, 10:584-590. Current Opinion in Structural Biology 2015, 31:20–27

www.sciencedirect.com

Hybrid methods for macromolecular structures Schro¨der 27

69. Fennen J, Torda A, van Gunsteren WF: Structure refinement with molecular dynamics and a Boltzmann-weighted ensemble. J Biomol NMR 1995, 6:163-170.

84. Roux B, Weare J: On the statistical equivalence of restrainedensemble simulations with the maximum entropy method. J Chem Phys 2013, 138:084107.

70. Bu¨rgi R, Pitera JW, van Gunsteren WF: Assessing the effect of conformational averaging on the measured values of observables. J Biomol NMR 2001, 19:305-320.

85. Cavalli A, Camilloni C, Vendruscolo M: Molecular dynamics simulations with replica-averaged structural restraints generate structural ensembles according to the maximum entropy principle. J Chem Phys 2013, 138:094112.

71. Kim Y, Prestegard JH: A dynamic model for the structure of acyl carrier protein in solution. Biochemistry 1989, 28:8792-8797. 72. Lindorff-Larsen K, Best RB, Depristo MA, Dobson CM, Vendruscolo M: Simultaneous determination of protein structure and dynamics. Nature 2005:433. 73. Lindorff-Larsen K, Best RB, Vendruscolo M: Interpreting dynamically-averaged scalar couplings in proteins. J Biomol NMR 2005, 32:273-280. 74. Richter B, Gsponer J, Va´rnai P, Salvatella X, Vendruscolo M: The MUMO (minimal under-restraining minimal over-restraining) method for the determination of native state ensembles of proteins. J Biomol NMR 2007, 37:117-135. 75. Lange OF, Lakomek N-A, Fare`s C, Schro¨der GF, Walter KFA, Becker S, Meiler J, Grubmu¨ller H, Griesinger C, Groot BLD: Recognition dynamics up to microseconds ensemble in solution. Science 2008, 320:1471-1475. 76. Linge JP, Williams MA, Spronk CAEM, Bonvin AMJJ, Nilges M: Refinement of protein structures in explicit solvent. Proteins 2003, 50:496-506. 77. Kuriyan J, Osapay K, Burley SK, Brunger AT, Hendrickson WA, Karplus M: Exploration of disorder in protein structures by Xray restrained molecular dynamics. Proteins 1991, 10:340-358. 78. Levin EJ, Kondrashov DA, Wesenberg GE, Phillips GN: Ensemble refinement of protein crystal structures: validation and application. Structure 2007, 15:1040-1052. 79. Burnley BT, Afonine PV, Gros P: Modelling dynamics in protein crystal structures by ensemble refinement. eLife 2012:e00311. 80. Kohn JE, Afonine PV, Ruscio JZ, Adams PD, Head-Gordon T: Evidence of functional protein dynamics from X-ray crystallographic ensembles. PLoS Comp Biol 2010, 6:1-5. 81. Burling FT, Brunger AT: Thermal motion and conformational disorder in protein crystal structures: comparison of multiconformer and time-averaging models. Israel J Chem 1994, 34:165-175. 82. Fenwick RB, van den Bedem H, Fraser JS, Wright PE: Integrated description of protein dynamics from room-temperature X-ray crystallography and NMR. Proc Natl Acad Sci U S A 2014, 111:E445-E454. 83. Pitera JW, Chodera JD: On the use of experimental  observations to bias simulated ensembles. J Chem Theor Comp 2012, 8:3445-3451. The presented approach combines experimental data with an ensemble predicted by simulation. It is shown how a minimally biased ensemble is obtained that is in agreement with measured ensemble averages but is otherwise as similar as possible to the predicted ensemble.

www.sciencedirect.com

86. Boomsma W, Ferkinghoff-Borg J, Lindorff-Larsen K: Combining experiments and simulations using the maximum entropy principle. PLoS Comp Biol 2014, 10:e1003406. 87. White AD, Voth GA: Efficient and minimal method to bias molecular simulations with experimental data. J Chem Theor Comp 2014, 10:3023-3030. 88. Olsson S, Vo¨geli BR, Cavalli A, Boomsma W, Ferkinghoff-Borg J, Lindorff-Larsen K, Hamelryck T: Probabilistic determination of native state ensembles of proteins. J Chem Theor Comp 2014, 10:3484-3491. 89. Rieping W, Habeck M, Nilges M: Inferential structure determination. Science 2005, 309:303-306. 90. Nilges M, Bernard A, Bardiaux B, Malliavin T, Habeck M, Rieping W: Accurate NMR structures through minimization of an extended hybrid energy. Structure 2008, 16:1305-1312. 91. Cossio P, Hummer G: Bayesian analysis of individual electron microscopy images: towards structures of dynamic and  heterogeneous biomolecular assemblies. J Struct Biol 2013, 184:427-437. A method is presented to reveal structural heterogeneity by comparing models to raw single-particle cyo-EM images. A Bayesian framework yields model ensembles and identifies correct structures from a set of particle images containing a mixture of conformations. 92. Walczak M, Grubmu¨ller H: Bayesian orientation estimate and structure information from sparse single-molecule X-ray diffraction images. Phys Rev E 2014, 90:022714. 93. Vela´zquez-Muriel J, Lasker K, Russel D, Phillips J, Webb BM, Schneidman-Duhovny D, Sali A: Assembly of macromolecular complexes by satisfaction of spatial restraints from electron microscopy images. Proc Natl Acad Sci U S A 2012, 109:1882118826. 94. Zhang J, Minary P, Levitt M: Multiscale natural moves refine macromolecules using single-particle electron microscopy projection images. Proc Natl Acad Sci U S A 2012, 109:98459850. 95. Wall ME, Adams PD, Fraser JS, Sauter NK: Diffuse X-ray  scattering to model protein motions. Structure 2014, 22:182184. X-ray diffraction yields information not only about average structures but also about correlated motions between atoms, which is encoded in the diffuse scattering between the Bragg peaks. This information will become more important in studying conformational motions and allosteric networks by X-ray diffraction. 96. Berman M, Van Eerdewegh P: Information content of data with respect to models. Am J Physiol 1983, 245:R620-R623.

Current Opinion in Structural Biology 2015, 31:20–27

Hybrid methods for macromolecular structure determination: experiment with expectations.

Studies of large and heterogeneous macromolecules often yield low-resolution data that alone does not suffice to build accurate atomic models. Adding ...
756KB Sizes 0 Downloads 7 Views