Computer simulation of 2D-NMR (NOESY) spectra and polypeptide structure determination.

Prog.Biophys.molec.Biol.,Vol. 58, pp. 61-84, 1992. Printed in Great Britain. All rightsreserved.

0079-6107/92$15.00 © 1992PergamonPress Ltd

C O M P U T E R S I M U L A T I O N O F 2 D - N M R (NOESY) SPECTRA A N D POLYPEPTIDE STRUCTURE DETERMINATION ISTV,~N P . S U G A R * a n d Y U A N X U

Departments of Biomathematical Sciences, and Physiology and Biophysics, The Mount Sinai Medical Center, New York, N Y 10029, U.S.A.

Dedicated to Professor Imre Tarj/m on his 80th birthday

CONTENTS ABBREVIATIONS I. INTRODUCTION

61 62

II. SIMULATIONOF THE N O E S Y SPECTRUM

1. Semi-Empirical Theory of N O E S Y 2. Simulating Polypeptide Dynamics (a) Rigid molecule approximation (b) Fast motion limit (c) Slow motion limit (d) General polypeptide dynamics (e) Lysine side-chain dynamics 3. Simulation of the N O E S Y Spectrum of a Protein III. THE INVERSE PROBLEM. DETERMINATIONOF THE POLYPEPTIDE STRUCTUREAND DYNAMICS FROM THE N O E S Y SPECTRUM

1. Back Transformation in the Case of Small Molecules 2. Back Transformation and Structure Refinement in the Case of Macromolecules (a) Manual refinement (b) Iterative relaxation matrix approach (IRMA) (c) Determination of the relaxation matrix by means of least square minimization (d) NOE constrained molecular dynamics IV. AUTOMATIZED FITTING OF THE SIMULATEDN O E S Y SPECTRATO THE EXPERIMENTALONES

1. The Target Function 2. Application of the Method of Variable Target Function 3. Application to Ten Residue Long Fragments of BPTI V. CONCLUSIONS

62 62 64 64 64 65 66 66 67

69 70 71 71 72 72 73 73 73 74 75 79

VI. APPENDICES

1. Normalized Spherical Harmonics 2. A General Method for Finding Explicit Relationships Between Nonrigid Molecular Models and Spectral Densities ACKNOWLEDGEMENTS REFERENCES

79 79 80 82 83

ABBREVIATIONS B P T I - - B a s i c pancreatic trypsin inhibitor C O R M A - - - C o m p l e t e relaxation matrix analysis COSY---Correlated spectroscopy 2 D - N M R - - T w o dimensional nuclear magnetic resonance E C E P P - - E m p i r i c a l conformation energy program for peptides I R M A - - I t e r a t i v e relaxation matrix approach N O E - - N u c l e a r Overhauser enhancement N O E S Y - - N u c l e a r Overhauser enhancement spectroscopy

* Correspondence to be addressed to I. P. Sugfir. 61

62

I.P. SUGARand Y. Xu I. I N T R O D U C T I O N

Viewing the two dimensional-NMR spectrum of a macromolecule, we are at once both amazed and alarmed by the flood of structural and dynamic information. The experience is similar to looking at the X-ray diffraction pattern of a crystallized macrornolecule. In both cases, the amount of information and the complexity of coding requires computer-assisted analysis of the spectra. While X-ray data analysis has by now been essentially computerized, in the relatively new field of 2D-NMR the methods of automatized spectrum analysis are still in the developing phase (Madi and Ernst, 1988). There are two basically different approaches to the analysis of the NOESY spectra of macromolecules: the geometric and the physical. In the case of the geometric approach proton-proton distance constraints are determined from the NOESY spectra by means of rigid molecule and two-proton approximations (but these approximations result in systematic errors in the distances obtained (Borgias and James, 1988)). Then, the methods of distance geometry or variable target function make it possible to determine macromolecular conformations which optimally satisfy the geometric constraints. These methods have proven to give consistent global macromolecular structures; however, structural features defined by short-range distances, such as bent helices, twisted beta sheets, or variation of helical parameters along a double helix of nucleic acids, cannot be described from distances that contain systematic errors (Koehl and Lefevre, 1990). The NOESY spectrum analysis based on geometric constraints was reviewed extensively by Havel (1991), Crippen and Havel (1988), and W/ithrich (1986). In this paper we focus on an approach to NOESY spectrum analysis based on the physical principles of spectrum formation. There are three main parts to the paper. First, we review the semi-empirical theory of NOESY spectroscopy with emphasis on modelling intramolecular dynamics. This part is detailed enough to provide instruction on how to perform NOESY spectrum simulations. Second, we deal with the inverse problem of spectrum simulation, i.e. determination of macromolecular structure and dynamics by analyzing the respective NOESY spectra. Methods for determination of the relaxation matrix from complete and incomplete integrated peak intensity matrices are reviewed in this part. The third part of the paper introduces a method of NOESY spectrum analysis developed in our laboratory. This is a method for automatized fitting of simulated NOESY spectra to experimental ones. The performance of the method is demonstrated on 10 residue long fragments of the BPTI molecule.

II. SIMULATION OF THE NOESY SPECTRUM Among numerous types of 2D-NMR experiments NOESY is the most important one in the determination of macromolecular structures and dynamics. The NOESY experiment centers around a three-pulse sequence (Jeener et al., 1979): (delay time-90°-tl-90°-Zm-90°-t2)n. Each sequence is repeated with a constant delay- and mixing time zm, and progressively increasing evolution time t l. Free induction decay is detected after each sequence. Double Fourier transformation of the measured free induction decays with respect to the evolution time t~ and detection time t 2 results in the NOESY spectrum. The spectrum is shown in the (~o~, ~2) two-dimensional frequency domain.

1. Semi-Empirical Theory of N O E S Y The NOESY spectra of macromolecules cannot only be measured, but also calculated. Experimental NOESY spectra can be simulated by knowing the macromolecular structure, dynamics and proton chemical shifts. The simulation is based on the semi-empirical theory developed by Macura and Ernst (1980). According to this theory, the NOESY spectrum has the following function:

Simulation of 2D-NMR (NOESY) spectra

63

1

1

1

T2k

S(091, Zm, 0 9 2 ) = 4 E

T2z

1 X [e-a"]kl X

~,z (09 _09~)2 _¢ T~

nl M 0 1 U---p

(1)

(09~-°O~ + T~

where S(ogt, L,, °92) is the intensity of the NOESY spectrum at 09~, 092 angular frequencies; M o is the equilibrium magnetization of the system of Np protons; nt is the number of spins in the lth set of magnetically equivalent protons, and 09kis the Larmor frequency of the kth set of magnetically equivalent protons. The summation is taken over each pair of magnetically equivalent sets of spins. The relaxation matrix R, and the transverse spin-spin relaxation time T2k of the kth set of magnetically equivalent spins are defined in eqns (2)-(5). The diagonal elements of the relaxation matrix Pu are:* Ns

p , , = 2 ( n , - 1)(W~'+ wi~i)dl-

E nJ(WgJ+2W~j+ W~j)

(2)

j=l

j¢i

where N~ is the number of sets of magnetically equivalent protons in the polypeptide, i.e. Np = Z,j=~N"nj. The off-diagonal elements of the relaxation matrix o~j are: (3)

ff ij = ni( Wi2j - W~).

In eqns (2) and (3) Wo, W1 and W2 are the zero, single and double quantum jump transition probabilities, respectively, between the eigenstates of the longitudinal components of the spin operators. The transition probabilities are given in Table 1. These expressions for the relaxation matrix elements are approximations since they neglect cross-correlation terms between separate pairwise and higher-order interactions (Solomon, 1955; Werbelow and Grant, 1977; Bull, 1987). TABLE 1. TRANSITIONPROBABILITIESIN THE FUNCTION OF THE SPECTRAL DENSITIES

Transition probabilitiesbetweenthe eigenstates of the longitudinalcomponentsof the spin operators Zero quantumjump W~j = QJ°°(oJ k - ~oj) Single quantumjump W~x~_- ~aQj~joo(o~) Double quantumjump W~j = 6QJ°~(co~ + o~) Transition probabilitiesbetweenthe eigenstates of the transverse componentsof the spin operators Zero and double quantum jump

Single quantumjump

Uokj_ - U kj 2 = Q{J~joo(0)+ ~~J~joo (m~--mj)} +Q{~ [J°~(cok)+J°~(o~j)+J~(mk+o~i)']} DO(°gk--mfl+Z 3 [J~o Do(Ogk)+JkJ - Do(mk+mJ)]} U kj ' k -_Q { ~ ~JkJ

The definition of the transverse spin-spin relaxation time spins is (Solomon, 1955; Abragam, 1961):

T2kof the kth set of equivalent

1

- - = 2(nk- 1)(U~k+ U~k)

T~

Ns

+ ~, nj(Uokj +2Uxk kj + U2kJ).

(4)

j=l

In Table 1 Q = (2rr/5)74h 2 where 7 is the proton gyromagnetic ratio and h is the Planck constant over 2~. The J~k]"' (09) spectral densities are Fourier cosine transforms of correlation functions Cmm'(t) (Tropp, 1980): * Here, instead of using the logical notation R , and Rij for the matrix elements of R, we used Solomon's (1955)

original notation.

I.P. SUG,~Rand Y. Xu

64

J~"' (co) = nkn~

C""' (t)cos(cot) dt

=nkn~;f /k r~.(*'a~(o))r~.,(*'"~(t))\ ~ -/coskcot/dt , . ,

(5)

where the nknj factor is the number of possible proton-proton interactions between the kth and jth sets of protons. When k =j, the factor can be given by nk(n k - 1)/2. r(t) is the distance between the interacting kth and jth set of protons at time t. Usually, a set of magnetically equivalent protons consists of one proton only (ni = 1), and the distance between two sets of protons is equal to the distance between the respective two protons. In the case of polypeptides, protons of each methyl group form a set of magnetically equivalent protons (n~= 3). When nk > 1 and/or nj > 1 and k ¢:j the distance between the kth andjth set of protons is defined by the distance between any pair that consists of one proton from each set. The distance between a methyl proton and any other proton may serve as an example. Note that this distance usually varies in time because of the intramolecular rotation of the methyl group. If k =j, the distance is defined as the distance between any pair of protons within the same set (e.g. the constant distance of two protons of a methyl group). In eqn (5) q~tab(t) represents the polar angle 0 and q5 of the interproton vector in the laboratory frame of coordinates whose Z axis coincides with the applied constant magnetic field; the Y2,, normalized spherical harmonics are defined in the first Appendix, and the angular brackets signify an ensemble average over all configurations of the proton pair. 2. Simulating Polypeptide Dynamics

Equation (5) is the general definition of the spectral density of two protons. In order to calculate spectral density one has to know the structure and dynamics of the polypeptide. There are two main components of polypeptide dynamics determining proton dynamics: the overall tumbling of the molecule in the solution, and intramolecular motions. In general, motions of a polypeptide can be simulated by molecular dynamics (in a picosecond timescale), or by the Monte Carlo method. (a) Rigid molecule approximation The dynamics of an oligonucleotide have been studied by Monte Carlo simulations (Genest, 1989). The model calculations show that small amplitude intramolecular rotational motions (< 10°) have a negligible effect on spectral density. Witterbort and Szabo (1978) have studied the restricted intramolecular rotations of a lysine sidechain. It was found that rotations smaller than 30° are ineffective in causing relaxation. Thus, in the case of small amplitude intramolecular motions, only the overall tumbling of the molecule contributes to the spectral density. When the overall motion of the molecule is isotropic, the spectral density function of any pair of protons of the molecule can be given in the following simple form (Woessner, 1962): 1

'~M

]

,6,

where r u is the correlation time of the reorientation of the tumbling axis and rij is the interproton distance. In the case of globular proteins, the intramolecular motions are usually strongly restricted in the tightly packed core and eqn (6) is applicable. (b) Fast motion limit In the core region of globular proteins, usually the methyl groups are an exception: the condition of small amplitude rotation is not applicable for the fast-rotating methyl protons. The spectral density of a pair of protons in a methyl group has been determined by Woessner (1962):

Simulation of 2D-NMR (NOESY)spectra

oo

J~i (co) = ~

,[

0.25(3 cos 2 A - 1)2 -1 + -co2z2 + 0"75(sin2 2A + sin 4 A) 1 + co2~2

65

(7)

where A is the angle between the rotation axis of the methyl group and the interproton vector; r is the fixed distance between two methyl protons, and z - l = z ~ l + z 7 1 where z, is the correlation time for the reorientation of the interproton vector in the molecular frame. The spectral density of a pair of protons, where one proton is a rotating methyl proton (i) and the other is not part of the methyl group (j), was determined by Tropp (1980). The result of the 'three site hindered rotation model' is: J~"'(co) = ( - 1)" di._., 5

a F_,rMf(m) zg(m) 1 m=E--2 L 1 + c02z2 + 1 + c02z2.j

(8)

where f(m)=],=~ Y2m(~°~(/)) 2

#(m) =-5 ,=1

~

-f(m)

(9)

(10)

where I represents one of the three states of the rotating methyl proton, a n d (I)~j°1 (jr) gives the polar angle ~ and 0 of the i - j t h interproton vector in the molecular frame. Because the tumbling motion of the globular protein is much slower than the rotation of the methyl proton (Zu~>z~), the spectral density function can be approximated by:

joo(,o) 1 . 2-21 ZMf(m)

(11)

(c) Slow motion limit At the other extreme, there are internal motions which are much slower than the tumbling motion of the protein. Protein may change from one to another metastable conformation as a result of thermal fluctuations. If these metastable conformations are separated by a relatively high free energy barrier (AF> 10kT), then the conformation changes are slow relative to the tumbling motion. Landy and Rao (1989) have pointed out that in the case of slow internal motions, the spectral density functions of the proton pairs can be constructed by averaging over all the possible N protein conformations: N

J~"'(co)= ~ P~J[~Til(co)

(12)

I=1 nn' is the where P~ is the probability of the occurrence of the Ith protein conformation and Join spectral density of the i - j t h pair of protons in the Ith protein conformation. In thermal equilibrium P~ can be calculated by means of the Boltzmann distribution:

e- F(I)/kT Px = ~ = l e- e(j~/kT

(13)

where F(/) is the free energy of the protein in the lth protein conformation and J~o~'ncan be obtained by using eqns (6) or (11) with the polar coordinates of the interproton vector in the Ith protein conformation. The 180° flips of the aromatic rings of phenylalanine and tyrosine (Wiithrich and Wagner, 1978) are an example of the slow internal motions.

66

I.P. SUC;,~Rand Y. Xu

The effects of the above extremes of internal motion on the buildup curves of the NOESY peaks have been studied by Koning et al. (1990). (The buildup curve is the integrated peak intensity as a function of mixing time.) They emphasize that one can make serious errors in the interpretation of the NOESY spectrum by neglecting the effects of internal motion. For example, in the flipping of an aromatic ring, a proton-proton distance of 4-5 A has been obtained by using the rigid molecule model for protons that were actually 8.2 A apart. (d) General polypeptide dynamics The dynamics of a globular protein are most complex on the surface, where a sufficiently long side chain (e.g. lysine side chain) may rotate around its carbon-carbon bonds. The rates of these side-chain rotations are neither extremely slow nor fast relative to the overall tumbling motion of the protein (Richarz et al., 1980). Thus, calculating the spectral densities of the side-chain protons, the above-mentioned approximations of the slow and fast intramolecular motions are not applicable; however, two important characteristics (Witterbort and Szabo, 1978) make side-chain dynamics mathematically tractable. First, side-chain rotations are restricted because of the steric interactions both within the side chain and with the surface of the protein. Second, small amplitude rotations ( < 30°) are ineffective in causing relaxation. These characteristics make it possible to describe side chain dynamics by means of jump models involving a relatively small number of side-chain conformations. Tropp (1980) has derived expressions for spectral densities of A X systems using a general jump model of conformational dynamics, assuming the molecule as a whole undergoes rotational diffusion. In the second Appendix, we reformulated Tropp's equations, obtaining the spectral density of any pair of protons as an explicit function of the conformations and rate constants of the conformational changes:

2 jram'(f'O) --"

( - - 1 ) m + l n =E-2

s

p R I - d- e t (IT - s I, I X t s/ / Y2n(Ol mol )Y2n(OJ , mol )

I,J=IE I eL ~

J

r133rj

(14)

wherePI is the probability of the Ith conformation in thermodynamic equilibrium; Y2n and Y2n normalized spherical harmonics are defined by eqns (32)-(35); r~ is the distance between the considered pair of protons in the Ith conformation; sn = z~-1_ io9, where z, relaxation times are related to the rotational diffusion constants of the whole molecule (see eqn (41)); while d e t ( T - s~l)lj is the IJth cofactor of determinant d e t ( T - s~l) and T is a matrix of the rate constants of conformational changes, defined by eqn (36), and I is the identity matrix. It is important to know that eqn (14) is a general form of the spectral density functions. It transforms into eqn (11) and eqn (12) at the limits of fast and slow intramolecular motions, respectively (Sug/tr, 1992). Thus eqn (14) is applicable for the determination of the spectral density of any pair of protons in the globular protein. (e) Lysine side-chain dynamics

Witterbort and Szabo (1978) have described a general method to determine all possible sterically allowed conformations of the polypeptide side chains and the possible jump modes connecting these conformations. As an example, we consider a lysine residue with partially restricted side-chain dynamics. The lysine side chain may form trans (t) and gauche (g +, g - ) rotational isomers as a result of 0 ° and _ 120° rotations, respectively, around the Cr-C ~ and C~-C~ bonds. The rotation around the Ca-C r bond is assumed to be completely hindered by the nearest neighbor residues. The possible seven lysine conformations are: tt, tg +, tg -, g +t, g - t, g +g +, g - g -. At each conformation, the first letter refers to the rotational state of the Cr-C ~ bond, and the second letter to that of the C~-C' bond. Knowing the lysine side-chain conformations and the possible jump modes connecting these conformations (see the first seven conformations in Table 3 and the jump modes in Fig. 8 in Ref. Witterbort and Szabo, 1978) the following rate constant matrix T can be constructed:

67

Simulationof 2D-NMR (NOESY)spectra ¢" - ~ [ 1 ]

T=

\

bE1, 1]

bE1, 1]

bE2, 1]

bE2, 1]

0

0

bE1, -1]

-~E2]

bE1, O]

0

0

0

bE2, 1]

b[l, -1]

b[1,O]

-~[3]

0

0

b[2, 1]

0

b[2, - 1 ]

0

0

-~[43

b[2, O]

b[1, 1]

0

b[2, -1]

0

0

b[2, 0]

-~[5]

0

b[1, 1]

0

0

b[2,-1]

b[1,-1]

0

-~[6]

0

,

0

b[2, - 1 ]

0

0

b[1, -1]

0

-)"[7]

J

(15)

where ~ [ / J = ~ = 1Tis and/'12(= b[1, 1]) represents the rate constant of the transition from the second to the first conformation. Estimating the numerical values of the rate constants b[n b, Ang] one has to take into consideration the number of C-C and C-N bonds nb moving in a concerted way during the conformational change, and the change in the number of gauche isomers, Ang during the conformational change (Monnerie et al., 1969, and Dubois-Violette et al., 1969). If the principle of detailed balance is satisfied (Haken, 1983) then the rate constant of a conformational change can be calculated as follows:

b[n b, Ang] = v exp( - Ebnb/R T)exp( - AngEg/2R T)

(16)

where E bg 3500 cal/mol is the activation energy of the movement of a C-C or C-N bond during the conformational change, v is the number of trials per unit time for the conformational change, and Eg~ 500 cal/mol is the energy of a gauche isomer (Flory, 1969) relative to the trans conformation. Knowing the rate constant matrix T one can determine the matrix of dynamics D(n) defined by eqn (50). As an example we give a diagonal and an off-diagonal element of the D(n) matrix:

D(n)x 1= - o9- 2[z~- 1+ 4re-t3Eb +~g)/2Rr cosh(Eb/2R T)] D(n)l 2 =¢D-2ve -(2Eb-Eg)/2RT.

(17)

These matrix elements have been determined by means of the high angular frequency approximation (eqn (51)). The matrix of the probability distribution P (defined by eqn (49)) can be determined from the free energies of the conformations by assuming the Boltzmann distribution (eqn (13)). The diagonal elements of the P matrix are: P11 = (1 + 4e-r',/xr + 2e- 2 E g / R T )

-

1

P22= P33= P4,= P55 =e-E~/~rPll P66 = P77 =e-2Ey/RTpll.

(18)

The proton positions at every side-chain conformation can be given on a tetrahedral lattice (Witterbort and Szabo, 1978). Knowing the positions of the considered pair of protons in each conformation one can construct the structure vector S(n) (defined by eqn (48)). Finally, by substituting D(n), P matrices and the S(n) vector into eqn (47) one can get the spectral density function of any pair of protons in the lysine side chain. This is a complicated but precise way of simulating the side chain dynamics on the protein surface.

3. Simulation of the NOES Y Spectrum of a Protein Figure 1 shows the stack- and contour plots of the simulated NOESY spectrum of Basic Pancreatic Trypsin Inhibitor (BPTI), a small protein of 58 residues. The NOESY spectrum

68

I.P. SUGAR and Y. Xu

(a) 10

v

10

B

6

4

o

¢a~ (ppm)

(b) 10.0

•

,

. , ~

0

.

~'"

$

.;. * ~

• @

•

•

•

@

O0 - @

[

'DO"

~

° @

@

•

•

'~a,"

'~,..-

.0

-

o

..

-~,~.,~

"

. ~$' . , " $

•

'

~.

o~O.

~

..

•

~

5.0

r

..

o

i

.,=.

i

•

. '

0

~

•

•

•

- ~~:~.."~..,3F'.'.'.'.'.'.'~&.,~..IO ~/';':-'

v

" u

/B.

""•

,~:,/'~,.,~'

; •

:' 2,5

0

~' •

-

-

.

"

"

"

~ O*- n:e e~ ' °

+

-

$

0

I";

' . 5.0

~

o.

":'i

.

•

•

~O

°

"+'

'

0

.

v ,

•

0010

#

.

.

•

. 0

- - ~0

o •

o

.

.

$

*

".

•

@

o.

o=~a,

.

--"

•

0

'O

.@

qO, O

* ~ o

.~

..

~

•

.

..

i ~ 0

$

= " ' o , , Y ' ~

- - V.

*

•

-

0

. o . .

J ,

¢

•

O

i. -v

$

CASD-NMR 2: robust and accurate unsupervised analysis of raw NOESY spectra and protein structure determination with UNIO.

Simulation of NOESY spectra of DNA segments: a new scaling procedure for iterative comparison of calculated and experimental NOE intensities.

Robust and highly accurate automatic NOESY assignment and structure determination with Rosetta.

Prediction of RNA secondary structure, including pseudoknotting, by computer simulation.

Electron paramagnetic resonance in biochemistry. Computer simulation of spectra from frozen aqueous samples.

Computer-assisted interpretation of 1H-n.m.r. spectra in the analysis of the structure of oligosaccharides.

Computer simulation of immunochemical interactions.

Computer simulation of periacetabular osteotomy.

Computer simulation of protein folding.

Computer simulation of DNA supercoiling.

Comparison of experimental and calculated chiroptical spectra for chiral molecular structure determination.

When Computer Simulation Excels Experiment.

Computer-assisted tracking simulation (CATS).

Computer graphics simulation for teaching.

The simulation of aerial movement--IV. A computer simulation model.

Determination of protein secondary structure using factor analysis of infrared spectra.

Centre of resistance and centre of rotation of a tooth: experimental determination, computer simulation and the effect of tissue nonlinearity.

spectra for structure determination of proteins in solution.

Computer simulation of pathological vocal-cord vibration.

Computer simulation of firefly flash sequences.

Mass-weighted molecular dynamics simulation and conformational analysis of polypeptide.

Computer simulation of human tympanic membrane.

Educational computer simulation of malignant hyperthermia.

A computer simulation of cardiac emergencies.