The determination of the conformational properties of nucleic acids in solution from NMR data.

Biochimica et Biophysica Acta, 1049 (1990) 189-204

189

Elsevier BBAEXP 92073

The determination of the conformational properties of nucleic acids in solution from N M R data Andrew N. Lane National Institute for Medical Research, The Ridgeway, Mill Hill, London (U.K.)

(Received 10 November 1989)

Key words: DNA structure; NMR; Conformationalaveraging

A program, NUCFIT, has been written for simulating the effects of conformational averaging on nuclear Overhauser enhancement (NOE) intensities for the spin systems found in nucleic acids. Arbitrary structures can be generated, and the N O E time courses can be calculated for truncated one-dimensional NOEs, two-dimensional N O E and rotating frame NOE spectroscopy (NOESY and ROESY) experiments. Both isotropic and anisotropic molecular rotation can be treated, using Woessner's formalism (J. Chem. Phys. (1962) 37, 647-654). The effects of slow conformational averaging are simulated by taking population-weighted means of the conformations present. Rapid motions are allowed for by using order parameters which can be supplied by the user, or calculated for specific motional models using the formalism of Tropp (J. Chem. Phys. (1980) 72, 6035-6043). NOE time courses have been simulated for a wide variety of conformations and used to determine the quality of structure determinations using N M R data for nucleic acids. The program also allows grid-searching with least-squares fitting of structures to experimental data, including the effects of spin-diffusion, conformational averaging and rapid internal motions. The effects of variation of intra and internucleotide conformationai parameters on NOE intensities has been systematically explored. It is found that (i) the conformation of nucleotides is well determined by realistic NOE data sets, (ii) some of the helical parameters, particularly the base pair roll, are poorly determined even for extensive, noise-free data sets, (iii) conformational averaging of the sugars by pseudorotation has at most second-order influence on the determination of other parameters and (iv) averaging about the glycosidic torsion bond also has, in most cases, an insignificant effect on the determination of the conformation of nucleotides.

Introduction The rapid advance in N M R techniques in recent years has opened up the possibility of determining the structures of biologically important macromolecules in solution. The structures of several small proteins (i.e., M r less than about 8000) in solution have been reported [1-5], which in at least one case compares favourably with the independently determined structure in the crystal state [3,4,6]. At the same time, progress in nucleic acid chemistry has allowed the synthesis of large quantities of pure oligonucleotides of defined sequence, which have also been examined by NMR. Most of these have corresponded to biologically important sequences, such

Abbreviations: NOE, nuclear Overhauser enhancement; NOESY, two-dimensional nuclear Overhauser enhancement spectroscopy; ROESY, rotating frame Overhauser enhancement spectroscopy. Correspondence: A.N. Lane, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, U.K.

as operators, promoters and restriction endonuclease sites [7-13]. However, the starting points for the determination of the structures of proteins and nucleic acids are different. In the former, the emphasis has been on the determination of the folding of the polypeptide chain, and the details of the conformational properties, particularly of many of the amino acid side chains remain underdetermined. In contrast, in nucleic acids, it usual to know in advance the overall structure (e.g., double stranded, single stranded or hairpin), and inspection of the NOESY spectrum is usually sufficient to establish whether the fragment is predominantly A, B or Z like. The task of N M R is then to determine the local, sequence dependent variations of the conformation, and to characterise multiple conformations that are usually present [14-16]. This puts much greater demands both on quantitation of the N O E values and coupling constants, and of proper interpretation of N O E intensities [9]. An important limiting factor in the determination of distances from N O E intensities is the presence of spin

0167-4781/90/$03.50 © 1990 Elsevier Science Publishers B.V. (Biomedical Division)

190 diffusion [9,18,19]. Fortunately, it is possible to treat spin diffusion in a rigorous, self-consistent manner [9,20,21]. There is also an important structural difference between D N A and proteins. D N A is essentially a linear polymer, in which to a good approximation only dipolar interactions between (and within) sequential nearest neighbour nucleotides need to be considered, whereas proteins are three dimensional in the sense that more than sequential nearest neighbour residues must be considered. There are several approaches to interpreting N M R data for structure determination, though all use the information embedded in the intensity of the N O E and three bond coupling constants. Both distance geometry and restrained molecular dynamics make use of NOEs interpreted as ranges of distances, and coupling constants as ranges of torsion angles [1-5]. In current implementations, this requires the assumption of a single potential minimum in conformation space. An alternative method is to analyse the NOEs and coupling constants in greater detail by specifically accounting for spin-diffusion [9,18-21], either to provide more accurate distances for subsequent restrained dynamics optimisation [8], or to determine directly models that are compatible with the primary data using the least squares criterion [9,17]. Altona and coworkers [14,15] have shown by detailed analysis of coupling constants that the deoxyriboses in D N A fragments exist as an equilibrium mixture of at least two conformations (essentially C2' endo and C3' endo). It is important to take this equilibrium into account when building structures of nucleic acids. It has also been shown that some sequences can adopt different conformations at the level of the bases, depending on the conditions [22,23], and that some internal motions of the sugars can be measured by ~H-NMR [24]. The present work described a computer program ( N U C F I T ) which has been written to take account of these properties, and to simulate N M R data for nucleic acids, allowing arbitrary structures of any given sequence to be generated, and including the effects of multiple conformations and motional averaging. The same program kernel can also be used to find optimal structures (according to the least-squares criterion) by iterative refinement against the experimental data. The program is an extension and generalisation of work previously described [9,25]. N U C F I T has been used to simulate N O E time courses to investigate the effects of multiple conformations, and internal motions. The fitting routines were then applied to simulated one- and two-dimensional N M R data sets in which a single rigid structure is assumed, and one in which conformational averaging is allowed. These calculations demonstrate the degree to which the solution conformations of nucleic acids can be described using present N M R methods.

Methods Model building The kernel of the model building program consists of geometric routines for rotations and translations, a numerical integration routine, and least-squares optimisation routines. The geometry engine is specific to nucleic acids, and is a development of a program previously described [9,25]. Input consists of the nucleotide sequence, torsion angles and rotations and translations necessary to describe the conformations of individual nucleotide units, and their relationships to one another. The sugars are described using the pseudorotation formalism as a mixture of south and north conformations [14A5,26]. This requires up to five parameters for a complete specification, namely the pseudorotation phase angles Ps and PN where S, N refer to the south (near C2' endo) and north (near C3' endo) conformations, respectively; ~bs and ~bN, the maximum amplitudes: and fs the mole fraction of the south conformer. Following Altona [14,15], ~N is set to 40 ° , and Py is set to 9 °, leaving three parameters to describe the conformation of the sugar. It will be shown that the NOEs are not very sensitive to ~s, and are indeed covariant with Ps, so that in effect, the NOEs can only be used to fit two parameters, namely Js and the glycosidic torsion angle, X. q~s can be set to 37 ° with small error [14--16]. The glycosidic torsion angle is defined as the rotation about O 4 ' C 1 ' N 9 C 4 for purines, and O 4 ' C 1 ' N 1 C 2 for pyrimidines. With this definition, X is about - 1 1 0 ° for B D N A and - 1 5 0 ° for A DNA. The Cartesian coordinate system is centered with the x axis bisecting the C8 and C6 of a base pair, the v axis perpendicular to x at the mid-point, and the z axis perpendicular to x and y, at their point of intersection. Base roll (p) is then defined as a positive rotation about x, base tilt (t) as a positive rotation about y, and the helical twist (~'2) as a positive rotation about z. The shift of the helix axis ( D ) is defined as a translation of the base pair along its short axis (x) toward the minor groove, the base pair slide as a translation along its long axis ( y ) and the helical rise (h) is defined as a positive translation along the local z axis. These correspond to definitions given in Ref. 27. The coordinates of the bases, sugars and phosphates are stored in a standard conformation, as C2' endo ( P = 162, ~ s = 3 6 °, f s = 1), X = - 1 1 0 ° , and all other parameters equal to zero. A given conformation for the input sequence is generated by appropriate rotation and translation according to the values of the torsion angles and translations given above. The protons are added using standard bond lengths and bond angles taken from Saenger [28]. NOEs can be calculated for three different experiments; the driven truncated N O E [29], NOESY [30] and ROESY [31]. The cross relaxation rate constants are

191 calculated for a given geometry using the appropriate spectral density functions [32], for the input correlation time and spectrometer frequency. Rapid internal motions are treated by assigning order parameters [33] which can be calculated for specific motional models using the method of Tropp [34], or estimated from experiment [9,24]. The cross relaxation rate constants in the presence of rapid motion are then approx, given by [24,33,34]: % = S2a(6J(2o~)- J(O))/ri 6

(1)

where S z is the order parameter, a is a constant whose value is 56.92 ,~6 ns-Z, rij is the interproton separation and J ( w ) is the spectral density function. Reciprocal T1 (i.e., p) values are also calculated assuming pure dipolar relaxation, though there is also the facility to override these values. The calculated values of to provide good initial estimates for the relaxation rate constants when the number of spins is large, and the only relaxation mechanism is dipolar. The temporal evolution of the magnetisations, M, is defined by the Bloch-Solomon equations: d[M]/dt= -[RI[M

l

(2)

where JR] is the relaxation matrix [20]. In the presence of multiple conformations, the relaxation matrix has to be calculated as the average over the structures. If the frequency of interconversion among different conformations is much greater than the Larmor frequency, the method of Tropp [34] must be used. If the rates of interconversion are slower than the Larmor frequency, then cross relaxation rate constants are simply linear weighted averages of the cross relaxation rate constants appropriate for each conformation sampled. For example, within a nucleotide, the mixture of north and south conformations would lead to the following average for the cross relaxation rate constants if sugar repuckering were slow on the Larmor timescale: (o) = fsOs(Xs) + (1 - fS)ON(XN)

(3)

and (o) = fslfsz%l(X)%2(X) + (1 - fsz)fslOs1(X)ONZ(X) + (1 - fsl

)fS2OiSI(X)ON2(X)

+ (1 - f m

)(1 - fS2)ONI(X)ONz(X)

(4)

Because only nearest neighbouring nucleotides contribute to the magnetisation transfer process, higher order averaging is not considered. The exact frequency of sugar repuckering in oligonucleotides is not known with any certainty, though recent experiments and calculations suggest that it is not the dominant motion that affects relaxation of protons in deoxyribose [24]. In-

deed, the order parameters calculated for the relatively small amplitudes of the motion of deoxyribose protonbase proton vectors during repuckering between C2' endo and C3' endo states are very similar for both linear averaging and fast averaging calculated using Tropp's formalism (unpublished calculations, and see Results section). Hence, while Eqn. 3 and Eqn. 4 may be approximations that are incorrect for many internal motions, provided that the rapid internal motions have small amplitude, the error introduced is quite small, and much smaller than the error introduced by ignoring motions altogether. For double-stranded nucleotides of greater than about ten base pairs, it may be necessary to take into account anisotropic rotation. For cylindrical DNA, the model of Woessner [35] can be used [36], which states that the spectral density function is: J(O)=al~+a2~+(l-al-a2)~

(5)

where a~, a 2 are defined as: a 1 = 0 . 2 5 ( 3 cos2fl - 1) 2

(6A)

a 2 = 3 cos2fl sin2fl

(6B)

where fl is the angle the proton-proton vector makes with the long (z) axis of the molecule, rl.2.3 are correlation times defined as: "1"1 =

T L

(7A)

r 2 = 6~'Lrs/( r L + 5rs)

(7B)

r 3 = 3 " r L % / ( 2 r L + Zs)

(7C)

where TL is the correlation time for rotation about the short axis, and % is the correlation time for rotation about the long axis. For vectors that are nearly perpendicular to the helix axis (e.g., Cyt H6-H5, H 8 / 6 to H I ' , H2', H 2 " and H 3 ' ) , the dependence of the spectral density function on the angle fl is weak. Indeed, for the Cyt H6-H5 vector, fl = 90 °, and a 2 = 0. Under these conditions, J(0) is equal to the apparent correlation time for the Cyt H6-H5 vector. If the axial ratio is known, the ratio of the correlation times 7 = rE/% can be calculated using Perrin's equations [35]. If royt is the correlation time for the CH6-CH5 vector, then from Eqns. 6 and 7 for fl = 90 °, TL and % are given by: r L = (23' + 1 ) royt / ( 0 . 5 3 ' + 2.5)

(8A)

rs = rL/3'

(813)

For fl = 0, the spectral densities depend only on TL, SO that the internucleotide cross relaxation rate constants are governed by the correlation time for end-over-end tumbling. Hence, for anisotropic rotors, the axial ratio

192 must be supplied in addition to the correlation time for the Cyt H6-H5 vectors. It should be noted that the value of ~'L is at most 4 ~'cyt (i.e., for an infinitely long ellipsoid). Once the relaxation matrix has been generated, the N O E s are calculated as previously described [25], except that the fourth order R u n g e - K u t t a method is used for the numerical integration [38]. The Bloch-Solomon equations are linear and first order, and the magnetisation transfer a m o n g the different spins occurs over a relatively narrow time scale. Calculations have shown that the default step size in the integration (i.e., 1 / 1 0 p ..... where Pmax is the largest spin-lattice relaxation rate constant in the spin system) results in an error of less than 1% of the calculated value at any time up to at least 1 s. N O E s are stored in a file at times that facilitate comparison with experimental data, which are usually acquired at multiples of say 25 ms. It is, therefore, possible to calculate N O E time courses for an arbitrary c o n f o r m a t i o n of a nucleic acid, taking into account averaging of both the first and second kind [37]. D a t a fitting

Two modes of data fitting are considered. The first finds the set of o and O values that best represent the N O E data. A set of spins is specified, and the relaxation matrix is varied to minimise the sum of the squares of the deviations of the calculated and observed N O E intensities, using the Marquardt algorithm [38]. In this method, no other constraints are supplied. It is important to ensure that the n u m b e r of parameters does not exceed the information content of the data. For example, if one spin is irradiated, and two N O E s are observed as a function of time, then it is possible to determine up to five parameters. Two of these are the O values of the observed spins, which have little structure content. The remaining three are the cross relaxation rate constants connecting the irradiated and observed spins, and the cross relaxation rate constant between the two observed spins. The precision of the determination is very sensitive to the geometry. Although this method could be used to derive the entire relaxation matrix, from which the structure could then be determined by proper interpretation of the fitted cross relaxation rate constants [8], in practice the smaller cross relaxation rate constants, and those corresponding to indirect interactions, are relatively imprecise, and are usually covariant with other cross relaxation rate constants. This method, however, is useful for tightly coupled dipolar spin systems, e.g., in the sugars, where the geometry fixes m a n y of the internuclear distances. The derived cross relaxation rate constants can then be used to determine order parameters for motions of different vectors [24]. The second m e t h o d of data fitting makes use of the

linear properties of nucleic acids, and the natural constraints imposed by the covalent structure, First, the conformations of individual nucleotides are determined (i.e., the parameters X, Ps, fs are fitted), using base H 8 / H 6 to H I ' , H 2 ' , H 2 " and H 3 ' NOEs. If values of Ps are available from coupling constants, this is held constant, as the sugar-base N O E s are only weakly dependent on Ps [9,16]. The values of p are allowed to vary as empirical parameters that have no special physical significance; it is therefore necessary to collect data to fairly long mixing times. There are then only two structural parameters to find, X, ,/s. First a grid of initial values of X, fs is set up, along with initial values of p, which can be taken as the calculated values for dipolar relaxation. For each value of f~, X and O are optimised starting from each initial value of X using the M a r q u a r d t algorithm, and the value of the parameter R is stored. R, the mean absolute error, is defined as: R = (1/N),Y[ NOE(calc) - NOE(obs) 1/t NOE(obs) I

(9)

where N is the n u m b e r of observations. For a fit in which the only error is statistical arising from r a n d o m errors in the N O E intensities, the value of R should be close to the standard d e v i a t i o n / m e a n N O E . The standard deviation can be estimated from the signal-to-noise ratio of the N O E spectra. The grid search allows the c o n f o r m a t i o n space to be adequately sampled. Because the c o n f o r m a t i o n space for a nucleotide contains few minima, a relatively coarse search suffices. The values of X, P, fs (or p,;) that give the smallest value of R are taken as initial estimates. These estimates are further refined together, again using the M a r q u a d t algorithm. To reduce wild oscillations during the refinement, different parameter deviations are scaled, with the p deviations scaled down, and the deviations for fs scaled up. The scaling factors are brought closer to 1.0 as the refinement proceeds. N o t e that this is a constrained minimisation. For a given pucker state (which can be a mixture of two conformations), a change in the glycosidic torsion angle merely moves the base with respect to the sugar, leaving all cross relaxation rate constants that do not involve base p r o t o n to sugar proton interactions fixed by the covalent structure. Internal motions are treated by using order parameters if available (see above). Further, at this stage, no assumptions about the overall structure of the molecule have been made, only assumptions about the covalent structure of the nucleotide. The second stage is to find helical parameters for dinucleotides whose c o n f o r m a t i o n s have been determined in stage 1. Initially, the roll is held fixed, and values of the rise, tilt, shift and helical twist are found using the hybrid grid search-Marquardt method. The best twist is found for several values of the rise, which has restrictions on its values imposed by the covalent

193 structure and van der Waals interactions. Optimisation of the roll, tilt and shifts requires that two base pairs be considered. As previously discussed [9], the twist angle must be the same for both dinucleotides in doublestranded nucleic acids, as should be the shift. Propellor twisting allows the roll angles for the two bases within a pair to differ, while buckling of the base pair allows the tilt angles to be different for the two bases. However, unless the buckling is large, its effects will be undetectable and the number of independent internucleotide NOEs is insufficient to determine all of these parameters.

I

i

i

J

I

l

i

l

2' 1,2-

~'~

0.8.

0.4-

0I

Incorporation of van der Waals constraints Although the van der Waals energy is slightly attractive for equilibrium structures, its main use in findings conformations is in defining permissible regions of conformation space. Those conformations that produce significant violations of the lower bounds of the van der Waals radii should be rejected. In the present implementation, van der Waals information has been incorporated only as a check on steric acceptability of conformations, using only the repulsive part of the energy function. The Lennard-Jones 6-12 function was used to calculate energies as a function of internuclear separation, and then reparameterised as: U(rij) =

Aij/ri~(r >

rmin)

(10)

where rmm is the minimum van der Waals contact for a pair of atoms i, j. The value of A and the exponent n depend on the pair of atoms considered, and were obtained by reparameterising the data given in Saenger [28]. rmi. is the minimum contact distance for a pair of atoms, van der Waals violations are defined only for r < rmin. A list of offending atoms pairs are produced, and the total repulsive energy, /./tot is calculated as the sum of U(r) over all pairs of violating atoms. N U C F I T is written in F O R T R A N , and runs on Apple Macintosh computers. Results

1. Simulations of N O E intensities as a function of conformation and dynamics (a) Nucleotide conformation The conformation of rigid nucleotides can be described by three parameters, the glycosidic torsion angle X, the pseudorotation phase angle P, and the amplitude of the pseudorotation ~ [14,15]. If the sugar exists as a mixture of two pucker states (north and south) [14,15], then P and ~ have to be defined for both. In addition, the mole fraction of the south state fs needs to be specified. A complete description of the two-state conformational equilibrium can be obtained from measure-

'

I

I

I

I

-40

-8o

-12o

-1 o

Glycosidic Torsion Angle ( d e g )

Fig. 1. D e p e n d e n c e of the cross-relaxation rate c o n s t a n t for b a s e to sugar p r o t o n s on the glycosidic torsion angle. C o o r d i n a t e s were generated as a function of the glycosidic torsion angles at two p s e u d o r o t a tion p h a s e angles ( P = 1 6 2 ° a n d P = 1 8 ° ) as d e c r i b e d in the text. Cross-relaxation rate c o n s t a n t s were c a l c u l a t e d a c c o r d i n g to Eqn. 1 with $ 2 = 1 . The reduced cross-relaxation rate constant, o R was calculated as o / T R, with "rR = 2.5 ns. The s p e c t r o m e t e r frequency was 500 MHz. ( ) P = 1 6 2 ° ; (© ©) P = 1 8 ° .

ment of a sufficient number of three bond coupling constants. However, in large fragments of DNA, it is not always possible to measure all the required coupling constants with sufficient accuracy, so it may be necessary to supplement this information with N O E data. The dependence of NOEs on these parameters has therefore been simulated to determine the degree to which the nucleotide conformation can be described.

(i) Dependence of NOEs on the glycosidic torsion angle. Fig. 1 shows calculated reduced cross relaxation rate constants OR, (defined as the ratio of the cross relaxation rate constant for a correlation time of ~'R to ~'R, i.e., o/~'R), as a function of the glycosidic torsion angle, in two pucker states (C2' endo and C3' endo). The figure shows that o R of the base proton (H8) to H2' is very sensitive to the value of X in the anti-range characteristic of B D N A (varying up to a factor of 100), and less sensitive to the value of P (variation of 3-fold at a given value of X). Indeed, as one would expect, the cross relaxation rate constants for the C2' endo and C3' endo states are effectively displaced by a phase angle, apart from small differences due to small changes in bond lengths between the two pucker states. Conversely, o R of the base H 8 / H 6 to H3' is very sensitive both to X and P. However, in the C2' endo conformation, the value of o k is small for all values of X, and observable N O E cross peak intensity for H8-H3' would be a consequence of spin diffusion [9]. In the C3' endo conformation, the value of o R varies from 0.006 to 0.6 over the range 0 < P < 180 °. The base to H I ' o R is independent of P, and weakly dependent on X. The latter oR, however, becomes sensitive to X in the syn range, and thus can be used to discriminate between syn

194 a n d anti. Hence, it is i m p o r t a n t to measure not only the base to H I ' a n d H 2 ' N O E s , which are most useful to fix the value of X, but also the base to H 3 ' N O E , which p r o v i d e s sensitive i n f o r m a t i o n a b o u t the e q u i l i b r i u m between the sugar p u c k e r states. This is in a g r e e m e n t with previous calculations [9].

(it) Dependence of NOEs on the pseudorotation angle and the mole fraction of the south state, fs. M o s t of the m e a s u r a b l e intraring p r o t o n N O E s are insensitive to the sugar pucker, a n d therefore c o n t a i n no i n f o r m a t i o n a b o u t c o n f o r m a t i o n [16,24,39]. T h e r e are two i n t r a r i n g N O E s that in principle could be used to d e t e r m i n e the p h a s e angle of the deoxyribose, n a m e l y H I ' - H 4 ' a n d H 2 " - H 4 ' [16]. U n f o r t u n a t e l y , the H 2 " - H 4 ' N O E is expected to be weak for 90 o < p < 180 o a n d therefore p o o r l y d e t e r m i n e d . The H I ' - H 4 ' N O E becomes intense ( c o m p a r a b l e to that of the Cyt H 6 - H 5 N O E intensity) only near P = 90 ° . F u r t h e r , the m o t i o n of the deo x y r i b o s e s [24] i n t r o d u c e s an u n k n o w n error into the calculation of the N O E . Hence in general, i n t r a r i n g N O E intensities are not very useful for fixing the sugar c o n f o r m a t i o n . However, the base to sugar N O E intensities are sensitive to the p s e u d o r o t a t i o n phase angle, or to the fraction C 2 ' e n d o ( f s ) . Fig. 2 shows the d e p e n dence of A d e H8 to sugar p r o t o n N O E s on the p h a s e angle P at two glycosidic torsion angles c o r r e s p o n d i n g to B and A D N A . As expected, the H 8 - H I ' N O E is i n d e p e n d e n t of the p u c k e r state over the range 0 < P
2 A). Accurate determination of the helical parameters therefore places great strains on the accuracy and precision of the primary N O E data. Interpreting NOEs as distances automatically adds error, which presumably will be propagated in any procedure that uses distances as constraints in a fitting process.

197

i

i

I

I

i

I

I

B I

I

I

I

I

-.2

!

A

2 's

-0.15 ILl

~) z

0 z

-0.10

-.1

-0.05 1' 1 ~

I 3.0

i 3.2

I 3,4

I 3.6

I 3.8

0.0

I 4.0

~

I

I

I

I

I

i

30

32

34

36

38

40

twist/deg

rise/A

I

I

!

i

42

i

I

I

i

i

kJ 2"

-0.15

0.15

2"

ILl 0 Z

I¢1 0 Z

-0.1

0.10

I

0.05

-0.05

2'

1'

0.0 I

I

-20

I

i

0.0

I

I

20

0

I

I

-20

roll/deg

i

i

i

I

I

0

20

tllt/deg

I

g

I

i

I

|

I

F

E,

0.12

0.06

uJ 0 z I

0.04

2'

0.04

0.02

0.0

2"

0.08

0.0 I

-2

'

n

0 slide/A

i

i

2

I

-2

I

|

0

I

I

2

SHIFT/A

Fig. 5. Dependence of internucleotide NOE intensities on the helical twist, rise, roll and tilt, shift and slide. The same nucleotide parameters and conditions as those described in Fig. 4 were used except E, F where XA = -- 100 o. The NOE values are given at a mixing time of 100 ms, and are assumed to be cross peak intensities normalised to the intensity of a single proton. Except for the varied parameter, the parameters were held fixed at 36 ° for the twist, 3.4 ,~ for the rise, and 0 ° for both the tilt and roll. The tilt and roll are varied only for the 3'-nucleotide (i.e., Cyt). NOE intensities are given from Cyt H6 (diagonal) to Ade H I ' , H2' and H2". A, dependence of NOEs on the helical twist; B, dependence of NOEs on the rise; C, dependence of the NOEs on the roll of Cyt; D, dependence of the NOEs on the tilt of Cyt; E, dependence of the NOEs on the shift of Cyt; F, dependence of the NOEs on the slide of Cyt.

198 If one or both nucleotides in a step are mixtures of pucker states, then additional sources of error can arise. For example, if the 5'-nucleotide is 70% south and 30% north, the internucleotide NOEs can increase up to a factor of two compared with the case where both nucleotides are 100% south. A complete analysis should therefore take into account averaging of the nucleotide units.

(c) Influence of anisotropic rotation and internal motions on N O E intensities For highly elongated molecules, the cross relaxation rate constants depend on the angle the vector makes with principal axis of the diffusion tensor [35]. As described in the Methods section, D N A can be modelled as an equivalent ellipsoid of revolution, for which analytical expressions are available for the two rotational correlation times and the spectral density functions. For double-stranded D N A of about ten base pairs or less, the rotational anisotropy has only a small influence on the cross relaxation rate constants. For a twenty base pair fragment, where the axial ratio of the equivalent ellipsoid of revolution is about 2.8, the ratio r~./'(, is about 2.5. At 298 K, the apparent correlation time of a 20-mer is about 6.5 ns [23,41]. Using Eqns. 6 8, the values of r L and r~ are 10.4 ns and 4.2 ns, respectively. As pointed out in the Methods, those vectors oriented nearly perpendicular to the helix axis have cross relaxation rate constants equal to that calculated for an isotropic rotor whose correlation time is the same as that of a vector perpendicular to the long axis, i.e., the Cyt H6-H5 vector. In contrast, the cross relaxation rate constants of vectors oriented nearly parallel to the helix axis are determined mainly by r L. For a 20-mer, the assumption of isotropic rotation would introduce an error for such vectors of about 60% in the cross relaxation rate constant, or about 10% in the distance. Calculations with B D N A showed that base proton to sugar H I ' , H2', H2" or H3' vectors are nearly perpendicular to the helix axis, and therefore the correlation time of the Cyt H6-H5 vector is an appropriate value to use for intranuclear NOEs. Most internuclear NOEs are underestimated by only 20-30% assuming isotropic rotation. The exceptions are H l ' ( i ) - H 8 / 6 ( i + 1), and interiminoproton NOEs, where ¢1 is close to zero. For these vectors, the error resulting from the assumption of isotropic rotation is nearer 75%. Hence, the effects of anisotropic rotation on intranucleotide NOEs are minimal, but can be significant internucleotide NOEs. It will clearly be necessary to take anisotropic motion into account for D N A fragments longer than about fifteen base pairs. Molecular mechanics calculations have shown that the potential well for rotation about the glycosidic bond is broad and shallow [42,43], suggesting that averaging tbout the glycosidic bond may be important in DNA.

"FABLE 1 Effect of conformational averaging on cross relaxation rate constants

Coordinates were generated as described in the text. For motion.' slower than l / ' r average cross relaxation rate constants were calculated as o - [ o ( X - 2 0 ) + o ( X ) + o(X + 2 0 ) ] / 3 . For fast motions, tht averaged cross relaxation rate constants ( ( o ) ) were calculated usin~ Tropp's method [34] with the same set of coordinates, onl¢a is th~ cross relaxation rate constant at X (median value). The values ar~ from the H8 of an adenine residue. The first entry is o/ore,,o, th~ second is ( O >/Ome d. X

- 80 -100 160 160

P

162 162 162 18

Average o / m e d i a n o H8-HI'

H8-H2'

H8-H2"

H8-H3'

1.05/1.01 1.03/1.01 1.05/1.02 1.05/1.02

0.80/0.63 1.0/1.28 1.28/1.11 1.20/1.06

1.11/0.96 1.15/1.0 1.08/1.03 1.03/1.01

0.95/0.90 0.95/0.88 1.045/1.02 1.40/0.70

The effects of averaging about X have been simulated for two motional regimes, namely slow rotation (i.e., slower than overall tumbling), and very fast rotation. For slow motion, the NOEs are the linear average of the NOEs corresponding to the conformations sampled, whereas in fast motion, the dipolar Hamiltonian must be averaged [34]. The effects of averaging due to slow and fast fluctuations of + 2 0 ° are given in Table I. In the C2' endo state, the slow averaging increases the NOEs for X less than - 1 0 0 °, and increases them for X larger than - 1 0 0 °. The variation, however, is only up to around 20%. In the C3' endo state, the variation is greater, especially for the H 8 - H 3 ' NOE. This suggests that averaging about the glycosidic bond will have greater consequences for fitting NOEs in A like D N A than in B like DNA. As expected for the glycosidic torsion angle in the anti range, the averaging has an insignificant effect on the H 8 - H I ' cross relaxation rate constant. In most cases, rapid motion tends to make the cross relaxation rate constant closer to the value for the median. The exceptions occur when the glycosidic torsion gives internuclear distances close to the minimum (cf. Fig. 1) where averaging of any kind can only decrease the cross relaxation rate constant. In this case, rapid motion tends to reinforce the effects of decreasing distances by averaging. The determination of the glycosidic torsion angle by NOEs will in general give a realistic value for the mean angle, even if internal motion is ignored. For example, at a median value of 120 o, the value of X that would be obtained from the N O E intensities assuming no motion would be - 123 _+ 5 o. At a median value of - 8 0 o which corresponds to the maximum in Fig. 1, there are two equally good solutions for X, namely - 60 _+ 5 ° and - 98 _+ 5 o. This is because the dependence of the NOEs is nearly symmetric about the maximum in the anti range. The precision of derived conformational parameters will be treated in greater detail below. -

199 0.0 r

tU

o z

-0.06

- 0 . 1 6 [.-,

,

0.0

0.1

2 0.2

0.3

0.4

0.5

time/sac Fig. 6. Performance o f N U C F I T for a nucleotide. The N O E data set was N2 as described in Table II. The initial values for the spin-lattice relaxation rate constants were calculated from the geometry as described in the text. fs was searched from 0.5 to 1.0 using initial values of the glycosidic torsion angle from - 8 0 to - 1 3 0 °. The continuous line is the best least-squares fit with the parameters given in Table II.

Fig. 5 also allows the effects of slow internal motions on internucleotide N O E intensities to be assessed. A fluctuation of + 5 ° of the helical twist angle at a median value of 36 o causes a difference of less than 10% in the N O E intensities compared with the median value, which translates roughly to an error in determining the median value of the twist angle of less than 1 o. Similarly, fluctuations of the rise by +0.3 ,~ at a median value of 3.4 ,~ also cause a difference of less

than 10% in the N O E intensities, which translates to an error of approx. 0.02 A. These errors are negligible. Fluctuations of the roll angle even by ___10° have negligible effects on the N O E intensities, because of their near linear dependence on the roll. For median tilts of zero or positive values, negligible errors are produced by large fluctuations ( + 1 0 ° ) , though for negative tilts, substantial errors can accrue, of the order 10 o, reflecting the highly non-linear, and strong, dependence of the internucleotide N O E intensities on the tilt angles. In general, however, the effects of internal motions of modest amplitude will have only second-order influence on the determination of the helical parameters, at least in the B like family of conformations.

(2) Performance of the fitting routines in NUCFIT (a) Determination of nucleotide conformations There are several factors that need to be taken into account when fitting NOEs. First is the question of performance of the algorithm itself; does the output match the input parameters under ideal conditions? The second question is, then, what is the radius of convergence, and stability of the solution for data that have random errors of a magnitude that are found in practice? Third, how do systematic errors affect the de-

TABLE II

Fitting nucleotide conformations to simulated N O E data Five data sets were calculated, N1 to N5, with and without noise as described in the text. N I R refers to data set N1 with random noise. In N1, NOE s were accurate to + 0.001, while in N2 they were rounded to the nearest 0.005 units. In sets N1, N2 and N3, the nucleotide was Ade, in which the glycosidic torsion angle was - 1 1 0 o, Ps = 162°, ~s = 36 o, and all helical parameters set to zero. In N4, X = - 1 6 0 o p = 18 o and f = 1 (A DNA). In N5, the nucleotide was G with X = 7 0 °, f = l , P = 1 8 °. In N5, X = 6 5 °, P = 1 8 ° and f = l . The calculations were done for a correlation time of 2.5 ns at 500 MHz. The values of p (in s -1) were 3 (H8), 1 (H2), 2 ( H I ' ) , 7.5 (H2'), 6.5 ( H 2 " ) , 2.5 (H3', H 4' ), 6.5 ( H 5 ') , 6 ( H 5 " ) . In set N1, fs = 1 ; in set N2 fs = 0.7, PN = 18 o; and in set N3 fs = 0.7, PN = 18 o, and the order parameters for the H I ' - H 2 " vector was 0.7, H 2 ' - H 2 " vector was 0.9. Except where noted, the correlation time used in the fitting was 2.5 ns. a % = 2.25 ns; b "re = 2.75 ns; Cfs forced to remain at 1.0; d s 2 = l for all vectors; e H 8 - H I ' NOEs absent; f H8-H2' NOEs absent; g H8-H3' N O Es missing; h p fixed at 18 ° . Values in parentheses were constrained to that value. Dataset

- X

fs

Ps

N1 N1 R N1R a N1R b N1 R N1R N2 N2 N2R N3 N3 d N3R d N3R d.e N3R f.d N3R ~,a N4R n N4R N5R

109.7 110.6 108.4 112.6 109.5 109.5 111.1 110.5 109.4 110.4 109.1 110.8 110.3 112.3 110.8 - 150.2 164.8 76.7

0.998 0.997 0.999 0.995 1.0 0.997 0.7 1.0 c 0.68 0.69 0.7 0.69 0.69 0.68 0.76 1.0 1.0 (1.0)

162 162 162 162 198 126 162 198 162 162 162 162 162 162 162 (18) 0 0

-

P

R

HI'

H 2'

H2"

H3'

2.2 1.6 1.4 1.9 1.6 1.7 1.7 1.5 2.0 1.6 2.25 1.54 (1.59) 1.61 1.55 1.74 1.75 2.3

7.4 7.0 6.5 7.5 7.4 7.0 6.6 7.6 7.0 6.5 7.63 7.0 7.2 (7.4) 7.65 6.2 5.94 6.3

6.8 7.0 6.2 8.0 6.7 6.8 6.1 6.2 6.49 5.8 6.7 6.4 6.4 6.4 6.36 6.15 5.97 6.2

2.7 3.0 2.3 3.6 2.7 2.9 2.4 0.9 2.53 2.7 2.2 2.5 2.36 2.44 (2.5) 2.25 2.0 1.8

0.014 0.175 0.173 0.177 0.181 0.178 0.045 0.192 0.141 0.023 0.027 0.143 0.151 0.157 0.16 0.427 0.427 0.2

200 t e r m i n a t i o n of structural p a r a m e t e r s ? It is a s s u m e d that s y s t e m a t i c errors arise solely from the neglecting variation due to s o m e parameter(s). F o r example, what errors are i n t r o d u c e d b y ignoring the c o n f o r m a t i o n a l e q u i l i b r i u m in the deoxyriboses. Finally, there is the p r o b l e m of n o r m a l i s i n g N O E intensities, or using the correct c o r r e l a t i o n time. Fig. 6 shows the best fit to c a l c u l a t e d N O E s for a m o d e l in which the fraction C 2 ' e n d o was 0.7. The N O E s used in this s i m u l a t i o n were r o u n d e d to the nearest 0.5%. T h e m e a n fractional error, R, for this d a t a set using the i n p u t p a r a m e t e r s was 0.046, which r e p r e s e n t s the best that c o u l d be h o p e d for in a fit. T h e fit o b t a i n e d in Fig. 6, where fs = 0.7, is excellent. H o w ever, when fs is c o n s t r a i n e d to 1.0, while the fits to H 2 ' , H 2 " a n d H I ' are acceptable, a n d give an a c c u r a t e value for X, the fit to H 3 ' is exceedingly poor, and is the source of the large R factor in T a b l e II. T h e results in T a b l e II show that the glycosidic torsion angle is well d e t e r m i n e d , even when the p h a s e angle or the fraction s o u t h are incorrect, a n d when the d a t a are noisy. This is true only if the N O E time courses are r e a s o n a b l y well s a m p l e d . T h e m a i n r e a s o n for the g o o d d e t e r m i n a t i o n of X is that the large b a s e - H 2 ' N O E is very sensitive to X, a n d less sensitive to P a n d fs- T h e statistical errors in T a b l e II are small, a n d are d e r i v e d from the c o v a r i a n c e m a t r i x [38]. W h e n starting from different inital values for the p a r a m e t e r s , g o o d convergence is o b t a i n e d . The fitting error on X is a b o u t _+0.5 °, while the accuracy in these calculations is _+ 2 °. W h e n an incorrect value of fs is forced on the calculation, the value of R b e c o m e s large, a n d the fit to the b a s e - H 3 ' b e c o m e s very poor. This w o u l d be d i a g n o s t i c of c o n f o r m a t i o n a l averaging in the sugars, a n d signal the need to search c o n f o r m a tional space m o r e thoroughly. T h e O values are less well d e t e r m i n e d , a n d less accurate than the structural p a r a m e t e r s , in a g r e e m e n t with previous conclusions [44]. T h e intensity of the fits to the precise values of Ps is shown by the result that the value of R o b t a i n e d with Ps = 126 was 0.178, a n d with Ps = 198 was 0.181 (see T a b l e II), while the fitted values of X were - 1 0 9 . 5 a n d - 1 0 9 . 4 °, respectively. T h e fits differed m a i n l y in the values of O. T h e effect of noise on the p r i m a r y N O E intensities was s i m u l a t e d using a r a n d o m n u m b e r g e n e r a t o r giving a series of zero m e a n and m a x i m u m a m p l i t u d e of _+ 0.01 (rms = 0.007, i.e., 0.7%), which were a d d e d to the exact N O E s . T h e statistical error increases the value of R (in this case to a b o u t 0.18). This is close to the vlaue r m s / ( m e a n N O E ) = 0.186. As T a b l e II shows, the values of X, fs r e m a i n s well d e t e r m i n e d in the presence of m o d e s t noise. In fitting o n e - d i m e n s i o n a l N O E data, it is necessary to s u p p l y a value for the overall c o r r e l a t i o n time. It has been shown that the c o r r e l a t i o n time can be a c c u r a t e l y d e t e r m i n e d from the cross r e l a x a t i o n rate c o n s t a n t for

Cyt H 6 - H 5 vectors [23,41,45]. P r o v i d e d that the N O E time courses are well s a m p l e d , a n d p r e f e r a b l y r e c o r d e d at several t e m p e r a t u r e s , the c o r r e l a t i o n time at any given t e m p e r a t u r e has an error of less than + 1 0 % . T a b l e II shows the influence of a 10% error on the c o r r e l a t i o n time on the fitting for noisy data. T h e value of X is still accurately d e t e r m i n e d (error < 2.5 ° ), implying that m o d e s t errors in the scaling of the c o r r e l a t i o n time d o not seriously affect the fit. I n t e r n a l m o t i o n s of the sugar p r o t o n - p r o t o n vectors scales d o w n some of the cross r e l a x a t i o n rate constants. In particular, the H I ' - H 2 " vectors have o r d e r p a r a m e ters of 0.6 (the value of a 1 given in Ref. 9 is equivalent to the o r d e r p a r a m e t e r ) to 0.7 [24], a n d the H 2 ' - H 2 " vectors o r d e r p a r a m e t e r s of a b o u t 0.9 [24]. A third d a t a set was c r e a t e d for which fs = 0.7, a n d S 2 ( H 2 " - H I ' ) = 0.7 a n d S 2 ( H 2 ' - H 2 ' ' ) = 0.9. All o t h e r o r d e r p a r a m e t e r s were set to unity, though the cross r e l a x a t i o n rate c o n s t a n t s are still a v e r a g e d a c c o r d i n g to Eqn. 3. As T a b l e II shows, i g n o r i n g these o r d e r p a r a m e t e r s has insignificant effects on the d e t e r m i n a t i o n of the nucleotide c o n f o r m a t i o n , i n d i c a t i n g that the a s s u m p t i o n of a single c o r r e l a t i o n time for all vectors m a y be a c c e p t a b l e in practice. Finally, it is quite c o m m o n for s o m e N O E s to be u n m e a s u r a b l e , owing to spectral overlap. The effect of leaving out these N O E s has also been calculated. N o t surprisingly, leaving out the H I ' - H 8 N O E s has an insignificant effect on the fit, b e c a u s e these small N O E s , which are very insensitive to X a n d fs c o n t r i b u t e a l m o s t n o t h i n g to the target function, in the anti range. O m i t ting H 2 ' leads to p o o r e r statistical d e t e r m i n a t i o n of the p a r a m e t e r s , though the a c c u r a c y r e m a i n s acceptable. This is because the H 2 " - H 8 N O E c o n t a i n s a l m o s t the same i n f o r m a t i o n as the H 2 ' - H 8 N O E , p a r t l y b e c a u s e of the r a p i d spin diffusion from H 2 ' to H 2 " , a n d p a r t l y because of the g e o m e t r y of a nucleotide. Leaving out the H 3 ' - H 8 N O E has a l m o s t no influence on the det e r m i n a t i o n of X, as this is largely d e t e r m i n e d by the H 2 ' - H 8 N O E s . In contrast, the value of fs is relatively p o o r l y d e t e r m i n e d statistically in the a b s e n c e of the H 8 - H 3 ' N O E , a n d is also less accurate. A l t h o u g h the value of X is well d e t e r m i n e d for B like c o n f o r m a t i o n s , where base p r o t o n to sugar p r o t o n N O E s are intense, the s a m e m a y not be true in other conformations. F o r example, in the A c o n f o r m a t i o n , where P s = 1 8 ° a n d X = - 1 5 0 ° , the N O E s b e c o m e quite small. D a t a set N 4 was g e n e r a t e d for the A c o n f o r m a tion. As the results in T a b l e II show, the value of X is less well d e t e r m i n e d than in d a t a sets N1 to N3, b o t h in terms of the statistical precision ( ± 5 ° ), a n d the accuracy. Also, the o p t i m u m value of P o b t a i n e d by extensive searching in the north range was 0 ° c o m p a r e d with the i n p u t value of 18 °, though the value of R was similar to that o b t a i n e d for the input p a r a m e t e r s . The main reason for this is b e c a u s e the N O E s are small, the

201 i

i

i

i

!

I

|

|

i

|

,

=

•

,

,

30

34

38

42

3.6

0.16

3.6

cc

5~

0.0~

a.4

-e-

3.2

i

I

26

30

I

34

I

l

36

42

3.0

26

twist/deg

twist/deg

Fig. 7. Performance of NUCFIT for a dinucleotide. The same NOE data set as in Fig. 4 was used. The helical twist was optimised at different values of the rise, for fixed values of the other four parameters. A, mean absolute error, R, versus optimal twist angle; B, optimal twist angle versus helical rise.

random noise causes the NOEs to vary substantially around the true values. Similarly, for the G residues in Z D N A ( P = 18 °, X = 6 5 ° ) , X is d e t e r m i n e d o n l y t o w i t h i n a b o u t + 1 5 °, w h i l e P w a s d e t e r m i n e d t o b e 0 + 9 °. T h e fit is d o m i n a t e d i n t h i s c a s e b y t h e H I ' H 8 N O E , w h i c h is i n t e n s e , w h i l e all t h e o t h e r N O E s a r e weak, and strongly affected by noise. H e n c e , it c a n b e c o n c l u d e d t h a t t h e g l y c o s i d i c t o r s i o n a n g l e is likely t o b e well d e t e r m i n e d i n t h e h i g h anti range, even under fairly pathological conditions, followed by the fraction south (probable error +0.1). T h e v a l u e o f P s is n o t w e l l d e t e r m i n e d e v e n u n d e r i d e a l c o n d i t i o n s , a n d c a n p r o b a b l y b e set i n t h e m i d d l e o f t h e

s o u t h r a n g e (i.e., P = 162 ° ) w i t h o u t s e r i o u s i n f l u e n c e o n t h e o t h e r p a r a m e t e r s . T h e d e t e r m i n a t i o n o f t h e glyc o s i d i c t o r s i o n a n g l e i n A l i k e D N A o r Z like D N A will b e less w e l l d e t e r m i n e d , t h o u g h a p r o b a b l e e r r o r o f + 10 ° w o u l d still b e a u s e f u l c o n s t r a i n t . F o r s u g a r s i n the north pucker state, the pseudorotation angle can be d e t e r m i n e d t o w i t h i n a b o u t + 1 0 °, c o m p a r e d w i t h + 30 o i n t h e s o u t h p u c k e r s t a t e .

(b) Determination of helical parameters Initial values for the helical parameters can be obtained in a manner similar to that described for the nucleotide conformations, taking into account the rela-

TABLE III

Internucleotidefitting Three data sets were generated for dinucleotides having a correlation time of 2.5 ns and a spectrometer frequency of 500 MHz. AC1 consisted of Ade-Cyt in which the glycosidic torsion angles were - 1 0 0 o and - 1 1 0 % respectively, and Ps was 162 o. The rise was 3.4 ,~, the twist was 36 o, and all rolls, tilts, shifts and slides were zero. NOEs from Cyt H6 and Cyt H5 to Ade H8, HI', H2', H2" and H3' were used. In data set AC2, the same parameters were used, except that the roll and tilt angles of the Cyt residue were + 5 ° and - 5 o, respectively, and the displacement was - 0.5 .~. In data set GT1, the same parameters as for AC2 were used, but only the NOEs from Thy H6 to Gua H8, HI', H2', H2" and H3' were used. U is the van der Waals energy. The values of R for the input data were: 0.01 (AC1), 0.069 (AC2) and 0.011 (GT1). Values in parentheses were constrained to those values during the search and fitting process. The error on the rise and twist was +0.3 ~k and 3 °, respectively. Dataset

Rise (,~)

Twist (deg.)

Roll (deg.)

Tilt (deg.)

Shift (.~)

U (kcal/mol)

R

AC1 AC2 AC2 AC2 AC2 GT1 GT1 GT1 GT1

3.4 3.4 3.4 3.4 3.7 3.1 3.4 3.4 3.4

36 36 35.5 37 33.8 39 35.8 40.4 34.5

(0) (5) (0) (0) (0) (0) (0) (0) (0)

(0) (0) - 5 0 - 5 (0) - 5 - 5 - 5

(0) (0) - 0.5 0 0.0 (0) - 0.5 0 0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0.0

0.01 0.08 0.11 0.12 0.16 0.08 0.027 0.037 0.073

202 tive sensitivity of internucleotide N O E s to changes in the structural parameters (see above). First N O E data were generated for the dinucleotide AC in the B conformation. The N O E s A H 8 to CH6 and CH5, and C H 6 to AH8, A H I ' , A H 2 ' , A H 2 " , and A H 3 ' were used for fitting. Fig. 7A shows how the value of R varies with the twist angle when the rise is optimized, and Fig. 7B shows how the optimized value of the rise varies according to the fixed value of the twist angle. Clearly, there is a m i n i m u m in the target function, which corresponds to the input parameter values (twist = - 36 o rise = - 3.38 ,~). However, the optimized value of the rise decreases with increasing twist angle. This anticorrelation agrees with the considerations given above. If fits with R factors of < 0.04 were accepted, these calculations would indicate that, under optimal conditions, the helical twist can be defined to within about +_2 ° , while the rise is determined to within about +_0.2 ,~. A second data set was prepared for the dinucleotide A C (AC2), in which the roll, tilt and shift of the cytosine residue were set to 5 ° _ 5 ° and - 0 . 5 A, respectively, and r a n d o m noise was added to the data. The resulting value of R using the input parameters was 0.069. If the roll, tilt and displacements are constrained to their input values, a least-squares optimization of the twist, rise and relaxation rate constants of the observed spins to the data yields correct values for the rise and twist (Table III), with a final R value essentially the same as a perfect fit. Constraining the roll angle to 5 o and optimizing with respect to the rise, twist, tilt and shift, using a hybrid grid search as described above, acceptable solutions are found only for small negative tilts. Table III shows the three best fits obtained with the roll of Cyt set to zero. All other solutions had much larger R values, and m a n y of them had high van der Waals energies. These results show that, with sufficient data, the rise and twist are relatively well determined, and are tolerant of modest errors in the roll angle. The results also suggest that the tilt and displacement are less well determined. N o t all dinucleotides yield so m a n y NOEs. In the dinucleotide GT, having the same conformational parameters as AC2, only N O E s from T h y H6 to G u a protons were used (i.e., N O E s to the T h y methyl group were ignored). As expected, if the correct roll, tilt and displacements are given, the correct values of the rise and twist are found. If the roll, tilt and shift of Cyt are constrained to zero, a m i n i m u m in the function R is found at a twist angle of 39" and a rise of 3.15 ,~. Including constraints between imino protons has only a minimal effect, as these N O E s are dominated by the rise, and therefore serve only to restrict the search in the values of the rise. As Fig. 5 shows, the N O E intensities are only weakly dependent on the roll angles, so the residual error in the determination of the helical parameters arises mainly from the incorrect tilt angle. A

search over c o n f o r m a t i o n space, with least-squares optimization, generated the three best solutions shown in Table III. In all cases, a small negative roll angle was found, with twist angle in the range 34.5 40.4 ° . Allowing all helical parameters to vary in a least-squares fit results in an improvement in the value of R, though because the tilt, shift and twist are quite strongly covariant, their accuracy is not greatly improved (see Table III), and the roll angle barely changes at all. Evidently, when fewer N O E intensities are available. the accuracy of the best solutions is impaired. This is not surprising as the problem is actually underdetermined. Under these conditions, the best that can be hoped for is to find solutions that are compatible with the data. Simple least-squares optimization is not very suitable for this, because the algorithm will only find solutions very close to the initial conditions. The grid search method, while computationally inefficient, does sample c o n f o r m a t i o n a l space, and the least squares refinement can be restricted to the most sensitive parameters (which to some extent depends on the conformation, see Fig. 5). The van der Waals energy is a useful guide to restricting the search. For example, large, negative tilt angles of the 3' base, when uncompensated by changes in helix shift or slide result in very unfavourable steric clashes between the nucleotide units. These clashes can be c o m p e n s a t e d partially by increasing both the rise and the helical twist, and by a negative roll of the 3' base. The effects of rolling base pairs on steric clashes in the minor grooves has been treated in detail by Calladine [46] and Dickerson [47]. In the A family of structures, where the axial rise is relatively small, and the base pair tilts are large, it is the displacement of the base pairs from the helix axis that overcomes the steric clashes. It is clear that use of van der Waals constraints can help avoid poor structures, but may not be of much use in discriminating between similar conformations that yield acceptable agreement with N O E data. Further, large values of the rise ( > 4 A) can be excluded from measurements of the N O E between imino protons, as the distance between these protons is at least as large as the rise in B like D N A , and at this range, these N O E intensities become vanishingly small. Generally, the roll and tilt angles are not zero. Indeed, propellor twists can be substantial. The internucleotide N O E s depend on the rotation about the long axis of the base pair, which is the sum of the base pair roll and propellor twist. The individual roll angles and propellor twists can only be determined for two base pairs, with the propellor twist defined as the difference between the apparent roll angles for the two bases in a base pair [9]. However, unless the propellor twists are large, they are poorly determined by the N O E data. Because the internucleotide sugar-base N O E intensities are also sensitive to the glycosidic torsion angle of

203 the 5'-nucleotide, once initial helical parameters have been found, it is necessary to refine the structure allowing variation of all parameters. Generally this third stage of refinement causes only small changes in the initial parameter values. Discussion

The calculations given in the Results section indicate that if N O E intensities as a function of mixing time are analysed by varying the conformations of fragments of the sequence under study, the glycosidic torsion angles are likely to be well determined, and also accurate. Provided that a sufficient number of base proton to sugar proton NOEs are observed (particularly the H 8 / H 6 to H3'), the fraction of the south conformation can be reasonably well determined, but the pseudorotation phase angle is not well determined. For this, the N O E data should be supplemented with spin-spin coupling constants. In the range of glycosidic torsion angles typical of the A and B forms of DNA, averaging about the glycosidic torsion bond has only small effects on the N O E intensities, and the derived glycosidic torsion angle will be close to the median value of the range being samples. Incorrect determination of the fraction of the south conformation present, and assuming that a single correlation time exists for all vectors has a relatively small influence on the determination of the nucleotide conformation. Nevertheless, the simplified treatment of internal mobility on different timescales remains one of the greatest sources of uncertainty in the determination of conformational parameters. These conclusions differ from those reached by Pardi et al. [48] who applied distance geometry to synthetic data derived from the crystal coordinates of the E c o R I sequence solved by Dickerson and coworkers [49], in which no intrasugar distances were used. Pardi et al. [48], found that the best determined parameter was the pseudorotation phase angle, even though all sugars were in the south domain. Somewhat surprisingly, the backbone angle 8 was considerably less well determined, even though P and ~ are closely related [14,15]. In agreement with Pardi et al. [48], the shift and propellor twists are poorly determined. However, the roll angles in their study appeared to be better determined than the glycosidic torsion angles, though this may be an artefact of their determination criteria. Thus, although the average of the roll angle from several independent runs produced values within a few degrees of the target values (though sometimes of opposite sign), the range of values determined was very large, indicating that the precision of the determination is poor. The calculations shown here indicate that small roll angles (i.e., less than 10 o of either sign) are not well determined by N O E data. Only large roll angles have a substantial influence on internucleotide NOEs, and re-

quire information from van der Waals interactions to obtain reasonable precision in their determination. Further, in many instances, there are insufficient N O E s to determine all the parameters; conformational space should be searched to find combinations of parameters that are consistent with the data. This is unfortunate, because the global features of D N A are determined in considerable part by the roll angles. A run of roll angles of the same sign will cause bending of the DNA. It seems unlikely that long-range properties of D N A can be adequately described by present methods of N M R data collection and analysis. The requirement for adequate consideration of spin diffusion and conformational averaging is clear. If these complications are ignored, then derived distances used as input for distance geometry can severely distort the resulting structures. It is common for the actual distances used as input for distance geometry and retrained dynamics calculations to be mutually inconsistent, and sometimes physically impossible [10,50-52]. Only by using wide upper and lower bounds can a set of solutions be found [10]. Anisotropic motions have the greatest affect on the cross relaxation rate constants for vectors parallel to the helix axis. For a twenty base pair fragment of B D N A , the axial ratio is near 3, and the ratio of the correlation times for end-over-end tumbling and spinning around the long axis is also about 3. If the correlation time for the cytosine H6-H5 vectors is used as an isotropic rigid body correlation time, then internucleotide distances are overestimated by up to 10%, which amounts to more than 0.3 ~- for the H l ' ( i ) - H 8 / 6 ( i + 1) and NH(i)-NH(i + 1) vectors. This effect would presumably tend to stretch the resulting structures. On the other hand, the influence on intranucleotide N O E intensities is insignificant. It has been implied that the conformations of macromolecules determined by N M R spectroscopy approaches the precision of single crystal X-ray diffraction. In the cas~ of nucleic acids, this can only be partially true at present, because the conformation of the phosphate backbone is not determined [43]. It may be true for some local structural details such as the glycosidic torsion angle, but is not so for global features. However, one area where the N M R method scores over the crystallographic method is the detection and characterisation of multiple conformations, particularly of the sugars [14,15], and in some instances to variation of propellor twists [9,53-55]. The crystal studies have uniformly demonstrated the presence of a single conformation [49,56] which can represent only a fraction of the conformations actually present unrestrained in solution. In this sense, the N M R and crystallographic methods are nicely complementary, and fortunately there is a significant overlap in the information content of the two methods. Where applicable, the two methods tend

204 t o a g r e e in m a n y

of the features observed, the best

e x a m p l e b e i n g t h e E c o R I site s t u d i e d b y D i c k e r s o n a n d c o w o r k e r s [49], a n d b y P a t e l [ 5 3 - 5 5 ] . G i v e n t h e e n t i r e l y different nature of the NMR and diffraction experim e n t s , it m a y well b e m i s l e a d i n g t o try t o f o r c e t h e NMR-determined

structures

to

resemble

crystallo-

g r a p h i c s t r u c t u r e s (i.e., a s i n g l e c o n f o r m a t i o n w i t h all a t o m s p r e c i s e l y l o c a t e d in s p a c e ) . O n t h e c o n t r a r y , it m i g h t be m o r e useful to c o n c e n t r a t e o n the d i f f e r e n c e s in t h e v i e w s o b t a i n e d b y t h e t w o m e t h o d s , a n d t h e r e b y learn more question.

about

the p r o p e r t i e s of the m olec u l e in

Acknowledgements This work was supported by the Medical Research C o u n c i l of the U . K . I a m grateful to Dr. M.J. F o r s t e r for helpful d i s c u s s i o n s on a n i s o t r o p i c m o t i o n s , a n d to D r . C.J. B a u e r f o r a s s i s t a n c e w i t h r o t a t i o n m a t r i c e s . I t h a n k a l s o D r . J. F e e n e y f o r c r i t i c a l d i s c u s s i o n s o f t h e manuscript.

References 1 Driscoll, P.C., Gronenborn, A.M., Beress, L. and Clore, G.M. (1989) Biochemistry 28, 2188-2198. 2 Williamson, M.P., Havel, T.F. and Wiithrich, K. (1985) J. Mol. Biol. 182, 295-315. 3 Kline, A.D., Braun, W. and Wiithrich, K. (1986) J. Mol. Biol. 189, 377-382. 4 Kline, A.D., Braun, W. and Wiithrich, K. (1989) J. Mol. Biol. 204, 675-724. 5 Kaptein, R., Zuiderweg, E.R.P., Scheek, R.M., Boelens, R. and Van Gunsteren, W.F. (1985) J. Mol. Biol. 182, 179-182. 6 Pflugrath, J.W., Wiegand, G. and Huber, R. (1986) J. Mol. Biol. 189, 383-386. 7 Nilges, M., Clore, G.M., Gronenborn, A.M., Piet, N. and McLaughlin, L.W. (1987) Biochemistry 26, 3734-3744. 8 Boelens, R., Koning, T.M.G., Van der Marel, G.A., van Boom. J.H. and Kaptein, R. (1989) J. Magn. Reson. 82, 290-308. 9 Lef6vre, J-F., Lane, A.N. and Jardetzky, O. (1987) Biochemistry 26, 5076-5090. 10 Nerdal, W., Hare, D.R. and Reid, B.R. (1988) J. Mol. Biol. 201, 717-739. 11 Zhou, N., Bianucci, A.M., Pattabiraman, N. and James, T.L. (1987) Biochemistry 26, 7905:117913. 12 Zhou, N., Managoran, S., Zon, G. and James, T.L. (1988) Biochemistry 27, 6013-6020. 13 Chary, K.V., Hosur, R.V., Govil, G., Chen, C-q. and Miles, H.T. (1988) Biochemistry 27, 3858-3867. 14 Rinkel, L.J. and Altona, C. (1987) J. Biomol. Str. Dyn. 4, 621-649. 15 Rinkel, L.J., Van der Marel, G.A., van Boom, J.H. and Altona, C. (1987) Eur. J. Biochem. 166, 87-101. 16 Van de Ven, F.J.M. and Hilbers, C. (1988) Eur. J. Biochem. 178,

1-38. 17 Borgias, B.A. and James, T.L. (1988) J. Magn. Reson. 79, 493-512. 18 Chazin, W.J., Wtithrich, K., Hyberts, S., Rance, M., Denny, W.A. and Leupin, W. (1986) J. Mol. Biol. 190, 439-453. 19 Jardetzky, O., Lane, A.N., Lef~vre, J-F. and Lichtarge, O. (1986) in NMR in the Life Sciences (E.M. Bradbury and C. Nicolini, eds.), pp. 49-72, Plenum Press, New York.

20 Olejnicjak, E.T., Gampe, R.T. and Fesik, S.W. (1986) J. Magn. Reson. 67, 28-41. 21 Keepers, J.W. and James, T.L. (1984) J. Magn. Reson. 57, 404-426. 22 Lef+vre, J-F., Lane, A.N. and Jardetzky, O. (1988) Biochemistry 27, 1086-1094. 23 Lane, A.N. (1989) Biochem. J. 259, 715-724. 24 Lane, A.N. and Forster, M.J. Eur. Biophys. (1989) J. 17, 221-232. 25 Lane, A.N. (1988) J. Magn. Reson. 78, 425 439. 26 Altona, C. and Sundaralingam, M. (1972) J. Am. Chem. Soc. 94, 8205 8212. 27 EMBO J. 8, 1 4 (1989). 28 Saenger, W. (1984) Principles of Nucleic Acid Structure. Springer, New Yrk. 29 Wagner, G. and Wtithrich, K. (1979) J. Magn. Reson 33, 675 680. 30 Ernst, R.R., Bodenhausen, G. and Wokaun, A. (1987) in Principles of Nuclear Magnetic Resonance in One and Two Dimensions. Clarendon Press, Oxford. 31 Bothner-By, A.A., Stevens, R.L., Lee, J.T., Warren, C.D. and Jeanloz, R.W. (1984) J. Am. Chem. Soc. 106, 811. 32 Abragam, A. (1978) in The Principles of Nuclear Magnetic Resonance. Clarendon Press, Oxford. 33 Lipari, G. and Szabo, A. (1982) J. Am. Chem. Soc. 104, 4546--4558. 34 Tropp, J. (1980) J. Chem. Phys. 72, 6035-6043. 35 Woessner, D.E. (1962) J. Chem. Phys. 37, 647-654. 36 Forster, M., Jones, C. and Mulloy, B. (1989) J. Mol. Graph. 7, 196-201. 37 Jardetzky, O. (1980) Biochim. Biophys. Acta 621, 227-232. 38 Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. (1986) in Numerical Recipes. Cambridge University Press, Cambridge. 39 Wi~thrich, K. (1986) NMR of Proteins and Nucleic Acids. Wiley, New York. 40 Birdsall, B.B., Birdsall, N.J.M., Feeney, J. and Thornton, J. (1975) J. Am. Chem. Soc, 97, 2845-2850. 41 Lane, A.N., Lef~vre, J-F. and Jardetzky, O. (1986) J. Magn. Reson. 66, 201-218. 42 Olson, W.K. and Sussman, J i . (1982) J. Am. Chem. Soc. 104, 270 278. 43 Olson, W.K. (1982) J. Am. Chem. Soc. 104, 278 286. 44 Dobson, C.M., Olejnicjak, E.T., Pulsen, F.M. and Ratcliffe, R.G. (1982) J. Magn. Reson. 48, 97 110. 45 Williamson, JR. and Boxer, S.G. (1989) Biochemistry 28, 2819 2831. 46 Calladine, C.R. (1982) J. Mol. Biol. 161, 343-352. 47 Dickerson, R.E. (1983) J. Mol. Biol. 166, 419 441. 48 Pardi, A., Hare, D.R. and Wang, C. (1988) Proc. Natl. Acad. Sci. 85, 8785-8789. 49 Fratini, A.V., Kopka, M.L., Drew, H.R. and Dickerson, R.E. (1982) J. Biol. Chem. 257, 14686-14707. 50 Clore, G.M., Gronenborn, A.M., Moss, D.S. and Tickle, l.J. (1985) J. Mol. Biol. 185, 219 226 51 Nilsson, L., Clore, G.M., Gronenborn, A.M., Briinger, A.T. and Karplus, M. (1986) J. Mol. Biol. 188, 455-475. 52 Patel, D.J., Shapiro, L. and Hare, D. (1987) Q. Rev. Biophys. 20, 35-112. 53 Patel, D.J., Kozlowski, S.A., Marky, L.A., Broka, Rice, J.A., ltakura, K. and Breislauer, K.J. (1982) Biochemistry 21,428-436. 54 Patel, D.J., Ikuta, S., Kozlowski, S. and Itakura, K. (1983) Proc. Natl. Acad. Sci. USA 80, 2184 2188. 55 Patel, D.J., Kozlowski, S.A. and Bhatt, R. (1983) Proc. Natl. Acad. Sci. USA 80, 3908-3912. 56 Nelson, H.C.M., Finch, J.T., Bonaventura, F.L. and Klug, A. (1987) Nature 330, 221 226.

Advances in the determination of nucleic acid conformational ensembles.

Measuring Residual Dipolar Couplings in Excited Conformational States of Nucleic Acids by CEST NMR Spectroscopy.

Conformational properties of oxytocin in dimethyl sulfoxide solution: NMR and restrained molecular dynamics studies.

Conformational studies of cyclic peptide structures in solution from 1H-Nmr data by distance geometry calculation and restrained energy minimization.

Determination of nucleic acid backbone conformation by 1H NMR.

NMR data-driven structure determination using NMR-I-TASSER in the CASD-NMR experiment.

Ring-current effects in the Nmr of nucleic acids: a graphical approach.

Conformational properties of dinucleoside monophosphates in solution: dipurines and dipyrimidines.

Microspectrophotometric determination of the relative contents of nucleic acids in Pinus silvestris pollen.

The fluorometric determination of nucleic acids in pea seeds by use of ethidium bromide complexes.

Capturing conformational States in proteins using sparse paramagnetic NMR data.

The metabolism of nucleic acids in mice.

Antisense and antigene properties of peptide nucleic acids.

Synthetic enkephalins. Addicting properties and conformational studies in solution.

Information content of long-range NMR data for the characterization of conformational heterogeneity.

Use of denaturing gradient gel electrophoresis to study conformational transitions in nucleic acids.

Design, synthesis and properties of artificial nucleic acids from (R)-4-amino-butane-1,3-diol.

Ethidium dimer: a new reagent for the fluorimetric determination of nucleic acids.

The AUDANA algorithm for automated protein 3D structure determination from NMR NOE data.

1H-NMR stereospecific assignments by conformational data-base searches.

Raman spectral studies of nucleic acids. XVII. Conformational structures of polyinosinic acid.

A 1H-NMR study of the solution conformation of cyclo(GRGDSPA): conformational effects on the physiological activity.

Conformational dynamics of Escherichia coli flavodoxins in apo- and holo-states by solution NMR spectroscopy.

Solution structures of human transforming growth factor alpha derived from 1H NMR data.