[23]

P R E D I C T I O N O F H I S T O N E O C T A M E R B I N D I N G SITES

387

partial flipping in longer stretches of alternating purine-pyrimidine residues has been previously observed at threshold negative superhelical densities.17 In our case with AN500, a second transition step is therefore expected when the energy content of the minicircles is increased by the introduction of additional ( - ) supercoils, that is, with topoisomer ( - 4, Z). H o w e v e r , in order to understand the described phenomena in more detail, chemical footprinting of the Z-stretch in different topoisomers, for example, Jis necessary. Acknowledgments We thank H. Schr6ter and K. Geider for protein samples and K. Meese for technical assistance. We are grateful to F. M. Pohl, University of Konstanz, for an ample supply of monoclonal Z-DI 1. The gift of plasmid pUC18M by M. Caserta is acknowledged. This work was funded by grants from the BMFT (BCT-0381-5and Forschungsschwerpunkt "Gpundlagender Bioprozesstechnik") and by the Fonds der Chemischen Industrie. t7 B. H . J o h n s t o n , W . O h a r a , a n d A . R i c h , J. Biol. Chem. 263, 4512 (1988).

[23] A l g o r i t h m s f o r P r e d i c t i o n o f H i s t o n e O c t a m e r Binding Sites By WILLIAM G. TURNELL and ANDREW A. TRAVERS Nucleosome Positioning The complex of the histone octamer with 145 base pairs (bp) of DNA, the core nucleosome particle, can be precisely positioned with respect to a given D N A sequence, yet, paradoxically, the octamer can associate with an immense variety of DNA sequences, both in vivo and in vitro. The resolution of this apparent paradox was first proposed by Trifonov and Sussman,l who suggested that the major determinant of nucleosome positioning was the bendability, or anisotropic flexibility, of DNA. Thus a DNA of particular sequence would bend most easily in a certain direction and so assume a preferred configuration. Subsequently it has been demonstrated that the direction of curvature of both a natural octamer binding site and a bacterial D N A sequence that has not been subject to the selective constraints of nucleosome formation is the same when the respective DNA molecules are bent in the absence of any protein and also when they are t E. N . T r i f o n o v a n d J. L . S u s s m a n , Proc. Natl. Acad. Sci. U.S.A. 77, 3816 (1980).

METHODS IN ENZYMOLOGY+ VOL. 212

Copyright © 1992 by Academic Press. Inc. All rights of reproduction in any form reserved.

388

INTERACTION OF D N A AND PROTEINS

[23]

reconstituted into core nucleosome particles. 2 The available evidence is thus wholly consistent with the notion that bendability is a determinant of nucleosome positioning. The anisotropic flexibility of a DNA molecule is a sequence-dependent property that is determined by the physicochemical characteristics of individual base steps. It is thus an intrinsic property of the DNA. This means that although precise binding sites for the histone octamer can be determined, such sites cannot be described or defined in terms of a consensus sequence of particular nucleotides such as would characterize the binding site of a sequence-specific DNA-binding protein. Instead, the positioning of a histone octamer on a particular DNA sequence can be described in terms of two parameters, rotational, which determines the orientation of the DNA molecule relative to the protein surface, or more accurately relative to the direction of curvature, and translational, which determines the preferred precise binding site of the histone octamer. 2 The average sequence organization of nucleosome core DNA has been derived from a collation of 177 sequences of cloned DNA molecules isolated from chicken erythrocyte core nucleosomes. 3 These sequences were aligned on and averaged about their midpoints. From this alignment it was shown that for these sequences the probability of occurrence of particular base steps at equivalent positions is nonrandom. Moreover, the positions of preferential occurrence of any such base steps occur periodically with an average period repeat over the entire octamer binding site of 10.2 bp. This number is close to the helical periodicity of DNA and arises because the wrapped DNA duplex tends to present the same side of the double helix toward the protein core of the nucleosome. It was consequentially inferred that these preferred occurrences represented the sequence dependence of the smooth anisotropic bending of DNA. In particular short A/Trich sequences are preferred where the minor groove points in toward the histone octamer, and conversely, where the minor groove points out, short G/C-rich sequences are preferred. 4 These sequence preferences and the periodicity of their occurrence constitute the essential determinants of rotational positioning. The sequence periodicity directly reflects the relative or local twist of the DNA on the surface of the protein, that is, it is the number of base pairs between, for example, successive outward facing minor grooves and is thus a property of the geometry of the complex. This parameter should be clearly distinguished from the intrinsic twist of the DNA molecule. The latter 2 H. R. Drew and A. A. Travers, J. Mol. Biol. 186, 773 (1985). 3 S. C. Satchwell, H. R. Drew, and A. A. Travers, J. Mol. Biol. 191, 659 (1986). 4 A. A. Travers and A. Klug, Philos. Trans. R. Soc. London B 317, 537 (1987).

[23]

389

PREDICTION OF HISTONE OCTAMER BINDING SITES Dyad

I

I

-40

I

I

0

I +40

out

I bp minor groove orientation

10.2 10.0

..-~

10.7

~

10.0

Average sequence periodicity (bp) FIG. 1. Sequence organization of n u c l e o s o m e core D N A . The rotational sequence periodicities, are from T r a v e r s and Klug. 4

quantity is a measure of the average rotation between successive base pairs measured about the local axis of the DNA double helix. 2'4'5 An essential characteristic of rotational signals is their regular periodic nature. By contrast translational signals would be expected to identify unique structural features in the path of the double helix around the histone octamer. It has been demonstrated experimentally that within the octamer binding site the major translational determinants are confined to the four central double-helical turns. 6'7 Within this region distinct sequence features have been identified. The sequence " d y a d " itself has a distinct pattern of sequence preferences, 8 whereas other features are asymmetrically arranged with respect to the midpoint of the binding site--the sequence dyad--and include an enhanced probability of occurrence of "flexi-DNA" sequences, 9 together with significant variations in the local sequence periodicity (Fig. 1). To date the significance of this sequence orgaIfization has not been tested experimentally. It should be emphasized that the general utility of positioning algorithms, particularly with respect to the prediction of "natural" positions, is potentially dependent on the provenance of the basis set of sequences from which the general sequence organization has been deduced. For example, the core nucleosomal DNA analyzed by Satchwell et al. 3 w a s isolated from core nucleosomes that were positioned under conditions of relatively high ionic strength. Consequently it cannot be assumed a p r i o r i that the parameters of periodicity and sequence preference which are 5 j. H. White and W. R. Bauer, Cell (Cambridge, Mass.) 55, 9 (1989). 6 p. C. Fitzgerald and R. T. Simpson, J. Biol. Chem. 260, 15318 (1985). 7 N. R a m s a y , J. Mol. Biol. 189, 179 (1986). s W. (3. Turnell, S. C. Satchwell, and A. A. Travers, FEBS Lett. 232, 263 (1988). 9 S. C. Satchwell and A. A. Travers, EMBO J. 8, 229 (1989),

390

INTERACTION OF DNA AND PROTEINS

[23]

characteristic of this set of D N A molecules are also characteristic of nucleosomes assembled in v i v o , although they should, and do, predict accurately the positions of nucleosomes reconstituted under similar conditions in vitro. ~° N e v e r t h e l e s s the dominant sequence periodicity (v = 1/10.2), and therefore the structural periodicity of the D N A contained in these particles, is identical to the periodicity of the probability of pyridimine dimer formation of nucleosomes in intact nuclei. IH2 Because this latter p a r a m e t e r is also structure-dependent it is reasonable, at least to a first approximation, to assume that the rotational signals are conserved. Algorithms for Nucleosome Positioning Methods for the prediction of preferred binding sites for the histone o c t a m e r on a defined D N A sequence are of two types. In statistically based algorithms the tested sequences are c o m p a r e d against the sequence organization of experimentally determined binding sites. 13't4 By contrast, for structurally based algorithms the preferred configuration of a sequence is calculated on the basis of known and inferred structures of individual base steps. T M This a p p r o a c h is essentially a calculation of preferred curvature. The direction of the local axis of the double helix will change w h e n e v e r the planes of adjacent base pairs are inclined relative to each other (Fig. 2). The assignment of this principal c o m p o n e n t of c u r v a t u r e - - t h e " w e d g e " angle of a base s t e p - - d e p e n d s heavily on the limited n u m b e r of available crystal structures of short D N A oligomers. This wedge angle has two c o m p o n e n t s , a relative roll, p, and a relative tilt, z. Of these two c o m p o n e n t s the relative roll is generally substantially larger and energetically more favorable. 17 Therefore in some algorithms ~5 r is justifiably ignored. Where these p a r a m e t e r s are not available from crystal structures they are either estimated from conformational energy calculations 16 or by reference to rotational sequence preferences, j5 An alternative and less direct a p p r o a c h is to calculate the wedge angles from the anomalous gel 10H. R. Drew, J. Mol. Biol. 219, 391 (1991). tl j. M. Gale, K. A. Nissen, and M. J. Smerdon, Proc. Natl. Acad. Sci. U.S.A. 84, 6644 (1987). 12j. R. Pehrson, Proc. Natl. Acad. Sci. U.S.A. 86, 9149 (1989). 13H. R. Drew and C. R. Calladine, J. Mol. Biol. 195, 143 (1987). i4 B. Pina, M. Truss, H. Ohlenbusch, J. Postma, and M. Beato, Nucleic Acids Res. 18, 6981 (199O). t5 C. R. Calladine and H. R. Drew, J. Mol. Biol. 192, 907 (1986). t6 D. Boffelli, P. De Santis, A. Palleschi, and M. Savino, Biophys. Chem. 39, 127 (1991). ~7R. E. Dickerson, M, L. Kopka, and P. Pjura, in "Biological Macromolecules and Assemblies" (F. A. Jurnak and A. McPherson, eds.), Vol. 2, p. 37. Wiley, New York, 1985.

[23]

PREDICTION OF HISTONE OCTAMER BINDING SITES

391

÷

FIG. 2. Bending of the double-helical axis of DNA. The direction of the helical axis changes at positions where adjacent base pairs are inclined along their short axes relative to each other. This inclination or relative roll, p, can be toward either the major ( + ) or minor ( - ) groove. The path of the minor groove is outlined and shows the effect of positive and negative roll angles separated by half a double-helical turn separately producing a bend in the same direction.

mobilities of a variety of DNA fragments of different sequence. 18,19These methods yield widely disparate values for particular individual base steps. However, it should be emphasized that for the calculation of the preferred configuration of a DNA molecule of mixed sequence the important parameter is not the absolute value for an individual base step but the relative differences between base steps. Put another way, the prefered direction of curvature of a particular sequence would be unchanged if the values of all the assigned roll angles were overestimated by, for example, a uniform + 10°. This approach necessarily assumes that the wedge angle is an intrinsic property of a base step and is largely independent of the sequence context. This assumption is clearly a simplification but is a useful approximation. A second general limitation of this approach is that only a single wedge value is assigned to each base step, whereas there is considerable evidence that certain base steps, notably YpR (where Y is a pyrimidine and R a purine) and also GpC, are bistable and can adopt alternative conformations with different wedge components. 2° However, because the rotational sequence preferences are redundant, in any 145-bp stretch of DNA of mixed sequence it is unlikely that there will be sufficient occurrences of a wrongly assigned wedge angle to influence the assignment of a rotational position 18 C. R. Calladine, H. R. Drew, and M. J. McCall, J. Mol. Biol. 201, 127 (1988). 19 A. Bolshoy, P. McNamara, R. E. Harrington, and E. N. Trifonov, Proc. Natl. Acad. Sci. U.S.A. 88, 2312 (1991). 2o C. R. Calladine and H. R, Drew, J. Mol. Biol. 178, 773 (1984).

392

INTERACTION OF D N A AND PROTEINS

[23]

TABLE 1 BASES FOR ALGORITHMS

Algorithm

Basis

Calladine and Drew (1986)"

Vectorial addition of roll angles for each base step; constant periodicity, phase change at dyad Vectorial addition of roll and tilt angles for each base step Probability of occurrence of base steps at given rotational orientation, constant periodicity; also translational parameters Same probability table as Drew and Calladine (1987)"

Boffelli et ul. (1990)b Drew and Calladine 0987)"

Pina et al. (1990)J

Comments Roll angles derived from crystal structures

Roll and tilt angles from energy Probability table calculated by reiterative procedure

Used to calculate rotational orientation for experimentally defined site

c, c. R. Calladine and H. R. Drew, J. Mol. Biol. 192, 907 (1986). b D. Boffelli, P. De Santis, A. Palleschi, and M. Savino, Biophys. Chem. 39, 127 (1991). c H. R. Drew and C. R. Calladine, J. Mol. Biol. 195, 143 (1987). d B. Pina, M. Truss, H. Ohlenbusch, J. Postma, and M. Beato, Nucleic Acids Res. 18, 6981 (1990). significantly. In o t h e r w o r d s , in this respect the algorithms are relatively robust. Statistical algorithms d e p e n d on the assignment o f a rank o r d e r o f the probability o f o c c u r r e n c e o f di- or trinucleotides at defined positions, usually c o r r e s p o n d i n g to rotational orientations. In the simplest case the rank o r d e r for rotational signals is d e t e r m i n e d by the amplitude o f the periodic m o d u l a t i o n o f o c c u r r e n c e s in a set o f aligned sequences.3 B e c a u s e the alignment o f s e q u e n c e d binding sites is inevitably imprecise, a t t e m p t s h a v e b e e n m a d e to refine the coefficients by reiterative p r o c e d u r e s w h i c h realign individual s e q u e n c e s within the set b y selecting for the strongest rotational signal with c o n s t a n t periodicity (e.g., Ref. 13). S u c h p r o c e d u r e s m a y , h o w e v e r , o b s c u r e translational information. T h e bases o f published algorithms are s u m m a r i z e d in Table I. Principles of Positioning Algorithms If we describe the D N A as a s e q u e n c e o f steps b e t w e e n s u c c e s s i v e bases in one dimension, the predictions can be m a d e either in real space

[23]

PREDICTION OF HISTONE OCTAMER BINDING SITES

393

or, via the Fourier transform, in reciprocal space. For example, Calladine and Drew ~5have shown that with a constant relative twist the corresponding roll angles, p, derived from structural models and from crystal structures determine the plane curve that a particular DNA sequence would adopt in space. If the left-handed supercoil of DNA around the histone octamer 2~ is approximated to an idealized plane curve, then a fit between the curve of DNA generated by probable roll angles p adopted by successive base steps and the ideal curve may be calculated. The closeness of fit is then a measure of the likelihood of a particular segment of DNA adopting the idealized shape for wrapping around the histone core. Calladine and Drew 15 calculated w

fitk

= ~ [(Pl - Ps)Z]j

(1)

j=l

Here the measure of fit is determined for each position k along the sequence of a moving window of W base steps. The fit is derived from the differences between the ideal DNA curvature described by the roll angle Pl at each of the steps j and the curvature Ps accessible to the equivalent steps in the natural sequence. The squared differences are summed over W for each position k of W, and the minimal values of the sum give predicted core positions. The predictions are thus calculated in real space. From this relatively straightforward derivation these predictions are largely consistent with known experimentally determined positions, but of limited accuracy. The inherent sequential and structural periodicity along the wrapped DNA of the nucleosome confers automatic advantages to a description of the sequence of base steps in one-dimensional reciprocal space. Calladine and Drew ~5modeled their ideal plane curve for DNA wrapped around the histones as

[PI]j

= (2 x 4.5 °) cos(27rj/lO.2) + 2°

(2)

We may generalize this as [Pt]j = $ cos(2~'jv) +

[Pl]w

(3)

Rearranging, we have 1

R ~ ~ = ([Plb - [Pl]w) cos(27rjv)

(4)

Equation (4) describes how the radius of curvature R of an ideally wrapped DNA segment is locally modulated by the helical periodicity of the duplex, 2t T. J. R i c h m o n d , J. T. Finch, B. R u s h t o n , D. Rhodes, and A. Klug, Nature (London) 311, 532 (1984).

394

INTERACTION Or DNA AND PROTEINS

[23]

as shown in Fig. 2. The equation is the real part of a Fourier transform that describes the distribution of Pt about a mean at intervals 1/t, within a window of W base steps. Expressing the path of DNA in this form enabled Calladine and Drew 15 to refine empirically their model set of roll angles Ol against a basis set of 177 aligned sequences from chicken erythrocyte nucleosomes. 3 Incorporating this refined model into their real-space fitting algorithm, and adopting a window of 145 base pairs, the authors were able to successfully predict rotational core positions in 75% of the 177 sequences that made up the basis set. The results indicated how realistic was the idealized model generated by [Pi]j in Eqs. (2)-(4). If the occurrence of a particular angle p~ is strongly periodic with a frequency v, then the intensity, I of the fluctuations in R will be relatively high: 2

Ik(v) = i~l ([Pllj -

[P~lw cos(21rjv)

(5)

for each position k of the window W along the sequence. Equation (5) describes a power spectrum of which each component v has an amplitude [Pl]j - [Pi]w at positions j along the sequence, together with a phase ju. Structurally, [Ik(v)] t/2 represents the magnitude of change in curvature of the DNA supercoil between successive positions j -2-_(l/v). The phase, 4~ = J~', represents the relative direction of curvature at j for the Fourier component of the spectrum with frequency v. Each component represents an idealized path through space of DNA whose sequence of bases has been rewritten as successive coefficients, [pi]j assigned to base steps j having a periodicity 1/u. The idealized model for the path of DNA around the histone core can be made more realistic by allowing for the sigmoidal shape of the supercoil that occurs in the vicinity of the structural dyad of the nucleosome? 1Only the first and last 60 base steps of the wrapped DNA are approximated by a plane curve generated by p~ with u = 1/10.2 and a constant phase angle. When the sequence organization is symmetrized about the sequence dyad, approximately 10 base steps on either side of the dyad are about 180° out of phase with the remainder of the nucleosome supercoilY Phase relationships between segments of the string of coefficients assigned to the DNA sequence can be incorporated into Eq. (5) as 2

lk(v)

= j=~ ([0~]2 -

[O--dwcos

where A~b is a relative shift in phase ~b =

jv

2~'([6 + "~4']i)

(6)

of component v at p o s i t i o n j .

[23]

P R E D I C T I O N OF H I S T O N E O C T A M E R B I N D I N G SITES

395

The improvement in their model around the structural dyad enabled Drew and Calladine ~3to predict, with moderate success, the translational positions of nucleosome cores along a piece of frog DNA that had not formed part of the basis set used to refine Pi empirically. The one-dimensional Fourier transform provides a formulation for more generalized predications of DNA wraps. For example, PI in Eq. (6) could be replaced by Ps and [Ik(v)]s compared with [Ik(u)]i to produce a calculation of fit in reciprocal space. Indeed, Eq. (6) may be generalized further: Ik(v ) =

~_~([CM.s] j --

[CM.s]w) exp{2rri [(jv) +

[A4,lj]}2

(7)

where CM.S are coefficients ascribed to base steps in either a model, M, or a sequence, S. The coefficients CM need not bear a structural interpretation, such as roll angle, but may, in principle, be derived statistically by comparison of sequences known to wrap around cores. Boffelli et el. 16 have built up a set of coefficients CM by correlating structurally derived base step twists as well as roll angles with the statistical data provided by the 177 sequences from Satchwell et al. 3 With u = 1/10.4 and W = 31 base steps, these authors were able to predict the position of the most stable nucleosome in the regulatory region of the SV40 gene 22 to ---2 rotational positions. Future Developments Predictions based on fits between spectra derived in reciprocal space by generalized expressions such as Eq. (7) offer scope for variation on and extension of the work by previous authors already described. We list here three main aspects. 1. The global shape described by a single component v of Eq. (7) is unlikely to be sufficiently accurate to describe the path of DNA wrapped around the histone core, and especially the set of conformations that a particular sequence could adopt in practice. With the generalized form of Eq. (2), as in Eq. (7), simultaneous comparisons of several components of the power spectrum generated by CM becomes trivial. Expectation of improvements in predications resulting from multicomponent fitting is tantamount to saying that the necessary assumption held in previous work that [Cs] j is independent o f j - Aj (i.e., that the assignment of coefficients is independent of the context of each base step) is unlikely to hold true 22C. Ambrose, A. Rajadhyaksha,H. Lowman,and M. Bina, J. Mol. Biol. 209, 255 (1989).

396

INTERACTIONOF DNA AND PROTEINS

[23]

beyond a first approximation. In particular, information for translational positioning will come from Fourier components with u 4, 1/10. Rotational information predominates for components v ~ 1/10. 2. A prefixed phase relation, 2~4~,between different parts of the DNA wrap is also unlikely to be sufficiently accurate for reliable predictions. In Eq. (7) the phase term is not only a function of frequency v but also of j, and therefore of k. In practice this does not introduce as much noise as might be expected provided that the moving window is long (W >> 1/v), as adopted by previous authors. Furthermore, a certain amount of smoothing is introduced by ignoring the dependence of v on W. This dependency is described as follows. For W >> 1/v, [Cs]j may be regarded as a smooth function ofj. More realistically, the DNA sequence of base steps is better represented by a string of discrete coefficients [Cs]j in which successive values of j are separated by increments in phase of 2 r r / N , where N equals W - 1. The Fourier transform of such a string of W coefficients has components given by (8)

u = n/N

where n is an integer. Moreover, all the information contained in these components will be obtained by sampling 4~ by a set of discrete phases [cb],, such that 4) = j u = j n / N

= [4~],,

(9)

where integer n ranges from zero to N - 1. Thus, spectra [Ik(v)],, may be calculated. This is illustrated in Fig. 3. For each k and v, the dependence of Ik(v) on n is due to local deviations of the DNA wrap from a plane curve supercoil, and these discontinuities provide translational information. Fourier transformation is often used to calculate sequence periodicities from an aligned set of sequenced binding sites. 3The validity of the periodicities so obtained depends crucially on both the average amplitude and regularity of the periodic modulations in sequence as well as on the limitations inherent in the method itself. We have shown how the sampling theorem limits the selection of components that contribute to the power spectrum calculated from a string of discrete coefficients. It follows from Eq. (8) that short window lengths permit few components, thus severely limiting the obtainable resolution of periodicities. This problem can be circumvented by application of pseudowindows, p times longer than N coefficients. A pseudowindow (Fig. 4) contains ( p N ) coefficients of magnitude [fCs]~, where f~.k = - 1 {[Cslw + cos 2~r([j'@)}

(10)

[23]

397

PREDICTION OF HISTONE OCTAMER BINDING SITES

n=l

g

n=5

I

8

"a "E

--9 DNA sequence

k+ 15

I

w

JI

I

w

I

FIG. 3. A D N A s e q u e n c e is depicted as a horizontal string o f j coefficients. The relative magnitudes of the coefficients are represented by vertical arrows. Two positions k and k + 15 of a window W are s h o w n as dashed boxes. At each position the contribution to a c o m p o n e n t u of the power s p e c t r u m [lk(v)],, is calculated from Eqs. (7) and (9) (see text). Here W = 7 coefficients, and v = ~ [from Eq. (8)]. At both these positions of the window the contribution to Ik(v) would be relatively high, whereas intervening coefficients would contribute less.

Here, jp = 1 to (pN); u~ = l/2(pN). The effect of this operation is to raise a positive cosine wave of periodicity Upto the mean value of Cs,j for values o f j within the window W, and thence to multiply Csa by the part of the cosine wavefj. This is the portion of the cosine function that corresponds to the real window as shown in

~" . . . . . .

'

--fj,k

J

k I

w

Real

I

I

2N

I

DNA sequence

window

I

I

(p.N)

2N

I

FIG. 4. T h e creation of a p s e u d o w i n d o w pN from Eq. (10) is s h o w n for a position k of a real window W. The coefficients within the real window (vertical arrows) are to be multiplied by the curve ~k. Here W = 5 and p = 5, so pN = 20 (see text). The resulting permitted c o m p o n e n t s up are listed in Table II.

398

INTERACTION ov DNA AND PROTEINS

[23]

TABLE I1 PERIODICITIESPERMITTEDWITH PSEUDOWINDOWOF LENGTHpN" P 1

3

5

7

9

20.00h

20.00

20.00

20.00

20.00 18.00

17.50 16.67 15.00

10.00

12.00

14.29 12.50 11.11

10.00

10.00

10.00

10.00

9.33

9.47 9.00 8.57 8.18 7.83 7.50 7.20 6.92

9.09 8.57

7.14

8.57 8.24 7.78 7.37 7.00

6.67

6.67

6.67

6.25

6.36 6.09

6.21 6.00

8.33 7.69 7.50

6.67

16.36 15.00 13.85 12.00 11.25 10.59

15.56 14.00 12.73 11.67 10.77

6.67

6.00

a For a moving window of 22 base pairs: W = 21 base steps, W = 21 coefficients; N = 20. b Periodicities, 1/v are displayed in columns. Those values independent of p are boxed in rows.

T a b l e II. P e r m i t t e d v a l u e s of l/v r e s u l t i n g f r o m the r e p l a c e m e n t of N b y p N in E q . (8) are s h o w n in T a b l e II. 3. A d v a n t a g e s o f this a p p r o a c h are as follows: (a) L o n g - r a n g e f e a t u r e s o f the D N A w r a p (i.e., the o v e r a l l s h a p e of the supercoil) n e e d n o t be a n t i c i p a t e d b e c a u s e , as Fig. 3 s h o w s , the p h a s e r e l a t i o n s h i p s b e t w e e n different s e g m e n t s of coefficients [Cs]j are d e t e r m i n e d e m p i r i c a l l y . (b) A single s e q u e n c e , e i t h e r f r o m or i n d e p e n d e n t o f a basis set, b u t w h i c h is k n o w n f r o m e x p e r i m e n t to w r a p a r o u n d the h i s t o n e core m a y be a s s i g n e d

[23]

PREDICTION OF HISTONE OCTAMER BINDING SITES

399

coefficients. Components of the resulting power spectrum can then be fitted empirically to the equivalent components from a sequence of unknown properties, and predictions made. The only assumption inherent in this operation is the nature of the coefficients assigned to both sequences. However, even these may be empirical, being derived statistically from a basis set of aligned experimental sequences. No physical or topological assumptions need underline the set of coefficients. (c) Crystallography has shown that the Fourier synthesis of an accurate image of an object depends more on the faithful reproduction of phase relationships between Fourier components than of their respective amplitudes. Analogously, accurate knowledge of the relative magnitudes of coefficients assigned to different base steps should prove to be of less importance than retention of the phase relations between different segments of the string of coefficients, as illustrated in Fig. 3. General Applications In principle, generalized algorithms for nucleosome positioning may also be applied to determine potential DNA binding sites for other protein complexes whose selectivity depends on the utilization of structural properties of the DNA double helix. The principal constraint for most algorithms is a requirement for a regular sequence periodicity. This condition is met by the Escherichia coli DNA gyrase but not by the bacterial RNA polymerase. In the latter case the rotational signals in promoter sites are not sufficiently regular to be accurately averaged by the assignment of a constant periodicity. Nevertheless, in such examples, the derivation of the power spectrum of the sequence is potentially of considerable utility. This is also particularly relevant where structural changes in a nucleoprotein complex may change the structural constraints on the bound DNA and hence the pattern of sequence preferences. In this context the transition fi'om the nucleosome core particle to the chromatosome is a pertinent example. We would also point out that structural periodicities reflected by sequence periodicities are also characteristic of certain proteins; the coiled coils present in myosin and keratins 23'24 are notable examples. Consequently the same algorithms that are used for nucleosome positioning could with appropriate modification be used for analysis of protein sequences.

23 A. D,. M c L a c h l a n and J. Karn, J. Mol. Biol. 164, 695 (1983). 24 A. D. M c L a c h l a n , J. Mol. Biol. 124, 297 (1978).

Algorithms for prediction of histone octamer binding sites.

[23] P R E D I C T I O N O F H I S T O N E O C T A M E R B I N D I N G SITES 387 partial flipping in longer stretches of alternating purine-pyrimid...
631KB Sizes 0 Downloads 0 Views