233

Biochimica et Biophysica Acta, 1076(1991)233-238

©1991ElsevierSciencePubfishersB.V.(BiomedicalDivision)0167-4838/91/$03.50 ADONIS 0167483891000828 BBAPRO 33828

The C-domain in the H1 histone is structurally conserved Dennis L. M a e d e r 1 and Lothar B o h m 2 : Chtomatin Research Centre, Department of Biochemistry, Universityof Cape Town, Rondeboschand ' Radiobiolo~ laboratory, Department of Radiotherapy, Faculty of Medicine, Unwersio,of Stellenbosch, Tygerberg(Republic of South Afri,=a)

(Received16 May1990) Keywords: HistoneHI; C-domain;Composition;Residuespacing:Conservation;Helicity;Sequenceanalysis The C-domain of HI is conserved in composition and not in sequence. The following regularities have been identif'ml: the distribution of iysine, alanine and proline is non.random; alanine occurs in doublets and at intervals of 4-6 significandy more often than expected for random sequences of equal composition; and lysine also deviates from random distribution in that doublets are under-represented and intervals of 2-7 are over-represented. Lysine preferentiaily occurs in singlets and alanine in doublets rather than triplets or quadruplets. This discourages the lormltion of helices without ueutralizatlon of lysine charges. When lysine residues are paired with DNA phosphate residues, helices are highly probable, lnterproline spacing promotes short helical segments. The regularities arising from the conservation of composition and non.random residue distribution suggests that C-domains adopt similar structures and in fact are stmctmally conserved.

lnlrodnction Histone H1 consists of a central globular domain called G-H1 and 2 flanking domains N-H1 and C-H1 which comprise the N- and C-terminal residues. The three domains separate the H1 molecule into conserved and non-conserved regions which differ chemically, structurally and functionally (reviewed in Ref, 1). That different H1 regions of the HI molecule indeed perform different functions in chromatin has been shown by reconstituting H1 depleted chromatin with H1 peptides. These experiments indicated that G-H1 protects 167 bp of DNA and delineates two full turns of superhelical DNA [2]. Use of overlapping peptides furthermore established that C-H1 facilitates chromatin condensation wlfile N-H1 plays a role in correctly locating G-H1 on the nucleosome [2,3]. The details of H1 function, in particular the condensation mechanisms, the location of the two flanking domains, their mode of binding and structure in chromatin, are not yet clear. The C-domain of H1 is by far the largest subregion and shows great variability of primary structure [1]. Since condensation is a universal requirement of the dynamics of chromatin in the cell cycle it is remarkable that the condensing function Correspondence:L. Bohm,RadiobiologyLaboratory,Departmentof Radiotherapy,Facultyof Medicine,Universityof Stellenbosch,P.O, Box63, Tygerber,~,7505, Repubficof SouthAfrica.

should be performed by a wide variety of different protein structures. Free in solution, C-HI has a random coil structure [4]. The direct examination of the in situ structure of C-HI in chromatin has not yet been feasible, but secondary structure predictions under conditions of charge neutralization indicate that helices form when C-H1 binds to DNA (5-9). In the following we assess the regularities of the charge-induced helices and show that they are similar in size and in number. C-domains closely resemble each other in terms of amino acid composition, residue distribution and occurrence of peptide repeats Depending upon origin of the HI molecule, C-HI comprises approx. 100 residues which are dominated by lysine (40 mol~) alanine (30 mol~) and proline 12 mol~). Threcnine and serine are minor constituents (3-5 mol~) followed by glycine, ratine and isoleucine. Acidic and aromatic amino acids are absent. The C-domain contains 66~ of all the lysines in H1 and is the most basic region of the molecule. The composition of C-H1 shows a certain resemblance to N-H1, but the distribution and sequential spacing of residues is different. When the N-domain consists of a cluster of basic residues and a region rich in proline and alanine [1,10], no such segregation is apparent in the C-domain. The distribution of the three major constituents, alanine, lysine and proline, nevertheless is non-random. Inspect-

234 DISTRIBUTION OF RESIDUES IN C~TERMINUS OF HI ~

C-HI

c~L~._~ ~i ~ :

H5

.............

..... ~.=~;.~,°~ i~-i~-_~-~-" ~i o: ~---. c-as chi~k ,o, ~-_ ~_: : :....... : : - : : : .: Caenoz:.

[

e

. . . . . . . . . . . . .

Goose

.t.

c-;I1

~iCk

l

.

.

.

.

.

.

.

.

.

.

.

.

.

+

O÷OPa r e e h t . n u a

s~o~z.

C-H5

7,e : : ' : 2 2 : , . _ . 2 2

n

Ch J.ck GQout~

C-HI

f-----I

Chick

-

+



!:: -

, 0,o0 Pa=ech~.nua

g

-

C

~ ®÷ ; ..... ,, .... °* Iiii"i ..... .....

c~. . . . . . . C-I'IS

Chick

+

-



Goose

+ -



-



+

-

0

(l .

+

-

Q 0

-



. . . . .

0 @ . . . .

Fig. 1. Spacingof successivealanine, lysine and proline residuesin the C-domain of histories HI and H5 from various organisms. The distance between each residue in each sequence was recorded. By exchanging each residue with a randomly selected residue from the same sequence new sequences of identical composition were generated. 1000 such sequences served as a statistical sample against which the interval count of the various sequences was compared. The direclion of deviation of the interval count from random is indicated by + and - . Deviations exceeding one standard deviation are circled. Areas of preferred intervals are boxed showing that non-random intervals are preferred. Alanine shows short range clustering and preferentially occurs in pairs. The interval in lysine never exceeds 7. Successive prolincs are spaced further apart than expected for a random distribution. H1 sequences used were: -~alf thymus CTL-1 119], rabbit thymus RTL-3 (Cole, R,D., unpublished data) [19], chicken 120]. Xenopus [21], Perechinus [22,23], Stron~locentroms [24] and Caenorhabidtis [25l. H5 sequences were: chicken [17] and goose [18].

matrixes of HI show that short sequential motifs are ~pe_zt~ and dominate the C-domain (Fig. 2)+ This is indicated by the multiplicity of lines parallel to the diagonal (Fig. 2). The 10 most common tripeptides in the 9 H1 sequences and their frequencies are found to be as follows: AAK (60 × ). AKK (57 × ), KKA (54 X ), KAA (44 x ), PKK (43 × ), KPK (42 × ), KAK (36 x ), KKP (31 x ), AAA (28 × ) and AKP (28 × ). Fig. 2 shows that the identity between C-H1 (calf thymus) and C-H5 (chicken erythroctye) is poor and that the incidence of repeated C-terminal motifs is markedly higher between H1 species in general and between species which are close on the evolutionary scale, e.g., calf thymus, rabbit and Xenopus, respectively. The continuous 45 ° diagonal indicates sequence identity which predominates in the G-domain [1,10] and in the case of closely related species also in the N-domain as shown [9]. The repe~.i'.ion of short sequential motifs is also seen in the randomized C-terminal sequence (Fig. 2). This indicates that the high abundance of Lys, Aia and Pro alone induces certain sequential homologies. These homologies are seen at tripeptide (Fig. 2) and at tetrapeptide level. Amongst the nine sequences listed in Table I, tetrapeptides containing alanine and 1, 2 or 3 lysines are repeated 146 times, whereas tetrapeptides containing a single proline residue recur 16-76 times, depending on the position of the proline. Van Vleteren et al. [25] identify the multiple repeat XS(T)PX where X = K or A which recurs 30 times in the nine H1 sequences (Table I). More recent work of Suzuki assigns

TABLE 1

Number of predicted helices a and number of residues per helical segment in the C-domains of histones HI and H5 Organism b

Residues in C-domain ~

Number Number of residues of per helical d segment

total hefical helices

ion of available sequences shows that alanine occurs in doublets (interval of 1) and at intervals of 4-6 significantly more oftep, than expected for random sequences of identical composition (Fig. 1A). Lysine also deviates from a random distribution in that doublets are underrepresented and intervals of 2-7 are over-represented (Fig. 113). Proline always occurs singly and at preferred intervals of 4-7 and 12-15 (Fig. 1C). Successive prolines are spaced further apart than expected. The low tendency of lysine clustering and preference for alanine doublets rather than triplets discourages the formation of helices without charge neutralization. The spacing of pro!L-.~, shows that helices can never be very long. A consequence of the high abundance of lysine, alanine and proline is that peptides containing these three amino acids are very frequent. Tripeptide identity

H1 bovine CTL-I rabbit RTL-I chicken Xenopus trout • Drosophila t Parechinus

Srron~loc Caenorhab. H~ chicken goose

107 119 It3 114 99 142 138 11S 100

62 61 72 77 74 91 95 70 74

10 9 10 9 11 9 7 7 13

97 100

35 25

5 4

8,6,10,7,5,6,5.6A,5 8,13,5,4.7,6,5A,9 5.8,4,6,8,4.11.7,9.10 22,6,8.4,5,4,6,9,13 11.6,9,5,I 3.5,5,7.4,5,4 8,21.4,24,7,6,6,1L4 4,57,11,5,6.6,6 29,5.6,11,4,83 8,5,5,7,4,4,5,6,4,8,5,5,8 4A,8,5,14 8.4.8,5

According to Chou and Fasman [28]. b Source of sequences given in Fig. I. Using the conserved pbenylalanine residue as N-term boundary. d From N- to C-terminus of domain. • Reference 30. f Reference 31.

23~

Calf thymus CTL-1

Calf thymusCTL-1 C-domain randomized

Rabbit thymus RTL-3 ,.: •

200

,

.

~- JsF .v ;,,/C: • "

~50

~

i ¢ ~'/

. '~

The C-domain in the H1 histone is structurally conserved.

The C-domain of H1 is conserved in composition and not in sequence. The following regularities have been identified: the distribution of lysine, alani...
466KB Sizes 0 Downloads 0 Views