J. them. Biol. (1975) !%,25-33

Automatic Comparison of the Sequences of calf Thymus Histones P. A. TEMUSSI Istituto Chimico, University of Naples, Via Mezzocannone 4, 80134 Naples, Italy (Received 20 March 1974, and in revised form 3 1 May 1974) The sequences of five histones have been compared pairwise by means of an automatic computer-assisted procedure. All pairs are significantly

similar beyond chance, pointing to a common origin for all histones. All features of the comparisons are consistent with spectroscopic data on histones’ structure and interactions. 1. Introduction It is well known that the genetic material of higher organisms is always closely associated with a family of basic proteins called histones. Their function is as yet unknown but there is little doubt on their importance since they constitute approximately half of the chromosomic material by weight. Two hypotheses have been formulated on their role. One is that they are directly involved in control mechanisms of genetic regulation much in the way enzymic proteins would behave (Stedman & Stedman, 1950). The second hypothesis is that histones control the structure of DNA (Bradbury & Rattle, 1972), being responsible, for instance, of the very large contraction of DNA during mitosis and of the conformational changes that might regulate the detailed functioning df DNA in different times of the cell cycle. The possibilities offered by the various conformations of five different histones are probably sufEcient to account for the complex behaviour of chromatin. Early in evolution the structural function was likely to be much simpler and it is conceivable that it was performed by a single protein. A proof of a common origin for all histones would thus substantiate the hypothesis of a structural role. If all histones are indeed modifications of a single protein this must be reflected by non-casual similarities of their sequences. A careful pairwise comparison of histone sequences may then prove their common origin if every sequence is found to be similar to the others beyond change. A comparison of this type has in fact already been performed by Phillips (1971) on calf thymus histones but the method of visual comparison

26

P.

A.

TEMUSSI

employed by this author is not accurate enough; besides, at the time of this first comparison, substantial parts of the sequences were either not available or incorrect, while now only the sequence of histone F, is incompletely known. This paper reports the results of an extensive comparison of the sequences of calf thymus histones by means of an automatic computer method. 2. Methods Most of the methods described for the automatic comparison of protein sequences are based on codon mutational distance (Cantor & Jukes, 1966; Fitch, 1966a,b). In order to compare many sequences rapidly it is probably more convenient a method, developed by Haber & Koshland (1970), based on the direct comparison of the amino acid residues according to a predetermined set of homologies. The residues corresponding to naturally occurring amino acids are grouped according to criteria such as conventional chemical similarities or, say, helix-forming tendency; each residue is then assigned a number so that differences between numbers, of homologous residues, will fall within an appropriate cut-off numerical value. In comparing two sequences, each residue is given the numerical value characteristic of the chosen homology set and the two chains are compared locus by locus. The total number of identities is found if the computer is asked to record zero differences, and the total number of homologies can be recorded by identifying differences less than the cut-off value. In the actual program written for the present work provisions were made to slide the two chains so that a whole range of different lengths could be compared; in each case the number of identities and homologies expected by chance were calculated from the frequencies of the residue in the segments compared. The programme was written in FORTRAN IV for an IBM 360/44 computer. Six different homology sets were used in all comparisons; the corresponding numerical values for each residue are reported in Table 1. Set I is based on conventional chemical similarities of the side-chains (Haber & Koshland, 1970); set II is based on the homologous substitutions used by Smith & Margoliash (1964); set III is based on the catalogue of conservative substitutions in haemoglobins recorded by Zuckerkandl & Pauling (1965); set IV reflects the intrinsic ability of residues to form or to break a helical segment, as estimated by Lewis & Scheraga (1971); set V groups the residues simply on the basis of their hydrophobic or hydrophilic character (Manwell, 1967); finally set VI is based on the tendencies of the residues to form helices, /I structures or loops, as calculated by Nagano (1973).

COMPARISON

OF HISTONE

27

SEQUENCES

TABLE 1 Numerical values assigned to residues according to the six homology sets used in the comparisons Amino acid

set1

setn

setm

f&Iv

setv

SetvI

1

21

1

323

:

351

i 1:

5 I

‘: 4

: 13 14 15

13 20 21 28

Aa

Ala ‘31~

S T V I L

z YZll Ile LCU

12 13

10 11

8 ::

126 28 30 35

PM W

Met Phe Trp

18 14 19

14 12 16

13 15

19 34 33

16 17 18

30 29 39

0H R

3 Arg

20 z

f17:

16 19 21

31 14 17

265 27

356 6

E

LYS Glu

30 26

26

23 22

36 13

28 29

38 37

:

ASP

::

2

ii

31 30

2i

N C P

zs!t 2f

zi 44

;Ti z: 2

26 30 34

1: 1

326 19

233 4

3

8

6

11

cut-off value

3

3

3. Resulta All pairs of histones were compared in such a way that the minimum length examined contained a number of residues not smaller than two-thirds of those of the shortest chain. A total of 2856 comparisons were accordingly computed. As an example of the kind of results obtained, the best comparisons of the pairs F2AI/F2A2, F,A,/F,B, and F,/F, are reported in Figs 1,2 and 3. The most significant comparisons, for every pair, in terms of the ratio between the number of identities found and that expected by chance, are reported in Table 2. In this table are also shown the numbers of homologies for the six sets used; the numbers of identities and homologies expected by chance are given in brackets after each of the figures found in the computed comparisons. It is easy to see that, for all pairs of histones, the number of identities found is always greater than that expected by chance. In fact for some pairs

28

P.

A.

TEMUSSI

TABLE

Best comparisons First residues compared

Number of identities

Es&

G(4) S(l)

FA FaB

Histones

2

computed for each histone pair?

set1

Set II

w3)

33(18)

38(20)

S(1) A(9)

1 X7)

32(18)

36(21)

FA F3

SW A(7)

16(7)

30(17)

Es& Fl

Xl)

PAI

Number of homologies Set III Set IV

Set V

Set VI

36(32)

50(33)

47(39)

44(29)

45(36)

55(35)

48(40)

32(20)

39(29)

45(37)

44(35)

41(41)

48(28)

SW) S(1) P(8) S(l) WO)

W9

23(21)

27(23)

38(32)

37(36)

39(35)

41(39)

14(8)

26(18)

28(22)

39(31)

49(39)

44(38)

56(45)

W’)

28(21)

30(24)

33(34)

55(46)

43(41)

62(51)

S

K(9) SW

17(10)

23(20)

29(23)

42(33)

48(41)

49(38)

55(49)

SB F3

A(4) AU)

13(8)

24(19)

31(24)

40(34)

44(43)

45(41)

53(48)

FaB Fl

A(9) S(l)

18(10)

27(18)

34(23)

43(33)

52(38)

49(38)

50(45)

F3 Fl

41) A(3)

19(11)

30(23)

32(28)

50(40)

48(48)

52(46)

57(56)

I%& FaB FaAa F3

FaAa

i Numbers

in parentheses

are those expected

by chance.

the number of identities is more than 100% greater than that expected by chance. If the comparison is limited to those segments that from NMR and theoretical predictions (Bradbury & Rattle, 1972) are known to interact with DNA the similarity is even higher. For instance it can be seen from Fig. 1 that the comparison of segments l-49 of F,A, and 146 of F,A, comprises 12 identities and 7 homologies, i.e. over 40% of the residues compared are similar. Even more significant perhaps is the distribution of the best results (among the comparisons): only very few comparisons have a number of identities close to that shown in Table 2, while most of them have numbers comparable to those expected by chance. Besides, nearly all the best comparisons correspond to an alignment of the two chains starting with one of the first residues of each protein. It is also interesting to note the very high numbers of homologies found for some of the sets.

129

AQGG,VLPNI

V T 0 T E H A G N A A R D N

T>:T lC&I

QAVLLPKKTES

A MDrVfV P K.HkL;Q

0 L

120

Fro. 1. Comparison of histones FIAl and FaAa. The sequence of FaAl is that reported by De Lange, Fambrough, Smith & Banner (1969), that of FaAl was taken from Yeoman, Olson, Sugano, Jordan, Taylor, Starbuck 8z Bush (1972). The homologies are thosecomputed with set I.

HHKAKGK

1

K G G[p-qGdgG G S K K&-V_lT

120

125

60 S G L I 0 E E T R G VrL;K V F L S S KAMGI MNS F;V,Ni31 60

102 G rd H;A;VSEGTKAVTKOTSSK

I clI

1 S G$G P E P A K S A P A P,KJK

R Q G R T L Or&F L L P G E LIA_(K

KLA,Q

An K K

FIG. 2. Comparison of histones FzAl and F2B. The sequence of FaB was taken from Iwai, Ishikawa & Hayaski (1970) plus a few corrections from a personal communication from Iwai (1973). The homologies are those computed with set I.

L% F.3

Fz& W

F2A1 F2B

FA, FaB

ALAJKLEJP

A A A K K P K K V A A KltOA

V A A 154

FIG. 3. Comparison of hiitonea F3 and F$. The sequence of I$ is that reported by De Lange, Hooper & Smith (1972), that of F, was taken from Rail & Cole (1971) and from Cole (1973). The homologies are those computed with set I.

Fl

F3

F3 S

F3 FI

F3 S

32

P.

A.

TEMUSSI

Most of the pairs are best compared in terms of conventional chemical similarity of the side chains but fairly high ratios between the number of homologies found and those expected by chance are also computed for the sets based on homologous and conservative substitutions. The figures computed for set IV, i.e. that based on helix-forming ability, show a peculiar trend, in very good agreement with experimental results. Both ORD and NMR results (Bradbury, Crane-Robinson, Goldman, Rattle & Stephens, 1967; Boublik, Bradbury & Crane-Robinson, 1970; Li, Wickett, Craig & Isenberg, 1972) indicate for histone F1, in various experimental conditions, lower helical contents than for other histones; it is reassuring to find that all pairs containing F, have rather low ratios between computed homologies and those expected by chance for set IV. Another interesting result is that all pairs containing histone F,A1 give the highest ratios between the identities found and expected; in particular the two most similar sequences are those of F2A, and F,A2. Studies of molecular weight and self association (Diggle & Peacocke, 1971) show that F,A, is the most strongly associating of the histones, e.g. it is precipitated from dilute solution by NaCl concentration as small as 0.2 M (Boublik et al., 1970), whereas other histones can stand salt molarities higher than 2-O M. Solutions containing other histones besides F2A, behave in a very different way, suggesting the existence of specific interactions between F,A, and other histones. Mixture of F2A, and F,A,, for instance, remain in solution with NaCl concentrations up to l-5 M (Bradbury, 1973). The great similarity we find between the sequences of these two histones is perhaps connected with their experimental behaviour. It may be mentioned that the singularity of F2A, is also supported by an independent comparison with non-histone proteins performed by Bauer (1971; 1972). This author showed that histone F,A1 contains an ancestral dodecapeptide closely homologous with partial sequences of a variety of non-histone proteins, including protamines, cytochrome c, ferredoxius, immunoglobulin L-chains and human encephalitogenic protein. 4. Conclusion

The results of our pairwise comparison of histones F2A1, F,AI, F2B, F, and F, indicate that all sequences are significantly similar beyond chance. The hypothesis of a common origin of all histones and, p&haps, of the simple structural role of the ancestral histone, is thus substantiated. The great similarity of F,A,, F2Az and F,B may indicate a first differentiation of the ancestral histone into three proteins, and specialization into five proteins only at a later stage.

COMPARISON

OF

HISTONE

SEQUENCES

33

The most significant result is the agreement between experimental spectroscopic data and sequence comparisons in the cases of histones FI and F2A,. This points to a possible use of these comparisons for planning and rationalizing future experimental work since the very detailed information furnished by high field NMR spectroscopy can be correlated with the structure of even very short chain segments. REFERENCES BAUER, K. (1971). Int. J. Protein Res.3, 165. BAIJER, K. (1972). Biocirem. J. l-1245. BOUBLIK, M., BRADBURY, E. M. & CRANE-ROBINSON, C. (1970). Eur. J. B&hem. 14,486. BMDBURY, E. M. (1973). Personal communication. BRADBuRY, E. M. & RA’ITLE, H. W. E. (1972). Eur. J. B&hem. 27,270. BRADBURY, E. M., CRANE-ROBINSON, C., GOLDMAN, H., Rmm., H. W. E. & STEPHENS, R. M. (1967). Eur. J. Biochem. 29,507. CANToR, C. R. & JUKFS, T. H. (1966). Proc. nata Acud.Sci. U.S.A.56,177. COLE, R. D. (1973). Pemonal communication. DE LANGE, R. L., FAMBROUGH, D. M., Smm, E. L. & BANNER, J. (1969). J. 6xX. Chem. 224,319. DE LANGE, R. L., HOOPER, J. A. & SMITH, E. L. (1972). Pm. natn. Ad. Sci. U.S.A.

69,882.

DIPOLE, J. H. & PEACOCKE, A. R. (1971). FEBSLett. 18,138. Fmxr, W. M. (1966a). J. molec. Biol. 16, 1. FITCH, W. M. (19666). J. moIec.Biol. 16,9. HABER, J. E. & K~SHLAND JR, D. E. (1970). J. mlec. Bill. 50,617. IWAI, K. (1973). Personal commuai~tion. IWAI, K., &IIKAWA, K. & HAYAEKI, H. (1970).Nature,Land.2-1056. LEWIS, P. N. & S-GA, H. A. (1971). Archs.Biochem. Bbphys.144,576. LI, H. J., WICKJXT, R., CRAIG, A. M. & ISENBERG, I. (1972). Biopolymers 11,375. MANWELL, C. (1967). J. camp.Biochem.Physiol.23,383. NAGANO, K. (1973). J. molec.Biol. 75,401. PHILLIPS, D. M. P. (1971). In Histones and Nucleohljtones (D. M. P. Phillips, ed.) New York: Plenum Press. RALL, S. C. & COLE, R. D. (1971). J. b&l. Chem. 246,7175. SMITH, E. L. & MARGOLIASH, E. (1964). Fe& Proc. Fe& Am. Sotsexp. Biol. 23, 1243. STEDMAN, E. & STEDMAN, E. (1950). Nature,L.ond.166,780. YEOMAN, L. C., OLSON, M. 0. J., SUGANO, N., JORDAN, J. J., TA~OR, C. W., STARBUCK, W. C. &BUSH, H. (1972). J. biol. Chem. 247,6018. ZUCKERKANDL, E. & PAULINO, L. (1965). In EvolvingGenes andProteins(V. Brysou & H. J. Vogel, eds). New York: Academic Press.

Automatic comparison of the sequences of calf thymus histones.

J. them. Biol. (1975) !%,25-33 Automatic Comparison of the Sequences of calf Thymus Histones P. A. TEMUSSI Istituto Chimico, University of Naples, Vi...
429KB Sizes 0 Downloads 0 Views