J. Mol. Biol. (1975) 98, 739-747

Some New Methods and General Results of Analysis of Protein Crystallographic Structural Data# R. S R n ~ V A S ~ , R. BAT+A.qUBSAMA~ ~-~V S. S. R A J ~

Centre of Advanced Study in Physics University of Madras, Madras 600025, India (Received 22 May 1975, and in revised form 14 July 1975) Some methods of analysis and representation of folding of chain using the structural data from X-ray analysis of protein crystals are discussed. Considering a set of four consecutive Ca-atoms to be the smallest segment for purposes of helical analysis, parameters are defined to characterize the helical axes and also dihedral anglo involving the virtual bonds. These are used to devise one and two-dlmensional representations which yield meaningful information regarding chain folding. Results of application to actual protein data are discussed. A method of using stereographic projection involving helical axes is discussed with illustrations. 1. I n t r o d u c t i o n

B y far the largest amount of direct information about the specific way protein chains fold is available from X-ray crystallographic analysis of globular proteins and the resulting structural data. The direct evidence of the ~-helix (Pauling & Corey, 1951) and other types of protein conformations has resulted from these analyses. Although the three-dimensional atomic co-ordinates specify precisely the complete architecture of these protein molecules, for certain purposes a simplified representation is sufficient and the most common one involves the use of the (~, ~b) diagram. Here the twin parameters ~, ~, which represent the rotations around two single bonds at each of the Ca-atoms of the chain, specify the relative orientation of a pair of peptide units. For assumed average dlmensions~ of the peptide unit, the backbone atoms are completely specified b y the (~, ~) values at a given Ca-atom. Other parameters to describe the side chain conformations are also available (I~PAC-IUB Commission on Biological Nomenclature, 1970), to make the specification of a protein molecule complete. I t is now a well-established fact t h a t theoretical and semi-empirical methods of prediction of protein conformation using energy consideration are done conveniently in terms of the (~, ~) diagram. The data available, from protein crystal structures have been used to verify the deductions, and also conversely in certain instances the (~b, ~b) diagram is used for the purpose of correcting improbable conformations during model building. I n the various attempts to predict the precise conformations of a protein molecule, given the sequence, systematic methods of characterization of the folding such as t This is eontribution no. 408 from the Centre of Advanced Study in Physics. The non-planarity of the peptide unit can be specified by the rotation angle co around 1~--C bond.

739

740

R. SRINIVASAN

ET

AL.

the ~-, E-types, bends, coils, etc., have been developed (Burgess et al., 1974; Nagano, 1974) and the available structural data have been analysed statistically on the (~, ~b) plane with a view to extracting information which is again used for feeding back in the prediction methods. Although the (~, ~b) diagram has been the basis for all these investigations, it has been felt that, in certain situations, it is rather inadequate to extract certain types of information, as for instance when one wishes to examine which parts of the chain are close to each other. While the ~ three-dlmensional structure from X-ray analysis contains the complete answer to this, it is rather unwieldy to handle. The so-called distance map has been suggested for this purpose (Phl]llps, 1970; Ooi & Nishikawa, 1973) which is again, like the (~, ~) diagram, a two-dimensional representation of the three-dlmensional information. Also, study of the characteristic patterns of these maps leads to interesting information. For instance, general sirnl]arities of patterns of different proteins point to possible evolutionary relations among them (Rossmann & Liljas, 1974). Other techniques are also available which yield specific information about protein folding. For instance distances involving a given Ca-atom and its third neighbour (i.e. ll.~+3) have been plotted (Lewis et al., 1973) for successive residues and such a chain plot~ can discriminate the ~-helical regions from the others. So also a chain plot of the number of Ca-atoms within a specified distance from a given Ca-atom, excluding a few nearest neighbours, has been used (Ooi & Nishikawa, 1973) to get an idea of the accessibility of that part of the chain to solvent molecules. The distance map mentioned above cannot give the sense of folding of the protein chain since it is in the nature of a Patterson function known to crystallographers (Rossmann & Liljas, 1974). Also the precise way the chain folds as one progresses along the chain does not become obvious in this representation. While, in principle, both these could be ascertained by tracing the points of successive residues on the (~, ~b) diagram and checking in which characteristic regions they fall, it would appear worthwhile to develop more powerful methods, if possible, for this purpose. Recently we have been interested in examining this aspect of general methods of analysis and representation of protein chain folding with a view to extracting specific types of information from crystallographic data. It is obvious that all of these, which involve reduction of three-dlmensional information into one or two-dimensional representations, should necessarily have their inadequacy. However, by a proper choice of parameter and representation it is possible to make them emcient in yielding a specific type of information being sought for. In this paper we consider a few such possibilities with illustrations. Details of applications are reserved for other papers to be published in this series.

2. Parameters for the Analysis of Chain-folding and Notations The various conformational parameters that are used to characterize protein molecules are all connected with real bonds in the chain. We shall now consider a few new parameters involving virtual bonds that would be found useful in the characterization of chain folding. The concept of virtual bonds in polypeptides is nothing new t W e use the t e r m "chain p l o t " for the graphical plotting of a chosen parameter as a function of the residue number in t h e ehalno

PROTEIN CRYSTALLOGRAPHIC STRUCTURE

741

o{

'%.

eiC ) ,;. . . . .

.

/ O~

/'1

t4/

c] -I

Fie. 1. Schematic diagram of a peptide chain consisting of three peptide units. Virtual bolids involving the four Ca-atoms are shown by solid lines, and real bonds of the peptide units are shown by broken lines. The new parameters involving virtual bonds are marked. (Brant & Fiery, 1965). The consecutive peptide units m a y be imagined to be connected b y virtual bonds of constant length (Pal]ling & Corey, 1951)t equal to the distance between consecutive Ca-atoms. We define the virtual bond angle 8~ to be the angle a t the a t o m C~ given b y C~_I--C~--Ca+I. Similarly the dihedral angle 0 ~ is defined to be the torsion angle C~_I--C~--C~+I--C~+ ~ (Fig. 1). I t is surprising t h a t little use seems to have been made in protein d a t a analysis of parameters connected with virtual bonds, although the virtual bond concept is weli-known and used in polymer statistics (Brant & F l o r y , 1965). Recently the angle 0 has been used for energy calculations (Levitt & Warshel, 1975)§. I t m a y be noted t h a t 0~ has, theroretically, a full range of variation from --180 ° to -}-180 °. However, the angle 81 has a restricted range. This m a y seem to arise from the fact t h a t for a given pair of peptide units, the rotations ~ and ~ a t the middle C~-atom allows the two adjoining atoms Ca~-I and C~+I to assume different spatial positions. Thus 0t and 8~ m a y serve as parameters to characterize the gross feature of folding of the polypeptide chain, although the specific information of the relative orientation of the peptide planes is not readily available in this picture. t More generally the distance between two atoms, C~ and C~, is denoted by Iu {from this definition the distance map (phillips, 1970; Ooi & Nishikawa, 1973) is seen to be an lts plot). We have taken above all Ii. i +~ to be a constant equal to l. In order to conform to the other double-subscript notation being used later, this angle strictly needs to be specified by a double-subscript symbol such as 01.~+1, the dihedral angle between the planes (C~__~--C,---C,+I) a a a and (C,---CI+I--C,.I.2) a a a o r in general 01], the (C~+I---C~--C~)(C~--C~-C~+1). However, we s~l] not be dealing with 0, and hence use the simplified symbol 0~for 0~.t+1 by dropping optionally the subscript ~-}-1. § After this work was written up the above reference was brought to our attention by the referee whom we would like to tban~ here.

742

R. S R I N I V A S A N E T AL.

Helical axis T h e chain folding m a y be depicted f r o m a n o t h e r angle. Thus, considering a set of four consecutive Ca-atoms in a chain, these m a y be t a k e n to constitute the smallest segment of a helical arrangement. I t is useful to define the direction of the axis o f t h e helix for such a segment. The unit vector for such an axis is d e n o t e d b y h,~ which involves the a t o m s Ca~-l, C~, C~+ 1 a n d C~+~. T h e angle between t h e axes ht a n d h s is d e n o t e d b y ~tj. However, following t h e earlier practice, the angle ~ . t + 1 is denoted simply b y ~,. 3. D a t a

Representations

(a) Single-parameter representatio~ and chain plot analysis Several possibilities are there for the choice of a parameter, for chain plot analysis. The 1,.,+ a plot m e n t i o n e d earlier is one a m o n g them. The choice is, however, dictated b y the specific aim in view. Here we consider two possibilities for such a plot involving parameters which are in the n a t u r e of angles, viz. 0t and W. Since 0t is in the n a t u r e o f a dihedral angle it has a full range of - - 1 8 0 ° to + 1 8 0 °. Also 0~ will characterize the sense of folding. Thus t h e range 8 = 0 to 180 ° a n d - - 1 8 0 ° to 0 ° correspond to positive (or right helical) twist a n d negative (or left-helical) twist, respectively. I n this respect it furnishes information n o t obtainable from, say, the chain plot o f ll.t+a. Also the range of variation of 0 is large c o m p a r e d to t h a t of the distance p a r a m e t e r such as l,.,+a so t h a t t h e discrimlnatory ability m a y be expected t o be larger in characterizing the chain folding. Table 1 gives the 0 as well as ~ values for some of the s t a n d a r d conformations a n d these correspond t o (~, ~b) values o f R a m a c h a n d r a n & Sasisekharan (1968). TABLE 1

0 values for ~ome standard conformations St~cture

O( ° )

8( °)

=-Helix 8~0-Helix 7-Helix ~-Helix 2-27-Helix 2-07-Helix 4"314-Helix ~-Helix Poly-L-Pro I Poly-L-Pro 1I Polyglyeine I I Poly-T.-hydroxyproline A Collagen chain I Collagen chain I I Collagen chain H I Silk

50.2 85-2 7-1 29-2 165.0 -- 179-6 100-5 28.2 -- 105.1 -- 109.3 -- 109.2 -- 107.6 -- 96.9 --93.2 -- 81.0 179.2

91.1 83.1 107.4 102.8 92.3 94.3 104.7 94.6 125.2 120.5 121.9 121.0 124.3 121.4 128.9 131.9

t This unit veeter h, is defined as the cross product (properly normalized) of the two vectors bisecting the angles ~l and 6, +1. For the case of a perfect helix (i.e. all l,'s equal and all 8~'s equal) this exactly coincides with the axis of the whole helix, on which the four Ca-atoms are arranged. Even in actual cases where ll's and 8~'s are unequal h , as defined above, serves as the helical gTi~ of the segment as a first approximation. Details of this will be discussed in a later paper.

PROTEIN

CRYSTALLOGRAPHIC

STRUCTURE

748

The parameter ~l is somewhat slmilar. It has theoretically a smaller range, namely 0 to 180 °, and represents the angle between helical axes of successive segments, Both 0t and ~ may be expected to have characteristic values for ~-helieal regions, The use of the virtual bond angle 8~ for chain plot analysis does not appear to be promising. (b) Two-parameter representation The (¢, ~b) diagram and the l~j plot are two examples of two-parameter representations t in two ¢t~mensions. It is obvious one can have more such representations particularly of the latter category. Thus, for instance one can plot ~/~sas a function of i and j and this would map the relative orientation of the helical axes h t and ht. (e) Use of 8tereographic projection An entirely different type of two-dimensional representation is also possible and this involves the use of stereographie projection, well-known to crystallographers and mineralogists. In crystallography one uses the stereographic projection, especially for studying the interracial angles in crystals (see for example, Phillips, 1954; Cullity, 1967; Azaroff, 1968). Briefly the method is as follows: imagine an idealized crystal to be at the centre of an imaginary sphere of unit radius. The normals to the various crystal faces are constructed and these directions, passing through the origin, intersect the surface of the sphere at points which yield the spherical projections. An arbitrary polar axis is chosen (conventionally the N-S direction) which may or may not coincide with an important axis of the crystal. The points on the sphere are joined to either of the two poles$ and the intersections of these lines with the diametral plane (i.e. the plane, normal to the polar axis) yield the stereographic projection. For convenience the points corresponding to the two hemispheres separated by the diametral plane are distinguished in the projection by different symbols. The advantages of the stereographie projection are that, firstly the angular relations are preserved in the projection, and secondly any circle on the sphere projects as a circle on the stereogram. Using standard charts, such as the Wulff's net, the angle between any two normals can readily be obtained by the standard procedure of measuring the distance along the great circle between the points on the stereogram. One may also use other types of projections, such as the orthographic projection, which is similar to the stereographie projection excepting that the pole is taken at ~n6ulty in the polar axis instead of on the sphere. This projection has the disadvantage in that a relatively high density of points results near the periphery of the projection circle. In applying these techniques to our present analysis we may use some convenient chosen directions, in the protein molecule, to replace the face-normals for crystals. The directions specified by the vector h~ would appear to be a good choice, since t I t seems preferable to use t h e t e r m two-parameter representation since i t is more specific a n d corresponds to representation of t h e values of some q u a n t i t y such as, say, t h e energy as a function of t h e t w i n p a r a m e t e r ~,~bin t h e case of t h e (~,~b) plot a n d t h e distance as a function of t h e residue n u m b e r s ~, j in t h e distance map. A l t h o u g h t h e t e r m two-dimensional representation m a y also b e used, its use appears to be more appropriate for other cases such as t h e stereographie a n d other projections being discussed in later sections. The points on t h e n o r t h e r n hemisphere are joined to t h e s o u t h pole a n d those on t h e southern hendsphere are joined t o t h e n o r t h pole so t h a t all t h e lines intersect t h e diametral plane within t h e sphere.

744

R. SRINIVASAN

ET

AL.

they represent the axes of helical segments and one may expect a concentration of points on the stereogram corresponding to, say, a good a-helix. Although one may use standard charts such as the Wultf's net for analysing these stereograms, ff a computer is available they may be readjl~y handled, and the latter has the advantage that the scale of the unit circle may be chosen at will. In our present studies we have developed computer programs for these calculations.

4. A p p l i c a t i o n to A c t u a l D a t a

The results of applying the different methods of analysis to actual data of protein crystals are +lmmarized below with illustrative examples. The chain plot of 0 for ribonuciease S (Wyckoff et al., 1970) is shown in Figure 2. I t may be seen that the m-helical regions are characterized by 0 values of about +50 °. t~-Structure has values close to 180°. The chain plot of ~ is shown in Figure 3 for ribonuciease S. For con. veniencc of ready reference, the schematic diagram of the secondary structure of ribonuclease S, taken from Wyckoff eJ al. (1970), is given in Figure 4. The helical regions have + values close to zero (Fig. 3). A representative map for myoglobin (Watson, 1969) is shown in Plate I. In contrast t o t h e distance map (Ooi & Nishikawa, 1973), the +hi map has characteristic triangular regions of approximately constant value along the diagonal and these could be identified with the m-helices. Also, in the non-diagonal regions, dark areas denote direction within a cone of 60 °. Thus for example B and F helices point to a direction within 60 ° and similarly the G and H helices point almost in opposite directions (b]A.nt:region in Plate I). These also become obvious from Figure 5 below.

,, +,4!!! i~++

L:-:_-:!--

-it ---:__-__! ....

, ,.:IT-~T ~-,.Jli ~ I ,A

, ,_ ~+ + I~Jl]- .~h +1 l- ~. x~ it' it l+l~+--l~'tltA/++'AII II .,'~I+W\ +li/tPJ.

,.~-l_.l__~l_. 'L~lt II+ ~It

I ! I

+ '+ ";: Residue

Ii.

tIV IoI::

II

;!

1.0o

® 171/ ',I,z ~lltlF

,,o"

number

~IG~. 2. Chain plot of O=for ~'bonuo]eeuseS. For eo'nvenienoe every tenth residue is mac]red b y e, circle r o u n d t h e point.

I

i

ltc

i

]

.

I0

20

i

s

t.

N

40

i

i

i

60

Residue

70

i

-

.......

X X.],.

SO

e

O0

gO

I00

II0

-

-

-

[ZO

14o

130 °

number

Fro. 3. Chain plot of +h for ribonuolease S. F o r convenience every t e n t h residue is m a r k e d b y a oirele r o u n d t h e point.

(/). o. c

o-

. . . . . . .

~o

R e~s~i~e=~numb:er PLATE ~[. ~hj-plot f o r m y o g l o b i n . T h e v a r i o u s h e l i c a l s e g m e n t s A, B e t c . a r e m a r k e d a l o n g t h e d i a g o n a l . D a r k , g r e y a n d w h i t e r e g i o n s c o r r e s p o n d , r e s p e c t i v e l y , t o 7? r a n g e s o f 0 ° t o 60 °, 60 ° t o 120 ° a n d 120 ° t o 180 ° .

[facing p. 7~4

PROTEII~

CRYSTALLOGRAPHIC

STRUCTURE

745

.¢ i .¢ .( "C

i

~Q

_.t F~G. 4. Schematic diagram of the secondary structure of ribonuelease S. Figure 5 shows the stereographic projection for myogiobin with the crystallographic z-direction as the polar axis with + z pointing upwards from the plane of the paper. Although the complete sequence of the chain is dit~icult to trace, one can see regions of concentration of points, which correspond to ~-helices. I t is clear t h a t if ~y

÷!

-y

FxG. 5. Stereographio projection of helical axes for myogiobin with ~-z pointing upwards. Helical regions A, B, C etc. are marked near the respective segments. For the explanation of × and A , see text, above.

746

R. S R I N I V A S A N B T A L .

the helix were an ideal one all the points corresponding to the various residues in a given helix would occur at the same point on the stereogram. In actual practice one might see only scatter of these points about the one corresponding to the mean axis. Figure 5 also shows the relative orientation of the helical axes. The G helix m a y be seen to have its mean axis pointing in the positive direction (marked b y × ) and the H helix in the negative direction (marked A). The relative orientation of two ~helical cylinders can be readily obtained @ore the stereogram, which is an elegant property of such a representation. (For obtaining the angle between two points in a stereogram please see section (c) above on the use of stereographic projection.) As a part of the present analysis, the statistical distribution of some o f the parameters as observed in protein crystals has also been studied. Figure 6 shows the P(6) distribution for carboxypeptidase A (I~pscomb ~ ~., 1970) and m-chymotrypsin (Birktoft ~ al., 1969). For the former, a high content of both ~ and fl conformations is obvious (see Table 1). In the latter the rather low ~-helical content is reflected. Corboxypeptidose - A 60

50

T 40 3O

IC

-too-,~o-,zo-9o-,~o'-3o

~ 0

a-

60

30

90

t20

tSO

180

"-

Chymot rypsin

30

I

ZO

10

- + '-t~o' "-,~o':~" :6~ '-~o' b' '3'o' '6'0' "9'o' ~o ,~ ',8o

0---~ FzG. 6. Observed frequenoy distribution of e for carboxypeptidase A and ~-ohymotrypsin.

REFERENCES Azaroff, L. V. (1968). Tn Elements of X-ray Crystallography, pp. 23-38, McGraw-l~ill, Canada. Birktoft, J. J., Matthews, B. W. & Blow, D. M. (1969). B/ochem. B~ophys. Bea. Commun. 86, 131-137.

PROTEIN CRYSTALLOGRAPHIC STRUCTURE

747

Brant, D. A. & Flory, P. J. (1965). J. Amer. Chem. Soc. 87, 2788-2791. Burgess, A. W., Ponnuswamy, P. K. & Scheraga, H. A. (1974). I~raslJ. Chem. 12, 239-286. Cullity, B. D. (1967). Elements of X-ray Diffraction, pp. 60-75, Addison-Wesley, London. IUPAC-IUB Commission on Biochemical Nomenclature, (1970). Bioch6mis~T, 9, 34713479. Levitt, M. & Warshel, A. (1975). Nature (London), 253, 694-698. Lewis, P. N., Momany, F. A. & Scheraga, H. A. (1973). Biochim. Biophys. Ac~, 308, 211-229. Lipscomb, W. N., Reeke Jun, G. N., Hartsuck, J. A., Qulocho, F. A. & Bethge, P. H. (1970). Phil. Trans. Roy. Soc. set. B, 257, 177-214. Nagano, K. (1974). J. Mol. Biol. 84, 337-372. Ooi, T. & Nishikawa, K. (I973). In The Jerusalem Symposia on Quantum Chemist/fy and Biochemistry (Bergamann, B. D. & Pullman, B., eds), no. 5, pp. 173-187, Academic Press, London. Pauling, L. & Corey, R. B. (1951). Proc. Nat. Acad. Sci., U.S.A. 37, 235-240. Phillips, D. C. (1970). In British Biochemistry. "Past and Present" (Goodwin, T.-W., ed.), pp. 11-28, Academic Press, London. Phillips, F. C. (1954). The Use of Stereographic Projection in S~ructural Geology, Edward Arnold, London. Ramachandran, G. N. & Sasisekhaxan, V. (1968). Advan. Protein Chem. 23, 323. Rossmann, M. G. & Lfljas, A. (1974). J. Mol. Biol. 85, 177-181. Watson, H. C. (1969). Progress in Stereochemistry, 4, 229-333. Wyckoff, H. W., Tsernoglou, D., Hanson, A. W., Knox, J. R., Lee, B. & Richards, F. M. (1970). J. Biol. Chem. 245, 305-328.

Some new methods and general results of analysis of protein crystallographic structural data.

J. Mol. Biol. (1975) 98, 739-747 Some New Methods and General Results of Analysis of Protein Crystallographic Structural Data# R. S R n ~ V A S ~ , R...
2MB Sizes 0 Downloads 0 Views