Int. J. PeptideProtein Res. 13, 1979, 320-326 Published by Munksgaard, Copenhagen, Denmark N o part may be reproduced by any process without written permission from the author(s)

D I S T A N C E C O N S T R A I N T S O N MACROMOLECULAR CONFORMATION I: The Effectiveness of the Experimental Studies on Tobacco Mosaic Virus Protein GORDON

M.CRIPPEN

Department o f Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, California, U.S.A.

Received 5 July, accepted for publication 21 July 1978

Many physico-chemical studies are made on proteins to determine something o f their solution conformation. For example the coat protein o f Tobacco Mosaic Virus has been subjected to more non-crystallographic experimental studies to determine its native conformation than perhaps any other protein. Yet the sum of the experimentally determined constraints on its tertiary structure are surprisingly inadequate to fix its conformation. We are able to detect and remove minor inconsistencies in the data and then calculate a sampling of conformations consistent with all the data, which differ among themselves by 1.m.s. deviations of the respective interresidue distances ranging from 5.78, to 15.8A Some individual interresidue distances differ by as much as 508, from structure to structure. In order to restrict the range of possible conformations to something corresponding to the errors in a I0 8, resolution X-ray crystal structure, chemical and spectroscopic studies will have to be much more detailed than anything done to date. Our calculations appear to be useful in deciding which further experiments would be most productive. Key words; distance geometry ; protein solution conformation; protein tertiary conformation; structure determination; Tobacco Mosaic Virus.

There are innumerable accounts in the literature of experimental studies attempting to determine the conformation of various proteins in solution by the many spectroscopic techniques available or by chemical modification methods. It is very desirable to make such structural determination, but our feeling has been that the experimental data is easily interpreted as determining the conformation far more completely than it actually does. Instead of moposing a single conformation that fits the data, it would be more proper to examine the range of allowed conformations. As a test of this hypothesis, we have examined the struc3 20

0367-8377/79/030320-07

tural data as of 1975 for Tobacco Mosaic Virus coat protein and have calculated five different, randomly chosen conformations consistent with all constraints, and then compared these with each other. TMV protein was selected as our test case because an extraordinarily great deal of work has been done on the determination of its conformation by a wide range of techniques, as has been neatly summarized by Durham & Butler (1975). The evidence comes from X-ray studies on oriented gels of the virus (yielding radial distances for some residues), electron microscopy, hydrogen exchange, immunochemistry ,

$02.00/0 0 1979 Munksgaard, Copenhagen

TMV EXPERIMENTAL DISTANCE CONSTRAINTS

Thus we must determine an entire n x n matrix of upper bounds U = (uij) and lower bounds L = (lij). Table 1 summarizes the data of Durham & Butler on the left, and the corresponding upper and lower bounds on the distances are shown on the right. Note that only experimental structural constraints have been included in this study, and theoretical constraints, particularly secondary structure predictions, have been excluded. We have several reasons for this: (1) The purpose of this investigation is to show how much a great body of experimental data limits the range of METHODS possible conformations of a protein. The effect Since our “distance geometry” algorithms of theoretical constraints is considered in (for calculating molecular conformations con- another paper of this series (Havel et al., sistent with given constraints) are a novel 1978). ( 2 ) The distance geometry approach recent development (Crippen, 1977, 1978; makes use of bounds on the distances between Crippen & Havel, 1978) we briefly summarize labelled points, so that knowing the percent the approach here, as applied to the TMV helix from CD studies, for example, is not protein problem at hand. The reader is referred useful because it is not known which residues to the above references for more detail. are helical, but only how many. (3) Even Step I : The molecule is represented by a assuming that a prediction of helical segments collection of n points. For high resolution was 100% correct, we find from another work, each point would correspond to an series of calculations (Havel et al., 1978) atomic center, but for the purposes of this that a full knowledge of secondary strucstudy, one point located at the Ca of every ture usually adds very little to the determinother amino acid residue gives sufficient detail. ation of the gross tertiary structure. Each TMV protein subunit consists of a single Part of the constraints is that each protein polypeptide chain of 158 residues, so we used subunit is known to occupy a sector-shaped 79 residue points to represent the chain. Using wedge of a certain size so that the subunits one point for each residue would have been form the hollow-cored, cylindrical virus particle eight times as time consuming for some parts (see Fig. 1). In order to represent such conof the calculation without adding anything straints in terms of bounds on distances, it t o the conclusions we reach in this paper concerning the gross conformational freedom of the chain. Step 2 : The geometric constraints are exu x o l e 50 pressed as upper and lower bounds on some of the interpoint distances. The task is to calculate 81 82 a set of three dimensional Cartesian coordinates vi = (xi, yi, zi), i = 1, . . ,n for n points such various kinds of spectroscopy, and determination of accessibility to attacking reagents. More recently, 4 8, resolution X-ray diffraction studies on TMV fibers (Stubbs et al., 1977) and a 5.5 8, crystal structure (Champness et ~ l . , 1976) have shown much more of TMV protein structure, but it is not our purpose in this paper to predict the native conformation. Instead we simply show how much a given data set, that of Durham & Butler, tells us about the conformation.

A

.

for

i,j = 1,. . . , n

(1)

FIGURE 1

Plan view of the TMV protein wedge (trapezoid in such that where I I I I denotes the ordinary heavy lines) showing the placement of outrigger points Euclidean distance between points: 80-83 to scale. Arcs indicate the upper and lower bound distances allowed between points of the poly~ - zj)’] dij = [(xi -xi)’ + (yi - ~ j ) (3

+

’’’

peptide chain and the various outrigger points. Point 83 is 200 A out of plane.

32 1

G.M. CRIPPEN TABLE 1 Geometric constraints on TMV protein used by Durham & Butler (1975) and the corresponding bounds on interpoint distances used in distance geometry calculations Interpoint distance boundsa Experimental constraints

1

j

Uij

(A)

1, (A)

Radial 1 is 90.4 Distance 27 is 57 A Of residue 64 between 80 and 90 A 66 between 80 and 90 A 6 8 is 70 A 90i s 39A 92 is 39 A 95 between 20 and 30 A 97 between 20 and 30A 113is39A 115 is 25 A 116 is 25.4 131 is 84 A 139 is 72 A 145 is 84 A 158 is 90 A all others between 20 and 90 A 158 residues in single polypeptide chain the chain is self avoiding

80 80 80 80 80 80 80 80 80 80 80

1 14 32 33 34 45 46 48 49 57 58

90 57 90 90 75 44 44 30 30 44 30

90 57 80 80 6Sb 34b 34b 20 20 34b 20b

80 80 80 80 80

66 70 73 79 other

89 72 84 90 90

79b 72b

1

j i+l

Chain within 22” wedge, 25 A thick

80 80 80 81 81 82 81 82 83

1

for i = 1 , . . . , 7 8 other j for i , j = 1 , . . . , 7 9 81 82 83 82 83 83 1,. ..,79 1 , . . . ,79 1,. . . , 7 9

7.3 500

188.272 188.272 205.760 332.513 263.1 37 263.137 183.429 183.429 225.

84

90 20 6.0

6 .O 188.272 188.272 205.760 332.513 263.1 37 263.137 149.084 149.084 200.

a There are only 79 points corresponding to the 158 residues of the protein; i and j refer to the point numbering scheme; uu is the upper bound distance allowed between points i and j, and lij is the lower bound distance. bUpper and lower distance bounds had to be relaxed f 5 A to relieve inconsistencies in experimental data.

was necessary to introduce four “outrigger” points, which do not correspond to the location of any real atom. Outrigger point 80 was taken to be at the cylinder axis of the virus. Distances from #SO to certain residues are known from the experimental data (see Table 1). The upper and lower bounds on the distance from all parts of the chain to point #SO determine the bottom and top, respectively, of the wedge shaped nllowed region shown in Fig. 1 . Outrigger 322

points 81 and 82 were located to the sides by means of appropriately chosen 80-81, 80-82, and 81-82 distances so that the allowed distances between them and the residues confined the chain to the experimentally determined 22’ wedge shape. Point 83 was then placed out of the plane of the previous three outriggers to confine the protein to the desired 25A thickness of wedge. (See Durham & Butler (1975) for further description and citations of

TMV EXPERIMENTAL DISTANCE CONSTRAINTS

cc

n 1-1 the original papers which led to this picture.) d , : = n-l f . d i -n-' dfk (6) The resultant arrangement is shown in Fig. 1. j= 1 j=2 k = i We wish to make clear from this example that one can represent experimental structural Basing the calculation of G on the center of constraints in terms of distances between mass increases the numerical stability of the points, even if the usual way of expressing algorithm. Step 6: Calculate the three largest eigenthem makes no reference to interpoint disvalues, X 1 , h z , and h 3 , of G and their cortances. eigenvectors, (wll , . . . ,w , ~ ) , Step 3: Even with such a long list of experi- responding mental constraints, some elements of the U and ( ~ 1 2 , .. . ,wnz),and(w13,. . . ,wn3). In effect, L matrices may still be undetermined. The U this determines the metric matrix of rank 3 matrix is completed by initially assigning a which is closest to G in a matrix norm sense. very large value to all undetermined elements Then coordinates may be immediately caland then exhaustively applying the triangle culated according to inequality uij < Uik + ukj (3)

to all triplets of points, i, j, and k. If uij is larger than the right side of eqn. 3, uij is set equal to it, and so on until no further alterations can be made on the matrix. This has the effect of, for example, spreading out the consequences of small upper bound distances between two residues to the upper bounds between neighboring residues. Similarly, one must complete the lower bound matrix according to l i j 2 lik - ujk (4) for all triplets of points, i, j , and k. Step 4: Randomly choose a proposed n x n distance matrix, D, such that each entry satisfies the constraints: Zij Q dij Q uij, for all i and j. This is the stage that produces a variety of conformations, a different one for each D. D obeys all the constraints, but does not necessarily correspond to any realizable configuration of points in threedimensional space. Step 5 : A sufficient condition that D does correspond to points in three dimensions is that the metric matrix, G , corresponding to D has rank three (Crippen & Havel, 1978). ,G = (gij) simply consists of the dot products of vectors from the origin to the points i and j, for all i and j. C may be calculated from D by

where the origin, 0 ,is taken to be the center of mass, and the df., may be calculated (Crippen & Havel, 1978) by

These coordinates, however, do not in general satisfy completely the distance constraints of eqn. 1. Step 7: It is therefore necessary to refine the positions of the points by minimizing f(vl, . . . ,v 3 with respect to the 3n Cartesian coordinates

Zij

Q dij Q uij

(8)

where the dij are calculated from the coordinates for the minimization, so convergence is usually rapid. The rate of convergence was substantially improved, however, by fixing the coordinates of the four outrigger points initially at the desired distances from one another given in Table 1, and then making the final iterative improvement of the conformation by moving only the 79 protein points. RESULTS

Our first discovery was that if one takes the residue radial positions given by Durham & Butler (1975) to be literally the distances from the C" of the residue to the cylinder axis (point 80), then there are some contradictions in the data. The problem is not serious, since 323

G.M. CRIPPEN

the quoted radial distances refer to locations of heavy metal ions which are bound to certain amino acid residue side chains, which are in turn a few Angstroms away from the corresponding alpha carbons. Thus we see from Table 1, residue 1 13 is supposed to be 39 A from the cylinder axis, while residue 115 must be only 25 A away, yet the distance from 115 to 113 could not possibly be greater than 2(3.8 A) = 7.6A < 14A = 39-25. All such inconsistencies have been marked with a “b” in Table 1 and have been relieved by relaxing the constraints merely k 5 A, as shown in the right-hand side of the table. The experimental inconsistencies themselves are the trivial result of an overliteral interpretation of good data, but they were not obvious amid the long list of distance constraints used. One could imagine cases where the errors would be much more subtle and where they stem from genuine error. Therefore, it is noteworthy that the structure generating algorithm is capable of automatically detecting geometric contradictions in the data given it by failing to converge to an allowed conformation and then showing wluch interpoint distance constraints could not be satisfied. These distances involve primarily only a few points, which are connected by the mutually contradictory constraints. Hence it is usually easy to trace the errors back to their source and remedy them. We sampled the range of conformations allowed by the now self-consistent constraints by calculating coordinates for five otherwise random conformations. Of course they are not completely random since they all satisfy the experimental data. We do not claim that five conformations is an exhaustive exploration of the range of possibilities, but it is an adequate sample for the point we need to make. All five structure generations were successful, having no distance between points in error by more than a few tenths of an Angstrom. For example, a computer graphics display of the second conformation is shown in Fig. 2 . An average of 13 seconds of computer time per structure was required on a CDC 7600. Table 2 shows the r.m.s. interpoint distance deviation, calculated for each pair of conformations. If the coordinates of conformation A are designated by vectors %, i = 1 , . . . , 8 3 ;and those of 3 24

FIGURE 2 Computer graphics display of calculated conformation #2 in a plan view similar t o that of Fig. 1.

conformations B are given by bi, i = 1 , . . . , 8 3 ; then A(A, B) is calculated by the formula

TMV EXPERIMENTAL DISTANCE CONSTRAINTS TABLE 2 The r.m.s. interpoint distance deviation A, (in A) between all pairs of the five calculated conformations of TMV protein. The values in parentheses are the largest single differences in corresponding distances Conformation

1

1

0

2

2 12.292 (42.7) 0

3

3

4

5

13.986 (56.4)

14.936 (56.0)

15.850 (53.2)

7.818 (29.7)

8.175 (36.56)

10.696 (50.6)

5.709 (30.3)

8.468 (35.3)

0

4

0

6.830 (24.4) 0

5

Eqn. 9 is just the r.m.s. deviation between two distance matrices, and has been used as a rough measure of similarity of conformation in previous work (Levitt & Warshal, 1975; Kuntz et al., 1976, 1978). The deviations are clearly very large in spite of the long list of distance constraints. Each conformation is well separated from the others, and some individual distances differ enormously from one conformation to another, by as much as 56A, although the whole molecule has dimensions only of the order 70 x 34 x 25 A. Of course since all the structures must meet the constraints, certain parts of the chain will always be at specified radial distances, for instance, while the intermediate chain segments will have more freedom. This is shown in Fig. 3 as a plot of (the radial) distance from (the axial) point #80 for the first 79 points (representing the chain), superimposing the curve for conformation #1 on the curve for conformation #2. There are otherwise no apparent trends as to which interpoint distances are responsible for the rm.s. deviations. It is interesting to compare the five calculated conformations with the X-ray diffraction structures (Champness et al., 1976; Stubbs et ul., 1977). Unfortunately the comparison must be strictly qualitative because the published experimental results are still at rather low resolutions (4 and 5A, respectively), and atomic coordinates are not yet available. The most unambiguous feature found in both the virus and the disk studies is a bundle of four helices, all roughly parallel

to each other, running radially outward from the cylinder axis. In the calculations, it would be possible to form the equivalent of a-helices, but there were no explicit constraints to do so. In none of the five calculated structures are there any features resembling helices, but rather only extended strands and coil segments. Thus the experimental evidence in Table 1 is insufficient to produce helices in the calculations. Further, only two out of the five calculated conformations show four strands of any sort running radially. One such example is shown in Fig. 2. It is immediately clear from Table 1 that the N- and C-termini are constrained to be far from the axis, while the

20

'

I

I

20

40

I

1

60

80

-

n

FIGURE 3 Plot of distance, d , o f the 79 alternate residue points (n = 1 , . . . , 79) from outrigger point #80 vs. point number, n. Curve for conformation #1 superimposed on curve for conformation #2.

325

G.M.CRIPPEN

middle (residue 95, for instance) must be close, so that necessarily there will be at least two strands running radially. The calculated structure shown in Fig. 2 proves that the data allow four strands, but this conformational feature is not a necessary consequence. We otherwise calculate a variety of foldings of the chain at low and high radius, but because these parts of the electron density maps are somewhat unclear in one or the other of the experimental studies, it is not worthwhile to compare our results to theirs.

but even knowing that two groups must be far apart can be useful. Furthermore, we are able to see, as in Fig. 3 , what parts of the molecule still have poorly defined conformation, as a guide to planning future experiments. For example, determining the radial position of point 20 would do more toward establishing the conformation than finding that of point 78. We intend to look into using such calculations to plan series of experiments so as to achieve maximum return on the definition of solution conformation for a given investment of labor.

DISCUSSION

ACKNOWLEDGMENT

One sobering conclusion from this work is that even the extensive chemical and physical studies on TMV protein cited by Durham & Butler (1975) were insufficient to narrow the range of possible conformations down to something that would be as well defined as a IOA resolution X-ray crystallographic structure. It is clear that far more extensive experimentation is required for useful conformational results than has been generally realized. The encouraging result of this study is that we have demonstrated the ability of our computer algorithms (Crippen, 1977,1978; Crippen & Havel, 1978) to generate conformations consistent with a great body of experimental data, and even to detect contradictions in the given data. Clearly this is of considerable importance for conformational studies, where devising a structure to fit the results is often difficult. The distance geometry formalism is capable of making use of data from many different sources. For instance in this study, cylindrically averaged X-ray information has been combined with the outcome of tyrosine iodination experiments (Durham & Butler, 1975), to name only a few. As long as an experimental result can be expressed as a constraint on the allow 1 distance between certain atoms, group of atoms, or reference points, this method can make use of the data. Discovery of close spatial proximity (by crosslinking reagents, for instance) seems to be the most powerful sort of distance constraint,

I would like to thank Dr. I. D. Kuntz for his helpful and encouraging discussions and Dr. R. Langridge for the use of the computer graphics system (Grant number NIH RR 1081). This work was supported by a grant from the Academic Senate of the University of California.

326

REFERENCES Champness, J . N., Bloomer, A. C., Bricogne, G., Butler, P. J . G. & Klug, A. (1976) Nurure 259, 20-24 Crippen, G. M. (1977) J. Comp. Physics 2 4 , 9 6 - 1 0 7 Crippen, G . M. (1978) J. Comp. Physics 2 6 , 4 4 9 - 4 5 2 Crippen, G. M. & Havel, T. F. (1978) Acfu Crysf. A , 34,282-284 Durham, A. C. H . & Butler, P. J . G. (1975) European J. Biochem. 5 3 , 3 9 7 - 4 0 4 Havel, T. F., Crippen, G. M. & Kuntz, I . D. (1978) Biopolymers. in press Levitt, M . 8~Warshel, A. (1975) Nature 253,694-698 Kuntz, I. D., Crippen, G . M., Kollman, P. A. & Kimelman, D. (1976) J. Mol. Biol. 106,983-994 Kuntz, I. D., Crippen, G. M. & Kollman, P. A. (1978) Biopolymers, in press Stubbs, G., Warren, S. & Holmes, K. (1977) Nufure 267.216-221 Address: Gordon M. Crippen

Department of Pharmaceutical Chemistry School of Pharmacy University of California San Francisco, CA 94143 USA

Distance constraints on macromolecular conformation.

Int. J. PeptideProtein Res. 13, 1979, 320-326 Published by Munksgaard, Copenhagen, Denmark N o part may be reproduced by any process without written p...
452KB Sizes 0 Downloads 0 Views