1531 W C. Lnmma, Jr., J. Am. Chem. SOC.91, 2820 (1969). [54] a) J . B. Bapor, D. S. C . Black, R . F . C. Brown, and C. Ichlov, Aust. J. Chem. 25, 2445 (1972); b) J . J . 7ififriello and E. J . "Pybidlski, J . Chem. Soc. Chem. Commun. 1973, 720. [SS] W Oppdrer, M. Petrzilku, B. Eukker, and S. Siles, unpublished work. [56] a) W Oppolzer and M. Petrzilka, J . Am. Chem. SOC. 98, 6722 (1976); b) unpublished work; c) W L . Scott and D. A. Evans, J. Am. Chem. Soc. 94, 4779 (I972). 1571 R. Hitisyen. R . Fleischmann. and A . Eckell, Tetrahedron Lett. 1960, No. 12, p. 1 ; R . Huisgeri and A . Eckell, h i d . 1960, No. 12, p. 5.

[ 5 8 ] W Oppolzer and H . P. Weber, Tetrahedron Lett. 1972, 1711. [59] W Oppolzer, Tetrahedron Lett. 1970, 2199. [60] W Oppolzer, Tetrahedron Lett. 1970, 3091. [61] R . Grmlie?: R . Hzrisqen, K . K . Sim, and R . M . Moriarty. J. Org. Chem. 30, 74 (1965). [62] W Oppolzer, Tetrahedron Lett. 1972, 1707.

[63] J . Bjgrgo, D. R . Boyd, and D. C. Neil/, J. Chem. Soc. Chem. Commun. 1974, 418.

Structural Rules for Globular Proteins By Georg E. Schulz[*] Is it possible to reach a detailed understanding of the complex three-dimensional structures of native polypeptide chains? In view of the wealth of common physicochemical and phylogenetic features discovered among proteins this question has become reasonable. The current state of discussion is presented in this report

1. Introduction The structure of a globular protein was first elucidated 16 years ago"]. Meanwhile about 40 proteins have been analyzed at atomic resolution, and the polypeptide chain fold is known in about ten more cases. Thus the pioneering era has undoubtedly come to an end. Protein structure models have become everyday tools of the biochemist; with their aid he can design and interpret his experiments much more soundly than ever before. The wealth of structural data available not only favors biochemical aspects, but also challenges us to understand the architecture of the observed structures and thus to trace biology back to its physical roots on a much more fundamental scale than was previously possible. Given an intimate knowledge of their architecture, a substantial increase in the number of analyzable proteins can be predicted because those proteins also become accessible which do not crystallize. Furthermore, the study of historical relationships, and especially very early stages of evolution, can be put on a firmer basis. Therefore, efforts invested in this field may be expected to yield considerable rewards.

2. Energy Balance Globular proteins consist of linear polypeptide chains (Fig. I), which are synthesized from 20 different amino acids[41. During or after synthesis these chains fold spontaneously to an exact three-dimensional structure. The spontaneity of folding is regarded as a generally valid principle, because it could be demonstrated in renaturing experiments with several proteins[5 'I. Thus all information pertaining to three-dimensional structure is already implicit in the chemical structure of the chain, i. e. in the amino acid sequence. Folding is merely a transition to an energetically more favorable state of the chain. [*] Dr. G. E. Schulz Max-Planck-Institut fur medizinische Forschung Jahnstrasse 29, D-6900 Heidelberg (Germany) Angwe. Cliem. lnr. Ed. Engl. 16,23-32 (1977)

U

Fig. 1. Part of a peptide chain. Because of resonance the peptide bonds are planar [2]. Therefore the dihedral angles 4 and li, at the CZ atoms determine the chain fold. Steric hindrance permits adoption of only 15 %, of all possible 9, li, orientations [3]. The relatively low mobility of the side chains should be noted.

The principal types of interactions involved in folding are: van der Waals forces between nonpolar groups, dipole forces between polar groups, in particular hydrogen bonds, and "hydrophobic forces" (a synonym for water entropy) which describe the tendency of the apolar side chains to form a separate hydrophobic phase-i. e. a kind of oil droplet"! In the free energy balance

the binding enthalpies and the water entropy together oppose the chain entropy term. Only when their contribution is SUEcient can the chain entropy be overcome and the chain "immobilized, i.e. constrained to a definite structure. For chains containing about 100 residues a rough estimate yields values of several hundred kcal/mol for the opposing contributions. However, the resulting AG is of the order 10 k~al/mol['~. Thus we are dealing with a delicately balanced system, which renders detailed understanding more difficult. Hydrophobic nuclei are found in all structurally known proteins, confirming the importance of the water entropy term in eq. (1). The packing density of these nuclei is as high as in 23

the most densely packed crystals stabilized by van der Waals forces""."I, i.e. the number of van der Waals contacts and thus the corresponding bond enthalpy term are maximized. This is also reflected in the comparatively low compressibility of proteins[Iz3 31 (ii,,i~/fiproiein ~ 2 0 )However, . the interior of proteins is not entirely apolar but contains about half of all the polar groups-in particular the C=O and N-H groups of the peptide bonds[t41. Being shielded from water 90% of these dipoles form hydrogen bonds with a relatively high energy contribution"51. Thus although proteins are stabilized by the oil drop effect they bear little resemblance to a disordered and loosely packed oil droplet, rather they are similar to a densely packed crystal. Equation (1) rationalizes the natural selection of a-amino acids and not p-amino acids, although the latter are produced in comparable quantities by experiments designed to simulate prebiotic synthesis[161.A peptide chain made up of 0-amino acids would exhibit almost free rotation about the C,-Cb bond and thus have a much higher chain entropy content. Hence much greater bond enthalpies and/or water entropies would be necessary for stabilizing the three-dimensional structure. In contrast, a chain composed of a-amino acids is far less flexible (Fig. 1) can be immobilized much more easily. For analogous reasons the side chains used are also comparatively rigid (Fig. 1).

3. Folding Pathway Folding of a chain of 100 a-amino acid residues requires the adjustment of about 300 parameters (Fig. 1). Even if each parameter is restricted to only two values, some 2300conformations become possible. Random adoption of each conformation until the chain stumbles upon an energetically stable one would require a time exceeding the age of the earth, even if only 1 0 - l 3 s were assumed for each trial. Hence the parameter values must be run through in a directed manner, i. e. there must be a folding pathway. For pancreatic trypsin inhibitor the existence of such a pathway has been proved experimentally. This was done by

tracing the sequence of S-S bridge formation on going from a random coil to the native state" 'I. Of the ($)= 15 possibilities, initially one of the native bridges is formed with maximum yield (Fig. 2). The cysteines of the accompanying non-native bridges are proximate in the native three-dimensional structure (Fig. 2). Accordingly, the chain regions around residues 5, 30, and 53 approach each other in the first stage of folding. Various false linkages occur also in the ensuing stages (Fig, 2). The conformational mixture is therefore broad; folding is not absolutely "single tracked". Renaturation experiments were performed with lysozyme at 84"C1'91.Although this temperature exceeds by far the denaturation temperature of lysozyme (ca.50"C) a transient renaturation is observed. Hence. even at such high temperature folding is faster than thermal unfolding. However, fast folding is feasible only if a defined folding pathway exists and little time is lost with trial-and-error fitting. Since such a pathway greatly reduces the conformational space to be considered, there is hope that it can by elucidated and reproduced with theoretical models. As the native three-dimensional structure is reached via a definite pathway it need not correspond to the absolute energy minimum in the entire conformational space. Extremely uneven energy surfaces with numerous local minima are to be expected for complicated structure^['^. Thus a defined pathway probably leads to a local minimum. However, it is fundamentally impossible to distinguish between a local and the global minimum because scrutinization of the entire conformational space would be required. Neither the chain itself nor any kind of computer simulation can d o that. Moreover, all kinetically inaccessible minima are irrelevant even if they include the absolute minimum. Any discussion of this questionC6,7, 91 therefore resembles the arguments of the scholastics.

4. Hierarchy Several degrees of organization are perceptible in the proteins of known structure (Table 1). The amino acid sequence Table 1 . Levels of organization in proteins. The order is vertical and partially hierarchic, i. e. the structural elements of a particular level, r . y . %-helices (a secondary structure) are essentially determined only by the elements contained therein from the next lower level ( r . q . itmino acid residues). Aggregates Globular proteins Domains

T Secondary structure aggregates

t Secondary structure

t Sequence of amino acid residues

30 - 55 Fig. 2. Sequence of S-S bridge formation on renaturing of pancreatic trypsin inhibitor, according to Creighron [17]. The cysteine side chains are linked by disulfide exchange reagents. It is generally assumed that bridge formation is not necessary for chain folding; it only indicates the vicinity of the cysteines. The native bridges are shown in boxes. Premature formation of the native bridge 14 -38 obstructs further folding. Cwiyhtoti's experiment shows unequibocally that a specific folding pathway is followed.- -The chain fold I S sketched according to Hiihrr. er nl. r t X ] . R=randoin coil, N = native.

24

represents the basis. As mentioned above, it implicitly contains the entire structural information. The next higher stage is seen in the secondary structures depicted in Figure 3. These are regular chain arrangements permitting optimal hydrogen bonding. Secondary structures are very common, extending over 75% of the chain in some proteins. This is probably related to the energetic necessity of compensating internal polar groups by forming hydrogen bonds. A119e>v.Chem. Inr. Ed. Enyf. 16.23-32 ( 1 9 7 7 )

5

regarded as less well-defined manifestations of the same principle. They are described below. The next higher degree of complexity is encountered in the domain. Most of the proteins with more than about 150 amino acid residues can be structurally subdivided into two or more spatially separated regions or domains. This is seen most clearly in the immunoglobulins whose light and heavy chains, respectively, form two and four separate domains (Fig. 5). The domains appear to contain those regions of the chain which fold independently to their three-dimensional structure.

I

I

\

,c

I

a

b

1

LblLI.sI

Fig. 3. Secondary structures in polypeptide chains. a) 3-helix, b) antiparallel P-pleated sheet, c) parallel P-pleated sheet, d ) twist found in all pleated sheets, e) P-bend. In all secondary structures hydrogen bonds are formed between the C=O and N-H groups of the peptide bonds.

In some cases, secondary structures form regular aggregates, so-called “supersecondary structures”. One undisputed example is the double cc-helix[201 in which the side chains oftwo a-helices are meshed together and the helices are wound around each other with a repeat distance of about 180A (Fig. 4).

I$

a

..... ,..:.

..

..

.

a

b

e

d

b

Fig. 4. Secondary structure aggregates or “supersecondary structures’‘. a) Double?-helix according to Crick 1201,observed in paramyosin [21]: myosin [22], r-keratin [23], and tropomyosin [24]. Similar structures seem t o occur in hemerythrin [25] and bacteriorhodopsin [ 2 6 ] . b) Pleated sheet;helix combination, consisting of three strands of pleated sheet and two r-helices, found in lactate dehydrogenase [27], malate dehydrogenase 1281, alcohol dehydrogenase 1291, and glyceraldehyde 3-phosphate dehydrogenase [30], adenylate kinase 1311, flavodoxin [32, 331, subtilisin [34], phosphoglycerate kinase 1351, and phosphorylase [36].

Another supersecondary structure is the pleated sheet/helix combination found in some dehydrogenase~[~’] and other proteins (Fig. 4). Apart from these clearly defined supersecondary structures there are several structural preferences to be

Fig. 5. Domain structure of globular proteins. a) The light (L) and heavy (H) polypeptide chains of immunoglobulins fold into t w o and four domains [37], i. r. spatially separated structural regions. b) Adenylate kinase forms one large and one small domain which are not linked in the usual manner by one, but by two strands. Consequently, the large domain shows a smaller neighborhood correlation than the small one.

Globular proteins with one or more domains (Table 1) are not always the ultimate stage of self-assembly. In many cases protein surfaces are constructed in such a way that globules combine into larger aggregates, e. y. enzyme complexes, ribosomes, virus coats, muscle fibers. If only one or two kinds of components join together the aggregates are usually symmetrical (e.g. subunits of glutathione r e d ~ c t a s e [ ~ * ] or lactate d e h y d r o g e n a ~ e [ ~regulatory ~], and catalytic subunits of aspartate tran~carbamylase[~~], pentone and hexone subunits of the adenovirus myosin forming a myosin Symmetry arises because surface contacts are optimized and everywhere the same. Understanding of chain folding would be greatly facilitated if this vertical order were strictly hierarchic, i.e. if the structural elements of a given level (e.g. a-helix) were influenced only by the elements of the immediately preceding level contained therein (e.g. amino acid residues) and not by other parts of the protein or other structural levels. The folding could then be analyzed for each step separately, and the difficult overall problem resolved into several much easier problems. Fairly strict hierarchy is found between globular proteins and aggregates (Table 1) because in most cases the globular monomer is rather stable and hardly altered during association. If the monomer structure is known the structure of the aggregate can be derived from surface characteristics of the monomer. However, this does not apply for some ribosomal proteins with almost completely extended chains[421,the monomer being too flexible to assume a defined structure on its own. Its structure is determined only on association with other ribosomal components.-Hierarchy is also found between amino acid sequence and cc-helices (Table 1) because the formation of a-helices is probably largely independent of the three-dimensional structure eventually adopted. The same applies to the association of domains to globular 25

proteins. Accordingly, while the vertical order of Table 1 is not strictly hierarchic it is nevertheless sufficiently hierarchic to provide an acceptable working hypothesis.

5. Separation of the Secondary Structure Once the secondary structures postulated by Pauling and Coreyr439441 had been detected in several globular proteins, a test for hierarchy between secondary structure and amino acid sequence (Table 1) suggested itself. Assuming strict hierarchy, an a-helix, for example, should be determined exclusively by its constituent aminoacidsand by no other part of the chain; and the correlation function between helix formation and amino acid sequence should be deducible from experimental data. Knowing the correlation function, the amino acid sequences of structurally uncharacterized proteins could then be examined and helical regions predicted. This approach has been pursued for some time by several research teams, who predicted a-helices, 0-pleated sheet strands, and 0-bends (Table 2). The purely statistical and

readily applicable methods are far more popular than those which in effect can be used only by the authors themselves. However, their information content suffers as a result of their simplifying formulation. For a long time predictions attracted little interest because they either determined known secondary structures a posteriori or, when dealing with proteins of unknown structure, were rather out on a limb. This situation changed with the test case of adenylate kinase where the object of study was a freshly determined but still generally unknown structurer56,571. The results depicted in Figure 6 clearly demonstrate the success of an approach based on strict hierarchy (Table 3). However, a second test with T4 phage l y ~ o z y m e [ ~having ~] less pronounced secondary structure revealed only slight correlations (Table 3). This modified the overall impression but did not efface it. Table 3. Correlation coefficients between predicted and experimentally observed secondary structure 156, 581. Because of the high helix content (54x)of adenylate kinase a relatively good helix prediction leaves only small regions for other secondary structures. This is one reason for their correlation coefficients being so high.

Table 2. Classification of prediction methods for secondary structures Purely statistical evaluation of structural data

Incorporation of further data: energy balance, structures of synthetic polymers, steric considerations

Applicable without computer program and data bank

References [4447]

References [48-501

Applicable only with computer program and data bank

References p i -541

References [551

~

readily applicable methods stem from early optimistic days. Meanwhile they have been overtaken by more elaborate systems. Furthermore, attempts have been made to improve the correlation by including other physical data. O n the whole,

o

n n o n

Adenylate kinase T4 phage lysozyme

a-Helix

a-Pleated sheet

6-Bend

+0.44 + 0.28

+0.51

+ 0.49 +0.13

+ 0.05

In both tests a-helices were found much more exactly than 0-pleated sheets and 0-bends (Table 3). Since the data bases for all three types of secondary structures are of comparable size, this shows that the assumption of hierarchy (Table 1) is best fulfilled for a-helices. Thus, helix formation is least influenced by distant parts of the chain. Therefore helices are most likely to fold on themselves; they are candidates for nucleation sites, i. e. regions of initial chain folding. The better they can be predicted, the more likely candidates they are. Conspicuously, in both tests the a-helices were detected better in the N-terminal than in the C-terminal region of

0

o n

0

0

p-Pleated sheet

0

0

0

U

0

a - Hebx

1

/Itlrls1

20

40

60

ao

ID0

7 20

1LO

160

180

P3Ibpeptide chain --+

Fig. 6. Additive summary of all secondary structure predictions (hatched)for adenylate kinase 1561. Each method was glven the same weight. The experimentally observed secondary structure (boxes)is included for comparison.

26

A n y r n . Chem. Int. Ed. Engf. 16,23-32 ( 1 9 7 7 )

the chain, indicating that folding begins in the N-terminal region. Since on ribosomes the chain is synthesized from the N- to the C-terminal end folding may already start during chain production. Therefore, renaturation experiments using complete polypeptide chains may not simulate conditions in uivo and it is hardly surprising that some d o not renature under these circumstances. In adenylate kinase three helices (residues 41-84) in the N-terminal half of the chain forming a small domain were fairly accurately recognized (Fig. 6). This indicates that folding may start by forming this domain. The strands of P-pleated sheets may be widely separated along the polypeptide chain. They come together only when the connecting chain segment assumes particular conformations. With such a dependence upon other parts of the chain strict hierarchy cannot be expected. Accordingly, the accuracy of corresponding predictions is low (Table 3). P-Bends are usually located on protein surfaces[591.Energetically they are not much favored for they contain just a single hydrogen bond, which in many cases has to compete with the surrounding water; therefore they are best regarded as passive kink sites. Probably the kink position is defined not only by the local amino acid sequence but also by the folding of adjacent chain regions, giving rise to a rather ill-defined hierarchic order. P-Bends were predicted with considerable accuracy in adenylate kinase (Fig. 6) but with poor accuracy in T4 phage lysozyme (Table 3). The varying success with these two proteins reflects the differences between their structures. While adenylate kinase contains a central pleated sheet surrounded by helices, and is therefore highly ordered, T4 phage lysozyme has less and hardly ordered secondary structure. Moreover, one has to expect that the folding mechanics differ greatly. Therefore it seems reasonable to classify proteins according to their secondary structure (Table 4), because the folding problem will probably have to be tackled differently in the individual classes. Table 4. Structural classes of globular proteins Structural class

Examples

Proteins with few and hardly ordered secondary structures Proteins with predominantly antiparallel pleated sheet

Ribonuclease [60], lysozyme [61], ferredoxin [62]

Proteins with a central, generally parallel pleated sheet and surrounding helices Proteins with predominant r-helix

its hydrogen bonds. The x-helix does not exploit all the possible van der Waals contacts because it forms a large cavity around its axis. Therefore it is energetically disfavored in spite of its straight hydrogen bonds. Among pleated sheets no preference for the parallel, antiparallel, or mixed type can yet be discerned (Fig. 3). It is, however, striking that no pleated sheet is flat; they all have a righthanded twist (Fig. 3). This is plausible because sheet twisting compensates for a twist of the individual chains around their axis (about 20" per residue) along a distance of about five residues; and the chain twist itself corresponds to an energetically favorable chain conformation, i. e. particularly low steric hindrance at the C, atoms (Fig. 1). Energetically, the twisted sheet is about 0.5 kcal/mol per residue more favorable than the flat sheet[761.This demonstrates again how important small energy contributions are in protein folding.

m

Neighbor a

+ b

Fig. 7. Structural preferences in globular proteins. a) Observed neighborhood correlation in antiparallel and parallel 6-pleated sheets, i. r . relative probability of a given strand of a pleated sheet to be connected in the chain direction to the Ist, Znd, ... neighboring strand. b) Knots in a polypeptide chain; not yet observed.

The only undisputed secondary structure aggregate (Table 1 ) is the double cc-helix. In addition, there are several less well-defined folding preferences of the chain. The individual strands in P-pleated sheets for instance display a very clear neighborhood correlation[67Jwhich is even more pronounced for the antiparallel strands than for the parallel ones (Fig. 7). Regions occur close together in the three-dimensional structure which are not very distant from each other along the

Immunoglobulins [63-661, superoxide dismutase [67], concanavalin A 1681, prealbumin 1691, chymotrypsin 1701, bacteriochlorophyll protein [71] Malate dehydrogenase [28], lactate dehydrogenase [391, adenylate kinase 13 I],phosphoglycerate kinase [3S], triose phosphate isomerase [72], carboxypeptidase 1743, several glycolytic enzymes 1731 Myoglobin [l],hemoglobin 1751, hemerythrin [ 2 5 ] , bacteriorhodopsin 1261

6. Structural Preferences Secondary structures are very common in globular proteins. Among the helices, the a-helix having straight hydrogen bonds between the nth and n+4th residues predominates. T o a slight extent one observes 310-helices with n and n+3, but never x-helices with n and n+5. The scarcity of the 310-helix can be explained by the energetic disadvantage of the bend in A n y e w . C h m . Inr. Ed. Enyl. 16, 23-32 ( 1 9 7 7 )

Fig. 8. Chirality of the fi-strand-2-helix-fi-strand group in parallel pleated sheets. The a-helix is shown as a smooth ribbon. The pleated sheet strands are not necessarily adjacent. The right-handed group has a nluch better neighborhood correlation than the left-handed one. a) frontal view, b) side view.

21

chain. In a crude approximation, the polypeptide chain behaves like a piece of string held vertically and allowed to fall onto a flat surface. The resulting coil is not random, it shows neighborhood correlation, it does not entangle, and it can easily be lifted up again. This picture is confirmed by the absence of “knots” in all the structures elucidated so far, the term knot not being used in a mathematical sense (mathematical knots exist only in closed threads), but in the everyday sense (Fig. 7). Between strands of parallel pleated sheets the chain must go from the upper to the lower end of the sheet (Fig. 8). It thereby passes either over the front or over the back side of the sheet, forming a right- or a left-handed screw with the two pleated sheet strands (Fig. 8). In the known structures the right-handed screw predominates in a ratio of about 50:l[771. Owing to the right-handed twist of pleated sheets, the righthand screw permits a better neighborhood correlation than the left-handed screw (Fig. 8). Conversely, the observed predominance shows how important the neighborhood correlation is. Other structural preferences are seen in the formation of domains, typical examples being immunoglobulin molecules (Fig. 5). Domains are usually joined together by a single strand of the peptide chain, and are therefore separated not only spatially but also along the chain[37J.This single chain linkage ensures a better neighborhood correlation within each domain than multiple chain linkages (Fig. 5). Since the neighborhood correlation within the domains is much greater than between domains, the domains are expected to fold separately. This is supported by the structure of chymotryp~in[’~~. In theinterior of the protein there are 13 water molecules between the two pleated sheet barrels, which can be regarded as domains (Fig. 9). The barrels are presumed to fold separately and capture the water molecules on association. The active center of chymotrypsin is located between the two domains. A similar location of the active center is observed in many other enzymes. The limiting size of a domain is about 150 amino acid residues. As derived from the energy balance [eq. (I)] the formation of larger domains should be favored because in larger globules the surface/volume ratio is smaller, and the energy of the hydrophobic bonds (ASwat,,)and that of the internal hydrogen bonds can be increased. In spite of this fact long chains always fold to several smaller domains with high neighborhood correlation. Therefore, folding of several smaller domains seems to be much easier than folding of a single large and thus more complex domain. As in the case of adenylate kinase there are often two linkages between smaller domains. With two linkages one domain has a higher neighborhood correlation than the other (Fig. 5). One has to expect that the domain with the higher correlation folds first and then serves as a support for construction of the remaining molecule. For adenylate kinase this assumption is corroborated by the great accuracy of secondary structure predictions in the domain with the higher neighborhood correlation (Fig. 6). As described above, symmetric structures arise on aggregation of monomeric proteins when only few kinds of building units are utilized. This symmetry is also apparent in the aggregation of domains, particularly if they are structurally similar like those of the light and heavy chains of immunoglobul i n ~- 66], [ ~ ~the domains of rhodane~e’”~,ferredoxin’621, 28

dehydrogenase~[~~l, and triose 3-phosphate i~omerase[’*~ (Fig. 9). Not only domains, but also secondary structures appear

I

a

b

W

Fig. 9. Symmetry in chain folds (schematic representation). a) Chymotrypsin [70]. The pleated-sheet barrels correspond t o one domain each; they are antiparallel and possess twofold internal symmetry. b) The double nucleotidebinding domain of lactate dehydrogenase and other proteins 139, 79. 931. It contains a vertical twofold axis. c) The extremely symmetrical structure of triose phosphate isomerase [72], consisting of a parallel pleated-sheet barrel and eight surrounding %-helices. d ) Optimization of contacts and symmetry: when the A-B contact is particularly fal’orable it is repeated, giving rise t o twofold symmetry. In monomers, only twofold symmetry axes have so Par been observed while the point groups 2, 3. 17 (protein of tobacco mosaic virus). 222, 32, 432 (apoferritin). 532 (spherical viruses) are found in aggregates.

to aggregate symmetrically, e. g. in parvalbumin’80, *11 and within the pleated sheet barrels of chymotrypsin (Fig. 9). The reason for symmetry lies in the optimization of contacts (Fig. 9). Obviously, this optimization is also important in very early stages of chain folding. 411giw.Chrm. Int. Ed. En(//. 16, 23 32 (19771

7. Structural Rules from Phylogenesis The historical background of proteins is a great help in understanding their structures. In the course of biological evolution all these structures have continuously evolved either in parallel or by diverging from each other. Thus, nature has provided us with a wealth of completed experiments. We only need to interpret the results, i. e. extract information of general validity from common structural features. The basic step in the evolution ofproteins is the modification of a side chain. Others are insertions and deletions of amino acid residues which affect also the main chain and which can extend over long chain segments. Usually exchanges in the interior of proteins are conservative (apolar for apolar side chains, etc.), and the energy balance is hardly perturbed. Furthermore, any gaps that may arise (e.9. by omission of a methyl group on isoleucine+valine) are compensated by another exchange, i. e. the high packing density is retained. Apparently the packing density is of great importance for the energy balance. The influence of amino acid exchanges on the folding process has been studied for the globin family[821.It was found that all a-helices carry at their N-terminal end either a proline or a short polar side chain which can hydrogen-bond to the main chain (Asn, Asp, His, Ser, Thr). These hydrogen bonds and/or the proline, which fixes the dihedral angle 4 (Fig. 1) to the a-helix value, are probably required for helix initiation because they limit the number of conformations at this site at an early stage of folding, i.e. form a miniature folding nucleus. In the course of time the modifications accumulate; and when about 80% of the amino acids have been exchanged common features can be recognized only in special cases from the sequence. However, it was found, first on comparison of myoglobin“’ with hemoglobin chains[831and later in cytochrome c[841and the serine proteases[*” that the amino acids vary much faster than the chain fold. The course of the chain and the sites of prosthetic groups (e.g. hemes) are exceptionally well conserved. The observed chain fold conservation cannot be a consequence of keeping the delicate energy balance [eq. (I)] intact; because in meeting the energy requirements the amino acids utilized appear to be much more important than the chain fold. Therefore, chain fold conservation indicates that the folding mechanics and with it the folding pathway, impose stringent conditions on the protein. Apart from continuous small evolutionary changes there are drastic discontinuous ones caused by gene fusion and gene duplication. A recent example of fusion is observed in the immunoglobulins where the genes of the variable chain regions are probably linked with those of the constant regions[s61. Gene separation and fusion must also have occurred in dehydrogenases because lactate dehydrogenase has the NAD-binding domain as N-terminal half, whereas an extremely similar domain is utilized in the C-terminal half of alcohol dehydr~genase[’~!Most gene duplications lead to separate chains and thus to proteins which develop separately[’ 71: globins, serine proteases, midbrain hormones, lysozyme, and 3-lactalbumin, etc. However, some duplications double the chain length, e. 9. in ferredoxin[621,rh~danese[~’l, chymotrypsin[881, parvalbumin[80.“1, troponin C[”], within the NAD-binding domains of d e h y d r o g e n a ~ e s ~in~ ~tropo], myosin[901,etc. As described above, most such chains display Angew. Chem. l n t . Ed. Engl. 16,23--32 ( 1 9 7 7 )

internal symmetry. The folding mechanism is probably modified only insofar as now two nuclei fold similarly but separately and subsequently aggregate. The wealth of examples discovered show[*’] these discontinuous modifications to play a considerable role in the production of new protein structures.

8. Phylogenesis on the Basis of Structures The previous section demonstrated that the historical development of proteins permits deduction of structural rules. Conversely, structural analysis provides information about phylogenetic relationships. It becomes particularly important when proteins have diverged so much that there are no longer similarities between theamino acid sequences, but only between chain folds. Since chain folds are particularly well conserved, such resemblances indicate interrelationships in very early stages of evolution. Of special interest in this context is the evolution of metabolic pathways which are otherwise hardly accessible to study. Structural similarity is not necessarily of phylogenetic origin, but may be an expression of a structural rule. For instance, nobody would think that a - k e r a t i ~ ~ is ~ ’related ~] to tropornyosin[241merely because they both form a double ol-helix structure (Fig. 4). This structural aggregate just happens to be energetically favored; therefore it is adopted independently by several proteins and the structural coincidence is not significant. A different situation is encountered in T4 phage lysozyrne[”’, hen egg white lysozyme[611,and a-la~talbumin‘~’~, whose structures are complicated but nevertheless similar. Their chain folds are not unique in any particular way; for there are many other kinds of complicated protein structures. Consequently, the structural coincidence of these proteins is a significant event which can only be explained by assuming a phylogenetic relation. These examples show that the degree of significance is of decisive importance in structural resemblance and that this significance can only be established with a knowledge of structural rules. This significance becomes extremely high when the amino acid sequences display common features. For example, 45 amino acids are common to hen egg white lysozyme and 3 - l a c t a l b ~ m i n [ The ~ ~ ~ .probability of a rahdom coincidence is 1 :204’, giving rise to a significance of 204’:1. Admittedly, this value has to be reduced because 84 other amino acids are different, the sequences have been shifted relative to each other to achieve optimal fit, and the amino acids are not distributed randomly. Nevertheless the significance still remains so high that a phylogenetic relationship can be unequivocally concluded. Significances based solely on three-dimensional structural features have been calculated for a group of proteins having central pleated sheetsr931which no longer display any similarity in the amino acid sequences (Fig. 9). The pleated sheet was defined as reference structure and the chain fold referred thereto. Viewed in such a way, an n-stranded pleated sheet can assume M=2”-L.n! different chain topologies. If two proteins have identical topologies the significance of this coincidence[931is M : 1 (Table 5). These values are reduced, however, by the observed predominance of the right-handed screw in connections between pleated sheet strands (Figs. 8 and 9)[771.A further reduction arises from the neighborhood correlation in pleated sheets (Fig. 7) because in the case considered most of the 29

strands are proximate. If the symmetry in the pleated sheet is caused by gene duplication another reduction is necessary. On the other hand the significance is raised if not only the chain topology but also the location of active center and substrates is taken into account (Table 5). Moreover, it should be remembered that this comparison is restricted to a single structural class (Table 4). Extension of the comparison to all proteins enhances the significance. Some of these contributions are difficult to estimate. In conclusion, only within the dehydrogenases are the resulting significances large enough to indicate a phylogenetic relationship (Table 5); the other structural similarities can be of purely physical origin. In a similar way the p-pleated sheets of superoxide dismutase and immunoglobulin domains were subjected to a topological comparison‘671.Including neighborhood correlations a significance of about 3000:l was found, which clearly indicates a phylogenetic relationship. The calculations given above require the definition of a chain topology which is only possible if the chain fold is referred to a rigid substructure such as a pleated sheet. Comparisons are then restricted to the protein class containing this substructure (Table 4). In contrast, direct geometrical comparison of chain folds is a much more general approach. In this case, similarity is expressed in terms of the average distance of corresponding C, atoms (AC,), or by closely related indices. However, this comparison has the drawback that structure rules can be considered only with difficulty and no value can be derived for the significance, which eventually decides whether or not a resemblance is of phylogenetic origin. In principle, a relationship between (AC,) value, structure rules, and significance can be established by simulating chain folds with the aid of a random number generator on a computer and comparing them with one another[99! In this procedure all known structural rules, such as neighborhood correlations, etc., should be considered. The chain length should correspond approximately to that of a domain (Fig. 5). The initial task in the comparison is to find the relative orientation which corresponds to the minimal (AC,) value. This would lead to a frequency distribution of minimal (AC,) values resembling that sketched out in Figure 10. If we want to derive the significance of the similarity of two protein structures

Fig. 10. Schematic frequency distribution of average C, distances (AC,) between all pairs taken from a set of G simulated chain folds. The significance N ofa structural similarity between any two proteins having a given (AC,) is derived from N = G 2 / Z ,where Z is the integral of the frequency distribution taken from zero to the (AC,) value considered.

of radiation quanta[”] the simulated structures obey Poisson statistics, i. e. the coincidence ratio Z / G equals the average population GIN. Thus the total number N of different structures with (AC,)23& i.e. the desired significance, can be determined from N = G 2 / Z .Using the frequency distribution of (AC,), the significance of a structural similarity between two proteins can be deduced for any mean C, distance (AC,), because the number of coincidences Z is known for any (AC,) (Fig. 10).

9. Outlook:Determination of Protein Structures without Crystals Usually, protein structures can only be solved if the proteins have been crystallized and subsequently analyzed by X-ray diffraction. However, one of the ultimate goals, determination of a three-dimensional structure from the amino acid sequence alone, has already been reached in a number of special cases. For example, the known three-dimensional structure of a given protein can be used in determining the structure of all its phylogenetic relatives once their sequences are known. For that purpose the sequence is fitted into the known chain fold. Remaining discrepancies are ameliorated by subsequent Since the chain fold is extremely energy well conserved during evolution the method can be applied even to such distant relatives as thrombin and trypsinrssl. In the case of the muscle proteins parvalbumin and troponin C, not only the same chain fold but also internal molecular

Table 5. Significance of structural similarities on topological comparison of proteins with parallel pleated sheets.

Adenylate kinase/subtilisin Adenylate kinase/flavodoxin Adenylate kinase/any dehydrogenase Subtilisin/flavodoxin Subtilisin/any dehydrogenase Flavodoxin/any dehydrogenase Any dehydrogenase/other dehydrogenase

Significance including the location of the active center [93]

Significance after accounting for the preferred chirality of the P-a-B groups 1771

24CQ:l 240:l 160: 1 4800: 1 2400: 1 4800: I 115200:1

340 :1 30.1 20: 1 350:l 180:l 350:l 4420:l

having, for example, (AC,) = 3 A, then we are actually enquiring as to how many structures N there are with (AC,) exceeding 3 A,because this quantity N corresponds to the significance. To determine N it is by no means necessary to simulate all possible structures. One only has to randomly generate G structures and compare them with each other. This yields Z coincidences with (AC,) I3A (Fig. 10). As in the counting 30

symmetry are postulated. Similarities between the sequences indicate that troponin C contains four of the substructures found in parvalbumin (one Ca2 binding site and two neighboring a-helice~)[*~’. In parvalbumin, these substructures have aggregated symmetrically, forming a twofold axis. It therefore appears reasonable to assume that in troponin C the two other substructures are added with twofold symmetry, leading +

Angew. Chem. Inr. Ed. Enql. 116,23-32 ( 1 9 7 7 )

to overall 222 (D2)symmetry. Hence a structural model of troponin C can be built on the basis of the sequence[961. Once again these examples demonstrate the utility of the phylogenetic relationships for solving protein structures. Although the direct determination of a three-dimensional structure from an amino acid sequence is still a distant aim, the first steps in this direction have already been taken: an attempt was made to derive the known three-dimensional structures of myoglobin and pancreatic trypsin inhibitor a posteriori with the aid of simple rules. Strict hierarchy was assumed for the a-helices in both cases. In other words, it was assumed that they can be determined by secondary structureprediction methods and that they d o fold first. The folding simulation then assumes ready-made a-helices. Since myoglobin contains practically only helices, in this case folding is merely helix association; the accessible conformational space is appreciably reduced. A simple scheme was postulated for calculating the packing energy of helices; and the folding pathway was determined as the direct route to the energy minimum[971.At this minimum the native structure was obtained, confirming the validity of the method. However, this method is limited to purely helical proteins. Pancreatic trypsin inhibitor, on the other hand, contains only one short a-helix. In this case, the geometry of peptide bonds and side chains as well as the determination of the bond energies were simplified. Folding to reach the energy minimum was then simulated by an iterative energy minimization procedure. The most favorable simulation run yielded a structure with an average C, distance to the native structure of 6.3AI9*’. These simulations and all the structural analyses of proteins described above nurture our hope that in the distant future three-dimensional structures can be derived merely from the amino acid sequence. In any case, they show how well the initial shock caused by the complexity of protein structures has been overcome. Moreover, some degree of order has already been established in the wealth of existing data. It may be expected that this order will become all the greater the more data become available, i. e. the more X-ray structure analyses succeed. Received: June 29, 1976 [A 141 IE] German version: Angew. Chem. 89, 24 (1977)

J . C . Keiidrew, R . E. Dickerson, B. E. Srraiidberg, R . G. Harr, D. R. Dui,i~&, D. C . Phillips. and V C. Shore, Nature 1x5, 422 (1960). R. E. Marsh and J . Donohue, Adv. Protein Chem. 22, 235 (1967). G. N . Rumuchandran and I.:Susisekharnn, Adv. Protein Chem. 23, 283 ( 1968). A. L. Lehninger. Biochemie. Verlag Chemie. Weinheim 1975. C. 5 . A$nseii, Science 181, 223 (1973). D. B. Werlaufer and S . Ristow, Annu. Rev. Biochem. 42, 135 (1973). C. B. Anfinsen and H . A. Scheraga, Adv. Protein Chem. 29, 205 (1975). W Kuuzmonu. Adv. Protein Chem. 14, 1 (1959). B. Furie, A. N. Schechter, D. H . Sachs, and C . B. Aiifinsen, J. Mol. Biol. 92, 497 ( I 975). F . M . Richards, J . Mol. Biol. 82, 1 (1974). A. I . Kiluigororlskg: Molecular Crystals and Molecules. Academic Press, New York 1973. J . F . Brundrs, R. J . Olioeira, and C . Westorr, Biochemistry 9, 1038 (1970). rl’Ans-Lax: Taschenbuch fur Chemiker und Physiker. Vol. 1. SpringerVerlag, Berlin 1967. B. Lee and F . M . Richards, J . Mol. Biol. 55, 379 (1971). C . Chorliia, Nature 248, 338 (1974). S. L. Miller, J . Am. Chem. Soc. 77, 2351 (1955). 7: E . Creiqhron, J . Mol. Biol. 95, 167 (1975). R . Huhm. D. Kukla. A . Riihbiranrr, 0. Epp, and H. Fnrmorrek, Naturwissenschaften 57, 389 (1970). D. Werlaufer, E. Kwok, W L. Anderson, and E. R . Johnson, Blochem. Biophys. Res. Commun. 56, 380 (1974). AJIY~U*.Chrm. I n t . Ed. Engl. 16, 23-32 ( 1 9 7 7 )

F H . C . Crick, Acta Crystallogr. 6 , 689 (1953). (‘ohen and K . C. Holmrs, J . Mol. Biol. 6 , 423 (1963).

(

11. E. Huxley. Science 164, 1356 (1969). R . D. B. Fraser and 7: P. MacRae, Nature 233, 138 (1971). C . Cohen, Sci. Am. 234, 36 (Nov. 1975).

W A. Hendrickson and K . E. Ward, Biochem. Biophys. Res. Commun. 66, 1349 (1975). R . Heirdersoii and P. N . 7: Unwin, Nature 257, 28 (1975). S. 7: Ruo and M . G. R O S V U U ~. JII.Mol. , Biol. 76, 241 (1973). E. Hill, D. Aeriiouglou, L. Webb, and L. J . Bnnaszak, J . Mol. Biol. 72, 577 (1972). I . Ohlsoti, B. Nordstriim, and C . I . B r u n d h , J . Mol. Biol. 89, 339 ( 1974). M . Buehiirr, G. C. Ford, D. Morus, K . W Olsrn, and M . G. RoS.SmUni1, Proc. Natl. Acad. Sci. USA 70, 3052 (1973). G. E. Schulz, M . Elziiiga, F. M a r x , and R . H . Schirmer, Nature 250, 120 ( 1974). K . D. Watenpaugh, L . C . Sicker, L. H . J e n ~ tJ,. Legall, and M . Dubourdieu, Proc. Natl. Acad. Sci. USA 69, 31 85 ( I 972). R . D. Andersen, P. A. Apgar, R . M . Bumett, G. D. Darlirig. M . E. Leqiirsne, S. G. Marhew, and M . L . Ludwiy, Proc. Natl. Acad. Sci. USA 69, 3189 (1972). C . Schuberr-Wight, R. A. Alden, and J . Kraur, Nature 221, 235 (1969). C. C. F. Blake and P. R. Eoans, J . Mol. Biol. 84, 585 (1974). R. J . Fletterick, personal communication (1976). P. A . Peter,son, B. A. Cunningham. I . Berggdrd, and G. M . Erl~lmmi, Proc. Natl. Acad. Sci. USA 69, 1697 (1972). G. E. Schulz, H . Zappe, D. J . Worthiuyton, and M . A. Roseineyer, FEBS Lett. 54, 86 ( I 975). M. J. Adorns, G . C . Ford, R . Koekoek, P. J . Leiirz, A. McPhersoii. M . G. Rossmann, 1. E. Smiley, R . W Scheritz, and A . J . Wiiniicotr. Nature 227, 1098 (1970). S . G. Warren, B. F . P. Edwards, D. R . Eraiis, D. C. Wiley, and W N . Lipscomh, Proc. Natl. Acad. Sci. USA 70, 1 1 17 ( 1 973). M . Griitter and R. M . Franklin, J . M o l . Biol. 89, 163 (1974). G. W Tischendor/, H. Zeichardt, and G. S t o n e r , Proc. Natl. Acad. Sci. USA 72, 4820 (1975). L . Paulmg and R . B. Corey, Proc. Natl. Acad. Sci. USA 37, 729 (1951). A. L! Gzizio, Biophys. J. 5, 809 (1965). . J . W Prothero, Biophys. J . 6, 367 (1966). R. Leberman, J. Mol. Biol. 5 , 23 (1971). P. N . Lewis, F . A. Momarty, and H . A . Scheragu, Proc. Natl. Acad. Sci. USA 68, 2293 (1971). M . Schffer and A. B. Edmundson, Biophys. J. 8, 29 (1968). P. I: Chou and G. D. Fasman, Biochemistry 13, 222 (1974). V 1. Lim, J . Mol. Biol. 88, 873 (1974). B. Robson and R . H . Pain, Biochem. J . 141, 883 (1974). K . NagaJlo, J. M o l . Biol. 84, 337 (1974). E. A. Kuhat and 7: 7: M i , Proc. Nail. Acnd. Sci. USA 70, 1473 (1973). A . V Finkelstein and 0 . B. Pritsjn, J . Mol. Biol. 62, 613 (1971). P. N . Lewis, N. G3, M . Go, D. Ktrtelchock, and H . A . Scheraga, Proc. Natl. Acad. Sci. USA 65, 810 (1970). G. E. Schulz, C . D. Barry, J . Friedmuri, P. I: Chou, G. D. F U S J ~ U U , A. V Fiiikel.~rein,! I I . Lim, 0. B. P t i t s w , E. A . Kahnt, 7: 7: Wu, M . Lecitt, B. Robson, and K . Nugano, Nature 250, 140 (1974). E. A . Kabor and T 7: Wu, Proc. Natl. Acad. Sci. USA 71, 4217 (1974). B. W M a t t h e w , Biochim. Biophys. Acta 405, 442 (1975). I . D. Kurirz, J . Am. Chem. SOC.94, 4009 (1972). H . W Wyck@ D. 7ieriiouglou, A. W Haitson, J . R. Kno.y, B. Lee, and F . M . Richards, J . Biol. Chem. 245, 305 (1970). C. C. F. Blake, D. F . Koenig, G. A. Mair, A C. T North, D. C. Pliillips, and L! R . Surma, Nature 206, 757 ( I 965). E. 7: Adman, L. C. Sieker, and L . H . J e ~ i ~J.i Biol. , Chem. 248, 3987 (1973). P. M . Colman, J. Deisenhofer, R . Huber, and W Palm, J . Mol. Biol. 100, 257 (1 976). R. J . Poljak, L . M . Amzel, B. L. Cheii, R. P. Phizackerley, and F . Saul, Proc. Natl. Acad. Sci. USA 71. 3440 (1974). D. M . Segal, E. A. Padlan, C. H . Cohen, S . Rudikofi, M . Porter, and D. R. Dacies, Proc. Natl. Acad. Sci. USA 71, 4298 (1974). A. B. Edmundson, K . R. Ely, E. E. Aholo, M . Scltr/J;, and N . Prmaqioropoulos, Biochemistry 14, 3953 (1975). J . S. Richardson, D. C . Richardson, K . A. Thomas, E. W Silterron, and D. R. Ducies, .I. Mol. Biol. 102, 221 (1976). K . D. Harrlman and C . F . Ainsworth, Biochemistry 11, 4910 (1972). C. C. F. Blake, M . J . Geisow, I . D. A. Swan, C . Rerar, and B. Rerut, J. Mol. Biol. 88, 1 (1974). 3. J. Birkroft and D. M . Blow, J . Mol. Biol. 68, 187 (1972). R . E . Fenna and B. W Matrhews, Nature 258, 573 (1975). D. W Banner, A. C. Bloomer, G. A . Petsko, D. C. Phillips, C. I . Pogson, I . A. Wibon, P. H . Corran, A. J . Furrh, J . D. Milman, R. E. OjJ)rd, J . D. Priddle, and S. G. Waley, Nature 255, 609 (1975). C. C. F. Blake, Essays Biochem. 11, 37 (1975). F . A. Quiocho and W N . Lipscomb, Adv. Protein Chem. 25, I (1971).

31

1751 M . F. Prrutz. H . Muirheud, J . M . Cox,and L. C. G. Goumun, Nature 219, 131 (1968). [76] C . Ckorhiu, J. Mol. Biol. 75, 295 (1973). [77] M . J . E. Strriiberg and J . M . Thortitoti, J. Mol. Biol.. 105. 361 (1976). 1781 J . Bergatiiu, W G. J . Hol, J. N . Jnnroriius, K . H . Kolk, J . H . Ploeqmuri, and J . D . G. Smit, J . Mol. Biol. Y8, 637 (1975). [79] M . G. Rorsmann. D. .Morns, and K . W. Olsen, Nature 250, 194 (1974). [SO] A . D . McLuchluri, Nature N e w Biol. 240, 83 (1972). [ X I ] R. H . Kretsinyer, Nature New Biol. 240. 85 (1972). [82] 0. B. Ptitsyti, J. Mol. Biol. 88, 287 (1974). [87] M. F. Perutz, J . Mol. Biol. 13, 646 (1965). [S4] 7: Tukuno, 0. B. Kulloi, R . Swanson, and R. E. Dickeraon, .I.Biol. Chern. 248, 5234 (1973). [SS] R . M . Srroud, Sci. Am. 233, 74 (July 1974). 1x61 G. M. Edelmuri, Science 180, 830 (1973). [87] M. 0. Duyhoff: Atlas of Protein Structure. Natl. Biomed. Res. Foundation, Washington 1972.

[88] B . S. Hurtley, Philos. Trans. R. Sac. London. Ser. B: 257, 77 (1970). [89] J . H . Collins, J . D . Pottm, M . J . Horri. G. Wikhirr, and N. Jucktnari, FEBS Lett. 36, 268 (1973). 1901 A . D . M ~ L ~ c h hMi ., Srriwrt. and L. B. Smillie. 'J. Mol. Biol. 98, 281 (1975). [91] B. W Mutt/zew.s and S. J . Remington. Proc. Natl. Acdd. Sci. USA 71. 4178 (1974). [92] W J . Browie, A . C . 7: North. D . C . Phillips, K . Brew, 7: C . Vrnamrm, and R. L . Hill, J. Mol. Biol. 4 2 , 65 (1969). 1931 G. E. S c h u k and R . H . Schirmer, Nature 250, 142 (1974). [94] G. Frietllnnder and J . W K e n r i e t f r Introduction t o Radiochemistry. Wiley, New York 1959. 1951 ,M. Lrcrrr. J. Mol. Biol. 82, 393 (1974). [96] R . H . Krrtsiyger and C . D. Burr),, Biochim. Biophys. Acta 4 0 5 . 40 (1975). [97] 0. B. Priuyri and A . A. Rushin, Biophys. Chem. 3 , I (1975). (981 M . Lecitr and A. Wurshel, Nature 253, 694 (1975). [99] G. E. Schulz, J . Mol. Evol., in press.

Experimental Electron Densities and Chemical Bonding By Philip C o p p e n s [ * ] Some recent results of charge density analysis by X-ray and neutron diffraction are discussed. Problems that have been studied in a number of laboratories include the nature of single, double, and triple bonds, lone-pair hybridization, bonding in some metals, alloys, and organomet a l k compounds, and the derivation of physical properties from X-ray diffraction densities. At the present stage of development of methods studies of series of related compounds are feasible and expected to find widespread application.

Clearly a molecule is much more complicated than an atom. So the question arises: what happens to an atom when it becomes part of a molecule? [ C . A . Coulson, Mem. SOC.R. Sci. Liege Collect 8" 6 (2), 143 (197111.

1. Introduction Part of the answer to this deceptively simple question can be provided by X-ray and neutron crystallography. Since X-ray scattering is primarily a result of the interaction between X-rays and electrons, it contains information on the electron distribution in the scattering material. As early as 1918, Debye and Scherrer discussed the determination of electron orbits"], but for reasons more easily understood now their determination proved to be an elusive goal. Much pioneering work in the fifties and early sixties''] was necessary before technical breakthroughs such as automatic diffractometry and high speed computing led to improvements in accuracy which made details of the electron distribution accessible to the experimentalist. One of the problems to be faced has been that both the electron distribution and the positions and thermal vibrations of the atoms had to be deduced from the measurements. A significant step forward was therefore made possible with the advent of high-flux neutron beams at nuclear reactors which allowed an independent determination of the atomic parameters because neutrons interact with atomic nuclei [*] Prof. Dr. P. Coppens Chemistry Department State University of New York at Buffalo Buffalo, New York 14214 (USA)

32

(through nuclear forces) rather than with electrons. More recent developments in methodology described below have also been centered on obtaining the combined information from accurate low-temperature X-ray measurements alone so that the double effort and expense of measuring both X-ray and neutron data sets is not always necessary. Today, charge density analysis with accurate diffraction methods is becoming increasingly sophisticated and applicable to crystals of sufficient complexity to be of chemical interest. Here, we will outline some typical chemical and physical information that can be obtained at the present stage of development, rather than attempt to describe all recent work in the field. A more comprehensive review (up to early 1974)and a comparison of experimental and theoretical results have been published e l s e ~ h e r e [ ~ . ~ ~ .

1.1. Difference Densities and Experimental Considerations The effect of bonding on the electron density is of a rather small magnitude when compared with the total number of electrons in a solid. This means that to a fairly good approximation the electron density p is equal to the sum of the densities the constituent atoms would have if all interactions with their neighbors were neglected. Since we are interested precisely in the small bonding effects it is most opportune to calculate the difference between the observed density and the density of the free atoms in their electronic ground state centered at the atomic positions in the crystal. This function may be called the deformation density pdeformation because it represents the deformation that occurs when the atoms interact to form chemical bonds. Thus: Anyew. Chem. lnr. Ed. Engl. 16,32-40 ( 1 9 7 7 )

Structural rules for globular proteins.

1531 W C. Lnmma, Jr., J. Am. Chem. SOC.91, 2820 (1969). [54] a) J . B. Bapor, D. S. C . Black, R . F . C. Brown, and C. Ichlov, Aust. J. Chem. 25, 244...
1MB Sizes 0 Downloads 0 Views