J. Mol. Biol. (1991) 218, 397-412

Influence of Proline Residues on Protein Conformation Malcolm W. MacArthur132 and Janet M. Thornton1 ‘Biomolecular Structure and Modelling Department of Biochemistry and Molecular University College London Gower Street London WClE 6BT, U.K.

Unit Biology

2Laboratory of Molecular Biology Crystallography Department Birkbeck College Malet Street London WClE 7HX, U.K. (Received 5 September 1990; accepted 16 November

1990)

To study the influence of proline residues on three-dimensional structure, an analysis has been made of all proline residues and their local conformations extracted from the Brookhaven Protein Data bank. We have considered the conformation of the proline itself, the relative occurrence of cis and trans peptides preceding proline residues, the influence of proline on the conformation of the preceding residue and the conformations of various proline patterns (Pro-Pro, Pro-X-Pro, etc.). The results highlight the unique role of proline in determining local conformation.

tion caused by the loss of the imide hydrogen (Kartha et al., 1974). The residue preceding proline must also be given special consideration because the bulky pyrrolidine ring restricts the available conformational space. Proline residues are therefore recognized as being of special significance in their effect on chain conformation and the process of protein folding. In view of these exceptional properties it is not surprising that it tends to be a conserved residue and plays a special role in protein structure and sometimes function. It has been suggested that a proline may be actively involved in the regulation of transmembrane proteins such as the sodium pump, by having cisltrans isomerization synchronous with ion translocation. In many of theemeactive transport and channel proteins, proline residues are located in the middle of transmembrane helices and are highly conserved (Brand1 & Deber, 1!%36). In water-soluble proteins, proline residues found in the centre of a-helices cause a sharp kink of 20” or more, but are conserved, which suggests that they are functionally or structurally important (Barlow & Thornton, 1988). it was thought that proline Until recently, residues occurred as isolated residues, and sequences of two or more were generally absent or rare, in globular proteins. With the increasing size of the protein sequence database it is becoming alpparent that proline residues are found at a much higher

1. Introduction Proline is unique among the amino acids in that the end of the side-chain is covalently bound to the preceding peptide bond nitrogen. This leaves the backbone at this point with no amide hydrogen so that no hydrogen bonding is possible. The fivemembered ring also imposes rigid constraints on the N-C” rotation. As a result the conformational energy of a proline residue depends largely on the value of $. For an isolated proline residue there are two minima at I+!J= -55” and I/I = + 145” (Schimmel & Flory, 1968). Proline residues also have a relatively high intrinsic probability (01 to 63) of having the cis rather than the trans isomer of the preceding peptide bond (Brandts et al., 1975) as compared amino acids with less than 10m3 for other (Ramachandran & Mitra, 1976). Energy calculations by Wiithrich & Grathwohl (1974) suggest that the standard free energy A@ for the equilibrium is of the order of 1 to 2 kcal/mol (1 cal = 4184 J). The activation energy barrier for cis-trans isomerization is also less for proline: 13 kcal/mol, compared with 20 kcal/mol at other peptide bonds (Schultz & Schirmer, 1978). This is partly due to the greater length of the X-Pro peptide bond (1.36 A instead of 1.33 A; 1 A =O.l nm), which results from the redistribution of charge and lack of resonance stabiliza-

397 002%2836/91/060397-16

$03.00/O

0 1991 Academic Pre,ss Limited

M.

398

W. MacArthur

frequency than average in many proteins. They may be present as random single units; in pairs or included in multiple tandem repeats. Perhaps the most striking of these is the one observed in the circumsporozoite prot’ein of Plasmodium falciparum (the malarial parasite) where Asn-Ala-Asn-Pro is repeated 37 times (Dame et al., 1984). Another example occurs in a class of proteins found in parotid gland salivary secretions in which groups of up to five proline residues may be found repeated at short intervals (Kauffman et al., 1986). Similar proline-rich proteins have been isolated and characterized from diverse sources, including ovine colostrum, rat prostate, serum chylomicrons and the respiratory tract!, in addition t,o the saliva of rat, human, rabbit and Drosophila. Many viral proteins a,re now known to contain segments rich in proline for example, SFV capsid C protein, repeats, polyoma VP1 protein, simian virus 40 VPl, influenza virus haemagglutinin, and hepatitis B core antigen. A nuclear protein in Epstein-Barr virus has no fewer t’han 29 proline residues in succession. Multiple repeat sequences of the type Pro,, (Pro-X),, (Pro-X-Y), etc., have thus been observed. In no case has the structure been det,ermined. Reference to the work of McCaldon & Argos (1988) shows just how unlikely such sequences are. They have, however, shown that certain oligopeptides within proteins occur at a far greater frequency than expected, with a striking preference for repet,itive sequences. They furt’her observed that such over-represented oligopeptides tended to be structurally conservat.ive. The aim of this study is to analyse the effects of proline residues on conformation; including such repeating patterns. A search was made of t’he available protein crystal structural data to define a correlation between sequence, backbone geometry and st,ructural preferences.

and J.

M.

ThorntorL

/ r Figure 1. A sampie query performed using the ORACLE relational database STEP. The dots indicate records that have been omitted for brevity. The query is an extract of all cis proline residues wit,h t,he residue to eit,her side, for structures determined to a resolut,ion d2.5 a.

a shorthand nomenclature (Efimov. 1981); in which CI = helical region, p = P-strand region: and rL = left-handed cc-helical region. The B region is further subdivided into & = polyproline region and & = extended sheet region (for a complet,e clas&cation, see Wilmot & Thornton, 1991). P;ote that although CI represents the .‘sc-helical” portion of the Ramachandran plot, it does not necessarily imply that this residue is part of an a-helix. Secondary structure assignment’s were made using a modified form of the Kabsch 8r Sander (1982) algorithm (D. Smith, personal communica.tion). Residues at the ends of strands and helices which formed the appropriate hydrogen bonds. but do not necessarily have the apprw priate 4, tj value, are included in the secondary structure. Thus, in practice many heliees and strands are extended by 1 residue at, their termini.

3. Proline 2. Methods

and Data

X-ray crystallographic data from the Brookhaven Protein Data Bank (Bernstein et al., 1977) were used in the analysis. Only non-homologous structures ( < 20 y0 identity) determined to 2.5 a resolution or better were used in the study (M. Johnson, personal communication). The new relational database of protein structure STEP was used to extract the data (Akrigg et al., 1988; Islam & Sternberg. 1989; D. Smith et al., unpublished results). Information was retrieved using the quer; language SQLPLUS. A typical sample query is shown m Fig. 1. As a first step, all examples of prolines and proline patterns of interest such as X-Pro, X-&Pro, XPPX; XPXPX. etc. were collected from the database together with basic information on secondary structure assignments and torsion angles #J. $; w. The data set, representing each pattern was then divided into groups according to structure type, based on 4, $ values. In order to see how these patterns relate to neighbouring residues, local secondary structure elements and t,he molecule as a whole, they were examined in greater detail on an Evans & Sutherland graphics display system using the programme FRODO (Jones, 1978). Throughout this paper the 43 $ conformation of residues is described using

Conformation

(a) 4, t/j Conformation

of proline

As shown by the 4: II/ plot for 963 trans proiine residues in Figure 3(a); proline in proteins adopt,s two distinct conformations that are almost evenly divided (a : p = 44 : 56) between the two theoretically predicted minima, with the broader energy well of the polyproline region being slightiS favoured. The t#wo groups are tightly clustered about their mean values of 4, $ = -6l”, -35” for the c( region and 4: $ = -6S”, 1.50” for the p region. The mean value of $Pro = - 63” ( & 15)“. The conformation adopted by the proline appears to be influenced by the nature of the preceding residue (see Table I). When it, follows an Asp, for example, there is a very high probability of the a conformat’ion being adopted (z:P=9: 1). Conversely, in the case of Val, it is much more likel>to be found in the p region (TX: ,/?= 1 : 4). When preceded by hydrophobics it generally farours the p conformation (see below). When proline is present in the cis arrangement (see Fig. 2) there is a stronger preference for it to

Proline

Residues in Proteins

4. Cis and Tram Peptides Preceding Proline Residues

=z cis-x-

Figure peptides.

2. Tram

399

and cis arrangements

Nuclear magnetic resonance experiments on dipeptides have indicated that the cis : trans proline (see Fig. 2) ratio depends on the amino acid sequence in the immediate environment of the proline residue, and that the interconversio-n rates may vary by as much as a factor of 10 depending on et al. the type of preceding residue X. Brand& (1975), have shown that the isomerism change I%c+t becomes slower as the bulkiness of the side-chain of residue X increases in the series Gly-Pro, Ala-Pro and Val-Pro. Aromatic residues show a tenfold reduction in the isomerization rate k,,, (Gra.thwohl 85 Wiithrich, 1981). They also found that the equilibrium is markedly affected by variation of the sequence outside the immediate environment of the proline residue. In deuterated dimethylsulphoxide, Thr-Phe-Pro contains approximately 60 “/ while Phe-His-Thr-Phe-Pro has only 15% cis proline. Nuclear magnetic resonance studies by Dyson et al. (1988) on peptides having the sequence YPXDV in aqueous solution have shown that the cis : trans ratio can be influenced by the nature of the residue following the proline. They found that aspartate and asparagine significant’ly increase the population of the cis isomer. For the sequence YPYX.V they also found that positively charged side-chains appear to destabilize cis relative to trans, while Asp, Asn and Gly slightly stabilize the cis form.

Pro

of planar

X-Pro

occur in the fi region (CI: ,0 = 24: 76) as shown in Figure 3(b). Compared with trans proline residues the distribution shows a pronounced displacement of the two clusters, in particular a shift to more negative values of 4 in both regions and to a more positive +!Ivalue in the a region. For the a region the mean values are 4, tj = (-86”, 1”) and for the j3 they are 4, $ = (-76”; 159”). The displacement to more negative 4 values arises from the need to reduce the steric clash between the c” hydrogen of the preceding residue and the carbonyl carbon of the proline, while the shift to more positive I,!I value helps to reduce the steric conflict between the Ca of the preceding residue and the carbonyl oxygen of the proline. The /? region values are similar to those found in the polyproline I helix 4, $ = (-83”, 158”). It will be noted that these shifts involve movements away from the lowest energy regions that are available to bans proline. This more highly strained ring geometry, which is a necessary compromise in the cis conformation, may be another reason why cis proline residues are less frequently observed than would otherwise be expected on theoretical grounds based on calculations that consider only the cis peptide bond.

(a) Frequency

of X-cisPro

in globular

The total number of proline Brookhaven Data Bank determined

records in the to a resolution

Table 1 Secondary

x GUY Ala Val LIXI Ile Phe TYr TOP CYS Met Ser Thr LYS A% His ASP Asn GlU Gln PIYJ

Assignments percentages.

structure and conformation trans proline residues from

of proline in X-Pro structures determined

for

963 no+entical

to < =2.5 A

Totals

Helix

/&sheet

Turn

Other

62

181 24.4 243 21.8 33.3 256 23.1 33.3 259 250 255 241 260 13.9 32.4 50.0 429 22.9 &6 21.6 269

169 15.4 257 12.6 12.1 140 17.9 11.1 7.4 167 9.8 143 18.0 194 >7 0.9

208 19.2 16.7 25.3 197 18.6 154 222 33.3 250 25.5 23.3 18.0 30.6 40.5 303 23.2 12.5 22.8 243 23.8

44.2 41.0 33.3 40.3 349 41.8 43.6 33.4 33.4 33.3 392 37.7 38.0 361 24.4 l&8 339 52.1 57.2 487 38.4

77 73 88 65 44 38 9 25 12 52 14 49 34 36 56 54 46 36 33 963 are modified

(Kabsch

8: Sander,

12.5 11.4 5.4 109 1983)

as defined

in the text

%a:%8

and numbers

proSteins

55:45 29:71 22:78 30:70 36:64 51:49 38:62 44:56 60:40 27 : 73 52148 42:58 45:55 24:76 81: 19 89:ll 69 : 31 35:65 28~72 52:48 44:56 represent

400

M. W. MacArthur

45

90

135

and J. M. Thornton

c,

-4

I 80

3 0 &

-135 -90

-45

45

Phi

Phi

(a)

(b)

90

135

Figure 3. (a) q%,tJ plot for tram proline residues. The 963 examples are drawn from non-homologous structures determined to G2.5 ,k resolution. A total of 44% are in the tl region, compared to 56% in the /I region. (b) 4; ti plot for structures determined to G2.5 w resolution. For the cis proline residues. The 58 examples are from non-homologous cluster in the /I region (PA” = -76”; $A” = 159”. For the tl cluster rjAY = -86”; JI = 1”.

of < 2.5 A is 2373. After elimination of identical and homologous entries this number reduces to 1021. Of these 58 have a cis peptide bond. The percentage of proline residues that form cis peptide bonds is thus 5.7 To. The frequency of the preceding residue is shown in Table 2. The number of occurrences of X-&Pro for every residue is less than ten, which is a small sample size. However, the high occurrence of

tyrosine is noted. While this almost certainly reflects the slow cis to trans conversion rate noted above, there appears to be no clear correlation amongst the other residues between size and frequency of occurrence as the cis form. Glycine can be regarded as a special case as all the observed glycine residues have C#values that are large and positive. However, the high frequency of serine is

Table 2 Frequency of residues forming cis peptide bonds in proteins from non-identical structures determined to a resolution < = 2.5 B R,esidue TY~ Pro Ser Gly Phe Glu LYS 4 Len His Gin Asn Thr Ile Vd Ala ssp ‘I’~P Met CYS

Number on database preceding proline 41 37 58 68 41 49 52 36 93 38 38 57 77 67 75 79 57 9 12 25 1021

Number with cis peptide bond

%

BB

Pa

9 4 6 6 3 3 3 2 5 2 2 3 3 2 2 2 1 0 0 0 58

19.1 1@8 10.3 83 6.4 6.1 58 56 5.4 53 53 5.3 39 3.0 2.7 2.5 I.8 0 0 0 57

7 2 4

2 2 2

3 3 1 1 3 2 1 3 3 1 2 2 1 0 0 0 39

0 0 2 1 2 0 1 0 0 1 0 0 0 0 0 0 13

Columns 5 and 6 show the conformations of the residues in the X-Pro pair. With the exception of Va1315 in rhizopuspepsin, X is always in the p conformation. Glycine is taken to be a special case. Of the 6 examples observed 5 have posrtrve 4. The 6th is from bacteriochlorophyll protein for which no 4 value is available.

Proline

Residues in Proteins

401

n

(a)

(b) Figure 4. (a) Residues (95, 96) in adenylate kinase illustrating Tyr-&Pro in the syn orientation in a type VIb turn. (b) Residues (11, 12) in ovomucoid third domain showing Tyr-cisPro in the anti orientation, which is also in a type VIb turn.

anomalous. The overall frequency of Trp-Pro is too low for any conclusion to be drawn. In the case of tyrosine in oligopeptides, theoretical calculations had led to speculation that interaction between the aromatic and proline rings might be implicated (Hetzel & Wiithrich, 1979). Examination on the graphics display system of Tyr-c&Pro in proteins shows two different orientations of one ring relative to the other, which might be described as syn and anti with close interaction of the tyrosine ring in the former. These are illustrated in Figure 4(a) and (b). Since $ryr is always in the /3 conformation (see below), the interaction is solely determined by the values x1 and xZ. Either orientation can be adopted whether the backbone conformation be extended or involved in a turn. The relative frequencies of although the aromaticsyn : anti = 5 : 3. Thus, proline interaction does occur in cis-proline residues it is not always found.

5. Influence of Proline on the Conformation of the Preceding Residue of a residue In general the 4, $ conformation within a free polypeptide chain is independent of the conformation of the preceding residue. In the case of proline, however, energy calculations by Schimmel & Flory (1968) have shown that the space available to the preceding residue is severely curtailed, by steric conflicts between the “CH,attached to the imide nitrogen and the NH and C?H, atoms of the preceding residue (see Fig. 5). For example, for alanine preceding proline

Influence of proline residues on protein conformation.

To study the influence of proline residues on three-dimensional structure, an analysis has been made of all proline residues and their local conformat...
2MB Sizes 0 Downloads 0 Views