BioE. (1992) 227, 271-282
High Resolution Solution Structure and Comparison with Human Transforming Growth Factor a Ulrich Hommel, Timothy S. Harvey, Paul C. Driscoll and I&n D. Campbell? Department of Biochemistry South Parks Road Oxford OX1 3&U, U.K. (Received 25 March
1992; accepted 13 May
The solution structure of the 53 amino acid peptide hormone, human epidermal growth factor (hEGF), has been determined to high resolution from nuclear magnetic resonance (n.m.r.) data. A large number of internuclear distance and dihedral restraints was obtained, ‘%labelled hEGF. Dynamical simulated annealing methods including data from uniformly using the program XPLOR were used for structure calculation. An improved protocol was developed combining efficient conformational searching at a reduced computational cost. The general fold of the calculated structures compared well with that of a derivative of the carboxy-terminally truncated hEGF determined previously. A group of 44 structures were calculated with no violations greater than 63 A and 3” for distance and dihedral restraints, respectively. The average pairwise root mean square (r.m.s.) deviation of all backbone atoms for these structures was 2.25 A for all 53 residues, 992 A for the bulk of the protein, and 0.23 A for the functionally important carboxy-terminal domain. Two new helical segments containing highly conserved amino acids have been identified; one between eysteines 6 and 14 and a second at the end of the carboxy-terminal domain. New insight into the molecular architecture of the site of putative receptor binding was provided by comparing the structure of hEGF with its biologically equipotent analogue, human transforming growth factor alpha. This comparison revealed a close structural relationship between the two growth factors and provides an improved understanding of the structure/ function relationships in EGF.
epidermal growth factor; transforming growth factor a; nuclear resonance; dynamical simulated annealing; protein structure
units or modules (Patthy, 1985).
Human epidermal growth factor (hEGF$) is a 53 amino acid peptide, which acts on a wide variety of different cell-types (Carpenter & Cohen, 1978; Ushiro & Cohen, 1980). EGF is homologous to other growth promoting proteins such as human transforming growth factor (TGFcr) (Derynck et aE., 1984), amphiregulin (Shoyab et al., 1989), heparin binding growth factor (Higashiyama et al., 1991), and several pox virus proteins: shope fibroma growth factor, myxoma virus protein, and vaccina virus protein (Lin et al., 1988; Upton et al., 1987; Twardzik et aZ., 1985). In addition, a number of other proteins, including several present in serum as well as cell surface receptors, contain structural f Author to whom all correspondence should be
1 Abbreviations used: hEGF, human epidermal growth factor; TGFa, transforming growth factor alpha; n.m.r., nuclear magnetic resonance; 2D, twodimensional; NOE, nuclear Overhauser enhancement; NOESY, 2D NOE spectroscopy; COSY, 2D correlated spectroscopy; E.COSY, exclusive COSY; HOHAHA, 2D Homonuclear-Hartmann-Hahn spectroscopy; HSQC, 2D heteronuclear single quantum coherence spectroscopy; HMQC, 2D heteronuclear multiple quantum coupling constant spectroscopy; 3JuNa, vicinal spin-spin between amide proton and a-proton; 3J,8, vicinal spinspin coupling constant between the a-proton and a proton-proton B-proton; dHmB and dHzHb, intraresidue distances between a C?H and NH, and C”H and [email protected]
; proton-proton distances AN, dNN, and dSN, sequential involving C”H, NH and C?H; r.m.s., root mean square; p.p.m., parts per million.
0 1992 Academic Press Limited
Despite the biological importance of EGF as a growth factor and as a module in many other proteins, there is a lack of high quality structural information. No X-ray structures of EGF have been determined, although in recent years several low resolution n.m.r. structures for EGF analogues have been reported, including an hEGFt derivative, hEGF( l-48), murine EGF and hTGFa (Cooke et al., 1987; Montelione et al., 1987; Kohda et al., 1989; Kline et al., 1990; Harvey et al., 1991) and recently, a somewhat higher resolution model of murine EGF has been presented (Montelione et al., 1992). A high resolution description of the three-dimensional structure of hEGF is desirable for better understanding of the many studies of structure/function relationships in EGF and TGFa (Defeo-Jones et al., 1989; Moy et al., 1989; Dudgeon et al., 1990; Engler et aE., 1990; Hommel et aZ., 1991). Here we describe a high resolution structure determination of native hEGF( l-53). A major improvement in resolution over the previously published structures has been achieved by taking advantage of improved n.m.r. technology, the recombinant expression of an isotope labelled derivative, the use of stereospecific assignments and the inclusion of many experimentally derived dihedral restraints. The procedures used for the “N-1abelling and assignment of the recombinant hEGF(l-52), which have in addition provided the basis for studies on the backbone dynamics of the protein (unpublished results), are also described. The major advance presented here, however, is that the new hEGF structure has facilitated a detailed comparison with the structure of the equipotent receptor binding molecule, hTGFcl (Harvey et al., 1991). This comparison affords new insight into the molecular architecture of the putative receptor binding of these molecules at the atomic level, and also opens the way for more accurate modelling studies of the many homologues within the EGF family (Baron et al., 1991).
2. Materials and Methods (a) Protein
hEGF was a gift from G. D. Searle, U.K., produced as a trypsin cleavage product from a poly-arginine fusion protein. The uniformly lSN-labelled derivative of hEGF was produced from a synthetic gene cloned into a yeast 2 micron vector with EeuZd as marker for selection, and the expressed protein was secreted into the medium under the control of the yeast alpha factor leader sequence (Dudgeon et al., 1990; Brake et al., 1984). It was found earlier that the product of this expression is the des(Arg53) protein lacking the carboxy-terminal arginine residue. The transformed yeast strain MD50 (a/a ZeuZ/ZeuZ pepd-3/ + h&3/+ ) was grown at 30°C on a minimal amounts of growth limiting medium with [“N]ammonium sulphate (Isotec Corp.) as the only t Throughout the text hEGF denotes the protein with its native amino acid sequence comprising 53 residues, while lSN-EGF indicates the recombinant protein with the C-terminal arginine missing.
et al nitrogen glucose, acids
source. This medium contained 2% (w/v) @S% (w/v) yeast nitrogen base without amino and ammonium sulphate, and 005 y. [‘SN]ammonium sulphate. The protein was recovered from the medium and purified to homogeneity by a combination of reverse-phase and ion exchange high pressure liquid chromatography as described previously (Hommel et al., 1991). For the n.m.r. experiments, the samples were dissolved in either H,O or *H,O and the pH was adjusted to 2.9 (uncorrected meter-reading for ‘H,O solutions). The concentration of the samples as measured using a calculated extinction coefficient for hEGF (A,.,,0 = 2.89 in a 1 cm light path) ranged from 1 my to 3 mM protein in a volume of 95 ml. The temperature during data acquisition was set to either 15” or 30°C. (b) n.m.r.
Homonuclear n.m.r. spectra were acquired at 500 and 600 MHz on Bruker AM 500 and 600 spectrometers” Heteronuclear spectra were recorded on a home-built spectrometer at 500 MHz. Quadrature detection was achieved either by the TPPI or the States method (Redfield & Kuntz, 1975; Marion & Wiithrich, 1983; States et al., 1982). In homonuclear experiments typically, 512 to 800 t, increments of 2 K real data points were recorded. The digital resolution was 3.4 Hz/point in F2 and 68 Hz/point in Fl. Heteronuclear experiments were performed with a spectral width of 1400 Hz in the
and 256 t, increments. WALTZ-16
band decoupling of “Iv was used. The water signal was suppressed in the homonuclear experiments by saturating the residual H,O resonance when 2H,0 was the solvent
and in all heteronuclear
spectra acquired in H,O the jump-return sequence was used (Plateau & Gueron, 1982; Bax et al., 1987). Homonuclear NOESY spectra (Jeener et al., 1979; Macura et aZ., 1981) were recorded with mixing times of 200 ms and 70 ms, and HOHAHA spectra (Braunschweiler & Ernst, 1983; Davis & Bax, 1985) with a WALT%-I7 mixing time of 50 ms; at an earlier stage of this study NOESY spectra were also acquired at 70 ms: 100 ms and 200 ms with presaturation of the water resonance. The “N HMQC-J, HSQC-COSY and HSQC-NOESY experiments were performed as described earlier (Bax et aZ., 1990; Norwood et al., 1990). All spectra were processed using the FELIX software package (Hare Inc.). Lorentzian to Gaussian and squared sine bell window functions were applied in F2 and Fl, respectively. Residual water signal was reduced by deconvolution of the free induction decay prior to Fourier transformation in F2 (Marion et al., 1989). (6) Amide-exchange
A series of 9 HSQC-COSY spectra was recorded after rapid dissolution of [“N]EGF in ‘HZ0 in order to measure the exchange rates of slowly exchanging amide protons at pH 3.1 and 30°C. The total acquisition time per 2Dspectrum was 70 min. The rates of hydrogendeuterium exchange were determined by non-linear least squares fitting of the time dependence of peak intensities. Either peak volumes or peak heights of a particular cross peak were used in this analysis, depending upon peak overlap. (d) Coupling
The vicinal coupling constants between the cc-proton and the ,&protons of AMX spin systems were determined from E.COSY spectra (Griesinger et al., 1987) acquired in
*H,O. The displacement of the passive component in a C”[email protected]
cross peak along F2 was measured after processing the data to a digital resolution of O-33 Hz/ point. The vicinal coupling constant between the amideas suggested earlier and a-proton, 3J,N,, was determined from J-modulated (15N, ‘H) COSY spectra, in which the autopeaks of a particular residue are split into a doublet in the “N dimension due to homonuclear coupling (Kay & Bax, 1990). A non-linear least-squares fitting procedure was programmed to extract the coupling constants from traces through each autopeak by a fitting of theoretical lineshapes to the observed (Redfield & Dobson, 1990; I’. C. Driscoll, unpublished results). This procedure takes into account the fact that, due to cross correlation effects between the chemical shift anisotropy relaxation mechanism and the homonuclear dipolar relaxation mechanism for the NH-C”H pair, 2 different linewidths are observed for each component of the doublet (Kay & Bax, 1990; Boyd et al., 1990, 1991). The spectral range for the fitting procedure was 35.5 Hz for each peak and the digital resolution was 1.4 Hz/point.
(e) Structure calculation The distance restraints used as data input for the structure calculation were derived from NOES classified on the basis of intensity into 3 qualitative bands, with upper limits of 27 A, 3.3 A and 50 A (1 A = 0.1 nm). The lower limits were explicitly set to 90 A, rather than 18 A, which is the commonly used lower limit for interproton distances with standard radii. When using simulated annealing protocols commonly used for n.m.r. structure determination (Nilges et al., 1988), the scale of the repulsive term representing Van der Waals non-bonded interactions (rconst) is initially very small, allowing atoms to pass much closer to each other than would otherwise be possible. Application of the normal lower bound of 1.8 A is then inappropriate and will adversely affect the sampling properties of this method. Similarly, hydrogen bonds were defined as upper limits only, and were included after initial structure calculations had (a) unambiguously identified a hydrogen bond acceptor, and (b) shown that the pair was present in a regular protein secondary structure element. Several amide resonances observed to be slowly exchanging were not used as hydrogen bond restraints; these are discussed later. Restraints for the backbone and side-chain angles 9, $ and x1 were derived from the measured coupling constants ( 3JHNa and 3J,a) and short intraresidue (dHNHB, a naHB) and sequential (d.n, c&n) distances by use of the systematic grid-search method STEREOSEARCH (Nilges et al., 1990). The dihedral angle restraints were introduced conservatively following initial structure calculation. Where multiple solutions for cp were found, some could be eliminat,ed after inspection of the initial structures. Similarly, where STEREOSEARCH gave multiple, but well defined ranges for $ and x1 certain regions of dihedral space could be excluded. For cp, $ and x1; dihedral restraints were given minimum allowed ranges of i20 and $-50 and $-20”, respectively. Initial structures were calculated using a dynamical simulated annealing method starting from a structure generated using randomized backbone dihedral angles (Briinger, 1988). Disulphides were included at this stage as upper distance bounds of 2.1 A between the sulphur atoms. These were removed and the disulphide bridges included as bonds in the subsequent refinement stages. The resulting structures were then refined using successive rounds of an efficient simulated annealing protocol based on the hybrid method of Nilges et al. (1988), and described
briefly by Downing et al. (1992). The protocol consists of 3 stages. The first stage starts with the scaling of the weights of the NOE, constrained dihedrals ,snd nonbonded terms from initially small values to more realistic ones, at 1000 K. The second stage performs a slow cooling of the system to 300 K. The final stage is an energy minimization. Bond lengths are constrained in all but the minimization stage using the SHAKE algorithm (Rykaert et al., 1977; van Gunsteren & Berendsen, 1977), using a comparatively high tolerance for the geometrical accuracy term of lo-‘. This, in combination with increased proton masses, allows longer time steps, up to 5 fs, to be taken. A less rigid forcefield than originally described was also used, with the force constants for bonds (kbon), chirality constraints (kchi) and planarity constraints (I%,,,) reduced to 140 kcal mol-’ A-‘, 40 kcal mall’ rad-’ and 40 kcal respectively. The extra freedom given to the mall’ rad-‘, system allows it to respond more rapidly to changes in potential energy as the terms for the repulsive Van der Waals interactions, NOE and restrained dihedral terms are increased toward their final values during the first stage of the refinement procedure. A further difference is the linear increase of the Van der Waals scaling factor The original protocol increased T,,,,~ exponentially iz; an initial value of 0001 to 025 cal mol-’ Am4. This results in most of the conformational searching occurring in a forcefield where the atoms are unrealistically small, and the available conformational space is correspondingly exaggerated. Using a linear scaling at this stage, a longer search of conformational space takes place at values of rcons, nearer to its final values, improving the efficiency of the protocol in general. During the cooling stage, there are 2 further noteworthy changes in the protocol. The first is that krepe, is reduced gradually from 1.0 to its final value of 98. A more rapid change was found to produce structures which, although they agreed very well with the experimental data and had low total energies in the simplified repel forcefield used, contained poor non-bonded contacts when placed in a full forcefield, such as that of CHARMm (Brooks et al., 1983). This was particularly noticeable during structure calculations when the quality of the n.m.r. data was low (T. S. Harvey, unpublished results). to overcome this Slowly reducing krepe, was found problem. The second change concerns the methodology for cooling the system from its initial temperature of 1000 K to 300 K. This was previously accomplished by successively reducing the temperature in 25 K steps. For this method to work efficiently, it is important for the system to reach equilibrium at each temperature. Given the limited time of the simulation, this is often not the case, and incomplete sampling will tend to favour conformations that are found at elevated temperatures. With these considerations in mind, this protocol cools the system relatively rapidly, by coupling to a bath at 390 K with a coupling time constant of 905 ps (Berendsen et al., 1984) and then performing an extended search of conformational space at 300 K. The protocol described above achieves an order of magnitude improvement in efficiency over that described previously (Nilges et al., 1988).
3. Results (a) Assignment [“N]EGF
of the ’ 5N-chemical [15N]EGF good
only a few overlapping
shifts of dispersion
peaks due to chemical
Table 1 and 3JHNa coupling (lSN]EGF ChemicaI
8.0 7.5 F2 (p.p.m.)
Figure 1. A spectrum depicting the sequential assignment of [“NIEGF. The plot represents part of a 250 ms HSQC-NOESY spectrum where connectivities between adjacent amide protons observed are highlighted.
shift degeneracy in both ‘H and 15N dimensions. It was therefore straightforward to assign these spectra from a combination of heteronuclear COSY and NOESY spectra together with information from the assignments for the proton chemical shifts of unlabelled hEGF (Cooke et al., 1990). Figure 1 shows a section of a HSQC-NOESY spectrum of [r5N]EGF where most of the sequential NH-NH connectivities seen in hEGF are depicted. NOES linking residues 3 to 6,8 to 14, 24 to 28,41 to 44 and 47 to 52 correspond well to positions within the protein where turns and helical regions are expected to occur and analogous connectivities are also visible in the homonuclear spectra. The l%shifts of all other residues, which are mainly those within the P-strands of the protein, were identified from either their sequential C’H-NH peaks in the HSQCNOESY spectra or from their direct correlation to the intraresidue C”H. Delineating direct and sequential connectivities for each residue also provided independent information about the proton chemical shifts of this hEGF derivative, which lacks the C-terminal arginine residue. This resulted in differences in the proton chemical shifts for some residues at the carboxyterminal end of the molecule and at a site (Va135, Gly36) which interacts with it. The assigned 15N chemical shifts are summarized in Table 1. (b) Distance
The main data input for the calculation of the protein structures came from NOE measurements and dihedral restraints as described below. Figure 2
Ser2 Asp3 Ser4 Glu5 Cys6 Leu8 Ser9 His10 Asp11 Gly12 Tyr13 cys14 Leui5 His16 Asp17 Gly18 Va119 cysso Met21 Tyr22 He23 61~24 Ala25 Leu26 Asp27 Lys28 Tyr29 Ala30 Cys31 As1132 Cys33 Va134 Va135 61~36 Tyr37 Ile38 61~39 Glu40 Arg41 Cys42 611143 Tyr44 Arg45 Asp46 Leu47 Lys48 Trp49 Trp50 Glu51 Leu52
@p.m.) %77 850 8.46 8.25 8.67 8.39 8.24 8.31 8.03 8.56 7.63 8.50 8.03 865 8.98 7.20 8.19 8.88 944 8.74 8.43 824 880 6.53 7.78 6.82 8.46 9.36 8.84 9.44 8.94 8.61 7.84 8.22 8.16 924 8.01 929 8.44 7.77 969 8.7 1 8.71 %30 8.22 7.96 7.47 7.13 7.49 7.70
116.7 121.0 1155 121.3 1201 1228 113.1 118.9 120.6 [email protected]
120.8 114.6 123.4 113.5 112.9 103.6 120.6 1289 128.3 123-3 129.0 128.5 118.8 111.9 118.0 1144 121.0 1252 115-2 1241 124.4 121.0 121-3 182.1 120.9 1159 106.9 122.9 113.6 111.8 116.9 121.5 121.8 122.8 124.7 118.2 119.5 118.6 119.4 1236
6.9 7.3 7% 6.3 6.1 3.6 4‘1 9.2 4.7 44 8.1 6.6 65 7.0 a 9.0 6.3 PO 8.4 94 %8 34 98 7.4 8.8 91 83 90 95 65 7.0 3.0 a 9.6 %9 P 30 9.5 7.0 54 b h 7.3 51 63 5.3 64 7.R 7.4
The ‘H chemical shifts are referenced to TMS while those for the “N nuclei are referenced to NH, (Live et al., 1984). The coupling constants were measured as described in Material and Methods. “The HMQC lineshape of resonances corresponding to glycine residues include contributions from additional couplings and so were not fitted with this procedure. b Severe spectral overlap of these residues in the HMQC-spectra prevented a reliable measurement of their coupling constants.
shows part of a homonuclear NOESY spectrum, allowing the without presaturation, acquired observation of C”H protons near the water resonance. Several structurally important conneetivities,
Figure 2. Part of
200 ms NOESY
acquired at 15°C is shown. ties are indicated.
spectrum of hEGF
unobserved in earlier studies on hEGF( l-48), can be seen; i.e. those sequential d,, connectivities between Ser4 and Glu5, Cys33 and Va134, and Ala30 and Cys31. The importance of this improvement in spectral performance is further illustrated by the acquisition of new secondary structural information for the region between His10 and Tyr13. New medium range (i, i+2) and (i, i+ 3) NOES from His10 and Asp11 to Tyrl3 indicate the presence of a short helical segment. Similarly, the functionally important sequence between Ile38 and Gln43 is now much better defined by the identification of sequential NOES from Ile38 to Gly39 and Arg41 to Cys42.
Figure J. A summary of the observed internuclear distances between backbone and backbone (m), backbone and side-chain (H) and side-chain and side-chain atoms (0). In cases where NOES of different strengths were found, the symbol represents the strongest observed. Helical segments (c() and p-strands (/I) are also shown. A division of the plot at residue 34 emphasizes the S-domain structure of REGF.
A survey of the spatial distribution of nonsequential distance restraints used in the structure calculation is shown in the diagonal plot of Figure 3. This Figure emphasizes the two-domain topology of hEGF as indicated by two stretches o’f NOES running perpendicular to the main diagonal and thereby representing interactions within a fiveresidue strand (major-) and two-residue (minor-) respectively. Besides these secondary b-sheet, connectivities, there are a number of tertiary ones linking the carboxy-terminal pentapeptide to residues 35 to 37. Few long-range NOES have been found involving the amino-terminal residues 1 to 12 and this part of the molecule is, in terms of the experimental data, completely uncoupled from the carboxy-terminal domain, which results in relatively low definition of its respective orientation as will be shown below. A similar situation applies for the segment between residues 17 and 33, i.e. the major b-sheet. In contrast, a number of long-range NOES have been found connecting residues .13 to 16 with residues 40 to 43. Specifically, new, strncturally significant NOES between Arg41 and Tyrl3 and Leu15 have been obtained. (c) Dihedral
The side-chain orientation of 14 out of 25 AMX-type residues of hEGF was investigated by measuring vicinal coupling constants between C”H and CBH protons. Eleven residues could not be analysed in this way due to chemical shift degeneracy of the CBH proton resonances or spectral overlap. For the remaining residues stereospecific assignments were made using STEREOSEARCH (see Materials and Methods). For six residues the combination of observed coupling constants ( w 7 Hz) suggested rapid mixing of different staggered rotamers (Nagayama & Wiithrich, 1981) and hence the corresponding methylene protons were not stereospecifically assigned. Further stereospecific assignments were achieved for the methyl groups of two of the three valine residues (Va119, Va135) and for the methine protons of both isoleucine residues (Ile23, Ile38) from measurements of 3J a6 and intraresidue distances. At a later stage of the refinement process, the C&H, methyl group resonances of two out of the five leucine residues (Leu26, Leu15) were also stereospecifically assigned, since it was possible to correlate experimentally observed differences in their intra- and interresidue distances with those measured in calculated structures. For the determination of sJuNa coupling constants, traces parallel to F, were taken from HMQC-J spectra through each cross peak containing the amide proton resonances of a particular residue. These traces were analysed using a non-linear least-squares fitting procedure (Redfield & Dobson, 1990). A wide range of different starting values for the fitted parameter 3JuorN were chosen, but the result was in most cases independent of the initial guess values and variations were usually less
Table 2 Structural A Structural
All (572) Sequential (189) Short range (79) Long range (157) H-bond (18)” r.m.8. deviations
Figure 4. The distribution of the 127 calculated structures shown as a function of NOE restraint energy grouped into classes with divisions every 12.4 kJ mol-’ and (inset) the major peak using a higher resolution class division of %4 kJ mol-‘. The classes containing the final 44 hEGF structures are shown filled.
and atomic r.m.s. differences
6% 0042 (0001) 0.048 (0.002) 0.044 (0.004) 0.028 (0.003) 0027 (0008)
((W), 0.036 0042 0033 O-026
from experimental dihedral restraints (“) 046 (0.09) 0285
FNoE (kJ moi-l)d Fa,, (kJ mol-‘) F repelIkJ mol-“) EL-j (kJ mol-‘)’
1283 (7.98) 5.43 (2.84) 644 (7.40) -470.25 (33.15)
91.33 I.84 4293 - 397.4
Deviations from idealized geometry Bonds (838) (A) 0009 (00001) Angles (1495) (“) 226 (@004) Impropers (402) (“) 0.90 (0.010)
0909 2246 9876
B Atomic r.m.s. differences (A)
Residues Residues Residues Residues
1-53 12-50 12-34 35-50
Backbone atoms 225fO64 @92 + 936 0.68f0.18 [email protected]
All heavy atoms 274+957 1.47,@30 [email protected]
The calculation of 140 structures was started on the basis of 572 NOE-derived internuclear distance constraints, including 189 sequential, 79 short range (I < Ii-j] I5) and 157 long range (Ii-j1 > 5), 101 dihedral constraints (44 9, 40 $ and 17 xi) and 18 restraints defining nine hydrogen bonds. Of these 140 runs, 23 failed to complete the initial structure generation protocol because of crashes in the SHAKE algorithm. The resulting structures were then refined using the faster protocol described earlier. The distribution of the NOE energies of the final 127 structures is shown in Figure 4, with a finer class division of the major peak shown inset. From Figure 4 it can be seen that many structures exhibit a poor agreement with the experimental data. This is in contrast to work on other systems (Downing et al., 1992; Norman et al., 1991) where no mis-folded structures were produced. The more detailed breakdown of the distribution in Figure 4 (inset) shows that there are clearly two populations of structures within the main peak. The group with the higher NOE energies exhibit one or two larger violations (over 0.3 A) when compared to the lower energy group, which all had NOE energies less than 160 kJ mol-‘. Using this objective criterion, the lower energy group of 44 final structures were selected as being representative of the solution structure of hEGF. It is possible to further divide this group by similar criteria but this is unjustifiable in view of the assumed experimental error of at least O-5 A in upper bound NOE distances. Table 2 gives a summary of the various energy terms and deviations from idealized geometry. All of the structures show good covalent geometry and experimental distance with the agreement restraints. There are no distance restraint violations
a (SA) refers to the 44 final structures. ((SA)), is the mean structure obtained by averaging the co-ordinates of the final structures best fitted to each other over the backbone atoms of residues 12 to 50, with the co-ordinates subsequently minimized with restraints. b None of the final structures exhibited distance restraint violations greater than 93 A or dihedral angle violations over 3”. ‘There are 2 distance restraints for each hydrogen bond; ~nn.o (2.3 A) and ~n.~ (3.3 A). These were included as restraints after they were unambiguously assigned as being involved in regular secondary structural interactions. dThe final values of the square-well NOE and torsion angle potentials are calculated with force constants of 50 kcal moi-’ A-’ and 200 kcal mol-r radm2. “The quadratic van der Waals term was calculated with a force constant of 4 kcal mole4 with the van der Waals radii set of 98 times the standard value used in the CHARMm empirical energy function (Brooks et al., 1983). ‘EL-J was calculated using the CHARMm empirical energy function, and since it is not included explicitly in the target function during the refinement protocol, serves as an indication of the quality of the non-bonded contacts within the structures.
over 0.3 A and no dihedral angle restraint violations greater than 3”. An assessment of the 9, $ space covered by the calculated structures is shawn in the Ramachandran-plot of Figure 5. Most residues adopt conformations with q, $ combinations allowed for non-glycine residues (Richardson, 198 1) Few residues appear in the first quadrant ( + 40, + $) and notably all of those residues are located at turns classified as type I (Asp27), a distorted type I’ and near a p-bulge (Cys42j (His16, Aspl7), et al., 1989; (Richardson et al., 1978; Sibanda Wilmot & Thornton, 1990). (e) Description
of the three-dimensional
The overall appearance of hEGF is dominated by its two-domain structure-(Fig. 6(a)); i.e. two anti-
Human Epidermal Growth Factor Structure
Table 3 Hydrogen bonds Residue
Figure 5. A Ramachandran plot for the final 44 structures; the inset is an expanded view of the (+ , + ) quadrant indicating the location of His16 (Cl), Asp17 (A) and cys43 (53).
parallel P-sheets (Va119 to Asn32, Tyr37 to Arg45), which are linked by a tight turn of type II (Va134 to Gly37). Both P-sheets have a right-handed twist (Chothia, 1984). No regular secondary structure was found in the amino-terminal tail comprising residues up to the first cysteine. This observation is in accordance with the lack of non-sequential NOES (Fig. 3) and much larger r.m.s. deviations (Table 2), indicating the structural disorder in that part of the protein. In addition, two helical segments (Leu8 to Tyrl3, Leu47 to Glu51) have been found in the structures reported here. While the first of these seems to be only loosely connected to the major P-sheet via cysteine bridges 6-20 and 14-31, the second shows a number of NOES connecting it to the type II turn at the carboxy-terminal tail of the protein. This helix displays some amphipathic character with Lys48 and Glu51 on the hydrophilic and Leu47, Trp50 and Trp49, Leu52 on the hydrophobic face. As shown in Figure 6(b), this helix is also involved in the formation of a hydrophobic core clustering Va134, Arg45, Trp50 around the conserved residue Tyr37. The interface between the amino and carboxyterminal domain of hEGF is composed of residues 13 to 16: 37 and 41 to 43. A number of stereospecific assignments for p-protons (Cysl4, Leu15, Hisl6, Tyr37 and Cys42) and &protons (Leul5), as well as the wealth of additional inter-domain NOES (c.f. Fig. 3) allow a detailed description of its threedimensional structure (Fig. 6(b)). This interface also comprises most of the highly conserved residues, and thus its three-dimensional conformation is of considerable interest for comparison with other EGF-like proteins (see Discussion). There is, in general, good agreement between the observation of slowly exchanging amide protons and the structure described above. We have found 21 residues whose amide protons exchanged slowly with solvent deuterons and the corresponding residues are listed, together with the potential hyd.rogen-bonding partner resulting from analysis of
Leu15 Gly18 vallga Met&l” Tyr22 Ile23a Leu26 Asp27 Lys28” Tyr29 Ala30” cys31 As1i32~ Va134 Tyr37” Ile3kY Gly39 Cys42 Gln43 Tyr44’
Arg41 CO Cysl4 co Am32 CO Ala30 CO GluP CO Lys28 CO He28 CO Glu24 CO Ile23 CO Tyr29 GH Met21 CO Gly39 CO Vall9 co His16 CO Va134 CO Tyr44 CO b b Ile38 CO Ile38 CO
Hydrogen bonds were assigned only on a qualitative basis, when they contributed significantly negative energies calculated using the CHARMm empirical hydrogen bond potential. a Hydrogen bonds used in the structure calculation as distance restraints (see Materials and Methods). bNo specific hydrogen-bonding partner could be identified in any of the 44 structures.
the three-dimensional structures, in Table 3. Only nine of these hydrogen bonds were used in the structure calculation. Four amide-protons are found to make contacts within tight turns (Glyl8, Leu26 and Asp27) and the P-bulge (Gln43). A number of the additional hydrogen bonds were located at the interface between the two domains of hEGF (Leul5-HN, Cys31-HN, Va134-HN).
4. Discussion Several solution structures of hEGF and its functional homologues murine EGF and hTGFcl have recently been described. Due to their relatively low resolution none of these studies allowed a detailed description of EGF at the atomic level. Here we have extended previous efforts on the structure/ function relationships of EGF by the acquisition of a completely new data set for the native and a ’ ‘N-labelled recombinant derivative of hEGF. Considering the relatively large number of structural restraints used in the calculation protocol, the overall precision of the calculated structures was lower than might have been expected. This fact, however, is readily understood after consideration of the distribution of interproton distance constraints over the whole amino acid sequence (c.f. Fig. 3). Two major problems are apparent: (1) the lack of long-range NOES for residues 1 to 12 and (2) the lack of contacts between the two major stretches of secondary structure, i.e. the major and the minor /?-sheet. As illustrated in Figure 6(a) and Table 2, both deficiencies reduce the overall preci-
U. Hommel et al.
of the final 44 structures, using the backbone atoms of residues 12 to 50 (a), and residues 13 r.m.s. distance is O-92 a and @4 8, respectively. The positions and relative of the N and C-terminal domains are exceptionally well defined.
to 16 and 34 to 50 (b). The average pairwise orientations
sion of the calculated structures. For instance, while the r.m.s. deviation for the carboxy-terminal domain (residues 34 to 50) is as low as 023 A, this value increases to 0.92 A when the bulk of the
protein (resid_ues 12 to 50) is included, and even more to 2.25 a for all 53 residues. A similar observation was also made in the study of hTGFcl (Harvey et al., 1591) and murine EGP (Montelione et al.,
1992) and consequently this seems to be an intrinsic property of the solution structures of EGF-like growth factors. The general polypeptide fold of proteins belonging to the EGF family has been described extensively, but none of the published structures have provided much insight into the conformation of any putative receptor binding site (Campbell et al.; 1989, and references therein). We have previously proposed a receptor-binding patch of EGF on the basis of low-resolution structures of hEGF(l-48) and hTGFa and sequence comparisons of proteins with EGF-like activity. With the present three-dimensional model it is now possible to look more closely into the structural details of this site. Mutational studies have so far revealed only two residues that are absolutely required for EGF to bind to its receptor. These are Arg41 and Leu47 (Defeo-Jones et al., 1989; Moy et al., 1989; Engler et al., 1990; Dudgeon et al., 1990; Hommel et aZ., 1991). The three-dimensional structure shows both residues to be located in the carboxy-terminal domain of hEGF (Fig. 6(b)). While Arg41 is situated in the highly conserved domain-domain interface of hEGF, Leu47 appears far away from this site and on a different face of the molecule. A close inspection of this surface region shows that the side-chain of residue Arg41 lies in a hydrophobic pocket near the domain-domain interface and including residues Tyr13 and LeuI5. Arginines are more often found in this kind of restrained conformation than lysines and consistent with this (Richardson, 1981), observation, substitution of Arg41 by Lys abolishes EGF-like activity in both murine EGF and hTGFcl almost completely (Engler et al., 1990; Defeo-Jones et al., 1989). This suggests that the guanidinium group in particular might be involved in specific contacts with the receptor molecule. Further structural stabilization of this interdomain interface may come from a number of hydrogen bonding interactions (Table 3). For instance, the amino group of LeuI5 is in close contact with the carbonyl group of Arg41, and the observation of a slowly exchanging amide proton of Leul5 in several EGFs provides evidence for the presence of a hydrogen bond between the two domains (Montelione et al., 1988; Mayo et al., 1989);
Table 4 of hEGF and hTGFu
Tyr13 Cysl4 Lx15 His16 keg4 1 Cys42 Uln43
Phel5 Cysl6 Phe17 His1 8 Arg42 Cys43 Glu44
f175 -75 -48 -34 -93 -43 -165
+174 -65 -63 -50 -68 -61 +5s
a Angies are given in degrees and are taken for hTGFa from the average minimized structure reported by Harvey et al. (1991).
Figure 7. Structural comparison of hEGF and hTGFa. The Figure shows an overlay of the best fit for the backbone atoms of hEGF (cyan, residues I3 to’ 15, 31 to 37 and 41 to 47) and hTGFcc (blue, residues I5 to 17, 32 to 38 and 42 to 48) with a r.m.s. deviation of 0.71 A. Note the inclusion of a large part of both molecules in this comparison. Several conserved residues at the dolmaindomain interface of both growth factors are highlighted in green and labelled according to hEGF. The structurally conserved cysteine residues are coloured yellow.
indeed a similar interaction is found in hTGFu (Harvey et al., 1991). Other potential hydrogen bond interactions include Argl4(H”)-TyrI3(CO), Arg4I(H