Secondary-Structure Dependent Chemical Shifts in Proteins MICHAEL P. WILLIAMSON* Physical Methods Department, Roche Products Ltd., PO Box 8, Welwyn Garden City, Herts AL7 3AY, United Kingdom
Chemical shift data have been collected on eight proteins that have the same conformation in solution as in their crystal structures. Ring-current shifts have been calculated and subtracted from the experimentally measured shifts, to leave shifts that depend only on local conformation. Overall, the shifts show an approximately normal distribution with no appreciable skewness, thus confirming that ring-current shifts have the overall effect of skewing the distribution to high field. In helices, NH and C"H have a highly significant tendency to resonate to high field, whereas they resonate to low field in (3-sheets. Sidechain protons resonate slightly to high field in (3-sheets. Chemical shift distributions are narrowest for side-chain protons, and widest for amide protons. When only slowly exchanging amide protons are considered, the high field shift for amide protons in helices is more pronounced, but there is only a small difference in sheets. C"H signals at the N terminal end of helices tend to resonate to higher field than those a t the C-terminal end, whereas for NH signals it is the C-terminal end that resonates to higher field. There is no significant effect of position within the helix on side-chain signals, implying that the helix dipole has little effect on shifts within the helix.
INTRODUCTION The measurement and calculation of chemical shifts played a n important part in early nmr studies of proteins, because of the simplicity of measuring chemical shifts. It has long been recognized that the local fields caused by aromatic rings could be calculated with a good degree of precision, to give useful estimates of the chemical shifts of many side-chain methyl groups, and such calculations have now been developed extensively, for example, to account for the averaged ring-current shifts caused by motion in proteins3 However, it is abundantly clear that there are other influences on chemical shifts in proteins besides aromatic ring currents, most of which are still undefined, so that the attractive goal of using chemical shift values in proteins to predict and refine their structures is still some way away.4 Nevertheless, some progress has been made toward this goal. 'v2
0 1990 John Wiley & Sons, Inc. CCC 0006-3525/90/10-111423.09 $04.00 Biopolymers, Vol. 29, 1423-1431 (1990) * Present address: Dept. of Molecular Biology and Biotechnology, University of Sheffield, Sheffield S10 2TN, U.K.
A good correlation has been found between the chemical shift of N H or C"H and the inverse third power of the distance to its nearest carbonyl oxygen.5*6This correlation will no doubt be of considerable usefulness in refining protein structures, and has already been used in verifying a protein structure determined by nmr data,7 but is limited a t present by the high level of accuracy required from the structure before the correlation can be applied. Of less ultimate utility but of greater applicability a t present is the tendency, observed by several authors,'-1° for amide and C" protons to resonate to high field of their random-coil values in helices and to low field in sheets. In this paper, we have attempted to verify and quantify this general tendency by a more rigorous and statistical treatment than has been used to date, and to determine what aspects of regular secondary structure have the greatest influence on chemical shifts. To this end, we have calculated and subtracted out the ring-current shifts expected to be produced by the aromatic systems present. In the rest of this paper, the residual chemical shift obtained by subtracting the sum of randomcoil shift plus ring-current shift from the experimental shift is referred to as the residual shift, and is defined such that a positive residual shift implies 1423
a n experimental shift to lower field than the randomcoil shift.
METHODS Proteins were selected for inclusion in the data base only when crystal structures were also available, and when the structures in solution have been shown to be essentially the same as the crystal structure. This meant that metallothionein-2a was not included, as its crystal structure is completely different from its solution structure, and that parts of the sequence of C3a were not included, as they are different from the crystal structure. Thioredoxin has both a crystal structure and nmr data, but the crystal data in the Brookhaven Data Bank only include the C" coordinates, and so it was not included. The crystal data used were taken from the Brookhaven Data Bank, with the exception of C3a and Tendamistat, for which the coordinates were kindly provided by Prof. Huber. Where there was a choice of crystal structure, the highest resolution structure was used. The proteins included are listed in Table I. Ideally, all data would refer to protein solutions a t the same temperature and pH. Inevitably, this was not possible, which no doubt accounts for some of the variability in the data, especially for amide protons. Where a choice could be made [with basic pancreatic trypsin inhibitor ( B P T I ),16x3' C3a, 1s331 and u b i q ~ i t i n ~ the ~ , ~data ~ ] , set was preferred that was nearest to pH 4.5 and 35°C ( t h e temperature a t which the random-coil data were collected). Where the assignment was uncertain, or where only
one of a methylene pair was assigned, the chemical shifts were not included. The chemical shifts for methylene pairs were averaged, because of the lack ( i n most cases) of stereospecific assignments. The random-coil shift values used were those given by Bundi and W u t h r i ~ h . ~ ~ The locations of secondary structure for the proteins were those listed in the nmr papers giving the assignment and/or structure determinations of the proteins. Where no delineation of secondary structure was given, the assignments of Kabsch and Sander34were used. No attempt has been made to distinguish between the different types of helix or sheet. The locations of slowly exchanging amide protons are those given in the references listed in Table I. In two cases ( B P T I and Tendamistat 1 detailed exchange rates are available, while for lysozyme, the exchange rates are listed as slow, intermediate, or fast. Statistical analyses were made with various categories for exchange rates in these proteins, but with almost identical results, and therefore slow exchange is here taken to include the intermediate exchange rates found in lysozyme, and all rates slower than 0.4 min in Tendamistat ( 50°C, pD 3.0) and 0.0025 min-' in BPTI (36"C, pD 3.5). Hydrogen atoms were added to the protein structures using standard bond lengths and angles, adding the protons of methyl groups in staggered conformation. This was not necessary for BPTI, as the structure used was refined jointly from x-ray and neutron diffraction data, and had deuterium atom positions, which were used without correction. Ringcurrent shifts were calculated using the parameters of Johnson and Bovey,3s with the ring-current in-
Proteins Used for Chemical Shift Data Base Refs. Short Brookhaven Name
36 27 50
4.6 5.5 3.2
16 18 20
17 19 21
Protein Carboxypeptidase inhibitor Ovomucoid third domain Basic pancreatic trypsin inhibitor C3a Tendamistat (a-amylase inhibitor) Ubiquitin Ribonuclease TI Lysozyme
Man Streptomyces tendae Man Aspergillus oryzae Hen
SECONDARY-STRUCTURE CHEMICAL SHIFTS
tensity factors of Giessner-Prettre and Pullman.36 Ring-current shifts were calculated for each proton in the structure, and then averaged for methyl groups, rnethylene pairs, and the 3,5 and 2,6 protons of iyrosine and phenylalanine. Thus, only a single vabie is generated for the C"H shift in glycine, the Ci'H shift in long-chain amino acids, and so on. Because of the inherent errors in ring-current calculations (arising mainly from errors and motions in the po>,itionsof the rings), chemical shift data were excluded when the calculated ring-current shifts exceeded t 0.5 ppm. The addition of hydrogen atoms and calculation of ring-current shifts was done on a Vax 11/750, using the program VNMR.37 Statistical analyses were done on a Vax 8820, using the program RS1 (version 3 . 0 ) . For normal distributions, significance levels were calculated using the Student's t test, while for non-normal distributions, the t test is not appropriate3' and the Wilcoxon's signed rank test was used for comparing a single sample and the Mann Whitney U test for comparing two samples.
RESULTS AND DISCUSSION Overall Shift Distribution The data base contains eight proteins, with a total oi 525 HN shifts, 545 C"H shifts, and 1040 sidechain shifts. An overview of the residual shifts for NH and C"H (experimental shifts corrected for random-coil and ring-current shifts) is presented in Figure 1. It is apparent from the figure that amide shift$ have a wider distribution than C"H, and this is illustrated further by the distributions shown in Fig-rire 2. .4 more complete statistical analysis is presente