Computational method for the design of enzymes with altered substrate specificity.

J. Mol. Hiol. (1991) 220, 495-506

Computational Method for the Design of Enzymes with Altered Substrate Specificity Charles Wilson, James E. Mace and David A. Agard Howard

Hughes Medical Institute, Graduate Group in Biophysics and Biophysics, Department of Biochemistry liniversity of CalifornGa Ran Fra.ncisco, CA 94143-0448.

and

T:.S.A.

(Received 6 November 1990; accepted 12 March

1991)

A combination of enzyme kinetics and X-ray crystallographic analysis of site-specific‘ mutants has been used to probe the determinants of substrate specificity for the enzyme a-lytic protease. We now present a generalized model for understanding the effects of mutagenesis on enzyme substrate specificity. This algorithm uses a library of side-chain rotamers to sample conformation space within the binding site for the enzyme-substrate complex. The free energy of each conformation is evaluated with a standard molecular mechanics force field, modified to include a solvation energy term. This rapid energy calculation based on coarse conformation sampling quite accurately predicts the relative catalytic efficiency of over 40 different a-lytic protease-substrate combinations. Cnlikr other computational approaches, with this method it is feasible to evaluate all possible mutations within the binding site. Using this algorithm, we have successfully designed a protease that is both highly active and selective for a non-natural substrate. These encouraging results indicate that it is possible to design alt,ered enzymes solely on the basis of empirical energy calculations. protein engineering;

Keywords:

rotamer searching; solvation energy; alpha-lytic

1. Introduction

accuracy of any algorithm designed to predict the energetic effects of altering a macromolecular complex. There is considerable interest in engineering biological molecules to form specific complexes (e.g. rational drug design, enzyme engineering). Discovering which mutations to a protein or alterations of functional groups on a drug will enable it to form a tight complex is a combinatorial problem in which thousands of candidate molecules may need to be tested. Recently developed free energy perturbation (FEPT) methods should theoretically be able to calculate accurately the effects of mutagenesis on an enzyme-substrate complex or for a drug binding to its target. A number of test cases have been reported in which the components of a complex are slightly altered and the energetic consequences are successfully predicted by FEP (Warshel et cd., 1988; Rao et al., 1987). While these methods can yield accurate energy estimates for some cases, they are extremely computationally intensive and thus have limited utility in the more general problem of drug design or protein engineering.

Many biological processes are characterized by the formation of a highly specific macromolecular complex. In an attempt to understand the energetic terms important for molecular recognition, our laboratory has been studying the interaction between the enzyme a-lytic protease and its peptide substrates (Bone et al., 1989a,b, 1991a; Kettner et al., 1988). We have used a wide variety of peptide substrates to obtain accurate estimates for the binding energy of the reaction transition state. These functional data have been complemented by crystallographically-determined high-resolution structures of enzymes complexed to a parallel set of peptide inhibitors. Site-directed mutagenesis of residues in the substrate binding pocket has provided a large number of variant enzymes with activity towards different substrates. altered Kinetic and crystallographic characterization of these mutants has led to a better understanding of the factors important in determining substrate specificity. In the course of these studies, we developed a data base of 42 enzyme-substrate kinetics measurements and structures and Kj data for 18 different enzyme-inhibitor complexes. This large body of data provides a rigorous test of the

t Abbreviations used: FEP, free energy perturbation; ASP, atomic solvation parameter. 495

0022-2836/91/140495~12

$03.00/O

proteasr

G I991 Academics Frrss Limited

Hwcl we desc*ribct tiorl

of

itn

the developmenl and rnrr#y-based c>onfi)rmat.ion

applic~asearc4

rncathod to the problem of predicting changes in the of an enzymeesubstrate complex. (‘onformational spa(:e is broadly sampled using a library of idealized rotamers. The energy of each rotamer combination is evaluated using a molecular mechanics force field that contains an additional salvation energy term. Remarkably good agreement st,ability

is found

between

the

calculated

substrate

binding

cwqirs and the experimentally observed values for r-lytic protease. We have used this algorithm t)o

I

Wild-type

design a highly w%ivr rw?;nw Con in substrate specitieity.

wit,h

;t tlesirwi

itlt.rri\

Thus, t Isis twt.hotl should be a generally useful tool bot.h for c.oml,utt+ based drug &sign and for the design of’ rnut,;tt~t enzymes.

2. Materials and Methods Fig. 1 outlines t.hr various steps to our alyorithtn. ‘1’1) the rrlativr cahangr in t)htl stahilit,y of a nli~(~r(~ molecular complex, we individually c-alculatc~ t h+- t’rc~c~ energy for each of the rnolrc*ulw in tht, c~)rnplex a11c1thcsrr for t,he complex itself. All allowt~tl c~orlfortnatic)rIs a~ calculate

enzyme + valine inhibitor

I

Remove the side-chains from the Pl site and the binding pocket I Separate the substrate from the enzyme

Free substrate

Complex

r-7 &I

Add rotamer side-chains to the binding pocket + Pl site

Add the Pl rotamer side-

chain to the substrate

Calculate A&.JB,Es)

Store AGG,E) and loop back for any more rotamer

f

f

-c Calculate the overall

Calculate the overall

Calculate the overall

AGE,

A&E)

~6~s)

I

I

I

Calculate the relative free energy for complex formation:

/ Figure 1. Algorithm binding

energy

MG = AG(ES) - A&

-

AG(s)

for calculating relative binding energies. The flow chart shows the steps in calculating for an enzyme-substrate complex. Details are given in Materials and Methods.

the relative

Engineering

Enzyme

evaluated and the corresponding free energies are used to calculate the overall free energy for each molecule: A0 = AG, - RT In 1 exp

- ‘^?i

‘%!

,

(1)

where AG, is the ground state energy, AG, is the energy of the ith conformation, and the summation is done over all allowed conformations. The difference between the free energy of the complex, AG,,, and the free energy sum of the free substrate and enzyme (AG,+AG,) yields a relative AG value for complex formation. Repeating these calculations on a slightly modified complex gives another between them energy, AG’, and the difference (AAG = AC’- AG) directly determines the energetic cost of the modification. (a) Motamer

representation of conformational space

The challenge in using such an energetic approach is to develop a practical method for adequately sampling conformational space. To limit the set of conformations to a computationally manageable number, we have ignored alternate enzyme and substrate main-chain conformations and fixed the backbone atoms using co-ordinates from a wild-type enzyme-inhibitor structure. Our representation of side-chain conformational space is based on the observations of Ponder & Richards (1987). By examining a collection of high-resolution protein crystal structures, these authors have shown that the conformations sampled by amino acid side-chains are relatively limited and can he reasonably approximated as a set of idealized rotamers. These rotamers, specified by the torsion angles between non-hydrogen side-chain atoms. generally correspond to staggered low-energy conformations. Because their work is based on a small set of highresolution structures, amino acid residues defined by many torsion angles (e.g. methionine) are relatively under-sampled and thus not all possible rotamers are included in the Ponder-Richards library. To correct for

Figure 2. Sample rotamer mode1 of the protease mutant with the phenylalanine enzyme residues 192 (red), 213 (yellow) C:onformational space is defined as the set

Substrate

Specificity

497

this we examined all staggered side-chain conformations and added those with no bad intraresidue contacts to create a new library (thereby introducing another asparagine rotamer and 9 additional methionine rotamers). A total of 95 rotamers are used to represent the possible conformations of all amino acid residues except lysine and arginine. The complete conformational space of a defined site is given by all of the different rotamer combinations for the side-chains within the site. Our calculations for a-lytic protease have used a site that contains the Pl substrate residue and those’ enzyme residues within 4 A (1 A = til nm) of it (a total of 4 amino acid residues).

(b) Energy calculation The free energy for each conformation is calculated as the sum of a non-bonded term, En,. and a solvation term, @so,, : AG = +w%s + wsadGso,v (4 where wNB and r~,,,,~are empirically determined weights for the non-bonded and salvation terms. respectively. The reference state for the force field (conditions under which AG = 0) assumes that (1) the atoms in each molecule are non-interacting with respect to electrostatic, hydrogenbonding, or van der Waals’ terms, and (2) t.he atoms are completely buried in a protein-like hydrophobip environment. The non-bonded contribution is estimated using our implementation of the AMBER all-atom molecular mechanics force field (Weiner et al., 1984). Since t,he rotamers calculated from the standard library have idealized internal geometry, the energy for distorting bond lengths and bond angles can be neglected. The interactions of hydrogen atoms are included. using the PROPAK program to calculate their atomic co-ordinates (Ponder & Richards, 1987). For hydroxyl protons whose position is undetermined (e.g. serine HOG). all staggered darners for the proton were evaluated. We have incor-

cc-lytic protease active site. All rotamers corresponding to the Met.l92+Ala substrate bound are shown simultaneously. The side-chain atoms for the are shown. and 217A (magenta), and the substrate I’, residue (green) of all rotamer combinations.

1!4H

( ‘. wilMlrl, et, al. ~-~

porattd the following changrs to the standard AM I1EK force field. Rather than using partial chargrs for cvcr? atom type. partial charges were assigned only to atoms with an absolute charge IpI > We-. All polar hydroprn at.oms were assigned the same partial charge (+@:)I e -). Instead of using a multipole representation for sulfur atoms (nucleus + 2 electron lone pairs), they werr rrprc sentrd as a single neutral atom, Non-bonded interactions wt’rc arbitrarily capped at 100 kcal/mol (1 cal = 4.181 J) and were evaluated (1) between all pairs of rotamer atoms wit,hin a sit.e, and (2) between pairs of fixed and rotamer at.oms for tixed atoms that lay wit.hin 8 ‘4 of the site. :I tiistanc~e-tleperldent dielect.ric. E = r, was used to ncalc electrostatic interactions. Thr salvation t,rrm was evaluated using an approach based on the method of Eisenberg 8 McLachlan (1986). In their formalism. the free energy of transfer for an atom i is tdwlattY~ as:

A(& = Agi.-li.

v,

where AG, is the free energy change in bringing atom i from a hydrophobic protein interior to aqueous solution. Aoi is an atomic solvation parameter. and A, is the solvent-ac’cessible surface area of the atom. Tn the original Eisenberg-Mclachlan algorithm, a molecular surface (as drfined by Lee & Richards, 1971) must first be calculated to estimate A i. The ralculation of a molecular surface is a computationally intensive process which cannot be done feasibly when carrying out a combinatorial rotamer search (in which thousands of different structures are generated for each site). We have developed a.n alternat,ivr grid-based approach for estimating Ai which increases the speed of the solvation calculation by several orders of magnitude. As a first. step. positions in a bodycentered cubic lattice. spanning the entire site and corresponding tjo allowed water positions, are calculated. A water molecule is assumed to occupy a lattice point, if it, is not within van der Waals’ contact distance of any ot,hrr Jrrot.ein atoms and if it forms a minimum number of hydrogen bonds. The van der Waalx’ contact radii defined by Shrake &Z Rupley (1973) are used to define the protein rxcluded volume. The solvent-accessible surface area for each atom is t,hen calculated as: 4nr2.n Ai = ~ 62

’

where r is the atom’s van der Waals contact radius and n is the number of solvent-occupied lattice points lying within a shell 3.3 to 3.6 A from the atom (a completely rxposed atom has 62 neighboring lattice points occupied). Non-bonded interactions between real atoms and the lattice solvent molecules are not considered, since these are a.ssumed to be accounted for by the solvation term. For aspartate, glutamate and arginine side-chains, both terminal heteroatoms are assigned the charged atorn (S +/(I-) atomic solvation parameter (see below).

(v) Structural model of the enzyme-substrate complex l’eptide boronic acids form transition state-like complexes with serine proteases and serve as good structural models for the active complex between substrate and enzyme (Matthews et al., 19’75; Bone et al., 1987). For t,hc cc-lytic protease test case, we used the X-ray crystal structure of the wild-type enzyme complexed to the methoxy-succinyl-alanine-alanine-proline-borovaline peptide inhibitor (Protein Data Bank name lPO3: Bone, 19896) as a basis for all calculations. This is the highest affinit,y

Table 1 h’inetics

MMV MMV ;MMV $1MV MMV MMV MMV HMV MMV MMV AM\’ AM\ AM1 AMV AMY MA\ MA\ MA\ MA\ MAL MMA MMA MMA MMA SM\ SM\ SM\ SM\ SMV SM\ SM\ SMV SMV SMV SM\ SMV SMV SMV SMV SMV

meusuremrnt~s jar cc-l?ytic protw~sr mutants

2 1,000 34 9600 680 330 $1 1800 2000 ‘00 790 10,000 110,000 :rAJ,o(HJ 310,000 :3000 600 160 9x0 340 340 15,000 5.1 17 000 “80 6100 6100 99.(HJO 3.X

(;t)

iii

(1))

(a) (a)

%(HJO

Cd)

(a) 6.4 64 (c) (c) 0~58 (c) WI50 (c) 1.3 ((‘) 270 (c) 66 (c) (c) 240 (c) ((8) %I0 (e) 47 (e) 1400 (0) 30 (P)

(b) 0)) (d) (c) ((1) ((1) (4 C(i) cc.1 (e) (e) ((‘1

2600 1%

1700 tie %8 18,000 6700 26.000 XI0 11 ,000 1700 d%,OOO

P1 refers to X in the indicator substrate Yuccinvl~Ala-Ala-I’ro~ X-p-nitroanilide or in the boronic acid inhibitor methoxv succinyl-alanyl-prolyl-boro-X. kc,, (s-l)> KM(-)> &a,/&~ (S -’ M-’ ), kinetic constants determined from the best-tit lines of constants (IN) were Lineweaver-Burke plots. K,, inhibition determined by competition with substrate. Substrate assays used O.Ol+ 1.0 PM-enzyme, in @l-+100 mw-substrate, 100 rn.M‘Iris. HCl (pH 8.0). t One-letter codes for the substrate binding pookvt amin~~ acids at 192. 213 and 217A (e.g. MMV, the wild-type mzyrnc. is Metl92, Met213 and Val217A). $ Values taken from: (a) Bone et ul. (1989b): (b) Kettner cf nl. (1988); Uonr ~1nl. (1989e); (d) Bone et al. (1991h); ((J) I P 1

Substrate. refers to X in t,he indicator substrate succinyl-Ala-Ala-Pro-X-p-nitroanilide. Experimental, binding energies (kcat/mol) calculated as described in Materials and Methods. wNB= IL’~~,~ = 1, calculated values using non-optimized parameters (+,a = wralv = 1). optimal wNB, wpolv, calculat,ed values using optimized weights (w,, = 0.031, w~Olv= 1.98), optimal weights+ NGS, caiculated values using fully optimized parameters (20~~= 0031, w,,,, = 1.98, scale for non-ground state energy term = 57. scale for AAG = 25). P/T, whether the data point was included in the parameterization set (P) or in the testing set (T). See alsn the legend to Table 1

were then calculat,ed for earh CB position. Different sidechain substitutions were made by generating atomic coordinates for all rot.amers at each of the varying sites (using the Ponder Richards (1987) PROPAK program). All rotamers corresponding to the appropriate amino acid at each of the varying sites were then evaluated combinat,orially. As an example, Fig. 2 shows the rotamers used to sample conformation space for a representative enzymesubstrate combination. In this Figure all rotamers are shown simult.aneously. (d)

h’rqyme-substrate

kinetics

The relative hinding energy between the enzyme in the transition state

the substrate and can be obtained

directly from experimental using the following formula A(;:=-,,.,,(~)-,,,,,(~~).

measurements of k,,, and K, (Fersht. 1985): (5)

Table I summarizes the kinetics measurements we determined for a-lytic protease with substrates of the type succinyl-alanyl-alanyl-prolyl-X-p-nitroanilide (where X = all 20 naturally occurring amino acids except for proline, tryptophan, arginine and lysine). It is important, to note that the experimental data contains information from polar, non-polar and charged residues and thus provides a stringent test for all aspects of the energetic

r,O()

_____.-

( ‘_ I/C’ilson et al.

model. The following proteases have been included in t,hr tL4ti1 set : wild-type. Met+Alal92. Met+Ser192. Mrt+AlablY and Val-tAla217A. Each AAG’ value is caalculated relative to the wild-type enzyme for the alaninr substrate. using the formula: AAG(~)=-lWln(~)i+~7’ln(~)w~+~,,

(6)

This arbitrary reference state was chosen because alanine is the best substrate for the native enzyme and it has been used as the starting point for other calculations of enzyme--substrate specificity for a-lytic protease (Caldwell rl a.!., 1991). (e)

Parameterizutio7~

The data set of kinetics measurements was segregated randomly into 2 groups. Twenty measurements were used to determine the scale factors in eqn (Z), which optimize the correlation between calculated and experimental binding energies (the parameterization data set). Because the calculated binding energies are a non-linear function of the free parameters, the scale factors were iteratively adjusted until the correlation was maximized. These optimized parameters were then used to calculate binding energies for 20 other enzyme-substrate combinations (the testing data set). Table 2 indicates which measurments formed each data set.

3. Results (a) Parameterizing

the solvation

model

To calculate solvation energies accurately, the at,omic solvation parameters (ASPS) for different atom types were adjusted such that the fit to experimental n-octanol + water transfer free energies described by Fauchere hz Pliska (1983) for N-acetyl amino acid amides was optimized. Figure 3 shows a plot of the calculated versus experimental values. The correlation coefficient between the experimental and calculated transfer free energies gives a measure of the accuracy of the solvation model. Using five atom types (carbon, neutral nitrogen, neutral oxygen, charged nitrogen/oxygen, and sulfur), both the molecular surface approach of Eisenberg & McLachlan (1986), and the grid approach developed here yield a correlation coefiGent of @95. This suggests that the errors in ‘4, introduced by the grid approach are less significant’ than the errors inherent to the formalism defined by equation (3). The ASPS for the five atom t.ypes listed above were refined to the following values: carbon sulfur nitrogen oxygen ru’( + )/(I( - )

20( 53( - 40( - 64( - 1 l2(

* f + f f

3) cal/mol/A2 20) cal/mol/A2 9) cal/mol/A2 11) cal/mol/W2 13) cal/mol/b2

All of the enzyme-substrate complexes considered in the data set contain at least one methionine residue in the active site. Given that the sulfur ASP is determined by only two experimental measurements (methionine and cysteine transfer free energies), we questioned the accuracy of this

AGtcolc) (kcol/mol)

Figure 3. Parameterization of the solvation model. Five atomic solvation parameters (corresponding to carbon, non-charged nitrogen, non-charged oxygen, charged nitrogen/oxygen and sulfur) were adjusted to optimize the fit to experimental n-octanol + water transfer free energies for 20 N-acetyl amino acid amides (Fauchere & Pliska, 1983). Residue 2 of cr-lytic protease (in an extended conformation) was used to model the backbone of the blocked amino acid. Side-chain atoms corresponding to all possible rotamers were added using the Ponder-Richards PROPAK program (Ponder. 1987). The free energy for each rotamer was calculated as described in Materials and Methods with either (1) all ASPS set to zero (corresponding to an n-octanol environment), (2) ASPS set to the optimal values (corresponding to an aqueous environment). The free energy of transfer for each amino acid was calculated as the ditference between the Boltzmann-weighted sums over all rotamers for each of the 2 states. The experimental values relative to the glycine free energy are shown as a function of the calculated free energies relative to glvcinr.

ASP. Eisenberg & McLachlan (1986) note that. it is impossible to determine a reliable ASP for the three different sulfur atom types (methionine, cysteinr and cystine) and in their calculations the cysteine value is ignored. Early comparison of the calculated and experimental binding energies suggested that. a better fit could be obtained if the hydrophobicity of the methionine sulfur atom was increased. The ASP for methionine sulfur atoms was thus increased to 80 cal/mol per A’, and this value was used in all subsequent calculations. (b) Optimizing

the model

In modelling the effects of mutagenesis on enzyme-substrate specificity, we assumed that the enzyme-inhibitor complex is an accurate representation of the transition state for substrate hydrolysis. Figure 4 shows a plot of the experimental binding energies for boronic acid inhibitors versus the experimental values for the corresponding

Engineering

Enzyme

3 2

E \ fz

’ 0

70 I I .z a” -2 -3

-4

-2

0 2 4 &subs+rote) (kcahal)

6

Figure 4. Boronic acid peptides as a model of’ the transition state. The binding energy of the boronic acid inhibitors Ala-Ala-Pro-X-boronic acid (calculated as RT In (Ki) using the K, values reported in Table 1) are plotted as a function of the corresponding binding energies for the substrates .4la-Ala-Pro-X-p-nitroanilide (calculated as - RT In (k,,,/K,)). Both values are normalized by subtracting the binding energy calculated for the alaninr inhibitor-substrate.

substrates. In general there is good agreement, indicating that this basic assumption is reasonable. The correlation between the two sets of energies (0.87) sets an upper limit on how well we might expect to calculate substrabe binding energies. In principle, the non-bonded and solvation contributions to the energy are properly scaled (wNB = W solv = 1) and could be used directly to calculate binding energies. Table 2 shows the results obtained using this assumption. The correlation between all calculated and experimental binding energies is 926 (f’> 010) and the slope of the best-fit line is 0013. An analysis of the results shows clearly that the van der LVaals’ interactions between atoms in close contact with one another are drasticallv overestimated for certain complexes. This “is not surprising since the rotamers used to model the active site are fixed and thus unable to accommodate slightly bad bumps. The electrostatic terms also appear to be overweighted, giving unreasonable results. For example, the calculated difference between serine and alanine substrates binding to the Met’l92+Ser mut’ant (-23.5 kcal/mol) is an order of magnitude greater than the experimentally observed difference (+ 1.2 kcal/mol). The calculated difference in binding energies drops to -0.9 kcal/ mol if electrostatic terms are ignored. The above observations suggest that the various components of the force field are not properly scaled to one another and should be adjusted to take into account the various assumptions of the model, e.g. that side-chains are rigid idealized rotamers, that the protein backbone is fixed, etc. Because the ASPS were parameterized using the experimental data of Fauchere &, Pliska (1983), the solvation model assumes that the hydrophobicity in the buried pro-

Substrate

Specijkity

501

tein is mimicked by n-oct’anol. Previous studies have indicated t,hat proteins are somewhat more hydrophobic than n-octanol ( Dorovska-Taran, 1982), indicating that the solvation t.erm may need t,o be upweighted. The relat’ive weights of the norrbonded component and the salvation component to the total energy were t’herefore adjusted t.o maximize the correlation bet’ween calculatad and experimental binding energies for the parametrrization set of data points. Scaling the non-bonded tertn II! 0031 and the salvation term by I.98 yielded an overa, correla,tion of 0.77 (f’ < 10VR) and ii best-tit line with a slope of 1.06 (results shown in Table 2). As shown in equation (1). the free energy cnalcu lated for a molecule c+ontains a temperat~urf~dependent term representing the entropic3 c.ontribrrtion of non-ground state conformations (- ff7’ In c exp (-AAGIRT)). Equation (1) is valid if the summation over discrete states is a good approximation of the integral over all possible states. Tn sampling a limited number of rotamers. we are estimating the contribution of non-rotamcr confor mations poorly a.nd may thus underestimate the part.ition function. In addition, if components to the free energy are not scaled properly, the value of the entropic term will be distorted because the partition function is a non-linear combination of the free energies for each conformation. For t,hese combined reasons. it seemed likely that additional parameters could help offset errors in the model and allow a more accurate calculation of the contribut,ion of non-ground stat)e conformations to the free energy. These parameters were introduced by scaling the AAG terms before calculating the summation in equation (1) and by applying a compensatory overall scale to weight the entropic term properly. The two parameters had strongly coupled effects on the correlation coefficient and hence are not truly independent. After optimization (during which the overall scale factor rose to 5.7 and t#hr AAG scale factor inside the summation rose to 25). t,hr correlation between calculated and experimental binding energies increased to 988 (Fig. 5). The significant, improvement in the correlation validates use of the additional parameters as judged bv the $test. The non-physical nature of these additional scale fa.ctors makes their validity for other applications question able. their although introduction definitely improves the ability to reproduce experimental dat>a for the cr-lytic protease test system. An important validation of the true capabilities of this approach is to test the model on data not used in the parameterization. Using optimized parameters derived from a subset of 20 kinetic values, the correlation coefficient for the remaining 20 observations was 0.85 (P< 10 ll; data shown in Table 2). Performance on the cotnplete data set of 40 measurements also gave a 085 correlation cboeffcient We resegregated the data points repeatedly into new parameterization and testing sets and reoptimized to determine how sensitive the parameters were to the particular choice of da.ta. The final parameters generally fell within :i(?;, of those

50% ._

Table 3

.. 0

2 AG(,,,,l (kcol/mol)

4

Figure 5. Calculated versus experimental binding energies. Binding energies calculated for each of the rnzyme~ substrate combinations listed in Table d (both paramrterization and testing data sets) are shown as a function of the experimentally observed binding energy (using the wild-type enzyme with the alaninr substrate as the reference st,ate). The solid line is the least-squares best fit line (experiment = 099 X calculat~ed + 0%).

reported 0+35.

above and correlation

coefficients

averaged

of different aspects of the model in. reproducing experimental data

((a) importance

a statistically significant correlation Given between calculated and experimental values for the binding energies, we can ask what aspects of the algorithm are essential for reproducing experimental results accurately. Since parameters have been adjusted to optimize the fit to experimental values, any changes in the force field would be expected to worsen the correlation. The importance of each part of the algorithm can be assessed by removing it from the calculation and then noting the drop in the correlation obtained using the recalculated binding energies. These results are summarized in Table 3. We have assumed that conformational space must be broadly sampled (using a complete search of all rotamer combinations) to estimate the free energy change accurately for a particular mutation. To test this assumption, we have repeated the calculation using a single rotamer at each site. When only the most commonly observed rotamer in the rotamer library is used for each amino acid, the correlation between calculated and experimental binding energies drops from OS5 to -0.26. Thus, carrying out the rotamer search is a key part of the dependence of the algorithm. The temperature correlation indicates how the non-ground stat,e rotamer combinations contribute to the overall AAG value. If the temperature is arbitrarily set to zero, the correlation drops to 0.75. Thus, while the gound state (lowest energy) rotamer combination provides most of the information needed to estimate the

energy of a complex, the rnt,ropy term given 1)). t 11~3 functions is also informati\-e ratio of the partition and improves the ability to reproducing the experiment. By omit,ting the non-bonded or salvation t,erms from the energy calculation, we can estimatch how important rither is in fitting experiment,al valurts. When only non-bonded terms are included. the correlation bet,ween calculat,ed and experimental binding energies drops from 085 to -0+9. I:siny only solvation terms, the correlation drops t.o (1-X. Clearly, both terms are essential for obt,a.ining a good fit. Tt is important to note t#hat the nor)bonded terms contribute significantly to thr c:orrel;ttion, despite the relatively low weight, givcan to them after optimization (uINB = 0.031). The AG value calculated for complex formation is a function of the energy of the free enzvme. t)hc: free substrate and of the complex itself. If dither t’he A(:, the correlat’ion drops or AGE, term is ignored, dramatically (data shown in Table 3). If thr ACt term is neglected from the calculation, i.ca. if t>htl f’rtv energy for t.he unbound enzyme is assumed to tw constant,, t,he correlation drops significantly to O-66. (d) Design

of a protease with speci,ficity

nltwed

slrbstratr

To test the algorithm rigorously, we attempted to design a mutant protease that is both specific and highly active for a substrate not normally hydralyzed by the wild-type enzyme. All possible smgles>te mutations within the binding pocket were evaluated computationally for the ability to cleave substrates with P, = leucine, since leucine is a poor substrate for the native enzyme. An enzyrne active for leucine substrates might also be expected to cleave isoleucine substrates, since these residues are

Engineering

1 1

Enzyme

Substrate

60

Wild-type

0.10

0.20

0.30

0.40

0.50

Met192 + -005

0.00

0.05

040

0.15

0.20

0.60

Vol 0.25

l/(S)

Figure 6. Kineticas measurements for an engineered protease. Kinetic constants were determined for both (a) the wild-type and (h) the Met192+Val mutant for succinyl-alan~l-alanvl-prolyl-X-p-nitroanilide substrates. with X = Ieucine (n) or isolrucine (0). For the wild-type enzyme only X.ca,jICM could br measured accurately: with the leucine substrate. k,,,/h’, = 4.2( kO.05) Me’ s-“: with the isolrucinr substrate. k,,,/K, = 20( f 0+4) .M- ’ s- ’ For the Mrt+VallSB mutant with the leucinr substrate. k,,, = 86( k 7) s- ‘. k,,JKM = 5320 Km = IS( + 1.5) mM. (k210) IV- ’ s- ‘: for t,he isoleucine substrate. h’,,, = 26( + 3) mnl.

k,,, = 0+54(+ @07) s- ‘.

m -’ s- ’ Results

UP summarized

k,,,/h;, = 24( k @7)

in Table

503

mol (prior to this work, there were no experimentjal kinetic data for the wild-type enzyme with isoleucine substrates). We t,hus screened all possible mutants for those predicted to prefer leucine over isoleucine substrates by at least 3 kcaljmol, corresponding to a factor of % 150 in relative activity. From this subset of mutants, the suhstitut,ion Metl92+Val was predicted to yield the enzyme most active for leucine. The predic%ed binding energy for the leucine substrate (AA0 = 0.85 kcal/ mol. relative to t,he Ala substrate), is significantly better than that for the isolrucine substrat,e (AAG = 7.90 kcal/mol). The Metl92-+Val mutant was made. expressed and purified using techniques described by Hone et al. (1989,). Figure 6 and Table 4 show the kinetic data obtained for the wild-type and mutant enzymes with either leucine or isoleucine US substrates. As predicted, wild-type cc-lytic protease prefers leucine slightly over isoleucinr substrates (AAm = -0.42 kcal/mol) but cleaves both poorly. The kCa,/KH value for the designed 3fetl9%-+V:tl with the leucine substrate has been mutant increased dramatically to 5320 s ’ M ‘. corrfhsponding to a binding energy of OX? kc~al/mol relative t,hr reference state to U&,Jh’m = 21 ,000 s l M l for the wild-type enzyme with alanine as substrate). This free energy dS+rence is essentially identical to the AA0 = 0.85 kcal/mol predicted by the algorithm. The mutant shows significantly less activity towards the isoleucine substrate (k,,JK, = 24.0 s- ’ Mu’. AA(:(rxp) = 4.01 kcaljmol). LVhile the enzyme is quite select.ivr for leucine over isoleucine (preferring leucine > 200-fold). the isoleucine substrate is significantly more acative than state rotamer predicted. We examined the ground structure for the mutant complex with isoleucine to understand why t,he calculated binding energy is so unfavorable. The struct,ure predicts several bad van der LVaals’ contacts between the (” methyl group of the P, side-chain and other atoms in the ac+tive site. It’ appears t’hat, these bad contacts ~~)uld bP part’ially relieved by a 45” rotation of the methyl group. Because our conformation search is limited to idealized rotamers, we do not consider this possi-

*O

000

8peci$city

1.

both structurally and chemically similar to one another. ln fact, the difference in binding energy for the two substrates, AAG,,,- AAG,,,, for the wildtype enzyme was calculated to be only -0.16 kcal/

Table 4 Activity

of a designed protease with leucillr kcatlfh

Substrate

(8-l

Ala

M-‘)

P1,000~700 4.2 k 0.05 20 + 0.04 5320 + 2 IO 24 co.7

Leu lie IRU Ill?

and isoleucine

substrates

AG(exp)

A(:( ralc)

(kcal/mol)

(kcal/mol)

0 54)5 5.47 ( + 042) 041 401 ( + 320)

0 3.67 3.83 (+@16) 0.85 7.90 ( + Twi)

(‘ornparison of kinetic data and calculated oersus experimental binding energies for wild-type and the ~let+Va1192 mutant with either leucine or isoleucine as subst.rates (sur-AAP-X-pNA. X = Leu or Ilr). Experimental data for the Ala and Leu substrates+ wild-type enzyme have been reported (Bone et (~1.. 19896).

Experimental

and theoretical

binding

energies

were calculated

relative

to the wild-type

+Ala

srlbstrate using eqn (5). The difference between leucine and isoleucine binding for each enzyme is shown 111parentheses.

M )4

(‘.

__--

Wilswz

hilty in eva,luat,ing tjhr isoleurine substrate. In addGon, the mode1 is based on a rigid backbone. which in reality would be expected to r&x to rninimizc, t,he bad c*ontact.s. These efiects could explain wh? overestimates AAG’ for had this algorithm substrates such as isoleucine.

4. Discussion We have applied an energy-based conformation search to estimating the energetic effects of mutagenesis on the formation of a macromolecular caomplex. To estimate the relative free energy, the Ponder & Richards (1987) side-chain rotamer library has been used to sample conformation space, and a combination of an AMBER molecular mechanics force field (Weiner et al., 1984) and an additional solvation term (Eisenberg & McLachlan, 1986) have been used to evaluate the energy for each conformation. This algorithm successfully models the results of a large body of experimental data for a-lytic protease. In addition, the method has been used to design a mutant protease with high activity and high selectivity for a non-natural substrate (leucine). Previous attempts at engineering the components of a macromolecular complex have been limited by bot’h the accuracy and the speed of existing computational methods for calculating free energy changes. Empirical, heuristic methods for estitnating binding energies between macromolecular complexes have been described by Naray-Szabo (1989) and Novotny et al. (1989). The most widely (*ited approach. based on free energy perturbation. c&an be used to obtain highly accurate estimates of thcb AAB value for a well-defined mutation in a cbomplrx (Warshel et al., 1988; Rao et al., 1987). For example, Bash et al. (1987) have shown that FEP (aan successfully predict the effect of replacing the amidr group of a phosphonamidate inhibitor of t.hrrmolysin with an ester (substitution -NH- for -0-). For reasons discussed below, however, FEP has a somewhat limited utility in the more general problem of designing mutations that will form a fight complex. To estitnate energy changes using FEP, a molecular dynamics simulation is carried out in which the Hatniltonian describing the system is gradually perturbed from the native state to the mutant stat,e. If the starting and final states are structurally different, the simulation should be long enough to allow the conformational transition bet,ween the two states to occur (Bash et al.. 1987). Since &he rotation of medium-sized side-chains buried in a protein has a characteristic time on the order of lo8 picoseconds (McCammon &, Harvey, 1987). it is impossible to simulate this type of transition by tnolecular dynamics accurately. This suggest’s that if mutagenesis alters the struct’ure of a well-packed complex significantly (e.g. if a sidechain rotation occurs), FEP will be unable to predict the energetic consequences of the change. (‘rystallographic~ studies comparing peptide boronic

et al.

acids hound to different a-lytic proteasr mut,anls have shown t,hat) a mutat,ion at one sit)cacan lead t () significant alteration in the surrounding side-chain conformations (Bone rt nl.. 1989a). These cshanges can havr dramatic effe&s on substrat’tb specificity. A srcond problem facing free energy perturbation methods is the amountj of computer time nc&rded 10 simulate a single mutation. Because t)hc molec~ulat dynamic-s simula,tion must be caarried out ncat equilibrium t,o ohtain accurate results. ea(*tt AAC calculation typically requires hours of superconputer time. In the case of enzyme design or drug design, thousands of diRerent mutations art* ~JO~PIItially worth evaluating. The inabilit,?; 1o sheen through a largr number of diffcarent c~~mplexcs rapidly may explain why FEP met,hods travc, tlot yet succeeded in thtk rational design of’ novel pharmaceuticals or altered enzymes. Recent> calculations on the substrate prrferenc*rs for wik-typcl cr-lytic proteasc> (Caldwell rt ~1.. 1991 ) allow a direct, c*omparison of the accuracy of frrr energy pert~urbation metShods and of t,hta algorithm described here. (laldwell rt nl. have calculated t)hci transition state binding ctnergies for glycinr. \:alinr and Ieucine substrates, relative, to t,he alaninca substrat,e. using free energy perturbation. Their es& mat,es for the glycinr and valinrb rptative hindinp energies differ from the experimental \Talues h! -0.72 kcal/mol and + 2.27 kcal/mol. respectively. Our estimates for t,hese binding tliffert~nc~rs art’ somewhat, more accurate. with iLtI (xrror of -0.08 kcal/rnol for glycinc and - 1.80 kcal/mol for valine. The free energy perturbation c*alculation drastically overestimat,es the difference in binding of IvrSUS alanine su bstrat>es (A(lcalc > leucinr 20 kcal/mol), whereas our calculation und(lrestimatrs t’hr energy differencc, by 1.38 kcal/mol. Crystallographic analysis using a ~)ept.icir-t)oroI,t~u inhibitor. revealed that although structural changes occur upon leucinc> binding. th 120”. This result unders(Lores the pra&c:al difSc*uties in ohtaining reliable values by the FEl’ method when eonformational aherations arr required. Whilr these results clearly favor the algorithm drveloprd here, it is important to not.e t,hat t,htk mrthod has been parameterized using a subset of 1.h~ cr-1yt ic* proteasr data. (Tntil the* algorithm is applicad to other experimental systems, it remains un&aar whether it will out-perform FEP consisttantly. A major goal of this work was to producse an algorit’hm that. was hot tr accurate and fksf clrrough to be useful in t,hc rational design of rr1zymc.s wit,tr altered specificit,y In t)hch rxamplc of’ a-lytic. protease, there are at least threca residues (192. 213 and 217A) which are likely to altar suhsl.r;t(tJ spv& ficit’y. 7’0 test, all possible mutations al. these sitths for a singlr substrat,e (203 = 8000 sequrrrc*rs, or Z I1 1’ = 1.4 x 106 rotamer c~ombinat,ions). one must br able to screen automaticall? t,hrougtr each mutant rapidly. The current algorithm c~alculatjes a binding energy for each rotamer corn bina,t.ion in approximately 2.2 x 10~ 4 secbonds VAX 8650 central

Engineering Enzyme Substrate Specificity processor unit time. Complete combinatorial mutagenesis of the binding pocket for a given substrate thus requires less than three hours. A remaining question is whether the scale factors obtained for the cc-lytic protease test case are, in fact, the optimal parameters to use for other systems. All protein backbone atoms are held fixed in our algorithm: in cases in which the active site actually expands to accommodate large substrates. wc therefore tend to overestimate the van der Waals’ interactions between the substrate and t.he protein. To compensate for this assumpt,ion. nonhonded terms have been down-weight,ed significantly relative to the original force field values. The amount that main-chain atoms can relax to accommodate bad contacts is presumably a charact)eristic of a given site. and thus the plasticity of the

a-lytic protease active site may require a nonhonded wright significantl,v different from that used for other proteins. If the active site of another protein is considerably more rigid than the a-lytic protease active site. for instance, a non-bonded scale factor closer to one may be needed since the assumption that t,he backbone remains fixed would he more accurate. Since t,he scale factors used to

obtain

good agreement) for the paramc*trrization good results for the testing data set, (r = 0.85). these paramet,ers appear set (r = 0.88) also yield

data to

be

well-determined

for

the

a-lytic

proteasr

system. We are in the process of “pplying the same free energy force field to the problem of predicting side-chain conformation. Quite good result’s are obtained

using the same set of scale factors

tleter-

mined for the a-lytic protrase data, suggesting that the values UWd are accept,able for other applications. The results

of this simpie

rotarner-based

confor-

mation search art‘ extremely encouraging, they suggest that, it is feasible to design enzymes with desired catalytic specificities solely on the basis of c~ornputrr calculations. A major advantage of the method described here is its ability to rapidly sample sequencti space to select combinations of mutations whicah may have altered binding properties. Even higher accuracy in predicting the effects of’ mutagenesis on substrate hinding energies might he obtained by c.otnbining this coarse conformation searching algorit,hm wit.h other computational

met hods that finely

sample

local conformations

(e.g. molecaular dynamics,

free energy

bation methods). Future algorithms apply

a “high-resolution”

more pertur-

will att,- alpha-lyt,ic protease complexes wit.h analogues of reaction intermediates. Hiochemistr~, 28. 7600~~7609. Hone. R., Sampson. N. S.. Bartlett. P. A. & Agard. 0. A. (1991a). Crystal structures of alpha-lytic protease c>omplexes with irreversibly bound phosphonate esters. Riochemistry, 30. 2263--2272. Hone. R., Fujishige, A.. Kettner. (“. A. &, ,Agard. L). A. (199lb). Structural analysis of broadly specific mutants of z-lytic protease. Hiorh~~mistry. in the press. (laIdwell. J. W.. Agard. I>. A. & Kollman. I’. A. (1991). Free energy calculations on binding and c*at,alysis hg alpha-lgtic protease: t,hr role of substrate size in the I’, pocket. Proteins, in the press. I)oro\ska-Taran. V., Momteheva. R... (~uluhova. 9. & Martinek, K. (1982). The specificit.,v in t,he elementary steps of alpha-vhymotrypsitl caittalysis: a temperature study with a series of N-acetyl-L-amino a,cid methyl esters. Riochivn. Riophys .-Ieta, 702, :17-53. Eisenberg. I). & McLachlan. A. I). (1986) Salvation rnergy in protein folding and binding ,Vatwre (London).

3 19.

199-203.

Fauchere. .I. 1,. & Pliska. V. (19%). Hytlrophobic para,meters K of amino-acid side (*hains from the partitioning of S-acetyl-amino-acid-amides. /Cur. -1. Med. C’hPm.-Chim.

Ther.

18. 369-375.

Fersht, A. R.. (1985) Enzyme Structurr and SPchnnism. 2nd edit.. W. H. Freeman & Co. New York. Frrsht. A. R., Shi. J-P.. Knill-,Jones, .I.. I,owc~. 1). M.. Wilkinson. A. fJ.. Blow, I>. M.. Brick. I’.. (‘arter. P.. Wayr. M. .M. Y. & Winter, G. (19%). Hgdrogrnbonding and biological specificity analysed bl protein engineering. LVaturP (London). 314. 235-238. ,James. M. K. C., DelBaere, I,. T. ,J. 8r Brayer, G. I>. (197X). Amino acid sequence alignment (if’ bacterial and mammalian pancreatic serine proteases based on topological equivalences. Cannd. .I. Riochum. 56. 396- 402.

Kettnrr. (1. A.. Bone. R.. Agard. 1). A &, Hac*hovchin, \V. W. (l!%R). Kinetic properties of’ thr binding of alpha-lytic protease t,o peptide horonic acids. Hiocht-midry,

27. 7682-7688.

Lee. 13. & Richards. F. M. (1971). The interpretation

of

protein struct~ures: estimation of static accrssibilit~y. ,I. Mol. Hid. 55. 379-400. Matthews. I>. A.. Alden, It. A.. Birktoft. ,l. .J. & Kraut,. .I (1975). X-ray crystallographic study of boronic acid adduct,s with suhtilisin BPiY’ (Nova). .J. Bid. (‘hen,. 250. 7120~7126. Mr(:arnmon. ,I. A. Kr Harvey, S. C. (1987). Dynamirs oj’ I’rot~ins and Nucleic Acids, p. 29. Cambridge I’niversity Press. New lTork. Saray-Szabo, (:. (1989). Quantitative est,imation of activit.ies of mutant enzymes. C’ntnl. Lrttrw. 2. I x5- I IJO. Novotny. ?J.. Bruccoleri. It. E. & Saul, F. A. (1989). On the att,rihution of binding energy in antigen-antibody complexes MrPC603. D1.3. HyHEL-5. Biochrmistry, 28. 47x-4749. Ponder. .J. (1987). PtlOPAK Program. Ponder, *J. W. & Richards. F. M. (1987). Tertiary templates for prot,eins: use of packing criteria in the rnumerat,ion of allowed sequences for different’ structural classes. .I. Mol. Hid. 193, 775791.

Rae, S. S.. Singh. I’. C’.. Bash. 1’. -2. & Kollmari. I’. .\. (1987). Free energy prrt~urbat~ioti c~al(~rrlations 011 bincling and catalysis after rnutat,ing .&I IS.‘, in suhtilisirt. Snturr (London), 328. 551 -554. Shrake. A. & Bupley. .J. .\. (1973). Knvironmriit anal exposure to solvent of protein atoms. I,ysozynir~ a11t1 insulin. ,I .Mol. Hid 79. Xl -X7 I. W’arshrl. A.. Sussman. Ii’. & Hwatig. .J. K (I!lH%). Evaluation of c.atalyt.ic free energies iii grnt+callj modified proteins. .I. X01. Niol. 201 I39 IS!) Weiner. S. ,I.. Kollman I’. .I., (‘ase I). A.. Siiytr. I’. (’ (Ohio, (‘.. Alagona. (j.. I’rofeta. S. & Il’einer I’ ( l!J84). A new. force held Ior molt~cular rnec*hanical simulation of nrrclcic~ acids and proteins. .I .4n,(’i (‘hrnt. Sot. 106. 765-781

Edited by I’. 17. Wright

A novel chimeric amine dehydrogenase shows altered substrate specificity compared to its parent enzymes.

A technique for the isolation of yeast alcohol dehydrogenase mutants with altered substrate specificity.

Directed evolution of cytochrome P450 enzymes for biocatalysis: exploiting the catalytic versatility of enzymes with relaxed substrate specificity.

Enantioselective enzymes by computational design and in silico screening.

Computational approaches for classification and prediction of P-type ATPase substrate specificity in Arabidopsis.

Continuous colorimetric screening assay for detection of d-amino acid aminotransferase mutants displaying altered substrate specificity.

A rapid method for determination of endoproteinase substrate specificity: specificity of the 3C proteinase from hepatitis A virus.

A novel member of glycoside hydrolase family 30 subfamily 8 with altered substrate specificity.

A mass spectrometry-based method for the assay of ceramide synthase substrate specificity.

Specificity constants in the context of protein engineering of two-substrate enzymes.

Computational design of the affinity and specificity of a therapeutic T cell receptor.

Computational study on substrate specificity of a novel cysteine protease 1 precursor from Zea mays.

Identification of vancomycin resistance protein VanA as a D-alanine:D-alanine ligase of altered substrate specificity.

GSHSite: exploiting an iteratively statistical method to identify s-glutathionylation sites with substrate specificity.

Computational tools for designing and engineering enzymes.

Catalytic fragment of protein kinase C exhibits altered substrate specificity toward smooth muscle myosin light chain.

Broad substrate specificity of snake venom fibrinolytic enzymes: possible role in haemorrhage.

Amino acid substrate specificity of Escherichia coli phenylalanyl-tRNA synthetase altered by distinct mutations.

Computational approaches for rational design of proteins with novel functionalities.

Potentiometric method for substrate analysis using immobilized NAD + -dependent oxidoreductase enzymes.

Glutamyl Endopeptidases: The Puzzle of Substrate Specificity.

Substrate specificity of the isoprenylated protein endoprotease.

Altered substrate specificity by substitutions at Tyr218 in bacterial aminoglycoside 3'-phosphotransferase-II.

Reprogramming the specificity of sortase enzymes.