Determining Minimum Energy Conformations of Polypeptides by Dynamic Programming SANDOR VAJDA' Ueptrtmcnt o f Biomathematical Sciences, Mount Sinai School o t Medicine, O n e ( , u s t a w L Levy Place, New York, New York 10029
and CHARLES DELlSl' l k p c ~ r t m r not t Biomedical Engineering Boston University, 44 Cummington Street, Boston, M A 0221 5
SY NOPSlS
A combinatorial optimization approach is used for solving the multipleminima problem when determining the lowenergy conformations of short polypeptides. Each residue is represented by a finite number of discrete states corresponding to single residue local minima of the energy function. These precomputed values constitute a search table and define the conformational space for discrete minimization by a generalized dynamic programming algorithm that significantly limits the number of intermediate conformations to be generated during the search. Since dynamic programming involves stagewise decisions, it results in builduptype procedures implemented in two different forms. The first procedure predicts a number of conformations by a completely discrete search and these are subsequently refined by local minimization. The second involves limited continuous local minimization within the combinatorial algorithm, generally restricted to two dihedral angles in a buildup step. Both procedures are tested on 17 short peptides previously studied by other global minimization methods but involving the same potential energy function. The discrete method is extremely fast, but proves to be successful only in 1 4 of the 17 test problems. The version with limited local minimization finds, however, conformations in all the 17 examples that are close to the ones previously presented in the literature or have lower energies. In addition, results are almost independent of the cutoff energy, the most important parameter governing the search. Although the limited local minimization increases the number of energy evaluations, the method still offers substantial advantages in speed.
INTRODUCTION Optimization problems are ubiquitous in science, and difficulties in solving them are often major stumbling blocks to progress on realistically posed questions. Predicting the equilibrium structure of peptides is no exception. As the configuration of a molecule changes by rotation about backbone and sidechain bonds, its energy varies in a complicated way, passing through a n enormous number of minima.' A solution to the problem of finding the global minimum, and perhaps some finite number of nearby minima, cannot be guaranteed by traditional local minimization methods. T h e existing generalpur
''
___ v1
1990 J o h n Wlley & Sons, Inc.
('cc 0006 ~+5%/90/14175518 $04.00 Biopolymers, Vol. 29, 17551772 (1990)
pose global minimization algorithms are restricted to problems with a few independent variable^,^^ and only the generalized Metropolis or simulated annealing m e t h ~ d ~ has " ~been adapted to conformational analysis."~'2 Further specialized procedures have been proposed for macromolecular applications, among them discrete sampling or search,'" l6 stepwise buildup of the m ~ l e c u l e , ' ~stochastic ~~' technique^,^^^^ and optimization in a space of high dimensionality with subsequent relaxation back to three Many of these approaches begin with a simplified model of the protein in which the rotational positions of the bonds are discretized. This is the socalled isomeric state approximation: the energy of a n isolated residue as a function of rotation about a bond passes through several welldefined minima, and to a first approximation these are the allowed 1755
1756
VAJDA AND DELIS1
states for rotation about the bond. Each residue can then be characterized by some finite number of states, ranging from 9 for Ala to more than 250 for long sidechain residues such as Arg. If the number of discrete states for a sequence of residues is, to a first approximation, the product of the number of states for each residue, a direct way to proceed would be to generate the configuration of every discrete state of the molecule, calculating the energy of each as the sum of all interresidue and intraresidue terms. This would yield a rigorous solution to the discrete state problem. A moment’s reflection indicates that without additional constraints 1315*22 not only are the computational demands of this approach excessive, but even if they were not, the procedure very likely would not accurately solve the real problem. Considering computational demands first, a short calculation indicates that a peptide with 20 residues and 10 states per residue on average would require of the order of several hundred hours on our fastest computers. And the problem would still not be solved because discretization is an approximation. A second stagean exhaustive continuous search about the global minimum of the discrete model, and some number of nearby minimawould be required to uncover the actual global minimum. These estimates notwithstanding, close study shows that the combinutorial portion of this problem; i.e., the first phase, is more promising than a cursory analysis would indicate. The approach developed in this paper, a generalized dynamic programming technique that is explained below, is directed at this first phase. The procedure is essentially a discrete search, although one version involves limited local minimization, generally restricted to two dihedral angles in a buildup step. We also show that the use of dynamic programming and the elimination or near elimination of locally minimizing relatively short chain fragments avoids the possibility of atomic overlaps. Energy increase due to atomic overlaps is a potentially difficult problem in buildup methods that may require the use of an auxiliary energy function based on softatom representation, but has never been encountered in conjunction with dynamic programming. In addition, the sidechain positions are adequately predicted for most peptides without a separate sidechain optimization stage. These and other factors that will be introduced below reduce the number of energy function evaluations by one to two orders of magnitude. All calculations in this paper are carried out using the ECEPP (Empirical Conformation Energy Program for Peptides”) potential, primarily because
most reports in the literature that are directly relevant to the evaluation of our method use ECEpp.2,11.1720,2327,3138 Th e minimization is performed in the space of dihedral angles with bond lengths and bond angles fixed in the standard polypeptide geometry. The use of standard geometry is not a liability here, since the main purpose of global search algorithms is likely to be in the generation of starting conformations for a further refinement stage in which bond angles and bond lengths will be permitted to change, and effects of the solvent may be taken into account.” In the next section we briefly outline the central idea of dynamic programming and show that in its original form it is not applicable to the problem of finding a peptide structure of minimum energy because the principle of optimality does not apply. The basic algorithm is then extended to circumvent the difficulty, and implemented in two forms. First we study the feasibility of a completely discrete approach, than increase its reliability by introducing limited local minimization during the buildup. We then compare our results with those of other calculations on 17 short peptides.
COMBINATORIAL MINIMIZATION IN POLYPEPTIDE FOLDING The basic problem (Figure 1) can be formulated in terms of finding the shortest from node a to node e such that the path contains exactly one node from each intermediate stage denoted by b , c , and d . For the moment we leave aside physical interpretation of stages, nodes, and paths, which will be developed below in terms of polypeptide conformation. The main point here is that if ( a , 6 , , cz, d 3 , e ) is the shortest path, then each subpath must be a shortest path itself, i.e., ( b l , c 2 ,d 3 ) is the shortest path from bl to d 3 .Indeed, if this were not the shortest, then we could replace it by a shorter path, say ( b l , c l , d3), and form the path ( a , b l , cl, d3, e l , which would be shorter than ( a , b l , c 2 , d3,e ) . This key featurethat each subpath of the shortest path is itself a shortest pathis called the principle of optimality and simplifies the solution considerably. A way of implementing dynamic programming is first to seek the shortest path from a to c1 by comparing the lengths of ( a , b l , c l ) , ( a , b’, c l ) , and ( a , b3, c l ) . Assume that ( a , b z , c l ) is the shortest path. Similarly, let ( a , b, , c z ) be the shortest path from a to c 2 .As shown in Figure 2, to find the shortest path from a to dl one has to compare only ( a , b z , cl, d,) and ( a , b l , C Z , 4 ) .
1757
DETERMINING MINIMUM ENERGY CONFORMATIONS
W Figure 1 text).
Illustration to the principle of optimality in the shortest path problem (see
If there are s k nodes in the kth stage and we have N stages, then the number of function evaluations is slsz s2s3 + + S N  ~ S Ninstead of s1s2 S N required for an exhaustive search. With s k = s for all k , the computing effort is proportional to s2. The simplest relation to the energy minimization problem is to interpret each node of the kth stage as a possible conformation of the kth residue. A somewhat more general relation would be to let each node at the kth stage be the states of a block of n residues running from k to k n  1.In either case, a path defines a conformation of the polypeptide chain. For reasons that will be explained in the next section, however, the principle of optimality does not apply to peptide energy minimization.
+


+
POLYPEPTIDE BUILDUP BY D Y N A M I C PROGRAMMING The dynamic programming concept can be generalized so that it can be used for peptide structure calculations. How this is done can best be explained by first formulating the traditional unmodified
search procedure. In analogy to the shortest path problem, form a matrix with residue names labeling columns and residue states labeling rows. The matrix is clearly not square since the number of states of a residue depends on the type of residue, but that is of little consequence. Let s k be the number of states that can be assumed by residue k , and let matrix element ( j ,k ) store the best energy E ( j , k ) of paths (configurations) up to residue k ending in state j. To obtain the best configuration k 1residues long that ends in state j ' , say, attach residue k 1 (in state j ' ) to each of the s k configurations associated with column k , compute each of the energies, and store the best one in element ( j ' , k 1) . The associated configuration should of course also be stored. The remainder of column k 1 is completed in an analogous manner. When the last column is reached, each element will have the energy of the best configuration ending in that state. In arriving at the above result, however, the best configuration obtained is not necessarily the best configuration of the molecule. There are two reasons. The first is related to difficulty in applying the principle of optimality. During the buildup, only residues
+
+
+
n
W Figure 2
+
Stagewise decisions in dynamic programming (see text).
1758
VAJDA AND DELIS1
to the left of the stage under consideration have coordinates assigned, and therefore only they can be included in the energy calculation. Because of the interactions with the remainder of the chain, however, the lowest energy structure of the first k residues might not be found in the lowest energy conformation of the entire molecule. Thus, in contrast t o the shortest path problem discussed in the previous section, the principle of optimality does not strictly apply. Second, the isomeric state model provides only a firstorder guess a t the states of each residue as they occur in a peptide, and it must be corrected by extensive searches for lowenergy structures in the vicinity of the derived minima. When this is done what is obtained as the best overall configuration may no longer remain the best. The first difficulty, omission of forward calculations, is common to buildup procedures. The error can be minimized by a number of approaches. One possibility is a perturbation method. During the buildup the calculated energy would be a sum over all atoms, both before and after the stage under consideration. T h a t can only be done using some initial guess a t the coordinates. These might be based on the crystal coordinates of a homologous molecule, or the coordinates obtained from a n initial conventional buildup as described below. Another approach would be to consider a t each stage the state of a block of residues n long rather than just the state of a single residue. Thus column k would have s: rows, these storing energies of the configurations of the chain k n  1 long, ending in the configurations of the last block. The next stage would consist of the block of residues from k 1 to k n . Not all states of this block will be compatible with states of the previous block; in particular, there will be paths from only s k of the s! rows. This approach is consistent with physical indications*,” that a protein folds by first forming local structures whose internal stability is largely independent of longrange interactions. With respect to computability, if these local regions are captured by sequences n long, the number of configurations that need to be considered is IIfn+l sk, where n needs to be large enough to fully incorporate local secondary structure. For example, n = 10 would be safe for alpharich structures. However, even for a chain of only 20 residues and taking a modest number of 10 states per residue on average, the number of configurations that would need to be considered is of the order of 10”. We expect the method to be valuable for predominantly ahelical proteins, and when the number of states per residue is reduced, for example, by nmr distance constraints.
+
+
+
A third approach, and the one that will be pursued in this paper, is to keep more than just the best path t o each state within each stage. In particular, we keep all states whose energy is within some cutoff of the best. T h e physical considerations involved in choosing a cutoff energy parameter governing the choice of paths to be retained will be discussed below. The basic assumption of the discrete approach is that each residue can take only a finite number of discrete conformations. In the present paper these conformations are defined as the local minima calculated for terminally blocked amino acid residues by Vasquez et al.31 With the configuration space consisting of singleresidue local minima, 31 the generalized discrete dynamic programming algorithm is as follows. Step 1. Add all minimum energy conformations of the first residue t o the Nterminus, and set k = 2, i.e., the index of the next residue. Step 2. Add the first minimum energy conformation of residue k to all previous fragments and calculate the conformation energies. Select the minimum energy, and retain only those conformations whose energies exceed this minimum by less than a cutoff energy C . Step 3. Repeat Step 2 for each local minimum of residue k . Step 4. If k < N , where N is the number of residues in the polypeptide, set k = k 1 and return to Step 2. Otherwise refine the resulting conformations by local energy minimization. Notice that because of the nonzero cutoff energy, each state (i.e., each single residue local minimum) of the i t h residue terminates a group of subpathsthe lowest energy path t o that residue and all other paths t o that residue having conformations within the cutoff. A central feature of the algorithm is that C applies only to subpaths within a group; the energies of paths terminating a t different states of the residue under consideration are not compared. In order to develop some intuition on a suitable value for the cutoff, consider a polypeptide divided into two segments A and I?,where A denotes the chain of the first k residues with residue k in one of its discrete states, and the remaining fragment B is in a n arbitrary fixed conformation. We intend to select a conformation for A in order to minimize the energy of the entire molecule A B . If A and B did not interact, the energy of the full segment, A B would be E = EA E B ,where EA and EHrepresent the internal energies of A and B , respectively. Let denote another conformation of the first segment with internal energy EA. The energy of the complete segment, again in the absence of interaction, would
+
+
A
DETERMINING MINIMUM ENERGY CONFORMATIONS
+
then be E = E A En. Then E < E if and only if EA < E L . Thus in the absence of interaction, for any particular conformation of B , the energy of the chain would be built by always retaining the minimum energy conformation A . This is the standard dynamic programming situation, and no generalization is required. When interactions are introduced, the conformation of A that is chosen will of course depend on the conformation of B . The central question concerns the sensitivity of this dependence. Now E = ER En EAB, where EAB represents the interaction energy between A and B . Similarly, for the conformation A of the first segment we have E = En EB EAB. Since E  E = EA  Ex + EAB  E2,R, E < E if and only if EA < EA_ EAB  E A R Thus, . EA < EA does not imply E < E , but EA < En EaB  EABdoes. This second condition is stronger than the first one if and only if EaB  EAB < 0. Introduce the notation C = max 1 EAB EABI , where the maximum is taken over all the possible conformations of A , A , and B . Then the worst case analysis shows that EL > EA C always implies E < I?. Therefore, in order to minimize the energy E , in each stage we can disregard any segment if E i > EA (', where EA is the minimum energy of the Sragment A in the current path of dynamic programming. Thus we retain many paths leading to each element in the matrix, and those we retain are decided on the basis of a cutoff energy parameter. In principle the value of C for a particular molecule and in a particular buildup step can be estimated by considering the weakest and strongest possible interactions between the two fragments A and B . As will be shown, however, the value required for finding the global energy minimum in short peptides may he significantly smaller than the upper bound calculated above.
+
+
+
+
1759
tions of terminally blocked single residues. We use all the conformations given by Vasquez et aL3' with energies within 3 kcal/mol relative to the global minimum. The list of residues with the corresponding number of states in parenthesis is as follows: Ala (9, all local minima within 5 kcal/mol relative to the global minimum are used), Asp (38 1 , Cys ( 3 8 ) , Glu( 7 4 ) , Gly( 7 ) , P h e ( 1 4 ) , H i s ( 4 3 ) , Ile( 151, Lys(178), L e u ( l 9 ) , M e t ( 5 3 ) , P r o ( 3 trans, 2 cis minimum energy conformations ) , Gln ( 76 ) , Arg(257), Ser(42), T h r ( 16), Va1(9), Trp(3O), and Tyr ( 2 6 ) . While there are many states for residues with long side chains, the number of essentially different backbone conformations is around 15.
+
+
+
+
A
IMPLEMENTATIONS OF THE ALGORITHM Two different buildup procedures will be based on the generalized dynamic programming algorithm. The first is completely discrete, the other includes limited local minimization. T h e performance and efficiency of'these procedures depend heavily on such details of implementation as the choice of the configurat.ion space, the treatment of side chains, and the strategy of local minimization. Use of SingleResidue Minima
The possible states of each residue during the discrete buildup are the minimum energy conforma
Treatment of Side Chains
Singleresidue local minima with the same backbone but different sidechain positions usually have very similar energies. Without eliminating some of these rotamers, the buildup would result in a huge number of structures for each backbone While we add all minimum energy conformations of the single residues to the chain during the buildup, the number of conformations is subsequently reduced by two different procedures. The first procedure is called path aggregation. It means that all single residue local minima with similar backbones are considered to define a single path in dynamic programming. In more precise terms, consider two conformations in the k t h buildup step with energies E and E , respectively. Let (d,, 4,) and ti,, J , ) , j  1, . . . , k , denote their backbone dihedral angles. If Iq5k  $ k l < A,,, and 11,Lk  $ k / < A,,, thus the backbone dihedral angles of the last residues in the two conformations deviate by less than an aggregating parameter A,, and E > E C, where C denotes the cutoff energy, then the second conformation is automatically dropped in dynamic programming, independently of the states of the first k  1 residues. Notice t h a t without path aggregation the energies of these two conformations are not compared unless the last residue is in the same discrete state in both. The second procedure applies to a n entire conformation a t stage k rather than just the kth residue. We scan all conformations to find the ones with similar backbones, and retain only the lowest energy conformation within each such group. Two backbones are regarded as similar if all corresponding backbone dihedral angles deviate by less than a filtering parameter A,. A conformation of energy E is dropped by this procedure called sidechain filtering
+
1760
VAJDA AND DELIS1
if and only if l $ j  &,I < A f and a l l j = 1 , .. . , k , and E > E .
I$,
 &jl
< A, for
local Minimization
In addition to the discrete buildup we study a second procedure with local minimization used when evaluating the energies of chain fragments in the steps of dynamic programming. Local minimization in the space of all dihedral angles proved, however, to be very inefficient. The reason is the tendency to generate, a t early stages of buildup, compact lowenergy structures that are not part of the minimum energy structure of the entire molecule. As emphasized by Gibson and Scheraga," concatenation of such separately optimized short fragments frequently leads even to atomic overlaps resulting in very high energies. Some changes introduced by local minimization will therefore have t o be undone when minimizing the energy of longer fragments, and sometimes the configuration may be trapped in a local minimum. On the other hand, some local minimization will often be required for a t least two reasons. First is
a
the potential existence of hydrogen bonds between residues 1 and 3, 1 and 4, or 2 and 4 in the lowenergy conformations. T h e hydrogenbonding geometry is rather restrictive and frequently cannot be satisfied by simply concatenating residues in fixed states. Deviation from the required geometry may result in underestimating the magnitude of the hydrogenbond energy term in the potential function. These potential hydrogen bonds can be located by introducing limited flexibility, adjusting the dihedral angles q52 and +,2 when adding residue i t o the chain as shown in Figure 3a. We emphasize that only two torsional angles are considered as independent variables for local minimization in stages i 2 4 of the dynamic programming, whereas all the other angles are fixed. T h e second reason for partial local minimization involves proline; specifically, accommodating the relatively rigid ring into the backbone without a sharp energy increase requires adjusting further torsional angles, including those of some peptide bonds. If proline is a t position i, then for short peptides, the required flexibility is obtained by a con
b
Figure 3 Strategy of limited local minimization ( a ) for nonproline residues and ( b ) for a peptide with proline as its ith residue.
DETERMINING MINIMUM ENERGY CONFORMATIONS
176 1
Table I Test Problems Best Result in Literature Example 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 a
Sequence"
Conformationb
Energy, kcal/mol"
Reference
AcGlu GluTyrSerAlaMetNMe AcIleLeuAspThr AlaGlyGlnNMe HTyrGlyGlyPheMetOH AcAlaLysGlySerNMe AcAlaAlaGlySerNMe AcGlyLys AlaGlyNMe AcGly AlaAlaGlyNMe AcAlaAla Ala Ala ProNMe AcProProGlyProNMe AcAla AspGlyLysNMe AcThrAspGly LysNMe Ac GlyThrAsp Lys NMe AcAspLysThrGlyNMe AcAspLysGlyThrNMe AcLysThrGlyAspNMe HThrProArgLysOH HThrLysProArgOH
AAAAAA EAEACD*A FDC*BE AAAA AAAA AAAA AAAA AAADA FCD*A AAD'F ADC*F ACDA DACD* EEC*F CCA*E FCA*E CDCF
44.90 43.60 12.90 13.81 10.69 10.20 7.11 16.75 43.16 24.04d 27.35d 28.08d 30.1gd 25.92d 31.74d 37.86d 38.6Zd
18 18 27 32 32 32 32 33 34 35 35 36 36 36 36 37 38
Ac and NMe denote Nacetyl and Nmethyl end groups, respectively. potential en erg^.".^' Oneletter codes of Zimrnermann et aL3' for the backbone. Recomputed by local minimization with the recent ECEPP parameters."
I, ECEPP
tinuous search of the dihedral angles of the two preceeding residues (Figure 3b) and those of the proline itself. Since 4 = 75' is fixed for the proline in standard "down" puckering geometry, there are 7 independent variables in local minimization when adding Pro to the chain, and 4 variables in the following step. Although local minimization is computationally expensive, Pro has only a few local minima and hence a relatively small number of conformations is usually subjected to minimization.
In examples 19 these results are based on the recent parameterization4' of the ECEPP potential and hence can be used for direct comparison. On the other hand, the tetrapeptides in examples 1017 are from studies using an earlier parameterization.'' Therefore, the energies shown in Table I were obtained by starting local minimization from the lowest energy conformations found in the original publications. Choice of the Parameters Governing the Buildup
EVALUATION OF THE ALGORITHMS The above procedures, like all the others that attempt to find a global minimum, are approximations. As such, extensive experimentation is required in order to investigate their reliability and efficacy under various conditions. Test Problems
Table I lists results from the literature for 17 short peptides whose structures were found by minimizing their ECEPP potential energy through some global search procedure. T h e lowest energy conformations reported in the literature are shown in terms of the oneletter codes introduced by Zimmermann et al.39
Table I1 shows the parameter values used in the procedures for solving the test problems. The roles of the path aggregation parameter A, and that of the sidechain filtering parameter A, have been discussed in the previous sections. Both parameters are intervals on backbone dihedral angles given in degrees. Since in the search table the single residue local minima are specified in terms of dihedral angles truncated t o integers,31 any A, < 1 means that all conformations with identical backbones define the same path in dynamic programming, independently of the sidechain positions. The value Af = 10 follows from practical considerations, since a smaller filtering parameter leads t o a large number of conformations with essentially identical backbones.
1762 Table I1
VAJDA A N D DELIS1
Parameters for Solving the Test Problems Value in Test Problems
Parameter Path aggregation interval Chain filtering interval Cutoff energy, 1st buildup step Cutoff energy, 2nd buildup step Cutoff energy, 3rd buildup step Cutoff energy, ith buildup step, 4 Ii < N Cutoff eaergy, last buildup step Absolute cutoff energy” a
Unit
Notation
Low Set
s 1.0
Degree Degree
10.0 3.0 2.0 1.0 0.5 0.5 3.0
kcal/mol kcal/mol kcal/mol
kcal/mol kcal/mol k ca 1/ m o1
High Set 1.0 10.0 3.0 2.0 2.0 3.0 5.0 8.0
2
Used only in the last buildup step
The most important parameter of the procedure is the cutoff energy C, used in the i t h step of the buildup. We have shown that with the value C = maxl EAB  EABI the generalized dynamic programming always finds the minimum energy combination of discrete states, where E,iB and EAHdenote the largest and smallest interaction energies, respectively, between two chain fragments. This estimate is, however, not useful for selecting the cutoff energies required in calculations. First, C is a n upper bound. Second, in a discrete buildup all decisions are based on energies computed a t fixed dihedral angles, and the lowest energy combination of these discrete states is not necessarily the same as the minimum energy conformation found after local minimization. Thus the actual cutoff energy must also compensate for the errors due to the discrete approximation. This second component can, however, be eliminated by minimizing the energy of chain fragments during buildup, and hence we can separately study the first function of the cutoff energy parameter when performing limited local minimization in one version of our procedure. The deviation E i H  EAB is mainly determined by hydrogenbonding energies.18 Assuming a conformation with r hydrogen bonds between the two fragments and another conformation with no hydrogen bonds gives C ’13r kcal/mol. Thus C N 3 kcal/mol for a tetrapeptide, but this value increases up t o 9 kcal/mol for a hexapeptide. T o study the effect of the parameter C on the resulting conformations, all examples are solved a t two different sets of cutoff energies referred to as “ 1 0 ~ ’ ’and “high” sets, respectively. As shown in Table 11, in both sets C i= 3 kcal/mol. Thus for the first residue we retain all the local minima with relative energies less than 3 kcal/mol. Using the “low” set, the cutoff energy is decreased in subsequent
buildup steps, and C, = 0.5 kcal/mol for all i 2 4. In the “high” set we use C2 = C3 = 2 kcal/mol and C, = 3 kcal/mol in all further steps but the last one, where the cutoff is further increased. A cleaner procedure is simply t o use C, = 3 kcal/mol throughout. The lower cutoff in steps 2 and 3 was adopted because of the practical limitation of using a t most 4 Mbyte core memory on the IBM3090 computer on which this study was carried out. I n both sets these values are clearly much smaller than upper bounds expected by physical considerations. Nevertheless, it will be shown that using limited local minimization in the buildup even the low set is sufficient to find the global energy minimum for peptides of 47 residues long and it generally gives good results even in conjunction with the completely discrete procedure. More generally, the global minimum can be reliably found a t small values of the cutoff energy parameter provided local minimization of chain fragments is used. Until now we did not mention the parameter CA listed in Table 11. This parameter is an absolute (i.e., pathindependent) cutoff energy, used only in the last step in the buildup. The reason i t is useful is that in the dynamic programming algorithm energies are compared only within the same group of subpaths. For some groups all energies may be so high that local minimization cannot lead to any lowenergy conformation. Therefore, in the last buildup step we drop all conformations with energies E > Em,, C A , where Em,, denotes the minimum energy. In the following sections we use the number of energy evaluations to describe relative numerical efficiencies of different methods. Two remarks seem to be appropriate a t this point. First, apart from the residue currently being added, the energies of all conformations are already known from the previous
+
DETERMINING MINIMUM ENERGY CONFORMATIONS
1763
Figure 4 Stereo plot of the lowest energy Metenkephalin conformation found by discrete buildup followed by local minimization. All atoms except carbonbound hydrogen atoms are included.
huildup step. This can easily be exploited in the algorithm, and hence an energy evaluation in a discrete buildup is less expensive than in other methods t h a t vary the dihedral angles. Second, for local minimization we presently use a version of' Powell's direction set method.42Powell's method is robust but requires many function evaluations and hence it is relatively slow." T h e efficiency of the 10ca.l search can be improved by about a factor of two with minimization methods that use information on the derivatives of the potential energy. Discrete Buildup
Recall that in this version of the algorithm we use discrete stat,es, there is no local minimization during the buildup, and only completely built conformations are subsequently refined by local minimization. Table 111 shows both the lowest energy combinations of discrete st,ates found by the discrete buildup at the low set of the cutoff energy parameters, and the lowest e n r r c conformations obtained by locally minimizing the resulting structures. The lowest energy conformations reported in the literature are also listed for convenience. In example 1 we reproduce the ahelical conformation r e p ~ r t e d . As ' ~ shown in Table IV, some sidechain positions differ but this does not affect the lowest energy. For the heptapeptide in example 2 the lowest energy conformation found by Gibson and Scheraga IXis partially extended. We find, however, the cvhelix shown in Table V t h a t has significantly lower energy.
T h e pentapeptide in example 3 is the Metenkephalin t h a t has been studied using a number of global minimization procedures. The first of' these was an extensive multistart search by Isogai and S ~ h e r a g aT. h~e~sidechain dihedral angles found by t h a t method were subsequently used in a chain buildup by Vasquez et al.," resulting in the energy minimum of 8.24 kcal/mol. Lower energies have been found by adaptive importance sampling, and finally, the value 12.9 kcal/mol has been attained when using simulated annealing" or a distance geometrytype procedure." T h e lowest energy given by the discrete buildup followed by local minimization is 11.05 kcal/mol, somewhat higher than the value reported in the literature. The conformations are, however, virtually identical, the rms deviation between the two being 0.17 8, for the backbone atoms a n d 0.79 A for all atoms ( T a ble V I ) . The reason for the energy difference is the failure of our method t o find one of the three hydrogen bonds t h a t are present in the minimum energy cow formation.27The missing hydrogen bond, however, does not reflect a problem in the algorithm, but a limitation in the number of singleresidue minima allowed for Tyr. Since the sidechain angles are close t o the cis position in all singleresidue local minima for Tyr, the result of Purisima and Scheraga" cannot be completely reproduced without varying the side chains in a separate procedure following the buildup. T h e lowest energy conformations of the four tetrapeptides studied by Takahashi et al.32are repro
1764
VAJDA AND DELIS1
Table I11 Lowest Energy Conformations Obtained in Discrete Buildup at the Low Set of Cutoff Energy Parameters Discrete Buildup
Refinement"
Previous Resultb
Example
Conformation'
Enera'
Conformation
Energy
Conformation
Energy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
AAAAAA AAAAAAA FDC*DE AAAA AAAA AAAA AAAA AAAFC FCE*C AAAA AAAA AAAA AAAA AAAA AAAA CACE EDAE
39.81 43.69 4.31 12.76 9.71 9.96 6.75 9.28 40.10 24.98 28.66 30.14 30.36 30.98 28.04 37.01 38.26
AAAAAA AAAAAAA FDC*DE AAAA AAAA AAAA AAAA AAAFC FCE*C AAAA AAAA AAAA AAAA AAAA AAAA CAAF" FDAF
44.96 50.47 11.05 13.78 10.62 10.20 7.10 13.89 42.21 26.68 30.11 31.28 31.36 32.94
AAAAAA EAEACD*A FDC*BE AAAA AAAA AAAA AAAA AAADA FCD*A AAD*F ADC*F ACDA DACD* EEC*F CCA*E FCA*E CDCF
44.90 43.60 12.90 13.81 10.69 10.20 7.11 16.75 43.16 24.04 27.35 28.08 30.19 25.92 31.74 37.86 38.62
Local minimization by Powell's method.42 See Table I for references. ' Oneletter codes of Zimmermann et aL3' for the backbone. a
duced exactly in examples 47. For the Procontaining peptides in examples 8 and 9, however, the discrete buildup does not find the conformations reported in the l i t e r a t ~ r e .This ~ ~ ?reflects ~~ the failure of a purely discrete state algorithm to accommodate the proline ring without unfavorable interactions as discussed above. Since examples 1017 are based on studies that were performed using a slightly different parameterization of the ECEPP potential, it is not surprising that we find lower energies in all but one example. According to our results, in short peptides with Ac and NMe end groups and without proline residues the recent parameterization of the ECEPP potential generally seems to favor helical conformations. This agrees with the conclusions of Takahashi et al.32In example 15, however, the turntype conformation found by Howard et al.36 has lower energy than that of the helix given by the discrete buildup. Table VII shows the number of energy evaluations and the CPU times required to solve the test problems on an IBM3090/200 computer using the discrete buildup with the low set of the energy cutoff parameters. The number of intermediate conformations in the buildup steps is also shown, as well as the number of conformations retained after ap
31.00
40.68 41.56
ECEPP potential energy,2'.*' kcal/mol. Local minimization rearranges the conformations.
plying the absolute cutoff C, in the last buildup step. This is the number of structures that have been subjected to local minimization. Since the number of energy evaluations is rarely reported in the literature, we are unable to systematically compare the efficiency of the present method to that of previously available ones. An exception is the Metenkephalin studied in example 3. As mentioned by Purisima and S ~ h e r a g ato , ~find ~ the lowest energy conformation of this pentapeptide the methods based on simulated annealing" and dimensionality relaxation, 27 respectively, require more than 1 million energy evaluations each. With 250,000300,000 energy evaluations reported, the buildup method used by Vasquez and Schreragalg is more efficient, but the sidechain positions were fixed in the calculation. From the number of the intermediate structures considered by Takahashi et al.32we can conclude that their buildup of the tetrapeptide in example 4 requires at least half million function evaluations. Comparing these results with the data of Table VII shows that with a small cutoff the discrete buildup is very efficient, and the number of energy evaluations is reduced by almost two orders of magnitude. All calculations have been repeated at the high set of the cutoff energy parameters. The only result
DETERMINING MINIMUM ENERGY CONFORMATIONS
Table IV
1765
Lowest Energy Hexapeptide Conformations in Example 1 Dihedral Angles Method
Buildup of Gibson and Scheraga'* followed by sidechain optimization
Discrete buildup only
Discrete buildup followed by local minimization
Buildup with limited local minimization followed by local minimization"
a
Glu Glu TYr Ser Ala Met Glu Glu TYr Ser Ala Met Glu Glu TYr Ser Ala Met Glu Glu TYr Ser Ala Met
* 
4
Residue
63 66 65 72 67 
11
,
 70 78 75 72  74 71 61 71  72 72 69 65 66 71 73 72  70 68
45 43 36 38 41 37 40 33 31 36 35 38 46 32 37 34 41 42 44  32 37 33 43 40
w
X'
178 180 180 179 179 178 180 180 180 180 180 180 175 180 178 179 176 179 177 179 178 179 179 179
176 80 177 63 61 72 173 76 179 56 62 172 175 83 180 46 62 174 175 a2 179 62 61 174
X2
x3
x4
Energy, kcal/mol
~
176 63 80 68
73 44 178
180 180
44.90
167 175 66 100 76
175 81 47 2
60 180 179
39.81
175 173 59 100 53
180 66 55 1
60 180 179
44.96
175 169 58 101 70
179 65 57 1
60 180 180
46.26
174
179
60
To be discussed in the next section.
that is different after the discrete dynamic programming step is the structure of Metenkephalin, but even there the final conformation obtained after completion of the entire algorithm (including local minimization ) is unchanged. Although increasing the cutoff parameter does not decrease the lowest energies, we will use the high set of the cutoff parameters for generating a relatively large number of conformations. Table VIII lists the number of energy evaluations and the CPU times for the first 8 test problems using the high cutoff values. Although the number of energy evaluations is significantly increased, the method is still efficient relative to the other known global search procedures. Notice that the more expensive part of the calculation usually is local minimization of the large number of conformations given by the buildup, and hence a more efficient local minimizer will significantly improve the overall efficiency. Thus we conclude t h a t the discrete dynamic programming finds in a matter of minutes the previously reported lowest energy structures in 13 of the 17 test cases, and fails to find the known lowest energy con
formations in example 15 and for the proline containing peptides in examples 8 and 9. For Metenkephalin the structure is essentially the same, but the calculated energy is slightly higher than reported in the literature. The results are insensitive to choice of cutoff beyond a very small value. Buildup with limited local Minimization
As described in the implementation section, this version of the algorithm involves adjusting the values of a few dihedral angles when evaluating the energies of chain fragments in stages of the dynamic programming. Calculations were performed for all the 17 test problems using the low set of the cutoff energy parameters. Table IX shows only the results that differ from those of Table 111. In example 1we find a sidechain rotamer shown in the last box of Table IV that agrees with the one previously reported," apart from the x1value for the Met. It has, however, somewhat lower energy, and it is especially significant considering that our algorithm does not include a separate sidechain optimization stage.
1766
VAJDA AND DELIS1
Table V
Lowest Energy Heptapeptide Conformations in Example 2 Dihedral Angles Method
Energy,
Residue
4
+
w
X'
X2
Y3
Y4
lrrnl/mrrl
Ile Leu
72 72 71 68 74 73 69 68 68 70 68 69 68 67
27 41 33 37 35 34 39 31 40 37 42 37 40 41
180 180 180 180 180 180 180 I78 178 179 179 180 177 179
160 179  58 55 62
172 64 76 165
74 52 180 59
70 59
43.69
177 162 178 54 60 61
62 171 63 65 98
81 73 53 180 57
3 71 59
178
62
81
.?
Discrete buildup only
ASP
Discrete buildup followed by local minimization
Thr Ala GlY Gln Ile Leu ASP Thr Ala GlY Gln
Results are also very good for the prolinecontaining peptides. In example 8 the conformation AAADC is shown to have lower energy than AAADA found by Piela et al."3 Notice, however, t h a t the goal of the latter study was determining the prolineinduced constraints on ahelices, and hence not necessarily the entire conformational space of the proline residue has been considered. The buildup returns Table VI
50.47
AAADA as the second best conformation, and shows that AAADF also has a low energy of 16.61 kcal/ mol. In example 9 our lowest energy conformation is FCGA, but it differs from the result FCD*A of Lee et al.34only in the dihedral angle & that has the value 178" instead of 179". Thus the two conformations are identical in spite of their diflerent Zimmermann codes.
Lowest Energy MetEnkephalin Conformations in Example 3 Dihedral Angles Method
Residue
4
Importance sampling, Paine and ScheragaZs
Tyr GlY GlY Phe Met TYr GlY GlY Phe Met
85 159 60 89 80 86 155 84 137 164 79 167 79 147 161  84 153 84 135  163
Distance geometry, Purisima and ScheragaZ7
Discrete buildup only
Discrete buildup followed by local minimization
Tyr
GlY GlY Phe Met TY GlY GlY Phe Met
* 155 79 91 26 137 156 84 74 19 161 147 52 73 29 162 147 58 67 20 171
X'
x4
w
X'
177 174 178 179 178 177 169 170 174 180 180 180 180 180 180 177 174 173 174 180
167
87
58 65 173
73 179 79
59 53 178
95 175 80
56 58 176
97 180 61
180 20
60
59 53
94 175
178
61
x3
148
179 166
180
Energy, kcal/mol 10.31
60 1'2.90
59
4.31
1
11.05
DETERMINING MINIMUM ENERGY CONFORMATIONS
1767
Table VII Number of Energy Evaluations and CPU Time in Discrete Buildup with the Low Set of Cutoff Energies Energy Evaluations
Number of Conformations in Buildup Steps Example 1 2 3 4 1
6 7 8 9 10 11 12 I :1 14 15 16 17
1
2
3
4
5
6
7
74 15 26 9 9 7 7 9 5 9 16 7 38 38 180 16 16
2145 151 75 639 52 694 48 52 18 184 119 100 1900 1900 1243 10 1115
1275 353 94 290 45 359 47 64 41 82 104 45 1 305 204 515 773 314
334 89 61 164 210 38 46 49 32 156 256 399 30 48 161 281 581
42 37 127
26 24
215
24
CPU Time, s
Retained"
Buildup
Refinement
Buildup
Refinement
3
62710 7857 2532 7824 1799 7824 887 1139 362 7103 9511 7305 37168 22200 13606 16372 12543
5007 1215 15703 6939 3212 3672 2361 6655 7631 3413 5370 7306 7510 3320 5195 5620 9540
910.0 141.4 40.7 74.1 12.8 51.5 4.8 10.5 4.2 84.4 112.6 119.0 449.3 212.4 147.5 981.1 763.5
187.9 55.8 295.1 70.6 23.0 46.4 34.2 102.1 93.5 52.7 109.8 119.9 124.4 54.6 85.7 205.4 386.2
1 37 12 3 32 9 6
6 9 13 6 6 12 24 15 16
NumOer o f conformations subjected to local minimization.
mation given by the discrete buildup. Indeed, the mean square deviation ior the backbone atoms is reduced to 0.11 A. Thus, although it is difficult t o find the global minimum by builduptype methods due to the nonstaggered positions of some side chains, the two lowenergy conformations"," are very well reproduced. In example 15 the minimum is attained on the conformation CCA*E that agrees with the best result of Howard et al.,36 but the energy is slightly higher, now due to deviations in the backbone angle @2 of the lysine residue. Thus in both problems we are certainly trapped in local minima that are, how
In examples 3 and 15 we continue to obtain essentially the same structures found by lengthier methods, but with slightly higher energy. As shown in Table X, the lowest energy for the Metenkephalin is now  10.86 kcal/mol, and apart from some sidechain positions of the Tyr and Phe residues the corresponding conformation agrees with the result of Paine and S ~ h e r a g a also , ~ shown in Table VI. There is, however, a second lowenergy conformation in 7'ahle X. Although its energy is 10.58 kcal/ mol, the backbone dihedral angles are closer to those of the global minimum with the energy of 12.9 kcal/mol than the dihedral angles of the confor
Table VIII Number of Energy Evaluations and CPU Time in Discrete Buildup with the High Set of Cutoff Energies for Examples 18 Number of Conformations in Buildup Steps Example
1
2
3
4
5
6
1 2 3 4 5 6
74 15 2c i
2145 151 75 639 52 694 48 52
5359 865 166
2074 366 294 1920 1970 163 387 456
699 182 662
1375

9 1
I
n
8
9
I
a
9
901 116 4474 113 192
58
Energy Evaluations
CPU Time, s
7
Retained"
Buildup
Refinement
Buildup
Refinement
1038
34 66 213
75183 11125 12321 17678 4687 29427 1518 3305
57310 42116 47286 13756 8112 78908 21303 32306
932.9 332.5 204.9 224.1 71.4 298.2 16.0 10.5
1742.5 1062.3 712.5 188.0 101.8 988.5 :106.0 486.3
36
170
Number o f conformations subjected to local minimization.
28 122 108 36
1768
VAJDA AND DELIS1
Table IX Lowest Energy Conformations Obtained in Buildup with Limited Local Minimization at the Low Set of Cutoff Energy Parameters Buildup
Refinement"
Previous Resultb
Example
Conformation'
Energy
Conformation
Energy
Conformation
Energy
1 3 8 9 15
AAAAAA FDC*AF AAADC FCGA CCA*E
43.49 8.39 14.89 41.40 29.14
AAAAAA FDC*AF AAADC FCGA CCA*E
46.26 10.86 17.71 43.20 30.50
AAAAAA FDC*BE AAADA FCD*A CCA*E
44.90 12.90 16.75 43.16 31.74
a
Complete local minimization by Powell's See Table I for references. Oneletter codes of Zimmermann et al.39for the backbone.
ever, very close to the known lowest energy conformations, and the small deviations in dihedral angles and atomic positions do not seem to have much practical significance. Therefore we can conclude that the buildup with limited local minimization gives adequate results in all the 17 test examples. Although local minimization increases the number of energy evaluations considerably, according to Table XI the procedure is still about a n order of magnitude faster than the other global search methods. In addition, our recent calculations show that using also derivatives in the local minimization further reduces the number of equivalent function evaluations a t least by a factor of two.
SENSITIVITY OF RESULTS T O THE CHOICE OF PARAMETERS Completeness of the Configuration Space
An important assumption in the discrete state approach is that the number of minima for each residue
is dense enough to assure good starting conformations for local minimization. T h e assumption is questionable for glycine and proline. The conformations of Gly are not localized t o relatively narrow regions of the 4, $ map. Furthermore, in addition to the 7 minima for Nacetyl"methylglycine amide there are 5 further minima for the terminally blocked dipeptides XGly and GlyX, where X can be any naturally occurring amino acid r e ~ i d u e . 'T~o study the completeness of the states we considered the grid points of a 20" X 20" square lattice over the (6, $) map and added each point to the list of local minima if the corresponding energy was smaller than 3 kcal/mol relative to the global minimum and there was no original point within a radius of 30". The discrete buildup was then repeated for all Glycontaining peptides in test problems. The presence of additional points did not change the lowest energy conformations. Although the dynamic programming itself may result in lower energy when increasing the number of states, after local mini
Table X Limited Local Minimization in Buildup: Lowest Energy MetEnkephalin Conformations in Example 3 Dihedral Angles Conf.
2
Residue
4
fi
w
X1
86 158 65 78 80 84 160 87 142 142
152 67 93 32 138 146 77 74 21 169
179 178 178 178 179 178 174 173 172 180
179
66
33
179 65 172
80 179 63
179 20
58 54
95 176
X2
x3
178
x4
Energy, kcal/mol 10.86
59 10.58
60
DETERMINING MINIMUM ENERGY CONFORMATIONS
Table XI
Number of Energy Evaluations and CPU Time in Buildup with Limited Local Minimization Energy Evaluations
Number of Conformations in Buildup Steps Example
1
2
3
4
5
6
1 3 8 9 15
74 36 9 5 180
2145 75 52 18 1243
1275 103 64 41 515
102 70 44 28 96
26 39 27
18
a
1769
7
CPU Time, s
Retained"
Buildup
Refinement
Buildup
Refinement
2 9 7 3 65
172379 137864 47800 12604 49307
4112 3316 7451 2192 14909
3385.3 5618.4 428.3 128.2 735.5
126.5 108.3 138.5 27.7 248.5
Number of' conformations subjected to complete local minimization.
mization we obtain the same result as with the original configuration space. Thus the sampling with the 7 local minima only appears sufficient for assessing the relative energies of different conformations. We have already shown that the singleresidue local minima are not sufficient for determining lowenergy conformations of prolinecontaining peptides. For the Pro with "down" puckering geometry, 4 = 75". The number of local minima in the ($, w ) plane is 3 for the trans and 2 for the cis conformation, where w denotes the peptide bond dihedral angle preceeding the proline. Again, we were unable to increase the reliability of the discrete method by simply adding further states, and after extensive experimentation we returned to the use of the original local minima given by Vasquez et al.31 Reducing the Number of SideChain Rotamers
In order to avoid losing important lowenergy conformations, path aggregation with a n interval A, < 1O was used. A value this small reduces the number of energy evaluations a t most by a factor of two, so the filter, although used, is only marginally effective. On the other hand, sidechain filtering is absolutely necessary, both for keeping the number of intermediate structures manageable and for avoiding the local minimization of a very large number of conformations. One can lose, however, the lowest energy rotamers if the procedure is used in the first two buildup steps, and hence we filter only after the third and all subsequent steps. In spite of reducing the number of conformations considerably, the sidechain positions are fairly good in the lowest energy conformations, although results for comparison are available only in a few examples. In example 1 we find a slightly lower energy rotamer than the one obtained by Gibson and Scheraga," although they used a separate procedure for sidechain optimization. In examples 46 the results of
Takahashi et al.32are exactly reproduced. We have shown that sidechain positions in the lowest energy conformation of Metenkephalin cannot be found without separate sidechain optimization, and the problem is not connected with filtering. On the other hand, due to the filtering there is no guarantee that the best sidechain position will be found in all cases. In fact, certain positions may be preferable in the final conformation, but i t is generally not possible to make completely safe decisions until the entire chain is built up.'8 Sensitivity to the Choice of the Cutoff Energy
As discussed, the principle of optimality2'3'" does not apply to the energy minimization problem, and the original dynamic programming algorithm does not necessarily find the lowest energy combination of states. In principle the same conclusion is valid for the generalized dynamic programming if the cutoff energy parameter is too small. In practice, however, the minimum energy conformation is sensitive to this parameter only in the first and second buildup steps. For example, only the second lowest energy conformation C*AAA is found in example 6 if C2 < 2 kcal/mol. T h e lowest energy conformation becomes completely independent of the cutoff values used in further buildup step if limited local minimization is performed, and also in the discrete buildup unless the discrete approximation is very bad for the particular molecule. T o understand why dynamic programming enables us to use small values of the cutoff energy parameter, some insight can be obtained by a closer look a t the details of the procedure. Such analysis is presented in Table XI1 for the discrete buildup of the Metenkephalin. The best structure a t stage k is defined as the one that is closest, in terms of torsional angles, to the structure of the same kmer in the lowest energy state that is eventually found for the entire peptide (after completing dynamic pro
1770
VAJDA AND DELIS1
Table XI1 Analysis of Intermediate Conformations in the Discrete Buildup of MetEnkephalin Energy DiHerence
Rank
Step
cutoff, kcal/mol
Lowest Energy, kcal/mol
Energy of t h e Best Structure"
Total"
In Path'
Total"
In Path'
1 2 3 4 5
3.0 2.0 1.0 0.5 0.5
0.76 0.46 0.18 2.75
1.18 0.91 2.2;j 0.8:) 4.31
1.94 0.48 2.05 3.60 0.0
0.0 0.09 0.0 0.0 0.0
26 5 19 44
1 2 1 1 1
a
I
4.31
1
Energy of t h e conformation closest to t h e corresponding fraginc~nti n t h e plot)al minimum o f t h e whole molecule Energy of t h e best structure minus lowest energy. Energy of t h e best structure minus lowest energy in t h e same path. R a n k of t h e best structure among all conlormations after sorting tor increasing energy. R a n k of t h e best structure within its path after sorting for increasing energy.
gramming and local minimization 1 . It is evident that the best structure a t stage h is not necessarily the one that has the lowest energy. This difference can arise in principle either because one of the higher energy structures retained at. stage k becomes dominant as the combinatorial portion of the algorithm continues ( a s explained in the introduction, this could not happen if only the hest structure were retained a t each stage), or there is a change during the local search part of the algorithm. According to 'Fable XII, the best structure for the Metenkephalin has the lowest energy only in stage 5, thus when the molecule has been completed. During the buildup, however, there exist other conformations with significantly lower energies. For example, the relative energy of the best structure is 3.6 kcal/mol in step 4,with 44 lower energy conformations. Had we used a n absolute energy cutoff CA, disregarding all conformations with energies E > E,,;, C A , where En,;,, denotes the lowest energy, the value C, = 3.6 kcal/ mol would have been necessary in order to retain the best conformation. In dynamic programming, however, comparisons are made only within each particular path, i.e., for each state of the last residue in the considered chain fragment. According to Table XII, the best structure has the lowest energy within its own path in all steps but the second, where there is a single conformation whose energy is lower by 0.09 kcal/mole. Thus, using dynamic programming the cutoff' values C2 = 0.1 kcal/mol and C, = 0 for i = 3 , 4 , and 5 are sufficient to find the lowest energy conformation for this peptide. The analysis of the buildup process for the other examples gives similar results and emphasizes the importance of using the dynamic programming decision criterion instead of simply comparing the energies of different conformations for the chain fragments. Based on numerical experiments involving more than 20 peptides of 410 residues long in addition
+
to the 17 test cases, we conclude that the low cutoff is reliable provided limited local minimization is used during the dynamic programming phase. Our calculations on mellitin, which were not reported above, provide a particularly good example because the crystal structure is available, as well as a computed structure. As noted by Pincus et al.,4' the amino terminal pentapeptide AcGlyIleGlyAlaValNMe is very flexible, but it is expected to have a predominantly iuhelical c o n f ~ r m a t i o n . ~Following ' a discrete buildup, however, the lowest energy, 0.28 kcal/ mol, corresponds to the conformation C *AC *CA. There is also an ahelix among the conformations generated by the buildup, but its energy is 1.07 kcal/mol. Both conformations are in the same path of the dynamic programming, since the last residue ValS is in the same state. Therefore, to continue the buildup of the mellitin molecule we need at least the cutoff energy C:, = 1.35 kcal/mol, otherwise the helical conformation will be lost. On the other hand, local minimization ofthe pentapeptide fragment gives 6.35 kcal/mol for the ahelix, whereas the energy of the conformation C *AC *CA is reduced only to 0.90 kcal/mol. Thus the need for the relatively high cutoff stems solely from the errors introduced by the discrete approximation, since the two hydrogen bonds in the tuhelix are formed only a t slightly adjusted backbone dihedral angles. Indeed, using the buildup with limited local minimization the cuhelix has the lowest energy in all steps. The above results suggest that we may need larger values of the cutoff energy parameter to compensate for the approximation errors in the discrete buildup, particularly for longer peptides. Although such a procedure is still numerically efficient, a large number of apparently lowenergy conformations must be retained that do not, lead to useful results when subjected to local minimization. In addition, by using
DETERMINING MINIMUM ENERGY CONFORMA'I'IOKS
an absolute cutoff t h a t is too small a t the end of the buildup one can drop the conformation t h a t would give the global minimum.
GLOBAL MINIMUM AND FURTHER LOWENERGY CONFORMATIONS Just as in the use of search procedures such as simulated annealing "J' or dimensionality relaxation, 2ti,%7our primary goal is to find the global energy minimum. Further lowenergy conformations are, however. frequently also of interest, e.g., when taking hydration into T h e buildup procedure of Gibson and Scheraga, which usually generates a large number of intermediate structures, is an excellent met.hod for obtaining a complete list of lowenerg! conformations of tetrapeptides3' For longer chains, however, the method seems to be less reliable for drtermining the best sidechain positions (see example 1 I or the lowest energy conformation itself (see example 2 ) . Del errnining nearoptimum solutions by dynamic programming in a rigorous manner, referred to as t,he Kth k t path problem, has been studied in the operation research l i t e r a t ~ r and e ~ ~it is far from easy. T o find the Kth best path, the original problem and the preceeding (IS 1)bestpath problems must be solved. Storage m d calculation for each path is somewhat more expensive than for the preceeding one. Kesults in the literature are restricted to the original dynamic programming algorithm and their extension to the generalized version introduced in this paper has not yet been completed. In the generalized dynamic programming we retain 21 number of conformations in each path, in contrast to the single best conformation retained in dynamic programming. Although this does not mean a rigorous solution to the Kth best path problem, the buildup generates a number of lowenergy conformat ions. A necessary, but obviously not sufficient condition for obtaining the local minima, say, within 3 kcal/mol, is to use cutoff' energies not less than 3 kcal i rnol in all steps. This is computationally very expensive if limited local minimization is performed, and hence we use the discrete buildup with the high set of' the cutoff energy parameters. This approach results in a nearly complete list of nearminimum conformations. For example, in examples 47 the numher of' local minima predicted with less than 3 kcal/inol relative energies is 3 , 4, 21, and 19. Although most lowenergy conformations are present, Takahashi et al.'? give a more complete list with 4, 4, 24. and 23 conformations, respectively, for the four tetrapeptides. Thus a t present we cannot claim completeness of nearby minima. T h e discrete ~
177 1
buildup wit,h the high set of the cutoff' is. however, suitable for studying the relative stability of the global minimum energy conformation. For example, the ahelix in example 1 seems to be \ w y stable. T h e 22 lowest energy conformations given by the discrete buildup differ from a helix at most in the last two residues, and the first essentially nonhelical conformation FCGDDE has the relative energy of 11.8 kcal/mol. An ahelix of similar len#,h may be considerably less st,able for other peptides. For example, the global minimum energy conformation of the heptapeptide AcPheArgLysAspIleAlaAlaNMe (residues 138144 of the sperm whale myoglobin) is also an ahelix ( E = 62.27 kcal/mol) that can be found only when using limited local minimization. T h e discrete buildup of the molecule results in 50 further conformations within 10 kcal/ mol, and the first essentially nonhelical one, ECCGAAA, has the relative energy of' 4.96 kcal/ mol.
CONCLUSIONS T h e main difficulty in using a systematic search of the discretized conformational space for predicting lowenergy conformations of polypeptides is that the number of conformations to be explored increases exponentially wit,h the number of residues. Combinatorial optimization algorithms can efficiently limit the number of intermediate structures needed for determining the global minimum. While dynamic programming is one of the most powerful combinatorial optimization methods in a large variety of application^,^^."" in its original form it does not apply to the energy minimization problem. The contribution of the present paper is a generalization of the dynamic programming algorithm that takes into account this deviation from the principle of optimality. T h e only change we have to introduce is retaining a number of paths to each state, in contrast to the classical discrete dynamic programming that keeps just the best path. The algorithm can be implemented as a completely discrete builduptype procedure. Reliability of finding the global minimum is further increased by using a very limited locai minimization within the essentially combinatorial algorithm. In particular, we have reproduced the lowest energy conformation or at least a conformation very close to it in all of the 17 considered test problems with previous results available in the literature, while the number of energy evaluations has been reduced by a n order of niagnitiide relative to other global search procedures. T h e discrete buildup is, however, still useful for generating a rel
1772
VAJDA AND DELIS1
atively large number of further lowenergy conformations. T h e numerical efficiency of any builduptype procedure heavily depends on the size of the cutoff energy parameter that determines the number of intermediate conformations. Our examples show that using dynamic programming and limited local minimization of the chain fragments, the global energy minimum can be attained a t very small values of the cutoff energy. We are most grateful to Dr. G. NBmethy and Dr. K. D. Gibson for answering our numerous questions on the ECEPP potential function and on optimization methods. Computation was performed on the IBM 3090/200 at the City University of New York University Computer Center. This work was supported by Grant 1ROI AI2747101 from the National Institute of Allergy and Infectious Diseases.
REFERENCES 1. Computer Assisted Modeling. Contributions of Com
putational Approaches to Elucidating Macromolecular Structure and Function. Walton, S., Ed. (1987) National Academy Press, Washington, DC. 2. Gibson, K. D. & Scheraga, H. A. ( 1988) in Structure and Expression: From Proteins to Ribosomes, Sarma, R. H. & Sarma, M. H., Eds., Adenine Press, Guilderland, NY, pp. 6794. 3. Reeke, G. N., Jr. (1989) A n n . Reu. Computer. Sci., in press. 4. Dixon, L. C. W. & Szego, G. P., Eds. (1975) Towards Global Optimization, NorthHolland, Amsterdam. 5. Bremermann, H. (1970) Math. Biosci. 9, 115. 6. Masri, S. F., Bekey, G. A. & Safford, F. B. (1980) Appl. Math. Comput. 7, 353375. 7. Pronzato, L., Walter, E., Venot, A. & Lebruchec, J. F. (1984) Math. Comput. Simul. 26, 412422. 8. Levy, A. V. & Montalvo, A. (1985) S I A M J. Sci. Stat. Comput. 6, 1529. 9. Kirkpatrick, S., Gelatt, C. D., Jr. & Vecchi, M. P. (1983) Science 220, 671680. 10. Bohachevsky, I. O., Johnson, M. E. & Stein, M. L. ( 1986) Technometrics 28, 209217. 11. Li, Z. & Scheraga, H. A. (1987) Proc. Natl. Acad. Sci. U S A 84, 66116615. 12. Wilson, S. R., Cui, W., Moskowitz, J. W. & Schmidt, K. E. ( 1988) Tetrahedron Lett. 29,43734376. 13. Moult, J. & James, M. N. G. (1986) Proteins Struct. Funct. Genet. 1, 146163. 14. Bruccoleri, R. E. & Karplus, M. (1987) Biopolymers 26,137168. 15. Wilkes, B. C. & Schiller, P. W. (1987) Biopolymers 26,14311444. 16. Lipton, M. & Still, W. C. (1988) J . Comput. Chem. 9,343355. 17. Vasquez, M. & Scheraga, H. A. (1985) Biopolymers 24, 14371447. 18. Gibson, K. D. & Scheraga, H. A. (1987) J. Comp. Chem. 8,826834.
19. Vasquez, M. & Scheraga, H. A. (1988) J. Biomolec. Structure & Dynamics 5,705755. 20. Mizuno, H. ( 1989) Proteins: Structure, Function, and Genetics 5,4765. 21. Schlick, T., Hingerty, B. E., Peskin, C. S., Overton, M. & Broyde, S.(1989) in Theoretical Biochemistry and Molecular Biophysics, eds. Beveridge, D. L. & Lavery, R., in press. 22. Shenkin, P. S., Yarmush, D. L., Fine, R. M., Wang, H. & Levinthal, C. (1987) Biopolymers 26,20532085. 23. Paine, G. H. & Scheraga, H. A. (1985) Biopolymers 24,13911436. 24. Paine, G. H. & Scheraga, H. A. (1986) Biopotymers 25,15471560. 25. Paine, G. H. & Scheraga, H. A. (1987) Biopolymers 26, 11251162. 26. Purisima, E. 0. & Scheraga, H. A. (1986) Proc. Natl. Acad. Sci. U S A 83, 27822786. 27. Purisima, E. 0. & Scheraga, H. A. ( 1987) J . Mol. Biol. 196,697709. 28. Momany, F. A,, McGuire, R. F., Burgess, A. W. & Scheraga, H. A. (1975) J. Phys. Chem. 79,23612381. 29. Hu, T. C. ( 1982) Combinatorial Algorithms, AddisonWesley, Reading, MA. pp. 106134. 30. Horowitz, E. & Sahni, S. (1979) Fundamentals of Computer Algorithms, Computer Science Press, Rockville, MD. pp. 198247. 31. Vasquez, M., NCmethy, G. & Scheraga, H. A. (1983) Macromolecules 16, 10431049. 32. Takahashi, K. et al. (1987) Biopolymers 26, 17811788. 33. Piela, L., NCmethy, G. & Scheraga, H. A. ( 1987) Biopolymers 26, 15871600. 34. Lee, E., N6methy, G., Scheraga, H. A. & Ananthanarayanan, V. S. (1984) Biopolymers 23, 11931206. 35. Simon, I., NCmethy, G. & Scheraga, H. A. (1978) Macromolecules 11, 797804. 36. Howard, J. C., Ali, A., Scheraga, H. A. & Momany, F. A. (1975) Macromolecules 5 , 607622. 37. Anderson, J. S. & Scheraga, H. A. (1978) Macromolecules 11,812819. 38. Fitzwater, S.,Hodes, Z. I. & Scheraga, H. A. (1978) Macromolecules 11, 805811. 39. Zimmermann, S. S., Pottle, M. S., NCmethy, G. & Scheraga, H. A. (1977) Macromolecules 10, 19. 40. NBmethy, G., Pottle, M. S. & Scheraga, H. A. (1980) J . Phys. Chem. 87, 18831887. 41. Pincus, M. R., Klausner, R. D. & Scheraga, H. A. (1982) Proc. Natl. Acad. Sci. 79, 51075110. 42. Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1986) Numerical Recipes, Cambridge University Press, Cambridge, MA. 43. Isogai, Y., NBmethy, G. & Scheraga, H. A. (1977) Proc. Natl. Acad. Sci. U S A 74, 414418. 44. Terwilliger, T. C., Weissman, L. & Eisenberg, D. (1982) Biophys. J . 37, 353361. 45. Waterman, M. S. (1983) Proc. Natl. Acad. Sci. U S A 80, 31233124.
Received May 4, 1989 Accepted September 20, 1989