126

PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS

[8]

that this approach will dramatically increase our understanding of the development of tertiary structure during folding.l-3'5'2°-23 When this information is combined with high-resolution information on the formation of secondary structure from NMR spectroscopy, 24-26 significant advances will be made in deciphering the "folding c o d e . " 21D. P. Goldenberg,Annu. Rev. Biophys. Biophys. Chem. 17, 481 (1988). 22T. Alber, Annu. Rev. Biochem. 58, 765 (1989). 23D. P. Goldenberg,R. W. Frieden, J. A. Haak, and T. B. Morrison,Nature (London) 338, 127 (1989). 24H. Roder, G. A. Elove, and S. W. Englander, Nature (London) 335, 700 (1988). 25j. B. Udgaonkar and R. L. Baldwin, Nature (London) 335, 694 (1988). 26M. Bycroft,A. Matouschek, J. T. Kellis, L. Serrano, and A. R. Fersht, Nature (London) 346, 488 (1990).

[8] C l e f t s a n d B i n d i n g S i t e s in P r o t e i n R e c e p t o r s B y RICHARD A. LEWIS

Introduction An enzyme can increase the rate of a biochemical reaction by bringing the reactants together and holding them in the correct orientation. This will increase the probability of a successful collision between the reacting groups. A membrane-bound receptor can promote interaction between its own functional groups and the ligand in a similar way. The host protein should make multiple contacts with the substrate in order to maximize efficiency and specificity. X-Ray studies of the structures of proteins have shown that most binding sites are found in clefts, holes, or in the boundaries between protein domains, where the ligand can be held firmly: studies of receptor-ligand complexes show that, after binding, the ligand is often completely surrounded by the receptor and hence immobilized for reaction. There is a price to be paid for this increase in the reaction rate. The freezing of a mobile molecule into a semicrystalline state inside the receptor binding site requires the loss of entropy; this must be balanced by the utilization of binding energy between the ligand and the receptor.~ A ligand can gain more binding energy from a complex with its host when it is surrounded in space by receptor atom groups that display complementary w. P. Jenks, in "Chemical Recognition in Biology" (F. Chapeville and A.-L. Haenni, eds.), p. 3. Springer-Vedag, Berlin, 1980. METHODS IN ENZYMOLOGY, VOL. 202

Copyright © 1991 by Academic Press, Inc. All rights of reproduction in any form reserved.

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

127

patterns of molecular recognition forces. The nature of these intermolecular interactions have been well discussed elsewhere2; this chapter focuses instead on the geometric properties of a binding site. The number of possible interactions is increased in regions of the surface where the ligand can be more completely enveloped, that is, in clefts or holes. It is therefore necessary to determine first which receptor atoms are part of the binding site. These lines of reasoning lead to the hypothesis that any cleft or hole or domain boundary in a receptor is a possible binding site. In addition to the active site of the protein, there may be several allosteric sites, protein-protein interface regions that could control multimerization, antibody binding sites, solvent sites, and packing defects that affect protein motion. The determination of binding sites and clefts is a very important step toward understanding protein function and toward the design and discovery of new ligands. This chapter reviews some of the approaches that can be used to determine clefts and binding sites. Direct methods of cleft determination are aimed at the analysis of a receptor whose three-dimensional structure is known. The meaning of the term "protein surface" is discussed so that all the methods can be related to a common set of definitions. The following sections review the different classes of algorithms that have been used to define and display the surface, to look at mappings of topographic features, to quantify surface roughness and curvature, and to search for packing defects and sites. All these methods require a set of Cartesian coordinates and produce an essentially static picture of the receptor. Biomolecules are dynamic objects and undergo substantial conformational changes; each static picture should therefore be regarded as a snapshot. These stills should be combined into a moving trajectory or a partition function3; this is implied whenever a new static method is introduced. Indirect methods of receptor mapping are applicable when the receptor structure is unknown but the structures of a series of ligands that bind to the receptor at a common site are available. The binding site is deduced from the apparent common properties of the ligands, that is, the pharmacophore. The assumptions behind pharmacophore deduction are discussed. The algorithms are illustrated by case studies: the method of shape mapping is exemplified by a family of retinal isomers. Inhibitors of angiotensinconverting enzyme (dipeptidyl carboxypeptidase I) are used in the discussion of the active analog approach. 2 p. M. Dean, "Molecular Foundations of Drug-Receptor Interaction." Cambridge Univ. Press, Cambridge, 1987. 3 K. B. Lipkowitz, B. Baker, and R. Larter, J. Am. Chem. Soc 11, 7750 (1989).

128

PROTEINS AND PEPTIDES" PRINCIPLES AND METHODS

[8]

RI ~//

.

Accessible Surface

~

Contact Surface

~

"-Reentrant Surface

Y FIG. 1. Various types of surfaces generated by probe spheres of different radii. The accessible surface is mapped out by the center of a probe sphere. The contact surface is defined by the parts of the receptor atom surface that can be touched by the probe sphere. The reentrant surface is generated by the underside of the probe sphere when it is in contact with two or more receptor spheres.

Definition of Protein Surfaces T h e definition of what constitutes the surface of a molecule is essential to the search for binding sites and clefts. The a t o m s of the protein are treated as hard v a n der Waals spheres; this is an a p p r o x i m a t i o n to the anisotropic distributions o f electron clouds around atomic nuclei but is accurate within the e x p e r i m e n t a l limits of the structural models available. The definition of a molecular surface can n o w be formulated in geometric terms. Richards 4 has given the several widely used definitions of a protein surface. The van der Waals surface or envelope is the surface obtained b y fusing the v a n der Waals shells of all the a t o m s in the structure. In m a n y cases, the positions of h y d r o g e n a t o m s within a structure are not k n o w n so that larger " u n i t e d - a t o m " values 5 for the van der Waals radii are used. This gives an isotropic representation of anisotropic groups such as a methyl group. The van der Waals e n v e l o p e is only an a p p r o x i m a t i o n to the effective surface seen b y an approaching molecule. T h e r e will obviously be parts of the surface that cannot be r e a c h e d b y external a t o m s and are therefore hidden. T w o other surfaces can be defined b y considering what h a p p e n s w h e n a single p r o b e sphere is rolled o v e r the van der Waals surface of a protein (Fig. 1). The " c o n t a c t " surface is m a p p e d out b y the part o f the p r o b e sphere 4 F. M. Richards, Annu. Rev. Biophys. Bioeng. 6, 151 (1977). 5 B. R. Gelin and M. Karplus, Biochemistry 18, 1256 (1979).

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

129

that touches the van der Waals surface. The "reentrant" surface is defined by the underside of the probe sphere when it is in contact with the van der Waals shells of two or more receptor atoms. The whole composite of contact and reentrant patches is often referred to as the molecular or Connolly surface. The physical interpretation of the molecular surface is that it marks the closest approach of two electron clouds before repulsion occurs. It also follows from Richards' definition that only atoms which contribute to the contact surface may directly interact with external atoms; the other atoms are buried inside the molecule. The "accessible" surface 6 is defined by the locus mapped out by the probe center as the sphere rolls over the receptor surface. This surface marks the closest approach of an external nucleus to the molecule. As Barry 7 has pointed out, the accessible surface is very helpful when vector models of molecules are being displayed inside a binding site, using a molecular graphics system. It is often easier for a drug designer to see where dots and sticks can be placed rather than solid spheres. The shape of the molecular or the accessible surface will be dependent on the size of the probe sphere used to compute them. A smaller sphere (RI in Fig. 1) will pick out clefts such as at X whereas a larger sphere (R2) will simply roll over this feature. The appearance of clefts or internal cavities is very sensitive to the size of probe selected. The smallest sphere that is physically realistic is one with a radius of 1.0/~ (a helium atom). Spheres of radius 1.4-I .5 .A are often taken as models of a water molecule. Other values, such as those based on a carbon united-atom van der Waals sphere, are also valid. The use of a small probe gives information about surface features on the atomic scale, such as the location of packing defects or solvent holes. Packing defects can give much information about the regions in the protein where there is some conformational flexibility, and they may indicate a hinge region, as in the kinase family,8 or a focus for disassembly, as in the human rhinovirus protein coat. 9 These points are discussed in more detail in later sections. Binding sites and clefts are medium-scale features with sizes on the order of 5-10/~. Connolly l° has argued cogently that it is not practical to look for these features simply by increasing the size of the probe sphere. Most molecules are not spherical and so are poorly modeled by a sphere; a long fiat ligand could easily slip into a binding site that a 6 B. Lee and F. M. Richards, J. Mol. Biol. 55, 379 (1971). 7 D. C. Barry, unpublished results, referenced in N. C. Cohen, J. M. Blaney, C. Humblet, P. Gund, and D. C. Barry, J. Med. Chem. 33, 883 (1990). 8 C. M. Anderson, F. H. Zucker, and T. A. Steitz, Science 204, 375 (1979). 9 E. Arnold and M. G. Rossmann, J. Mol. Biol. 211, 763 (1990). l0 M. L. Connolly, J. Mol. Graphics 4, 3 (1986).

130

PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS

[8]

large sphere would roll over. The mouth of any cleft would probably be part of the reentrant surface, and hence the whole cleft would only be seen as a shallow depression. The curvature of the contact and the reentrant surfaces would be on a markedly different scale: the contact surface is determined by spheres with radii between 1 and 2 .~ whereas the reentrant surface would have the curvature of the larger probe sphere. The size of the probe sphere therefore affects the nature of the molecular surface. Some useful results were obtained by Rose and co-workers H when they used a 10-,~ probe to look for antibody binding sites, but generally a small probe should be used. The question of probe size is central to how surface features are examined and interpreted; it is referred to frequently throughout this article. Protein Surface Representation A physical picture of the receptor surface is needed both for visualization and for determining the clefts within the structure. This is an essential step toward finding the binding sites in the receptor, but the information contained within the picture needs to be interpreted carefully. The following sections consider some of the existing surface generation algorithms. There have been many different methods devised to construct the various types of surfaces around a protein. The early approaches determined only the accessible surface of the protein. ~2,13These were followed by bit lattice methods and by precisde numerical and analytic surface programs. The Voronoi partition is also a useful surface descriptor. Bit Lattices

Proteins are dynamic systems and are constantly in a state of conformational flux. Some of the greatest changes occur on substrate binding: it would be very useful to have fast surface algorithms to aid the modeling of this process. Changes in receptor shape could be followed as the structure is manipulated; the opening and closing of binding sites and domain boundaries could be monitored semiinteractively. Bit lattices provide such a tool. An orthogonal lattice large enough to encompass completely the object of interest is set up. It is not necessary to align the lattice along the principal axes of the object provided the lattice dimensions are large enough, as the results obtained are not dependent on the orientation of ii j. Novotny, M. Handschumacher, E. Haber, R. E. Bruccoleri, W. B. Carlson, D. W. Fanning, J. A. Smith, and G. D. Rose, Proc. Natl. Acad. Sci. U.S.A. 83, 226 (1986). 12 A. Shrake and J. A. Rupley, J. Mol. Biol. 79, 351 (1979). 13 j. Greer and B. L. Bush, Proc. Natl. Acad. Sci. U.S.A. 75, 303 0978).

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

131

the object within the lattice. Each grid point on the lattice is a logical bit, being turned " o n " when the grid point is interior to the protein, " o f f " when it is exterior. Each atom is considered in turn, and its logical shadow on the grid is turned on. The overall shape of the receptor is the union of the umbrae. Changes in receptor shape can be highlighted by the comparison of two lattices. The first version of this algorithm TM performed a large number of slow distance tests to set each bit on the lattice; however, Bohacek and Guida 15have pointed out that significant time savings can be made by first storing a template of the logical shadow mapped out by each atom and then translating the template at the atom center and copying the bit settings directly onto the lattice. The main limiting factor to the usefulness of the bit lattice method is the fineness of grid spacing needed to produce an adequate representation of a molecule. Stouch and J u r s 16 computed the volume of the molecule by multiplying the number of " o n " bits by the volume on an individual cube. They compared their estimates of molecular volume to those obtained using Pearlman's methodJ 7 At a density of 1 bit per linear angstrom, the error was about 10%, improving to 1.5% at 3 bits per linear angstrom (corresponding to a grid spacing of 0.33/~). Little improvement is obtained by further reductions in grid size. The bit lattice method can be used to find contact and accessible surfaces, by defining the appropriate logical umbra for each atom. The jagged appearance of the surface can be corrected by movement of the points along a vector from the grid point to the parent atom center, but the distribution of lattice points becomes irregular and the lattices cannot be used for shape comparisons. The program of Bohacek and Guida takes 25.6 cpu (central processing unit) seconds on a MicroVax II to compute the molecular volume of crambin (327 atoms; Brookhaven TM Protein Data Bank code 1CRN). A related approach 19 combines bit lattices with fractals. Space is divided up into 2-A cubes, and each of the cubes is on or off, depending on whether the molecular surface intersects the cube. When a cube is cut by the surface, it is subdivided into eight cubes, and these cubes are retested. This process of division is repeated four or five times, producing an accu14 L. H. Pearl and A. Honegger, J. Mol. Graphics 1, 9 (1983). 15 R. S. Bohacek and W. C. Guida, J. Mol. Graphics 7, 113 (1989). 16 T. R. Stouch and P. C. Jurs, J. Chem. Inf. Comput. Sci. 26, 4 (1986). 17 R. S. Pearlman, in "Physical Chemical Properties of Drugs" (S. H. Yallowsky, A. A. Sinkula, and S. C. Valvani, eds.), p. 321. Wiley, New York, 1980. 18 F. C. Bernstein, T. F. Koetzle, G. T. B. Williams, E. F. Meyer, M. D. Brice, J. R. Brodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, J. Mol. Biol. 112, 535 (1977). i9 j. Higo and N. Go, J. Comput. Chem. 10, 376 (1989).

132

PROTEINS AND PEPTIDES" PRINCIPLES AND METHODS

[8]

rate silhouette of the molecule; the errors found in the computed molecular volumes are small. However, this representation is again irregular, and it is not obvious how it could be used in molecular docking or comparisons. One disadvantage of the bit lattice methods is that the type of surface portrayed is fixed by the way in which the shadow of an atom is defined. The usual definition of the umbra includes any point located within a sphere of radius (Rprot~ + Rvdw)centered on the atomic nucleus. Therefore, if Rprobe is changed, the bit lattice must be recomputed. A solution to this difficulty is to assign some real value to each point based on the distance to the nearest atom. 2° The points within the van der Waals envelope are all marked with a high value, C; some arbitrary distance boundary is used to define the zero level. Points outside are marked with a scaled distance between C and 0. Once the distances have been computed, information about the protein surface or binding pockets can be obtained by contouring the data, usually in tandem with a graphics program such as FRODO. 21 The contour plots can be done at several levels, corresponding to several different sizes of probe sphere, all using the same set of data. A contour of the (C - 0.0) level would give the van der Waals surface, a contour at (C - Rprob e) level would give the accessible surface for the probe being used. This technique has been used to look for solvent holes and spaces available for side-chain motion in the crystal structure of human rhinovirus. 9 The pockets show up as isolated patches of contours and so are easily identified. The running time required to produce the distance file is on the order of minutes on a MicroVax II for a medium-sized protein (around 2500 atoms). However, the memory requirements are quite heavy. When compared to the molecular surface for an equivalent probe radius, contour plots introduce slight interpolation errors. These errors become negligible with a finer division of space. It is clear that this method will be a valuable tool for looking for solvent sites and also binding pockets for larger ligands.

Numerical and Analytic Surface Descriptions A commonly used algorithm for surface visualization is Connolly's MS program. 22The MS program is based on a numerical algorithm for placing dots on the contact/reentrant surface of a protein. The dots are positioned on circles of latitude around the atom sphere at appropriate angular increments. A probe sphere is "rolled" over the surface of each atom sphere, stopping at each of the grid points. If the probe sphere does not interpene2o R. Voorintholt, M. T. Kosters, G. Vegter, G. Vriend, and W. G. J. Hol, J. Mol. Graphics 7, 243 (1989). 21 T. A. Jones, J. Appl. CrystaUogr. 11, 272 (1978). 22 M. L. Connolly, Quantum Chem. Program Exchange Bull. 1, 75 (1981).

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

133

trate or touch any of the other atom spheres, it is on a contact region of the surface (Fig. 1) and the point of tangency is a surface point. If the probe sphere is tangent to two atom spheres, its underside defines the reentrant surfaces as it rolls around the contact circle. If the probe is tangent to three atom spheres simultaneously, it is not free to move across this concave patch of surface. For the last two cases, the surface dots are generated from a grid on the probe sphere and are marked as reentrant. Each surface dot is associated with its nearest atom, and the surface normal pointing out from the protein surface can be computed using the vector from the surface point to the sphere that generated it. The graphical display of the dot surface is simple and attractive; however, it should be noted that it is still not easy to locate binding sites visually, owing to the subjective nature of human interpretation. Connolly has described a program for constructing the analytical molecular surface (AMS) from a series of geometric primitives. 23In the trivial case of a single atom, the analytical surface is described by a sphere. If there are two atoms, then the surface would be given by the intersection of two spheres and a toroid. The toroid is the shape mapped out by the probe sphere when it is simultaneously in contact with both atom spheres. Similarly, when there are three or more atoms in the molecules, the surface will be composed of fragments of atom spheres for the contact surface and tori and parts of the probe sphere for the reentrant regions. All that remains is to determine the surface primitives, compute their mutual intersections, and assemble the remaining surface patches to give a complete analytic description of the surface; the geometric complexity of the system means that this is not a task to be undertaken lightly. The AMS program performs this calculation in 2-3 cpu seconds per atom on a VAX 11/750. As the AMS consists of solid primitives, it is not easy to visualize without the use of a raster graphics system. Instead, the AMS program can be used to drive a triangulation program that splits each primitive up into a cage of interlocking triangular facets.24 It is beyond the scope of this article to describe in detail the complexities of this triangulation process, except that it can be performed to arbitrary precision by setting a maximum limit on the size of any triangle. The advantages of working with the finished polyhedron are obvious. It is trivial to compute the surface area and volume of the polyhedron, and whether any given point or solvent molecule site is interior to the surface. Internal cavities or packing defects can be detected very simply: these holes have their own patches of surface that are not connected to the main surface by any triangle edges. If the 23 M. L. Connolly, J. Appl. Crystallogr. 16, 548 (1983). 24 M. L. Connolly, J. Appl. Crystallogr. 18, 499 (1985).

134

PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS

[8]

vertices and edges of the surface are treated as part of a graph, then looking for internal cavities corresponds to the problem of finding all the disconnected components of the graph. 25 Internal pockets in the receptor structure can be found automatically by this method. The triangulated surface is also more tractable graphically. A study into the molecular volumes produced by the AMS method shows that the numerical error in the algorithm (0.01%) is less than the error in the protein atoms coordinates, even for high-resolution structures. 26The figures are accurate enough to be used in molecular dynamics simulations and for estimating the size changes of packing defects and ligand binding sites during conformational analysis. Surface Roughness

Protein surfaces have a certain texture, or roughness, that is very difficult to assess visually, and yet the texture might be indicative of a particular functionality, such as a binding site or a protein-protein interface. Surface roughness can be measured using the concepts of fractality; this could be a useful statistic for describing predictively the protein surface and for determining the clefts in the structure. The idea behind fractal dimension can be simply put by posing the following question: how long is a piece of string? The value of length depends on how finely the measurement is taken: a simple end-to-end value will be smaller than one which follows the twists of the constituent braids, which in turn will be smaller than the value obtained by tracking across all the individual atoms. The surface of any real object is inherently fractal and will have a particular fractal dimension, D, given by Surface = A(e) ore2-D

(1)

where A is the surface area and e expresses the step length of the measurement. D will have values between 2 (for a plane) and 3 (for an infinitely corrugated surface). Lewis and Rees 27 were the first to examine the fractal dimensionality of protein surfaces. The method is straightforward: for each patch of surface, compute the surface area, A, obtained from the MS program 22 using different sizes of probe, R, and plot log(A) against log(R). The patches are defined using a spherical polar grid with the origin at the protein center. The initial results indicated that surface roughness might be associated with antibody binding sites or protein interfaces, and smooth 25 R. Sedgewick, "Algorithms." Addison-Wesley, London, 1984. 26 M. L. Connolly, J. Am. Chem. Soc. 107, 118 (1985). 27 M. Lewis and D. C. Rees, Science 230, 1163 (1985).

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

135

regions with enzyme active sites. However, ,~qvist and Tapia 28 have shown this work to be slightly flawed. The assumptions made about selfsimilarity in Eq. (1) break down when the probe has a radius less than 1.5 ,~. As the probe gets smaller, dots marking internal cavities in the body of the protein become more numerous and skew the values of A obtained. This can be overcome by using larger probes and eliminating interior points from the computation: a linear plot of log(A) against log(R) is obtained even up to the largest size of probe tried (10 ,~). This study confirmed that protein interfaces have above average values of D (>2.25) but found no strong correlation between surface roughness and antibody binding sites. When retinal-binding protein 29 and preablumin (PDB code 2PAB) were rigidly docked together on the basis of matching up regions of similar roughness, the predicted match was found to have a remarkably high degree of topographical complementarity. Thus, fractal dimension is a good statistic for surface roughness and may be useful for predicting the way in which two proteins may associate, but it is not so good for finding binding sites. Pfeifer et al. 3° have looked at surface roughness to see if enzyme reaction rates were related to the fractal dimension of the surface. They showed that substrate capture is enhanced with higher values of D but diffusion across the surface is slowed down. They suggest that, in the case of lysozyme (PDB code 6LYZ), the observed value of D (2.2) in the active site might be the optimal value for catalysis. Molecular Cartography

The complexities of a three-dimensional protein surface are not readily amenable to human interpretation, and many workers have argued that a simplified two-dimensional cartographic representation would be of great value both to highlight binding sites and to aid docking studies. If the protein is assumed to be approximately spherical, it can be reasonably modeled as a globe. The surface features then appear as mountain ranges and valleys on the surface of the globe. Once a suitable datum point for "sea level" has been defined, the surface can be contoured and the resuiting map projected onto a plane using some standard cartographic formula (Fig. 2). The assumption of sphericity is not crucial to the method; most proteins are better modeled by ellipsoids. However, the better the approximation, the less distorted the extremes of the map will be. Algorithms for determining the inertial and equivalent ellipsoids for a collection of points have 28 j. ,~qvist and O. Tapia, J. Mol. Graphics 5, 30 (1987). 29 C. C. F. Blake and S. J. Oatley, Nature (London) 268, 115 (1977). 30 p. Pfeifer, U. Welz, and H. Wippermann, Chem. Phys. Lett. 113, 535 (1985).

136

PROTEINS AND PEPTIDES: PRINCIPLESAND METHODS

[8]

Overhang Region Projection Plane

II I II !"-I---I

I

I Sea Level

------~-

I

~,_ ,_ ...,_. ~

II

, (I \

II

\1

i

II

II

~11

___

II i II /

_

~3 2l .,. o

Contour Height//~

-2

/ /

, ~11 II / IIIII ,

llll IIII

Origin at Center of Mass

FIG. 2. Cartographic projection of a protein surface onto a plane. The height of the protein surface on the map is the height of the surface at the pierce point of a ray from the origin to the plane. The contours are discontinuous when the height can have multiple values, for instance, where there is an overhang. b e e n d e s c r i b e d b y T a y l o r et al.3~; the ellipsoids c a n be scaled interactively to e n c l o s e an arbitrary a m o u n t o f the protein. This is useful as the s u r f a c e o f the ellipsoid c a n be u s e d to set the sea level m a r k n e c e s s a r y f o r the c a r t o g r a p h i c projection. T h e g o o d n e s s o f fit o f the ellipse c a n be e s t i m a t e d f r o m the ratio o f ellipsoid v o l u m e to n u m b e r o f p r o t e i n residues, w h i c h s h o u l d h a v e a value o f a p p r o x i m a t e l y 140/~3/residue.32 S o m e w o r k e r s h a v e a d o p t e d the c o n v e n t i o n that the ellipsoid axes s h o u l d be half the length o f the p r o t e i n principal axes so that the sea level d a t u m lies c o m 3t W. R. Taylor, J. M. Thornton, and W. G. Turnell, J. Mol. Graphics 1, 30 (1983). 32D. C. Teller, Nature (London) 260, 729 (1976).

[8]

137

DETERMINATION OF CLEFTS AND B I N D I N G SITES

I

y FIG. 3. Mollweide projections of the molecular surface of lysozyme. The surface is contoured at intervals of 1 ~ . Only regions with heights exceeding 8/~ are displayed, to show the areas of high solvent exposure. Experimentally observed antigenic sites are outlined in bold. There is a strong correlation between these sites and the exposed residues. (Reproduced with permission from Fanning e t a/. 33)

pletely within the outer surface shell of the protein. 33 All surface points within the ellipsoid are removed, so that internal cavities are not displayed by this approach. The height of each surface point is defined as the distance between the surface point and the ellipsoid, moving along a vector from the centroid. The surface of the ellipsoid is divided up into squares of longitude and latitude, and the average value of the protein surface height in each square is computed. The grid is then contoured. The ellipsoid grid can be converted to a two-dimensional map using an equal area projection, such as the Mollweide or Hammer projections. These maps have been used to inspect surface charge distributions, 34 to look at pseudosymmetric protrusions in y-crystallin, 35 and to look for antigen binding sites. It has been hypothesized that experimentally observed antibody binding sites correlate strongly with regions of the surface with high solvent exposure. Fanning e t al. 33 noted that the mountain ranges (height >8/~) seemed to match the experimentally observed antibody sites in lysozyme (Fig. 3). In a related study, Novotny e t al. 11 considered the contact surface of a protein that is defined by a 10-,~ probe and found that the antigenic regions corresponded to the surface segments that were particularly accessible to the probe, that is, the most prominent features 33 D. W. Fanning, J. A. Smith, and G. D. Rose, Biopolymers 25, 863 (1986). 34 D. J. Barlow and J. M. Thornton, Biopolymers 25, 1717 (1986). 35 y . Chirgadze, N. Kurochkina, and S. Nikonov, Protein Eng. 3, 105 (1989).

138

PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS

[8]

on the contour map. The correlation between antigenic potential and solvent exposure seems to be better than for other statistics, such as B factors. 36 This can be rationalized by noting that high solvent exposure frequently results in mobility, especially in loop regions of the protein backbone, thus giving rise to higher B factors and the apparent correlation observed. Not all exposed regions will be mobile, as local crystal packing forces may prevent motion. The cartographic projection can give some indication of the antigenic potential of regions of the protein surface, and this can be used to determine automatically the binding sites. This work has been extended to look at the profiles of antibodies themselves, 37 and again a strong correlation between experimentally observed antigenic sites and regions of high solvent exposure was observed. CH2 chains have more possible antibody-binding sites than Fab fragments, suggesting that each antibody is itself autoantigenic and that control of the immunoglobulin response may be modulated by a network of immune recognition. There are criticisms that can be made of molecular cartography. Connolly has pointed out 1° that most geographical mapping techniques work because the terrestrial topographic features are on a very much smaller scale than the earth as a whole so that the surface is approximately fiat, making the definition of sea level simple. This is not true for proteins, where some of the features can be as large as the principal axes. Thus, the definition of a zero height is, to some degree, arbitrary. It is also well known that even equal-area cartographic projections still contain distortions for latitudes above 70°. The surface height is a single-valued quantity so that features like overhangs or internal cavities are missed (Fig. 2). If these caveats are kept in mind, then the method of molecular cartography is very useful tool for identifying broad depressions and constellations of residues that often mark binding sites.

Negative Sphere Description The negative sphere method of Kuntz et al.38deserves special consideration as one of the first methods for automatically determining binding sites; it still sets the standard for other algorithms. In terms of the lock-andkey analogy of drug-receptor interactions, this method tries to produce a mold of the lock, that is, the keyhole. The MS surface 2z of the receptor is generated using a 1.4-,~ probe; the 36 E. Westhof, D. Altschuh, D. Moras, A. C. Bloomer, A. Mondragon, A. Klug, and M. H. V. Van Regensworth, Nature (London) 311, 123 (1984). t7 j. Novotny, M. Handschumacher, and E. Haber, J. Mol. Biol. 189, 715 (1986). 38 I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Langridge, and T. E. Ferrin, J. Mol. Biol. 161, 269 (1982).

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

139

Negative Sphere Image

~ i ii , ~-

Molecular Surface

Receptor Atoms

FIG. 4. Negative image of a protein surface. Spheres are generated to fill the mold of a cleft in the surface. The center of each sphere lies along a normal vector to the surface, and each sphere is in contact with exactly two points on the surface.

normals associated with each dot point on the surface are also computed. A negative image of the surface is obtained by constructing a set of receptor spheres with the following properties: (a) each sphere must be in contact with two dot points and its center must lie on the surface normal to one of the points, and (b) each sphere must lie on the outside of the receptor surface, that is, it must not intersect the van der Waals envelope (Fig. 4). If a surface is described by N dots, it follows that, for each dot, it is possible to construct (N - 1) receptor spheres. However, only the sphere with the smallest radius from each set of (N - 1) spheres need be considered. As the center of each sphere in the set is along the surface normal at the dot (Fig. 4), it follows that all the larger spheres will cut the van der Waals envelope and so be disallowed. There will still be a large number of spheres left; further simplification is necessary so that this description is economic enough to be used by the combinatorial site analysis programs. A reduction in the number of receptor spheres can be achieved using two heuristics. The angle defined by the two dot points and the center of the corresponding sphere should be greater than a set value, typically 90 °. A smaller value indicates that the receptor sphere is sitting perched on top of a small invagination, such as a patch of reentrant surface. Little information about the site is lost by the omission of these spheres. Spheres with a radius greater than, say, 5 ,~, are also removed. The dot points that give rise to such a sphere must be at least 7 .~ apart and are more likely to be at the extremes of the site: graphical inspection of these large spheres showed that they generally form a "froth" at the site mouth. A final simplification is made taking the largest

140

PROTEINS AND PEPTIDES." PRINCIPLES AND METHODS

[8]

sphere from the collection associated with the contact surface of each receptor atom and, similarly, the largest sphere from each patch of reentrant surface. The final set of receptor spheres forms a negative image, or mold, of the receptor surface. The way in which the spheres are originally selected means that clefts in the receptor surface will be filled up with spheres whereas convex regions will not have associated spheres. The receptor spheres can be grouped together geographically by clustering on the basis of overlap. If two spheres intersect, then they belong in the same cluster. Each cleft in the structure should belong to a distinct cluster and so should be automatically identified by this procedure. The binding site is usually described by the largest of these clusters and independent subsites by the other groupings. There are more sophisticated ways in which the clustering of the receptor spheres could be done, in order to provide a cleaner separation and identification of the true clefts. The absolute overlap criterion described above will separate the spheres into disconnected groups, each marking a possible binding site or cleft. It might be chemically sensible to subdivide some of these principal clusters: two biochemically distinct binding sites might be joined by a single sphere, whose removal would not markedly affect the negative image of either site. There are also cases where a single binding site has a deep cleft and a shallow mouth, for instance, the methotrexate binding site of Lactobacillus casei dihydrofolate reductase (PDB code 4DFR): attention might be diverted from the cleft and focused instead on the larger number of loosely connected spheres at the mouth. It would be unwise to formulate strict a priori rules about how to subdivide the principal receptor sphere clusters; the danger of finding valid counterexamples is too high. Shoichet 39has been investigating approaches to give more physically sensible divisions, based around a single-linkage clustering algorithm. For each pair of objects in a system, some measure of similarity between them can be defined. In the single-linkage algorithm, an association is formed between the two most similar objects, forming a cluster with two members. The next association is formed between the next most similar pair. The clusters are formed by sets of linked objects. Obviously, if this process were to be continued, all the clusters would fuse into one, so that some arbitrary level of similarity between objects must be set as a cutoff. The measure that is used for receptor spheres is percentage radical overlap; if the percentage overlap between two spheres is less than, say, 20%, then they cannot be paired up. There are other variable parameters that have been devised to control cluster growth. One could 39 B. Shoichet, thesis in preparation, University of California, 1990.

[ 8]

DETERMINATION OF CLEFTS AND BINDING SITES

I ~ : ~ /e/~'

141

Probe cube

#

Probe sphere ~,~

Molecular dot surface

Solid angle to

FIG. 5. The curvature of a protein surface can be measured analytically by the solid angle to subtended by a probe sphere centered at a point on the surface. The number of surface points enclosed by the probe cube gives an approximate numerical value for the curvature. The values of curvature are dependent on the size of the probe.

limit the size of sphere allowed to f o r m associations. This is aimed at preventing a single large sphere on a small surface simply linking two adjacent clefts. Similarly, the clustering could be p e r f o r m e d only with small spheres, then incrementally larger ones could be added in, and so on. This has the effect of giving greater importance to those spheres that describe the d e e p e r invaginations. Clustering of the r e c e p t o r spheres automatically identifies and separates out the different binding sites and clefts within a r e c e p t o r structure. This a p p r o a c h has already shown its worth in the studies of K u n t z and cow o r k e r s on drug design and discovery. 4°'41 A database c o m p o s e d of the three-dimensional coordinates of a variety of chemical c o m p o u n d s can be searched to find structures that have the best steric m a t c h to the negative image of the site. The structures that do best in the sterie screen can be used as the basis for a p r o g r a m of drug discovery. Solid A n g l e s

T h e c o n c e p t of using solid angles to describe the curvature of a protein surface is intuitively simple. A sphere is centered on a point on the surface, and the a m o u n t of protein surface enclosed inside the sphere is c o n v e r t e d to a solid angle: the larger the angle, the m o r e c o n c a v e the surface and vice v e r s a (Fig. 5). Clefts will show up as large regions of negative curva4oR. L. DesJarlais, R. P. Sheridan, J. S. Dixon, I. D. Kuntz, and R. Venkataraghavan, J. Med. Chem. 29, 2149 (1986). 41 R. L. DesJarlais, R. P. Sheridan, G. L. Seibel, I. D. Kuntz, and R. Venkataraghavan, J. Med. Chem. 31, 722 (1988).

142

PROTEINS AND PEPTIDES'. PRINCIPLES AND METHODS

[8]

ture. The radius of the sphere defines the scale of the features that can be identified. A 1-A probe sphere would be sensitive to changes in curvature on the atomic scale, whereas a 6-/~ probe sphere will highlight larger features such as clefts and ligand and antibody binding sites. It is obviously more sensible to use a larger probe to locate the major features first and then to use a more detailed approach, such as a Voronoi method, to obtain information at the atomic level. The use of a large probe sphere alone may give misleading information, as not all concavities are well mapped by a sphere and most ligands are not even approximately spherical. The best solution is to examine the surface several times with probes of different sizes. The solid angle subtended by the intersection of the protein surface with the probe sphere can be computed analytically or numerically. Connolly 1° has described programs to compute solid angles based on the analytic23or MS 22descriptions of a protein surface. The probe spheres are placed on the surface at a sample spacing of about 1 ,A. The solid angle values are then smoothed over a small region to give an average statistic for the curvature of the region. The analytic computation of curvature for lysozyme took 20 cpu hours on a VAX 11/750. The numerical algorithm was slower and was used only to verify the results of the analytic algorithm. A much faster numerical method is described in the following section. The solid angle values can be color-coded to give a visually attractive and informative way of identifying binding sites. They can also serve as local shape descriptors for use in docking studies. 42

fl Splines and Surface Density Protein surfaces have also been analyzed by Colloc'h and Mornon 43 using a/3 spline and a surface density method, the former being more useful as a tool for visualizing the medium-scale features in the surface. Both techniques are worth describing in detail as they act as very efficient filters; attention is focused immediately on binding sites and clefts and away from overall shape or atomic noise. The MS s u r f a c e 22 is constructed, and the surface dots are projected onto a series of orthogonal planes that divide up the protein into 1-/~,cubes. In essence, all dots within a cube are pushed onto the three lower walls of the cubes. For the splining, each plane is taken in turn, and a closed curve is drawn to enclose all the dots within the plane. 44 When all the curves are displayed on a graphics screen, a wire mesh model of the 42 M. L. Connolly, Biopolymers 25, 1229 (1986). 43 N. CoUoc'h and J.-P. M o r n o n , J. Mol. Graphics (in press). 44 C. De Boor, J. Approx. Theory 6, 50 (1972).

IS]

DETERMINATION OF CLEFTS AND BINDING SITES

143

protein is produced. From this model, it is visually obvious where the clefts and possible binding pockets are. The surface neighborhood density is a semiquantitative measure of this visual assessment. The method is similar to the solid angle algorithm of Connolly.10 The surface is examined with a 7-_A cube, and the number of surface dots that are within the walls of the probe cube are counted to give a value of the density of the neighborhood around the center of the cube. Each plane is examined in turn, and the probe center is placed 1.0 ,~ away from each dot in the plane to build up a picture of the local variation of the surface density. There is an intuitive correlation behind the number of points enclosed (the density) and the curvature of the protein surface (Fig. 5). The regions of highest curvature can be clustered to give the automatic location of pits, pockets, and clefts. The use of a 7-/~ cube tends to smooth out features that are smaller in size than this, that is, atomic features. It will be appreciated that arguments both for and against a probe size of 7 ,~ can be made; the choice of probe size depends on the sort of features one wishes to investigate. The method takes 84 cpu minutes on a VAX 3500 to compute the plane sections and surface density values for a protein with 3192 atoms. The results are only semiquantitative, owing to the approximate numerical nature of the algorithm, and so need to be interpreted with some care. However, this new method does have the advantage of simplicity and has been shown to produce automatically binding site maps of the surface of several different protein systems.

Fourier Series Description A protein surface can be described by a Fourier series. 45 The method involves the projection of the MS surface 22 onto a sphere surrounding the object, in a similar way to the method of molecular cartography. This immediately creates difficulties when the surface is multivalued, owing to the presence of internal cavities or highly reentrant surfaces, so a large radius probe sphere should be used to create the surface. The projection gives the surface in terms of spherical polar coordinates that can be converted to a Fourier series by numerical integration. The computation of the first 50 coefficients in the series takes 9 cpu hours on a VAX 8600: about 20 terms are needed to give a reasonable approximation to the surface. Although the coefficients so obtained are dependent on the local coordinate scheme, the authors speculate that it might be possible to minimize an objective function to find the optimal superposition of two 45 S. E. Leicester, J. L. Finney, and R. P. Bywater, J. Mol. Graphics 6, 104 (I988).

144

PROTEINS AND PEPTIDES" PRINCIPLES AND METHODS

[8]

FIG. 6. Voronoi tessellation of a set of atoms. All of the space has been partitioned into domains, and each of the domains is convex.

objects (for an algorithm to do this, see the work of Dean and Chau 46) and to use the Euclidean distance between the vectors of Fourier coefficients as a measure of steric similarity. It should be noted that this would be a global measure of shape difference and would not highlight local changes such as capping-off of a ligand binding site. The Fourier description makes the visual assessment of the receptor surface easier but cannot help in the automated determination of clefts and binding sites. Voronoi Methods

A Voronoi tessellation 47 divides a given space around a set of points or atoms into a series of domains with the fundamental property that all the points within a given domain lie nearer to the domain atom than any other atom. More formally, for a set of atoms, Pi, surrounded by domains Vi, then for any other point, X, in Vi: Distance(Pi --->X) -< distance(Pj ~ X)

for all j # i

(2)

The domains are bounded by the perpendicular bisectors between P i and its neighbors: this gives rise to a frog-spawn pattern of polyhedra in three dimensions (Fig. 6). This partition is space-fitting, and all the domains are convex. Several algorithms 48 have been devised to derive the Voronoi 46 p. M. Dean and P.-L. Chau, J. Mol. Graphics 5, 152 (1987). 47 G. F. Voronoi, J. Reine Ang. Math. 134, 198 (1908). N. N. Medvedev, J. Comput. Phys. 67, 223 (1986).

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

145

tessellation; the most conceptually simple is that of Brostow et al. 49 For each atom, Pi, a list of possible neighbors, Pj, is made by sorting the other atoms in order of the distance Pi ~ Pj. The bisecting plane is set up between Pi and the top atom on the list, thus dividing space into two. The next atom, Pk, is taken; if it is in the same region of space as Pi, it is accepted and another bisector face is set up. This repeated for all atoms to give a series of planes that form the faces of a convex hull, the direct polyhedron. The vertices of this hull are found from the intersection of the bisector planes. In the next cycle, the bisector planes between Pi and the remaining nonneighbors, Pl, are constructed, and, if a vertex of the direct polyhedron is cut off from Pi by a plane, P~ is added to the set of neighbors. The polyhedron formed by the final union of all these bisectors is the Voronoi domain. The properties of the Voronoi tessellation make it very suitable for studying the properties of proteins, like molecular volume and surface area (for a review of this field, see the work of RichardsS°). There is difficulty in how to ensure that all the domains are closed. If the Voronoi tessellation of a protein is constructed, it is quite likely that many of the surface atoms will have unbounded domains. Finney 51 discusses this problem of surface atoms in some detail; his solution is to place the protein in a lattice bath of solvent so that all the protein atom domains will be capped by faces formed with nearby solvent molecules. When this was done, it was found that most atoms had domains of similar size, that is, the standard deviation in the domain volume was around 12%. Domains with above average volumes might be part of packing defects or, if the deviation is larger, part of binding sites. Richards 52 describes a statistic, van der Waals volume divided by Voronoi volume, that generally has a value between 0.7 and 0.78. Again, local fluctuations in this quantity could indicate a packing defect and could be used to look automatically for solvent holes. More recently, Gregoret and Cohen 53have devised a parallel method, based on modeling residues by a series of spheres, to assess quality of packing in model-built protein structures. The Voronoi tessellation can thus be used to look for packing defects because of its space-filling properties. However, as both Richards 52 and Finney 5~ have pointed out, the partition is not chemically meaningful and misallocates the space around nonbonded and covalent contacts. Richards' method redefines the position of the bisector plane according to 49 W. Brostow, J.-P. Dussault, and B. L. Fox, J. Comput. Phys. 29, 81 (1978). 50 F. M. Richards, this series, Vol. 115, p. 440. 51 j. L. Finney, J. Mol. Biol. 96, 721 (1975). 52 F. M. Richards, J. Mol. Biol. 82, 1 (1974). 53 L. Gregoret and F. Cohen, J. Mol. Biol. 211, 959 (1990).

146

PROTEINS AND PEPTIDES" PRINCIPLES AND METHODS

[8]

the van der Waals or the covalent radii of the neighboring atom pairs. Gellatly and Finney54 use radial plane dissection to split space. The Voronoi method tends to give relatively more space to atoms that have small van der Waals shells than it should. The method of Richards gives a more physically reasonable picture but at the cost of mathematical rigor: not all space is accounted for, as the vertices from adjacent polyhedra do not coincide. Finney55 has computed the domain volume statistics for ribonuclease S (PDB code 1RNS) and has estimated that the error arising from these vertex holes may be around 4%. The radical plane partition is exact but is a nonlinear function of atom separation, so that covalently bonded neighbors are not handled well. However, if residue volumes instead of atom volumes are considered, the radical plane method does better than the other two methods in highlighting packing defects and interior solvent binding sites. The Richards and the radical plane approaches are not particularly sensitive to how the solvent/surface interface is defined, so that they give a valuable, probe-independent way of looking for cavities. This work has been used in studying tertiary packing motifs in proteins 56 and for monitoring the packing defects introduced during a molecular dynamics simulation of a tyrosine ring flip in bovine pancreatic trypsin inhibitor. 57 There has been an attempt to use Voronoi polyhedra as a probe to investigate the interactions that occur when a substrate binds to its enzyme, in order to assess the ability of the site to immobilize the substrate. David 58 used the example of a trisaccharide binding to lysozyme. When the two reacting species approach, they should present complementary faces, and so the interfacial plane area should be large and possibly maximal for an ideal orientation. However, as noted above in the case of surrounding solvent, this statistic is not sensitive to small displacements in atom positions; David found that the derivatives of area with respect to displacements of substrate atoms were very small, so that interfacial area is not a useful predictor of binding orientation. It is worth noting that the largest faces of the Voronoi polyhedra were generated by the atoms involved in the enzyme reaction pathway. This ties in well with the dictum of Finney that the area of a face is indicative of the degree of interaction. The correct side-chain groups are thus apposed and, owing to the nearby packing defects, are free to move as the biochemical reaction proceeds. The Voronoi tessellation is very useful for detecting packing defects 54 B. J. Gellatly and J. L. Finney, J. Mol. Biol. 161, 305 (1982). 55 j. L. Finney, J. Mol. Biol. 119, 415 (1978). 56 j. W. Ponder and F. M. Richards, J. Mol. Biol. 193, 775 (1987). 57 j. A. McCammon, C. Y. Lee, and S. H. Northrup, J. Am. Chem. Soc. 105, 2232 (1983). 58 C. W. David, Biopolymers 27, 399 (1988).

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

147

and internal cavities. However, binding sites that are not capped off will not be located by the above methods, as they will be filled up with molecules from the solvent lattice used to close the domains of surface atoms. The closure of the surface domains can be accomplished by putting the protein in a cuboid and letting the cuboid faces close off each open domain. This is a virtual image method: a dummy atom is constructed using the cuboid face as a mirror plane. Most of the surface atoms will now have a virtual neighbor. Surface atoms will also have an abnormally large volume (compared to the average domain volume). The atoms in the interior will still have the same small domains as before. David 59hypothesized that the remaining domains with intermediate volumes will be atoms in clefts and binding sites and that, by clustering on domain volume and interatomic distance, these features could be determined automatically. It is likely that the hypothesis is correct, although we 6° have had little success in finding a measure of dissimilarity or a clustering algorithm that would give a reasonable grouping of the Voronoi domains of L. casei dihydrofolate reductase. It would seem more reasonable to impose some arbitrary heuristics on the data first. All domains that intersect the bounding cuboid must be generated by outer surface atoms, so can be ignored in the search for the binding site. Atoms that are interior to the protein would not normally have solvent neighbors and so should have the same volume statistics as those found by Finney. 54A heuristic cutoff can be based on these averages. Any domain with a volume less than, say, 30 ~3 is said to be interior. The remaining atoms are likely to be associated with clefts; their domains are larger because the solvent neighbors have been removed. The data set obtained after heuristic partition is much smaller and seems to group more clearly. Preliminary investigations indicate that this method might be a valid, probe-independent method for finding clefts and binding sites. Further study is needed on other protein systems before the value of this approach can be judged. Lewis has used the properties of a Voronoi tessellation to guide directly the search for clefts and pockets in receptor surfaces. 61 The protein is orthogonally sectioned into a series of slices, and the van der Waals shells of the receptor atoms are projected onto the slices; the radii of the resultant circles are the effective radii of the atoms in the slice. The atoms in the slice are tessellated. The surface of the protein is then found by rolling a probe sphere around the tessellation, starting at the atom with the largest x coordinate, which must be part of the surface. The faces of the local 59 C. W. David, Comput. Chem. 12, 207 (1988). 6o R. A. Lewis, unpublished results (1989). 61 R. A. Lewis, J. Comput. Aided Mol. Des. 3, 133 (1989).

148

PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS

[8]

Voronoi domain are used to find the next neighbor on the search path of the probe. The condition for a surface closure between two surface atoms I and J is given by dIJ < p

(3)

where p < 2Rprobe + R I + Rj. R I and Rj are the effective sphere radii of atoms I and J after projection, and Rprobeis the probe sphere radius, usually set to a value of 1.5/~. If dij --~ p, then the probe can slip between I and J and the line IJ is not part of the surface boundary. The surface is traversed anticlockwise around the faces of each Voronoi domain (Fig. 7a), ignoring any neighbors that are dummy images. Thus, the next anticlockwise neighbor to atom A after atom O is B; for atom B, the next neighbor is F. If the surface between two neighbors cannot be closed, there are two possible explanations. If dAB > 2p, there may be some medium-scale invagination between the neighboring points, as in the case of B and F. The search must be redirected around the correct path BCEF; this is achieved by placing a traffic island, in the form of a dummy domain, midway between B and F (Fig. 7b). As the dummy tiles are ignored in the search for neighbors, the next neighbor to B is now C, as required. The dummy tile is left as a marker at the cleft; only the neighbors of the dummy atom are part of the cleft. If the cleft is a large feature, then a clump of contiguous dummy tiles is created as the search progresses around this feature. The relative size of the cleft can be assessed by the number of dummy domains that are required to fill it up; the atoms which make up the cleft can be read from the neighbor lists of the dummy domains. The other possibility is that there is a small atomic-scale bump, and there may be a common neighbor, D, between C and E such that both dcD < p and dDE < P; the surface is written CDE. A 6-fi~ slice through the pteridine binding site ofL. casei dihydrofolate reductase, containing about 500 atoms, was divided into a Voronoi tessellation in 1.2 cpu seconds, and the surface and clefts were determined in another 0.8 cpu seconds on a IBM 3084Q (Fig. 8). In addition to the active site, which was marked by three dummy tiles, several other clefts and pockets were highlighted. Similar results were obtained for a slice through the amidinophenyl pyruvate site of trypsin (PDB code 1TPP). The obvious advantage of this method is its relative speed, which allows it to be used interactively. The results are not meant to be absolutely quantitative; all medium-scale features will be located, even though not every atom that is a part of a feature is always picked up on the neighbor list. Internal cavities will not be located as they are not connected to the surface. The other advantage of the method is that medium-scale features are found using a small, solvent-sized probe, unlike most of the algorithms

[8]

149

DETERMINATION OF CLEFTS AND BINDING SITES

i.

,o Dummy A

I" \ FIG. 7. Determination of clefts with a Voronoi tessellation. (a) The surface is found by wrapping up the atoms in an anticlockwise direction. The next anticlockwise neighbor of atom A, after O, is atom B. (b) The surface between atoms B and F cannot be properly closed so a dummy tile is added. The dummy tiles are ignored in the search for neighbors; their function is to redirect the search and to mark clefts. The surface between C and E also cannot be closed: a dummy tile is not needed here as there is a common neighbor, D, such that the path CDE forms a closed surface.

d i s c u s s e d so far, w h i c h u s e large p r o b e s a n d t h e r e f o r e alter the n a t u r e o f the r e c e p t o r surface.

I n d i r e c t M e t h o d s for D e t e r m i n i n g B i n d i n g Sites T h e m o s t u s u a l c o u r s e in t h e d i s c o v e r y o f n o v e l r e c e p t o r t a r g e t s is t h a t a l i g a n d o r a n a n t i b o d y to the p r o t e i n is f o u n d first, a n d o n l y m u c h l a t e r is the s t r u c t u r e o f the p r o t e i n e l u c i d a t e d . T h e p r o p e r t i e s o f the l i g a n d s c a n

150

PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS

[8]

°10~

FIG. 8. Voronoi tessellation through a 6-A slice of dihydrofolate reductase. The atom domains are shown in white, and the dummy tiles are shaded. The binding site has been automatically marked with a set of dummy tiles. Several other small pockets are also highlighted by dummy tiles. (Reproduced with permission from Lewis. 61)

be used to deduce the properties of the receptor binding site indirectly, through systematic variation of the shape and properties of the known ligands. The information obtained by this approach can be augmented by mutagenesis experiments. This type of modeling is a very important tool for deducing binding sites, especially for membrane-bound receptors, for which structural information is scarce. The concept of a pharmacophore is essential to the practice of site mapping. It is postulated that there is a unique geometric pattern of atoms within the site that can form attractive intermolecular interactions with a ligand. Only compounds that can display the complementary pattern of ligand atoms, or the pharmacophore, can be recognized by the site. If the structure of the site is known, it is a relatively straightforward procedure to postulate a pharmacophore and then perform ligand binding and mutagenesis experiments to test the hypothesis. When the site is not known, the pharmacophore has to be put together from the common accessible conformations of any known ligands, assuming that they all bind in the same way and to the same site. The existence of multiple binding modes,

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

151

exemplified by ligand binding to dihydrofolate reductase,62 will make pharmacophore mapping very difficult. Recent work by Crippen 63 on Voronoi site models may provide a way of deriving the pharmacophore even when there are different possible binding modes. The pharmacophore model will give a good negative image of the binding site.

Steric Mapping If a pharmacophore for a site has been established, the steric parameters at the binding site can be estimated by volume mapping the differences between high- and low-affinity ligands by the techniques of minimum steric difference,64 excluded volume analysis, or shape analysis.65 Steric mapping has been used by Liu and c o - w o r k e r s 66'67 to construct a three-dimensional cavity that represents the boundary of the retinal binding site. The structure of retinal is a cyclohexenyl ring attached to a conjugated chain of four double bonds terminated by an aldehyde group. Mutation experiments suggest that the aldehyde is anchored by formation of a Schiff base with a lysine side chain. The ring is thought to lie in a hydrophobic pocket. All 16 possible geometric isomers of retinal have been synthesized and their biological activities assayed to give a measure of their affinity at the site. The crystal structure or, in lieu of that, the energy-minimized structure of each isomer was taken and displayed graphically. The 1 1-cis isomer has the highest affinity and so is assumed to be the best model for the pharmacophore. The aldehyde group was tethered to a dummy lysine constructed using a molecular modeling package. This constrained the site lengthwise between the Ca of the amino acid and the cyclohexenyl ring. All of the structures were then superposed on these two anchor points. The all-trans isomer could not be sensibly fitted to this model, tying' in with its experimentally observed low affinity. The other isomers all fitted the longitudinal restriction. The inactive 13-cis isomer has atoms that fall into a unique region of space that is not populated by atoms from other

6z G. C. K. Roberts, in "Quantitative Approaches to Drug Design" (J. C. Dearden, ed.), p. 91. Elsevier, Amsterdam, 1983. 63 G. M. Crippen, I. Comput. Chem. 8, 943 (1987). 64 Z. Simon, A. Chiriac, S. Holban, D. Ciubotaru, and G. I. Mihalas, "Minimum Steric Difference: The MTD Method for QSAR Studies." Research Studies Press, Wiley, Letchworth, 1984. 65 N. C. Cohen, Adv. Drug Res. 14, 42 (1985), and references therein. 66 R. S. H. Liu, A. E. Asato, M. Denny, and D. Mead, J. Am. Chem. Soc. 1116, 8298 (1984). 67 R. S. H. Liu and T. Mirzadegan, J. Am. Chem. Soc. 110, 8617 (1988).

152

PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS

[8]

Fro. 9. Steric map of the retinal binding site deduced from ligand binding studies. The chicken-wire contour represents the furthest extent of space occupied by the active isomers of retinal, assuming that they bind in the pharmacophoric mode. The most potent analog, the 11-cis isomer, is enclosed within the mesh. The inactive analog, the 13-cis isomer, cannot display the correct pharmacophoric pattern without some of its toms moving outside the mesh. These atoms will occupy the same regions of space as receptor atoms and will be repelled. (Reproduced with permission from Liu et a/. 67)

isomers (Fig. 9). It is likely that these atoms experience steric repulsion by the site, thereby indicating a region of space where some receptor atoms must lie. The union of the surfaces of the active isomers indicates free space in the site, and the differences between this and the surfaces of the inactive isomers indicates solid protein. Further derivatives of retinal are being synthesized in an effort to determine the steric site boundary more exactly. As the intermolecular interactions that control the formation of the retinal complex are mainly hydrophobic, the steric model will be a fairly good approximation to all the salient features of the binding site.

Active Analog Approach In the case of the retinal-binding site, the deduction of the pharmacophore was made easier by the fact that there were very few ligand groups that could form specific interactions with the site. Unfortunately, the situation is usually more complicated than this. Marshall has devised the active analog approach to determine the geometric disposition of ligand groups consistent with experimentally observed affinities and a theoretically reasonable energy cutoff. Ifa ligand has to adopt a strained conformation in order to display the correct pharmacophoric pattern, it will obviously have a lower affinity than a less strained counterpart making

[8]

DETERMINATION OF CLEFTS AND BINDING SITES

153

approximately the same contacts with the receptor. The ligand conformational space that lies beneath the energy cutoff is explored, and common atom motifs displayed by all the ligands are searched for. The best motif will be a model for the pharmacophore. The exploration of conformational space is performed by torsion angle driving so the energy cutoff helps to limit the computational cost of the program. The cutoff cannot be set too low or some ligands may not display the common motif. There are no firm a priori rules for setting the level of this cuttoff. The power of the active analog approach can best be demonstrated by reference to a specific example of its use in drug design. The study discussed here is by no means unique; the use of this method in designing more potent ligands for many enzyme systems has been reviewed by Marshall. 68 The active analog approach has now been extended to include three-dimensional QSAR information. 69 Angiotensin-converting enzyme (ACE) is responsible for the cleavage of the endogenous peptide angiotensin I to angiotensin II. The latter is a highly active vasoconstricting agent, and the inhibition of ACE is used as a means of treating hypertension clinically. The first inhibitors were peptides isolated from snake venoms; QSAR studies showed that Ala-Pro derivatives were good substrates for ACE.7° It was argued that ACE should have a mechanism of action similar to that of carboxypeptidase; this conclusion was strengthened by the discovery that, like carboxypeptidase, ACE contained a zinc atom in its active site. A thiol group was therefore introduced into the next generation of ligands to anchor the Ala-Pro-like structures in the active site. The best compound designed by this group was captopril. The complete structure of the receptor or the binding site has yet to be elucidated The approach adopted by Hassall e t al. 71 w a s based on the assumption that the bound conformation of captopril is fairly similar to its conformation in solution. The relative orientations of the carbonyl of the amide group and the terminal acid group in captopril are fixed by conjugation in the amide and the semirigid five-membered ring. It was thought that these groups bind to the ACE site via a hydrogen bond and a salt bridge, respectively. A knowledge of the geometry and nature of these types of interactions permits limits to be placed on the lication of these receptor groups. The thiol group can rotate freely about two bonds to describe a 68 G. R. Marshall, Annu. Rev. Pharmacol. Toxicol. 27, 193 (1987). 69 G. R. Marshall and R. Cramer, Trends Pharmacol. Sei. 9, 285 (1988), and references therein. 7o M. A. Ondetti, B. Rubin, and D. W. Cushman, Science 196, 441 (1977). 71 C. H. Hassall, A. Krohn, C. J. Moody, and W. A. Thomas, J. Chem. Soc., Perkin Trans. 1, 155 (1984).

154

PROTEINS AND PEPTIDES" PRINCIPLES AND METHODS

[8]

Bridge

HS

0

COOH

H - bond promoter Zn FIG. 10. Proposed pharmacophore and map of binding site groups for angiotensin-converting enzyme and the inhibitor captopril. The carboxylate group is thought to form a salt bridge, the carbonyl group a hydrogen bond, and the thiol a link to a zinc atom. (Reproduced with permission from Hassall e t a/. 71)

bowl-shaped locus. After the high-energy configurations (>200 kJ/mol) have been stripped off this locus, the remaining region can be used to limit the position of the zinc atom in the receptor. These results can be displayed graphically (Fig. 10). This pharmacophore was used to determine the receptor site at an elementary level. The model was refined through the synthesis and in vitro assay of several rigid bicyclic systems that further restricted the movement of the thiol group. The precise geometry of each compound was determined, and the regions accessible to the thiol group in low-energy conformations were calculated by molecular mechanics. Molecular modeling was used to superpose these new structures against the structure of original

[8]

155

DETERMINATION OF CLEFTS AND BINDING SITES

•'

"

:

: .;.'.::~ i~7,::.":.. . - ' . . " ~ " ~ . ;

~..":;~

Compound

I

[

IC 50

r6

(10)

F5

(9) (11)

0.038 0.1

r4

(5)

0.48

r3

(4)

0.7

F2

(6) (7)

1-.-6

I F1

(8)

43

FIG. 11. The map of the angiotensin-converting enzyme binding site has been refined by assaying rigid bicyclic analogs of captopril for activity (IC50) against the enzyme. The carboxylate and the carbonyl groups were held fast; the thiol was free to move. The locus of the thiol to the black regions of the locus showed the highest activity. It is therefore likely that the zinc atom is located near these regions of space. (Reproduced with permission from Hassali e t a / . 71)

pharmacophore so that the loci of the carbonyl, acid, and thiol groups were well matched up. The intersection of the thiol loci was used to define the position of the zinc atom more precisely (Fig. l 1). The intersecting regions can be coded according to the biological activity of the ligands. This use of computer graphics allows the drug designer to interpret and visualize the pharmacophore model very easily. The final locus of the zinc atom, determined in this manner, is now very small; the pharmacophoric mapping of these receptor binding groups has allowed the partial determination of the binding site. KARMA

KARMA (knowledge-assisted receptor mapping analysis) 72 is a modeling tool that can incorporate information from QSAR and conformational analysis studies into a three-dimensional graphic map of a binding site. A pharmacophoric map of the site can be obtained by looking for a common motif among a possibly diverse set of ligands; this forms the basis for the model. A KARMA surface does not represent a real molecular surface, 72 T. E. Klein, C. C. Huang, E. F. Pettersen, G. S. Couch, T. E. Ferrin, and R. Langridge, J . M o l . G r a p h i c s 8, 16 (1990).

156

PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS

[9]

but an artificial model of what the receptor surface might look like. The surface is triangulated and dotted with control points that can be "grabbed" interactively and moved to a new position in space. The surface is malleable, so that when one control point is moved, the local surface deforms to take account of the motion: thus bulges or dents can be added to the KARMA surface. This allows the user to change the model of the site interactively. The KARMA surface can also be colored with hydrophobic or electrostatic parameters. QSAR studies on families of derivatives provide detailed information about local regions of the site based on local variations within a family of ligands. KARMA aids the interpretation of these data by providing an interactive way of visualizing the model of the deduced binding site and changing the model in response to new information. Finally, as it is in many ways similar to the surface models derived by direct methods, it can also be used to create a negative sphere image for database searches or other forms of rational drug design. Conclusion There are many methods available for directly determining clefts and binding sites in receptors, of which the negative sphere method appears to give the most reliable results. Indirect methods for site mapping can give valuable qualitative pictures of the binding site, but the models need to be verified experimentally. Acknowledgments R.A.L. wouldlike to thankthe RoyalCommissionof 1851for a ResearchFellowshipand the FulbrightCommissionfor a Senior Scholarship. This work was partially supported by National Institutes of Health Grants GM-31497 and GM-39552 (George Kenyon,Principal Investigator). Elain Mengand Natalie Colloc'h are thankedfor helpfuldiscussions.

[9] M o d e l i n g o f S i d e C h a i n s , L o o p s , a n d I n s e r t i o n s in P r o t e i n s

By

NEENA

L. SUMMERS and

MARTIN KARPLUS

Introduction Detailed knowledge of the three-dimensional structure of a globular protein is essential for an understanding of its biological function. Experimental data at atomic level resolution can be obtained by using X-ray METHODS IN ENZYMOLOGY, VOL. 202

Copyright © 1991by AcademicPress, Inc. All fights of reproduction in any form reserved.

Clefts and binding sites in protein receptors.

126 PROTEINS AND PEPTIDES: PRINCIPLES AND METHODS [8] that this approach will dramatically increase our understanding of the development of tertiar...
2MB Sizes 0 Downloads 0 Views