SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

SPILLO-PBSS: Detecting Hidden Binding Sites within Protein 3D-Structures Through a Flexible Structure-Based Approach Alessandro Di Domizio,*[a,b]y Alessandro Vitriolo,[a] Giulio Vistoli,[b] and Alessandro Pedretti[b] The study reports a flexible structure-based approach aimed at identifying binding sites within target proteins starting from a well-defined reference binding site. The method, named SPILLO potential binding sites searcher (SPILLO-PBSS), includes a suitably designed tolerance which allows an efficient recognition of the potential binding sites regardless of both involved residues and protein conformation. Hence, the proposed method overcomes the rigidity which affects the available approaches and which prevents a proper analysis of distorted binding sites. We apply SPILLO-PBSS to several test cases, including the search for the guanosine diphosphate

binding site in distorted H-Ras proteins and the identification of acetylcholine binding proteins from among a library of heterogeneous resolved proteins. Tests are also performed to compare SPILLO-PBSS with other related and available methods. The encouraging results confirm the notable potentialities of this approach and lay the foundation for its use to analyze C 2014 and predict target proteins on a proteome-wide scale. V Wiley Periodicals, Inc.

Introduction

cussed in some excellent reviews, these structure-based approaches can be roughly subdivided into geometrical, energy-based, or based on the analysis of the key interacting residues.[7,8] The geometrical approaches identify the binding sites by searching the void volumes within the protein structures thus characterizing the pockets in terms of size and shape. The energy-based approaches characterize the protein pockets by computing their interaction energies with some representative probes thus profiling the binding sites based on their interaction capacities. In both cases, the search can be supported by genomic analyses based on the degree of conservation of the surface residues (see e.g., Ref [9]). Although differing in terms of implemented algorithms and obtained performances, all these approaches require that the analyzed proteins possess a preformed cavity having all geometrical properties optimized for the binding event. If the pocket undergoes structural distortions resulting in a close or wideopen binding site, the methods are unable to detect it even though it still includes all key residues required for the binding.

The general concepts of drug-likeness and protein druggability show a rather common philosophy.[1] Indeed, the former is based on the idea that all drug molecules, while showing a huge structural heterogeneity, possess some well-defined physicochemical properties which render them particularly suited to be used as efficacious drugs especially with regard to their pharmacokinetic profile. Such a concept has enjoyed significant success over the last few years and finds various applications in the early screening of potential drug candidates, especially using purposely developed specific, diseasefocused property rules instead of general drug-likeness filters which revealed limited efficacy as shown in several recent studies.[2–4] Similarly, the concept of protein druggability relies on the idea that all protein binding sites, while showing a huge structural heterogeneity, share some key properties which render them well suited to recognizing ligands and to forming stable complexes.[5] Nevertheless, current understanding of the major features characterizing the druggable binding sites is still in its infancy as demonstrated by the lack of a commonly accepted definition of what constitutes a protein pocket. While recognizing their intrinsic diversity, the capacity to identify conserved properties of the binding sites across different target classes might have incredible applications in the drug discovery processes ranging from the design of targetindependent but pocket-dependent chemical libraries to the analysis of off-targets or antitargets for a given compound or to deorphanization of protein cavities and functions. When considering these impressive potentialities, the number of in silico approaches developed in the last few years to analyze protein binding sites comes as no surprise.[6] As dis-

DOI: 10.1002/jcc.23714

[a] A. Di Domizio, A. Vitriolo Department of Biotechnology and Biosciences, University of MilanoBicocca, Piazza della Scienza, 2, 20126 Milan, Italy E-mail: [email protected] [b] A. Di Domizio, G. Vistoli, A. Pedretti Department of Pharmaceutical Sciences, University of Milan, Via Mangiagalli, 25, 20133, Milan, Italy. Further information about the SPILLO-PBSS availability can be requested to the corresponding author. The authors declare no competing financial interest. † This author conceived and implemented SPILLO-PBSS. C 2014 Wiley Periodicals, Inc. V

Journal of Computational Chemistry 2014, 35, 2005–2017

2005

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

The last group includes the approaches which characterize the binding cavities in terms of key residues involved in the binding process.[10] These methods are usually utilized in the comparison of binding sites rather than in pocket detection. They start from a well-defined reference complex from which they extract the key residues and search for similar binding sites in the target proteins through geometrical comparisons often supported also by sequence alignments. These methods exploit different geometrical elements to describe the mutual arrangement of the key residues (e.g., points, vectors, nodes, grids) and utilize different algorithms to efficiently compare them;[11] however, even these methods require that the target proteins possess binding sites where the key residues show an optimized arrangement for the binding process. Distorted binding sites cannot be efficiently analyzed. By considering the major limitations of the hitherto reported methods, we here propose a novel approach, named SPILLO potential binding sites searcher (SPILLO-PBSS), to detect protein binding sites. It conceptually belongs to the last group as it starts from a reference binding site (RBS) based on which it searches for potential binding sites (PBSs) in the target proteins. However, this method is innovative compared to those already described because it includes a purposely designed tolerance which implicitly accounts for the possible conformational distortions affecting the target binding sites as well as for the conservative differences in the key residues between the analyzed pockets. In this way, the here proposed method is able to include in its binding site analysis the effects of both protein flexibility and sequence homology which usually hamper the efficacy of the other methods. The efficacy of so included tolerance is here assessed by detecting the guanosine diphosphate (GDP) binding site in Ras proteins whose pocket has been deliberately distorted or partially occupied by other interacting proteins. Again, the performances of SPILLOPBSS were compared with those of other available related methods also exploiting a set of apo and holo proteins taken from Ref. [12]. Finally, the reliability of the method is assessed by identifying the acetylcholine binding proteins from a library of heterogeneous resolved proteins.

Methods As schematized in Figure 1, the approach utilized by SPILLO-PBSS to detect potential protein binding sites starts by generating the RBS based on the known 3D structure of a given complex between a ligand and its target protein. To minimize the required computational costs, the second step generates a simplified representation of both the RBS and the target protein able to account for the key residues involved in the interaction. This representation is applied in the subsequent steps, which involve a systematic searching and scoring of all the PBSs on the target protein, followed by a final ranking of the so detected PBSs. RBS generation The optimized three dimensional structure of a given protein (experimentally resolved or suitably modeled, which herein2006

Journal of Computational Chemistry 2014, 35, 2005–2017

Figure 1. Workflow reporting the main logical units for the PBSs detection.

after will be defined as reference protein) bound to its ligand represents the basic information required by the here presented method. The analysis of the physicochemical and geometrical properties of the binding site allows the definition of the RBS, that is, the reference framework through which the PBSs are detected within the analyzed protein (which hereinafter will be defined as target protein) and the corresponding scores calculated. The RBS is defined by those R residues of the reference protein whose distance from the ligand is smaller than an user-defined threshold. Moreover, the RBS generation involves a preliminary calculation of the interaction energy for the reference complex (ETOT ) using the MMFF94 force field and considering electrostatic energy and LennardJones potential.[13] The so computed nonbonded energy is then decomposed into the contribution of each residue (Ear) and the specific relevance of each ar residue belonging to the RBS is parameterized by an appropriate weight War corresponding to the ratio between Ear and the total interaction energy ERBS elicited by all RBS residues eq. (1): War 5

E ar ERBS

(1)

Moreover, the weights can be assigned by the user when it is known that some residues play key roles regardless of their interaction energy. Using the so computed War values, the RBS residues can be suitably ranked according to their stabilizing contribution to the ligand binding. In this way, the scores computed for the successively identified PBSs will be based on the relevance of their stabilizing key residues. As discussed in the following sections, the score calculation takes into account also the ligand which undergoes the same roto-translational transformations as computed for the other RBS residues. Clearly, the reliability of these weight values heavily depends on the capacity of the force-field used to suitably parameterize all stabilizing interactions. Table 4 and Supporting Information Table S3 compile the residues included in four computed RBSs with the corresponding interaction energies and allow for some relevant considerations. The first consideration involves the apolar residues which are satisfactorily ranked in all computed RBSs thus suggesting that the utilized MMFF94 force field is able to properly account for hydrophobic interactions. The second consideration concerns the residues with positive WWW.CHEMISTRYVIEWS.COM

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

interaction energies which are absent in three out of the four RBSs and which correspond, in the remaining RBS, to negatively charged residues which should elicit repulsive interactions with the ligand. The SPILLO-PBSS program can also handle the water molecules included in the RBS. To this end, the crucial water molecules can be considered as an extension of the ligand so as to properly include in the RBS the residues interacting with the ligand by direct or water-mediated contacts.[14,15] Reduced representation of the system The presented approach exploits a reduced representation of both the reference and target proteins to speed up the required geometric transformations during the searching phases, thus allowing fast structural comparisons between the RBS and the PBSs. All amino acid residues are represented by a “spheres and vectors” simplified hybrid model (see Fig. 2), which suitably parameterizes their main geometrical features, namely the orientation and the steric hindrance. As depicted in Figure 2, each residue is defined by a vector which originates at its alpha carbon atom (Ca) and ends at the center of mass (CMSC) of its side chain as well as by a sphere centered on the center of mass of the entire amino acid (CM, as computed by neglecting hydrogen atoms) and having a radius corresponding to the residue’s mean radius (the used mean radii of the amino acids are compiled in Supporting Information Table S1). This simplified geometrical representation is consistent with the concept of geometric tolerance on which the method is based. Indeed, more accurate representations might overload the calculations with too much information that would then be lost due to the tolerances included in the following steps. PBS detection To perform a comprehensive search for the PBSs while minimizing the required computational costs, a three dimensional cubic grid of L points with an user-defined grid spacing is generated to fill the entire space occupied by the target protein. The RBS is then iteratively translated to all grid nodes and for each of them is rotated around the x, y, and z axes according to an user-defined step (see below). For each i grid node and for each j orientation, the residues of the target protein found within a specified distance, named tolerance, from the center of mass of each ar RBS residue are identified and analyzed (see Supporting Information Fig. S1). In this way, a set of potentially relevant target residues are collected and defined using the notation ar,s, where the indices r and s identify the s target residue included within the region surrounding the r RBS residue. Each ar,s residue is a candidate to be then chosen as a PBS residue, depending on its capacity to approximate the corresponding r RBS residue. The tolerance plays a crucial role in the method as it allows the target residues to be analyzed within a region large enough to implicitly account for the possible conformational differences between reference and target proteins. Furthermore, the tolerance is also expected to compensate for the

Figure 2. Example of the “spheres and vectors” hybrid model for the arginine. The vector originates at Ca (in green) of Arg and ends at the center of mass of its side chain (CMsc, green arrowhead). The sphere is centered on the center of mass (CM, in yellow) of the whole residue. Hydrogen atoms are neglected in the calculation. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

inaccuracies unavoidably introduced by the discretization of the 3D space during the RBS roto-translations. PBS scoring and ranking For each i node and each j orientation, the selection of the target PBS residues is performed by choosing only one ar,s target residue within each r surrounding region (see Supporting Information Fig. S1). To this end, two parameters are calculated for each ar,s residue, namely (i) the free interaction probability, Far,s and (ii) the residue score, Sar,s, whose product parameterizes the capacity of the considered ar,s target residue to approximate the corresponding ar RBS residue. To account for the relevance of each considered residue in stabilizing the putative complex, this product is also weighted by the above described War coefficient. According to eq. (2) and for each i node and each j orientation, the PBS corresponds to the combination of those target residues that maximizes the PBS total score, SCOREPBS, simultaneously considering all the R surrounding regions.    SCOREPBS i; j5max 

!     War Far;s Sar;s gar; ar;s i; j  r51

R X

(2)

The SCOREPBS is expressed as a percentage and the maximum value of 100% corresponds to the ideal case of complete agreement between the PBS and the RBS. Free interaction probability. To approximate an ar reference

residue, a given ar,s target residue must firstly possess an optimal arrangement for binding. Indeed, the inclusion of a significant geometric tolerance can allow the selection of residues which, despite having a high degree of similarity to the corresponding ar RBS residue, are partially or even totally shielded by other surrounding target residues. In these cases, a local rearrangement of such shielding residues would be necessary Journal of Computational Chemistry 2014, 35, 2005–2017

2007

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

to allow the ar,s target residue to freely interact with the ligand. Therefore, the free interaction probability, Far,s, is defined as the probability of such a favorable conformational event, which is assumed to be related to the number of shielding residues and to their shielding contributions. By exploiting the ‘vectors and spheres’ representation, Far,s is calculated by an oversimplified model which represents the interaction between the ar,s residue and the ligand (and the possible interferences of the shielding residues) by an “interaction cone” (see Fig. 3), the base of which is coincident with the section of the sphere corresponding to the ar,s residue, while the vertex is located at its nearest RBS ligand atom. The shielding effect of a given at target residue, where t refers to any target residue other than ar,s, on the ar,s residue is parameterized by the overlapping between the sphere representing the at residue and the above described interaction cone. As shown in Figure 3 and for each at target residue, the SOVERLAP surface is defined by the intersection between the cone section, SCONE, and the sphere section, SSPHERE. In turn, these two sections are defined by the plane perpendicular to the axis of the cone and containing the center of the at sphere. Such a plane intersects both the interaction cone (defining SCONE) and the sphere (defining SSPHERE). The shielding effect of the at target residue on the ar,s target residue can be then calculated by the ratio between SOVERLAP and SCONE, according to eq. (3): Shieldingar;s ;at 5

SOVERLAPar;s ;at SCONEar;s

(3)

Thus, the probability that the at target residue undergoes favorable rearrangements able to leave the ar,s target residue free to interact with the ligand is given by the difference “1shielding”. As the required rearrangements of the shielding residues are considered as independent events, the free interaction probability is thus calculated by eq. (4), which takes into account the possible shielding effects of all the M target residues other than the considered ar,s residue:    Far;s i; j5 

M Y

t51 at ¼ 6 ar;s

   ð1-shieldingar;s ;at Þi; j 

(4)

Residue score. With a view to representing the ability of each

ar,s target residue to approximate the ar RBS residue in term of physicochemical properties, the residue score, Sar,s, is calculated according to eq. (5): Sar;s 5Mar ;ar;s cos2

g  ar ;ar;s 2

(5)

where Mar,ar,s is the pairwise physicochemical similarity coefficient between the ar RBS residue and the corresponding ar,s target residue, as obtained by the precalculated similarity matrix (see Supporting Information Table S2). A decreasing function of the c angle, which encodes the orientation difference between the vectors associated with the two compared residues (ar and ar,s), is then used to weight the Mar,ar,s similar2008

Journal of Computational Chemistry 2014, 35, 2005–2017

Figure 3. Schematic representation of the interaction cone of a generic ar,s target protein residue as partially shielded by the presence of two other protein residues, labeled as a1 and a2. In detail, the SOVERLAP areas (dark gray) are the intersections between the cone sections, SCONE (dashed black area), and the sphere sections, SSPHERE (light gray), as obtained by the intersection between the interaction cone and the spheres, respectively.

ity coefficient. Hence, to be seen as a good approximant of the ar RBS residue, a given ar,s target residue has also to show an orientation similar to that of the ar RBS residue. The utilized function includes a certain degree of tolerance thus permitting moderate differences in orientation between the two residues. Such a function is consistent with the overall tolerant approach and avoids an excessive penalization of those PBSs which, although not exactly correspondent to RBS, still may undergo local rearrangements which can minimize the differences between them and the RBS. Once the systematic calculation of all the SCOREPBS has been completed using the final eq. (2), the so obtained PBSs are ranked according to their scores, which substantially rank the PBSs based on their similarity with the starting RBS. Computational details Before starting with the validation of the SPILLO-PBSS software, a first very simple

Calibration of the included parameters.

WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG

test was utilized to calibrate the geometric parameters. The test involved the research of the GDP binding site in the suitably open H-Ras protein using the same H-Ras protein as RBS (see Supporting Information Table S4) and by gradually varying the grid spacing from 1.0 to 5.0 A˚ and the rotation step from 5 to 60 . The so performed calculations were evaluated in terms of computed scores, root-mean-square deviation (RMSD) values as computed by comparing RBS and PBS residues (described by their centers of mass) and computation times. As shown in Supporting Information Table S4, the reliability of the performed calculations increases when decreasing grid spacing and/or rotation step even though more accurate calculations require conceivably longer times. Therefore, the choice of the best parameters should correspond to a satisfactory compromise between these two contrasting factors. Supporting Information Table S4 shows that the best combinations of these two param˚ and a rotation eters may correspond to a grid spacing of 2.0 A  ˚ step of 30 as well as a grid spacing of 3.0 A and a rotation step of 20 . In both cases, the computation time is equal to about 5 min even though it is possible to speed up the calculations by ˚ and a rotation step of 20 with setting a grid spacing of 4.0 A minimal worsening of the obtained results and this choice could be of significant relevance when analyzing huge protein databases. Thus, all reported calculations were performed through a three dimensional cubic grid with a grid spacing of ˚ and a rotation step of 30 . The tolerance value was set to 2.0 A ˚ 5.5 A by default as this value should account for almost all possible structural shifts between apo and holo proteins as suggested by comparative studies which revealed that the largest ˚ in the structural deviations involve a shift of about 5.0 A [16] arrangement of their binding residues. However, tolerance can be reduced with a clear saving in computation time, when the search does not involve distorted binding sites. Identification of GDP binding sites. The 3D structure of the protein utilized for the RBS generation was retrieved from PDB (PDB code: 3qnu) and after removing water and crystallization additives the hydrogen atoms were added using VEGAZZ.[17] To remain compatible with physiological pH, lysine, arginine, glutamate, and aspartate residues were considered ionized while histidine and cysteine were kept neutral by default. The so completed protein structure was then refined to optimize the starting ligand-protein complex. In detail, the minimization was performed in vacuo by Polack-Ribie`re Conjugate Gradient algorithm using the MMFF94 force field with the distance-dependent electrostatic treatment and performing a maximum of 500 cycles. By default, the R stabiliz˚ layer surrounding the ligand ing residues comprised in a 3.5 A were included into the RBS (see Supporting Information Table S3). Similarly, the target proteins utilized to search for the PBSs were retrieved from PDB (PDB Id: 4q21 for the first two tests and 1bkd for the last two cases) and were prepared by deleting water molecules, ligands, cofactors, and crystallization additives without adding hydrogen atoms and without any further refinement (apart from the closed conformation used in the second test and generated by restrained molecular dynamics (MD) run, see Supporting Information). The search for the PBSs on the target protein was always extended to the whole protein structure.

SOFTWARE NEWS AND UPDATES

To maximize recognition of the acetylcholine binding proteins two significantly different RBSs were generated. The first RBS is based on the complex between acetylcholine and a soluble acetylcholine binding protein (AChBP, PDB code: 2xz5, see Table 4) from Aplysia californica, while the second RBS derives from the complex of acetylcholine with acetylcholinesterase from Torpedo californica (PDB code: 2ace, see Table 4). As the first protein includes two acetylcholine molecules for each binding site, the RBS was generated by considering both ligands so as to consider the largest number of interacting residues. Moreover, the cysteine methanethiolsulfonate included in the exploited binding site was manually mutated into the natural tyrosine residue (Tyr53). Then, the two RBSs were generated using the above described standard protocol. A database of 52 proteins retrieved from the Protein Data Bank was collected including 21 known ACh binding proteins (namely proteins for which an experimental affinity value with ACh has been reported) and 31 proteins which are supposed to act as ACh non-binding proteins (see Table 5). All proteins were chosen from several classes with heterogeneous biological functions and belonging to different organisms so as to cover a significant part of the protein structural space. With regard to the ACh binding proteins, the dataset includes both holo proteins in which however the cocrystallized ligand was often different from ACh and apo proteins. These latter showed distorted binding sites and were included in the database to evaluate SPILLO-PBSS ability to identify the binding sites when they are far from an optimal conformation for the ligand recognition. As compiled in Table 5, the ACh binding proteins include five acetylcholinesterase structures (2ace; 1ea5; 4b85; 4m0e; 1qo9), four muscarinic receptors (mAChR2: 3uon; 4mqs; 4mqt and mAChR3: 4daj), seven soluble AChBPs (from Aplysia californica: 2wn9 and 2xz5; from Bulinus truncatus: 2bj0; from Biomphalaria glabrata: 4aod and 4aoe; from Lymnea stagnalis: 3zdh; from Capitella teleta: 4afh), two nicotinic receptors (4bor and 2bg9), a putative choline ABC transporter (2rin), a multidrug-efflux transporter 1 regulator (3q5s) and a ligand-gated ion channel ELIC (3rqw). As described above, the target proteins were prepared by deleting water molecules, ligands, cofactors, and crystallization additives without any further refinement. For the proteins showing a multimeric structure, the biological assemblies were generated using the MakeMultimer program according to the transformation matrices included into the PDB files.[18] The reference proteins (PDB codes: 2xz5 and 2ace) were also included in the database as positive controls. The detection and scoring of the so obtained PBSs were performed according to the default procedure as described above. Recognition of ACh binding proteins.

Results and Discussion Identification of GDP binding sites on different H-Ras conformations To test the ability of the here presented approach to correctly detect the ligand binding site within a target protein regardless of its amino acid composition and of its conformational Journal of Computational Chemistry 2014, 35, 2005–2017

2009

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

Table 1. Comparison between the residues in RBS and in the best PBSs as identified by SPILLO-PBSS in the four examined GDP cases. The experimentally observed interacting residues are also reported for easy analysis. REFERENCE BINDING SITE[a] Relative stabilizing contribution (%)

RBS residues [Atlastin-1/GDP (3qnu)]

BEST POTENTIAL BINDING SITES[b] (I) Suitably open [H-Ras (4q21)]

(II) Completely closed [H-Ras (4q21)]

(III) Wide-open without Sos [H-Ras (1bkd)]

(IV) Wide-open in complex with Sos [H-Ras/Sos(1bkd)]

28.48 Lys 80 Lys 16 Lys 16 Lys 16 Lys 16 10.50 Arg 217 Asn 116 Asn 116 Asn 85 Asn 116 Gly 13 Gly 13 Gly 12 Lys 811[c] 10.28 Arg 77 9.23 Lys 78 Val 14 Ala 11 Val 14 Val 14 8.84 Asp 218 Asn 85 Asn 85 Asn 86 Asn 85 8.57 Arg 113 Tyr 32 Ile 21 Thr 20 Glu 942[c] 7.61 Ser 81 Ser 17 Ser 17 Ser 17 Ser 17 Phe 28 Ala 18 Ala 18 Ala 18 4.61 Phe 82 4.58 Gly 79 Gly 15 Glu 31 Gly 15 Gly 15 ------------------------------------------------------------------------------------------------------2.46 Phe 76 Gly 12 Tyr 32 Val 9 Trp 809[c] 0.92 Asn 279 Asp 30 0.85 Asn 281 Glu 31 His 27 Tyr 32 Pro 945[c] 0.67 Pro 280 Val 29 Glu 946[c] 0.60 Val 276 Asp 119 0.42 Phe 282 Gln 22 0.39 His 271 Asp 119 Lys 117 Ala 121 0.39 Pro 272 Leu 120 Ala 146 Cys 118 Cys 118 Lys 147 Gly 13 Leu 120 0.31 Ala 277 0.29 Phe 293 Lys 147 Leu 120 Lys 117 SCOREPBS (%):

74.09

Experimentally known GDP interacting residues of H-Ras:[d]

71.35

70.56

72.50

Ala11, Gly12, Gly13, Val14, Gly15, Lys16, Ser17, Ala18, Phe28, Val29, Asp30, Tyr32, Asn116, Lys117, Asp119, Leu120, Ser145, Ala146, Lys147

[a] Residues listed in descending order according to their relative stabilizing contribution to the RBS complex. The dashed line separates residues responsible for more than 90% of stabilizing contribution in the RBS. [b] Residues defining the best PBSs as identified by SPILLO-PBSS. Unfilled table cells are found in correspondence of those RBS residues without corresponding residues in the PBS. Underlined residues correspond to the experimentally known GDP binding site residues reported in the last line. [c] Residues belonging to the Sos protein at the interface between H-Ras and Sos. [d] ˚ from the ligand in the 4q21 complex. Residues of the H-Ras GDP binding site with at least one heavy atom at a distance  4.5 A

state, we chose the H-Ras protein in complex with GDP, a ligand involved in the activation/inactivation of the Ras proteins.[19] Besides its remarkable biological role, this complex was chosen for the large number of available resolved structures which allow different representative situations of increasing difficulty to be investigated. Specifically, the tests were aimed at detecting the binding site of GDP on the whole target protein structure by analyzing different protein conformations, where the ligand binding site was: (i) suitably open, (ii) completely closed, (iii) wide-open, and (iv) wide-open and partially occupied. To obtain unbiased results, the RBS was generated using a GDP binding protein which does not belong to the same family of H-Ras. In particular, the RBS was based on the cytosolic domain of human atlastin-1 in complex with GDP.[20] It is a hydrolase which shares only a restricted similarity with H-Ras regarding both GDP binding site and its whole structure (identity equal to 8.9%, as computed by EMBOSS Needle).[21] The selected RBS residues along with their relative contribution in stabilizing the ligand–protein complex are listed in Table 1. Similarly, the scores of the top-ranked PBSs and the lists of PBS residues resulting from the calculations for the four different tests are reported in Table 1. The crystal structure of the H-Ras/GDP complex was used in this first test of moder-

Suitably open binding site conformation.

2010

Journal of Computational Chemistry 2014, 35, 2005–2017

ate difficulty. As the H-Ras protein structure was derived by a cocrystallized complex, the GDP binding site is in a conveniently open and undistorted conformational state. Even in this simplest case, detecting the correct binding site is however no small task, as its amino acid composition is significantly different from that of the reference protein and the search is performed on the whole target protein, assuming the binding site to be completely unknown. The resulting score of the best PBS was equal to 74.09%, representing the percentage of similarity referred to the RBS. The gap between this score and the ideal value of 100% can be explained by considering the previously mentioned differences between the target and the reference protein in terms of binding sites and overall folding. As shown in Figs. 4A and 4B, SPILLO-PBSS was able to find the correct GDP binding site on the analyzed H-Ras protein conformation. This is qualitatively supported by the agreement between the experimental and the predicted GDP ligand poses, as evaluated in terms of relative position and orientation on the 3D protein structure. As the search for the PBSs mainly relies on similarity between protein residues, a more precise evaluation of the results should be based on the agreement of the corresponding residues between the top-ranked PBS and the experimentally observed binding site. As reported in Table 1, the reliability of the obtained result is confirmed by the marked agreement between the predicted WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG

SOFTWARE NEWS AND UPDATES

measure of the opening of the GDP binding site and which ˚ in the original resolved structure to 5.69 A ˚ drops from 10.77 A in the so distorted H-Ras conformation (compare the accessible binding cavity between Figs. 4A and 4C). Also in this case, SPILLO-PBSS was able to detect the right GDP binding site. As shown in Table 1, there is a significant agreement between the identified residues and the experimentally known residues as the approach was able to recognize 11 out of 19 experimentally known interacting residues. A graphic confirmation of such an agreement is provided by the comparison between Figures 4D and 4B which reveals the very similar computed poses of the GDP ligand. The intrinsic difficulty in detecting such an inaccessible binding pocket is also confirmed by the score of the top-ranked PBS, 71.35%, which is slightly lower than that of the previous case (74.09%) and this is reflected by a lower similarity between RBS and PBS caused by the distortion of the latter. The third challenging case by which SPILLO-PBSS was tested can be seen as the opposite of the previous one, as here the recognition of the binding site is hampered by the difference in primary sequence as well as by a vast distortion of the binding site, which assumes a wide-open conformation. The H-Ras protein conformation utilized for this test was derived from the crystal structure of human H-Ras in complex with the Ras guaninenucleotide-exchange-factor region of the Son of sevenless (Sos) protein.[22] The GDP binding site is here extremely widened due to displacement of the Switch 1 region induced by the insertion of Sos into H-Ras. After removing the Sos protein, the search for the PBSs was performed on the whole H-Ras protein alone. The dramatic distortion affecting the binding site can be appreciated by taking into consideration the previously defined reference distance between Gly13 and Glu31 Ca ˚ , approximately twice the atoms, which here measures 21.33 A corresponding distance of the undistorted protein. Figures 5A and 5B and Table 1 show that also here SPILLOPBSS proved successful in finding the correct binding site. With a score of 70.56%, the top-ranked PBS of this third test was found to be slightly worse than the top-ranked PBS detected for the previous test, in which the binding site was purposely closed (71.35%). This result can be explained by considering the flatness of the here used H-Ras structure which prevented the program from precisely recognizing the target residues corresponding to the RBS residues, and indeed the top-ranked PBS includes only 9 out of 19 residues as compiled in Table 1. Nevertheless, it should be underlined that also in this particularly challenging situation the top-ranked PBS suitably corresponds to the correct GDP binding site thus confirming that the here proposed approach is able to successfully detect the right binding site even when vast structural distortions affect (narrowing or widening) the fine architecture of the analyzed pockets. Wide-open binding site conformation.

Figure 4. The best PBSs for GDP ligand (in yellow) detected on two different H-Ras protein conformations, where the binding site is either suitably arranged A) and B) or completely closed C) and D). Both surface and cartoons representations are provided for the two cases. The distance between Gly13 and Glu31 is utilized as a measure of the opening of the GDP binding site. The inlets report the key pairs of the corresponding residues, shown as vectors, obtained by the superimposition of the RBS to the best PBS.

residues of the top-ranked PBS and the known GDP binding residues in the H-Ras/GDP complex crystal structure. In detail, the top-ranked PBS includes 13 out of the 19 residues which characterize the experimentally resolved GDP binding site and only two unmatched residues (Asn85 and Glu31). Overall, the reported result emphasizes that the here proposed approach proved successful in detecting the correct binding site within a suitably folded GDP-binding H-Ras protein and such an achievement represents a necessary prerequisite to deal with the successive and more challenging tests. This is a more challenging test for SPILLO-PBSS, in which difficulties arise not only, as in the previous case, from the amino acid differences between the reference and the target proteins, but also from a strong geometrical distortion induced into the H-Ras binding site that renders it completely inaccessible to the ligand. To this end, an ad hoc H-Ras conformation was generated by performing a short (10.0 ns) MD simulation in explicit water and in the absence of GDP (as detailed in Supporting Information). Starting from the previously used crystal structure of the HRas protein extracted from the H-Ras/GDP complex and applying a distance restraint between the Gly13 and Glu31 Ca atoms, an unnatural arrangement of the H-Ras Switch 1 region (residues 30–40) was induced. This H-Ras conformation, albeit far from a realistic state, was in any case helpful in testing the ability of the program to recognize completely closed (and vastly distorted) binding sites. The induced protein distortion is well documented by the distance between the Ca atoms of the residues Gly13 and Glu31, which can be seen as a simple Completely closed binding site conformation.

Wide-open and sterically hindered binding site conformation.

The entire H-Ras/Sos complex was used by the fourth and most challenging test. The GDP binding site is indeed located at the interface between the two proteins and it is inaccessible Journal of Computational Chemistry 2014, 35, 2005–2017

2011

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

undistorted case, the score obtained in this last test (72.50%) is slightly higher than those obtained by the previously considered distorted H-Ras target proteins. As shown in Table 1, this can be due to the inclusion in the computed PBSs of some Sos residues, which artifactually increases the number of interacting residues that the software can consider to generate the PBS. Nevertheless, the good result obtained in this highly difficult test case suggests that SPILLO-PBSS could be also utilized in identifying PBSs at the interface between two interacting proteins. Comparison of SPILLO-PBSS with other related available methods

Figure 5. The best PBSs for GDP ligand (in yellow) detected on H-Ras protein (in blue), when the binding site is either wide-open A) and B) or wideopen and sterically hindered C) and D) by Sos protein (in green). The inlets report the key pairs of the corresponding residues, shown as vectors, obtained by the superimposition of the RBS to the best PBS.

to the ligand due to the steric hindrance caused by the Sos residues inserted into the H-Ras pocket. In detail, the search for GDP binding site was performed by utilizing the same target structure used in the previous case but here without removing the Sos protein. The difference in the primary sequence between the reference and target proteins as well as the widening distortion of the H-Ras binding region are thus the same as in the previous case. Here, the test is further complicated by the presence of the Sos protein, which is partially inserted into the GDP binding site, rendering it inaccessible to the ligand. In detail, the region of the H-Ras/Sos complex which is sterically most hindered and shows the most distorted architecture is that bearing the key residues interacting with the GDP phosphate groups thus preventing any effective interaction between H-Ras and GDP. As can be seen by comparing Figures 5D and 5B, the right position and orientation of the binding site was successfully identified also in this difficult case. Similarly to what was observed in the previous case, Table 1 confirms a good agreement between predicted and actual interacting residues as the top-ranked PBS includes 9 out of 19 residues. Notably, the correctly identified residues include those involved in ionic contacts with the GDP phosphate groups although their accessibility was markedly affected by the Sos protein as mentioned above. This result clearly emphasizes that the here reported approach is able to conveniently detect the correct binding site even when occupied by interacting (and interfering) proteins. Although conceivably lower than that of the first 2012

Journal of Computational Chemistry 2014, 35, 2005–2017

Once SPILLO-PBSS was proven to be successful in recognizing binding sites even within really distorted proteins, the same calculations on GDP binding proteins were repeated using different (but related) methods to compare the performance of SPILLO-PBSS with other available tools. As summarized in Table 2, the comparison involved three pocket search methods which are representative of the three groups described in the introduction, namely pure geometric methods (FPocket, see also Supporting Information Table S5), energy-based approaches (SITEHOUND-web, see also Supporting Information Table S6) and methods based on binding site comparison (eF-seek, see also Supporting Information Table S7).[23–25] Moreover, the comparison included blind docking simulations as performed by a well-known docking software (Glide, see also Supporting Information Table S8), and a binding site search based on Monte Carlo simulations (PELE, see also Supporting Information Fig. S2).[26–28] In this way, the considered methods combine totally different approaches and the last method should be particularly effective as it accounts for protein flexibility. Moreover, the utilized methods differ for the required computation time as FPocket and SITEHOUND require a time comparable with (if not slightly shorter than) that of SPILLO-PBSS, Glide requires a time slightly greater than that of SPILLO-PBSS, and PELE and eF-seek require hugely longer computation times. Table 2 clearly shows that all tested methods are able to correctly detect the GDP binding site when the binding site is suitably folded, no method (apart from SPILLO-PBSS) is able to recognize the GDP binding site when it is completely closed. Most tested methods are finally able to detect the correct binding site when it is wide-open without the Sos protein, while only SPILLO-PBSS and FPocket find the correct pocket with the Sos protein. Altogether, the obtained results confirm the already mentioned limitations of the available pocket search methods which are unable to find the correct binding sites when they are markedly distorted. Simulations which treat protein flexibility afford slightly better results but they are however unable to generate satisfactory results when the binding site is closed or partially occupied. With regard to these first four cases, SPILLO-PBSS confirms its clear advantage compared to the other available methods. With a view to further substantiating the remarkable performances of SPILLO-PBSS, a second part of this comparison takes its cue from a recent study which compared the ability WWW.CHEMISTRYVIEWS.COM

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

Table 2. Ability to correctly detect the GDP binding site on the four already tested H-Ras conformations of some representative methods currently available to search binding sites within 3D protein structures. [Color table can be viewed in the online issue, which is available at wileyonlinelibrary.com.] Method

Name

GDP binding site conformation on H-Ras (I) Suitably open [H-Ras (4q21)]

Search method

FPocket SITEHOUND-web eF-seek Glide PELE SPILLO-PBSS

(II) Completely closed [H-Ras (4q21)]

(III) Wide-open without Sos [H-Ras (1bkd)]

(IV) Wide-open in complex with Sos [H-Ras/Sos (1bkd)]

Geometric pocket search Chemical pocket search Binding site comparison Blind docking Monte Carlo search RBS based search

to detect functional binding sites of known blind docking and pocket search algorithms as assessed by analyzing a set of 16 protein complexes including both apo and holo proteins.[12] Hence, SPILLO-PBSS was similarly utilized to detect the binding sites within the same set of 16 protein complexes. For each tested case, the RBS required by SPILLO-PBSS was generated using a different resolved complex composed of the same protein bound to a different ligand. Table 3 collects the obtained results which are evaluated in terms of the ranking of the cor-

rect binding site, its score and the distance between the computed and the actual ligand pose. Table 3 clearly shows the remarkable performances of SPILLO-PBSS since in 24 cases out of 28 the top-ranked pocket corresponds to the functional binding site and this is confirmed by both the very high scores which in 21 cases are greater than 97.0 to indicate an almost exact match of all residues characterizing the detected binding pockets and the distance between computed and actual ligand pose which is in 18 cases less than 2.5 A˚. It should be

Table 3. Results obtained by SPILLO-PBSS when applied to the apo and holo protein complexes taken from Ref. [12]. Results Target protein Holo form PDB ID

Reference binding site

[a]

Ligand name

Apo form PDB ID

RBS PDB ID

Ligand name

1b70 1cea 1dy4 1e7a 1eqg 1h61 1hvy

Phenylalanine 6-Aminocaproic acid s-Propranolol Propofol Ibuprofen Prednisone Raltitrexed (Tomudex)

1pys 1pkr 1cel 1ao6 1prh 1h50 1hw3

3teg 1b2i 2v3i 2bxh 4o1z 1vys 4e28

1hz4 1ivb

// //

1hz4 1vcj

1ju4

Benzoic acid 4-(acetylamino)-3hydroxy-5-nitrobenzoic acid Benzoic acid

3i2j

3i2f

1lna 1m2z

Val-Lys Dexamethasone

1l3f //

1zdp 4csj

1ngp

2-(4-hydroxy-3nitrophenyl)acetic acid salicylic acid Dopacetic acid Ile-Val

1ngq

1etz

1prh 2pcd //

3kk6 3pch 4tpi

1pth 3pcn 3tpi

Holo First pose

Apo First pose

d/[A˚][b]

Score

1.41 2.38 1.99 2.50 1.64 2.37 4.68

98.34 90.19 97.55 99.11 98.01 97.63 73.95

// //

// //

1.33

99.59

2.38 //

98.53 //

79.24

5.37

80.76

98.27 99.13 97.26

3.69 2.24 //

98.75 99.16 //

d/[A˚][b]

Score

L-Dopa Tranexamic acid (R)-dihydroxy-phenanthrenolol 3-sulfooxy-1H-indole Meloxicam Picric acid 2-{(2z,5s)-4-Hydroxy-2-[(2e)(2-Hydroxybenzylidene) hydrazinylidene]-2,5-Dihydro1,3-Thiazol-5-yl}-N[3-(Trifluoromethyl) phenyl]acetamide Benzoic acid 4-[(2R)-2-(aminomethyl)-2(hydroxymethyl)-5-oxopyrrolidin-1yl]-3-[(1-ethylpropyl)amino]benzoic acid (4S,5S)-4,5-bis(sulfanylmethyl)1,3-dioxolan-2-ol Thiorphan N-[(2S)-1-[[1-(4-fluorophenyl) indazol-4-yl]amino]propan-2-yl]2,4,6-trimethyl-benzenesulfonamide N-(4-Cyanophenyl)-N’-(diphenylmethyl) guanidineacetic acid

1.60 2.63 2.88 2.92 1.80 2.45 7.19

98.86 90.35 97.07 99.79 99.48 92.92 82.30

1.05 1.18

99.44 98.70

1.74

97.86

2.89 2.49

99.74 98.14

3.84

Celecoxib 3-chloro-4-hydroxybenzoic acid Val-Val

1.85 2.67 2.13

// //

//

//

[a] The reference binding sites are generated by using different complexes containing the same target protein in complex with a different ligand apart from the 1hz4 case in which there are no other resolved complex. [b] Distance between the so computed ligand pose and that experimentally resolved (as defined by their centers of mass).

Journal of Computational Chemistry 2014, 35, 2005–2017

2013

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

Table 4. Residues of the two RBSs with their interaction Energy (Eint) with ACh and their relative stabilizing contribution to the interaction. First RBS AChBP/ACh complex (2xz5)

Eint/[kcal/mol]

Relative stabilizing contribution (%)

213.250 28.018 27.651 24.931 24.189 24.023 23.905 23.759 22.813 22.446 21.691 21.652 20.638 20.386

22.33 13.51 12.89 8.31 7.06 6.78 6.58 6.33 4.74 4.12 2.85 2.78 1.07 0.65

Second RBS AChE/ACh complex (2ace)

RBS residues Trp C 145 Tyr C 91 Thr A 34 Tyr C 186 Ile A 116 Tyr A 53 Ser C 144 Ser A 165 Val C 146 Cys C 188 Tyr C 193 Cys C 189 Gln A 36 Val A 106

remembered that SPILLO-PBSS cannot be seen as a docking program and thus the coincidence between predicted and experimental binding sites was here evaluated by measuring the distance between the centers of mass of the corresponding ligands rather than the RMSD values which would depend also on the exact orientation of the computed ligand pose. Table 3 also confirms that SPILLO-PBSS is not influenced by the conformation of the binding pocket as it performs equally well for apo and holo proteins. Similarly, the four unsatisfactory cases correspond to two target proteins (i.e., thymidylate synthase and Fab fragment), the binding site of which was undetected in both apo and holo conformations. The comparison of SPILLO-PBSS with the other methods implemented in the aforementioned comparative study reveals that SPILLOPBSS provides markedly the best results even when analyzing well-known problematic proteins such as cellobiohydrolase in complex with (S)-propranolol or ovine COX-1 complexed with ibuprofen. Recognition of ACh binding proteins The last test case involved the search for acetylcholine binding proteins within a suitably collected database of 52 structurally diverse proteins including 21 known and heterogeneous acetylcholine binders. Besides their remarkable medicinal role, this second test was focused on the identification of acetylcholine binding proteins because of their noteworthy diversity as they include transmembrane receptors, transporters, ion channels, and soluble enzymes (see Table 5). Thus, there are not well defined and specific sequence signatures which exhaustively characterize all acetylcholine binding sites and therefore the acetylcholine binding proteins cannot be recognized by simple sequence analysis and pattern characterization. While considering the intrinsic tolerance of the SPILLO-PBSS algorithm which should be able to recognize significantly different binding sites, the search for the acetylcholine binding proteins involved two different reference proteins, chosen to be representative of the two major groups into which the acetylcholine binding proteins can be subdivided, namely the 2014

Journal of Computational Chemistry 2014, 35, 2005–2017

Eint/[kcal/mol

Relative stabilizing contribution (%)

218.679 25.999 25.001 24.720 24.331 22.282 22.164 21.728 20.976 20.891 20.422 20.414 20.355 20.019

38.93 12.50 10.42 9.84 9.03 4.76 4.51 3.60 2.04 1.86 0.88 0.86 0.73 0.04

RBS residues Glu A 199 Gly A 118 His A 440 Trp A 84 Gly A 119 Phe A 330 Ala A 201 Gly A 117 Trp A 233 Phe A 331 Phe A 290 Gly A 441 Phe A 288 Ser A 200

transmembrane receptors and channels (as described by Ac-AChBP) and the soluble enzymes (as described by acetylcholinesterase from Torpedo californica).[29,30] The use of two RBSs was due to the aforementioned diversity between the acetylcholine binding proteins and had the primary objective of assessing the limits and strengths of each RBS as well as of evaluating the synergistic combination of both RBSs. Moreover, the mutated Ac-AChBP protein utilized to generate the first RBS posed relevant challenges useful to further assess the reliability of the SPILLO-PBSS algorithm. These challenges can be summarized as follows: (a) the binding site is here located at the interface between two monomers and the RBS was generated using the site between chains A and C; (b) each binding site includes two ACh and the RBS was generated by considering both ligands so as to maximize the number of interacting residues included into the RBS; and (c) the resolved structure is a mutated form which was manually transformed into the wild-type protein before generating the RBS, thus emphasizing the possibility of modifying the protein structure to restore, optimize or mutate the reference binding pocket. Table 4 lists the selected RBS residues with their relative contribution in stabilizing the ligand-protein interactions for the two RBSs. They confirm the remarkable difference between the two RBSs which show a low similarity equal to 55.15% as computed using SPILLO-PBSS. Such a difference is clearly documented by the key interacting residues as the acetylcholinesterase complex is vastly stabilized by a single ion-pair, while Ac-AChBP reveals a set of stabilizing aromatic residues among which a tryptophan residue plays a pivotal role. The results obtained by screening the collected protein database are compiled in Table 5 which reports both the rankings as obtained by considering the two RBSs separately as well as the combined ranking as obtained using the following consensus function:   Pi 5min Pi;1 ; Pi;2

(6)

where Pi,1 and Pi,2 are the ranking positions of the i protein in the first and second ranking, respectively. Using eq. (6), each WWW.CHEMISTRYVIEWS.COM

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

Table 5. Protein database ranked according to the consensus score for the ACh potential binding sites (PBSs) as detected by SPILLO-PBSS. Rank and SCOREPBS (%) Consensus rank

ACh binding protein[a]

First screening (2xz5)[b,c]

Second screening (2ace)[b,c]

PDB ID

1 2 3 4 5 6 7 8 9 10 11 12 13 14

* * * * * * * * * * * * * *

1 (97.60) 22 (70.08) 2 (96.35) 13 (72.75) 3 (88.62) 19 (71.45) 4 (87.47) 14 (72.55) 5 (87.28) 20 (70.62) 6 (86.75) 18 (71.62) 7 (86.67) 10 (75.22)

22 (74.64) 1 (98.97) 45 (71.27) 2 (98.47) 21 (74.95) 3 (98.04) 28 (73.33) 4 (97.25) 34 (72.46) 5 (97.06) 8 (78.14) 6 (79.82) 15 (75.82) 7 (78.57)

2xz5 2ace 2wn9 1ea5 2bj0 4b85 4aoe 4m0e 3zdh 1qo9 4afh 2rin 4aod 3rqw

holo holo holo apo holo holo apo holo holo apo holo holo apo holo

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

* *

8 (83.90) 9 (82.41) 21 (70.51) 36 (66.35) 11 (73.84) 12 (72.76) 41 (65.38) 30 (68.66) 25 (69.29) 15 (72.28) 16 (72.08) 23 (69.91) 17 (71.93) 37 (66.22) 39 (65.80) 44 (63.83) 24 (69.42) 46 (63.77)

17 (75.19) 11 (76.51) 9 (78.00) 10 (76.70) 37 (72.28) 32 (72.55) 12 (76.50) 13 (76.16) 14 (75.95) 33 (72.48) 29 (73.21) 16 (75.46) 18 (75.16) 19 (75.13) 20 (75.09) 23 (74.18) 27 (73.53) 24 (73.87)

4bor 2bg9 2aow 1d2e 3uon 4mqt 3erf 3q5s 4anv 4mqs 4daj 2c2n 2ceo 4fs2 3s7s 4irk 3tlr 1fts

apo apo

33 34 35 36

32 (67.77) 26 (69.02) 38 (65.90) 27 (68.95)

25 (73.85) 35 (72.34) 26 (73.56) 42 (71.58)

2wjw 4avm 4a1n 1hkf

37

28 (68.87)

31 (72.87)

1agr

38 39 40

29 (68.77) 33 (67.01) 31 (68.41)

41 (71.87) 30 (72.94) 36 (72.34)

1tup 3qnu 4f14

41 42

34 (66.64) 35 (66.58)

50 (68.05) 38 (72.07)

3upa 1t46

43

43 (64.04)

39 (72.04)

4m0l

44 45 46 47 48 49 50 51 52

40 (65.39) 47 (63.37) 42 (64.61) 45 (63.80) 49 (62.32) 48 (63.15) 50 (61.61) 51 (57.80) 52 (56.10)

44 (71.29) 40 (71.97) 43 (71.49) 52 (64.98) 46 (71.21) 47 (70.73) 48 (70.21) 49 (70.02) 51 (67.80)

3zxf 3fau 4gt7 1k59 4b3x 4q21 1ody 2eyx 1ejg

* * * * *

apo/holo form (ACh) (ACh) (other ligand) (other ligand) (other ligand) (other ligand) (other ligand) (other ligand) (ACh) (ACh)

holo (other ligand) holo (other ligand) holo (ACh) holo (other ligand) holo (other ligand)

Protein name

Organism

Acetylcholine-binding protein Acetylcholinesterase Acetylcholine-binding protein Acetylcholinesterase Acetylcholine-binding protein Acetylcholinesterase Acetylcholine-binding protein type 2 Acetylcholinesterase Acetylcholine-binding protein Acetylcholinesterase Acetylcholine-binding protein Choline ABC transporter Acetylcholine-binding protein type 1 ELIC Pentameric Ligand Gated Ion Channel Nicotinic acetylcholine receptor Nicotinic acetylcholine receptor Histamine N-methyltransferase Elongation factor Tu, mitochondrial M2 muscarinic acetylcholine receptor M2 muscarinic acetylcholine receptor Glutathione S-transferase 2 Multidrug-efflux transporter 1 regulator PI3-kinase subunit gamma M2 muscarinic acetylcholine receptor M3 muscarinic acetylcholine receptor Mitochondrial malonyltransferase Thyroxine-binding globulin DNA polymerase iota Aromatase DNA polymerase IV Beta-2-microglobulin Signal recognition particle receptor FtsY Glutamate receptor 2 Bridging integrator 2 Nuclease EXOG, mitochondrial Natural cytotoxicity triggering receptor 2 Adenylate cyclase-inhibiting G alpha protein Cellular tumor antigen p53 Atlastin-1 Nebulette/Cardiomyopathy-associated protein 3 Ig kappa chain V-I region Walker Mast/stem cell growth factor receptor Kit Translation initiation factor 2 subunit gamma Galectin-7 NEDD4-binding protein 2 Ig epsilon chain C region Angiogenin Translation initiation factor IF-2 GTPase HRas Gag-Pol polyprotein Adapter molecule crk Crambin

Aplysia californica Torpedo californica Aplysia californica Torpedo californica Bulinus truncatus Mus musculus Biomphalaria glabrata Homo sapiens Lymnaea stagnalis Drosophila melanogaster Capitella teleta Sinorhizobium meliloti 1021 Biomphalaria glabrata Dickeya dadantii Torpedo marmorata Torpedo marmorata Homo sapiens Bos taurus Homo sapiens Homo sapiens Saccharomyces cerevisiae Bacillus subtilis Homo sapiens Homo sapiens Rattus norvegicus Homo sapiens Homo sapiens Homo sapiens Homo sapiens Escherichia coli Homo sapiens Escherichia coli Homo Homo Homo Homo

sapiens sapiens sapiens sapiens

Rattus norvegicus Homo sapiens Homo sapiens Homo sapiens Homo sapiens Homo sapiens Sulfolobus solfataricus P2 Homo sapiens Homo sapiens Homo sapiens Homo sapiens Thermus thermophilus HB8 Homo sapiens HIV-1 M:B Homo sapiens Crambe abyssinica

[a] ACh binding proteins are indicated by asterisks. [b] As indicated, the first screening uses the RBS based on the Ac-AChBP (PDB code: 2xz5) and the second screening uses the RBS based on acetylcholinesterase (PDB code: 2ace). [c] The single ranking positions used to define the consensus rank are indicated in bold.

Journal of Computational Chemistry 2014, 35, 2005–2017

2015

SOFTWARE NEWS AND UPDATES

WWW.C-CHEM.ORG

protein is ranked according to its capacity to approximate at least one of the two utilized RBSs. With regard to single rankings, Table 5 and Supporting Information Table S9 show that the RBS generated using the Ac-AChBP provides rather satisfactory results, as the best ranked proteins include AChBPs, nicotinic and muscarinic receptors as well as several acetylcholinesterases. Indeed, the so obtained top-ranked 25 proteins (i.e., the top half of the dataset) include 20 out of 21 ACh binders and only five false positive proteins. In detail, only the multidrug-efflux transporter is not suitably recognized. Differently, the RBS generated using the acetylcholinesterase (see also Supporting Information Table S10) affords markedly worse results as it allows a suitable recognition of the acetylcholinesterases but proves unsuccessful in recognizing some AChBPs, muscarinic and nicotinic receptors. Thus, the so obtained topranked 25 proteins include only 14 out of 21 ACh binders and 11 false positives. Curiously, this second RBS allows a proper recognition of the multidrug-efflux transporter thus suggesting the beneficial opportunity of combining both rankings. The combined ranking is reported in Table 5 which emphasizes that it is able to locate all 21 ACh binders among the top 25 proteins with only four false positives. Understandably, the two top-ranked proteins are the positive controls, namely the proteins by which the RBSs are generated, followed by close AChBP and cholinesterase homologues. Even though the identification of new potential acetylcholine binding proteins goes beyond the primary objectives of this second test, one may note that among the four false positive results there are some proteins which show clear relations with acetylcholine as exemplified by histamine N-methyltransferase which binds known acetylcholinesterase inhibitors (i.e., tacrine).[31]

Conclusions The study presents a fully innovative methodology for the prediction of PBSs on target proteins which largely overcomes the limitations of the hitherto reported structure-based approaches, as it includes a suitably designed flexibility which allows the recognition of the ligand binding sites regardless of their variable amino acid composition and independently of the conformational state of the analyzed target proteins.[32] Indeed, the major limitation of all reported approaches is ascribable to their rigidity as they require that the target proteins include suitably arranged binding pockets to be successfully recognized. Distorted binding pockets cannot be conveniently detected. Conversely, the targeted tolerance included in all steps of the here reported method allows a proper identification of the binding sites even within dramatically distorted protein structures. As shown for the H-Ras/GDP case study and without any preliminary refinement of the protein structures, SPILLO-PBSS was able to find the correct binding site even within highly distorted conformations which were far from an optimal conformation for the binding event. Moreover, the results on the acetylcholine test case suggest that SPILLO-PBSS may be utilized to correctly extract from a database those proteins which show a propensity to recognize the considered ligand. 2016

Journal of Computational Chemistry 2014, 35, 2005–2017

On these bases, the present method can be successfully applied to a variety of medicinal problems where molecular recognition plays a pivotal role from among which one may cite the analysis and characterization of binding sites, the deorphanization of protein cavities and functions and the identification of off-target proteins on a proteome-wide scale. In this way, SPILLO-PBSS may represent a promising tool to reduce the attrition rate typical of the drug development process by speeding up various phases of the drug discovery and development processes, including polypharmacology analysis, side-effects prediction and clarification, and drug repositioning. Finally and while remembering that SPILLO-PBSS cannot be considered as a docking program, the information generated by the program including the precise arrangement of the PBS within the target protein, the ligand pose, the key residues involved in the complex stabilization and their relative weight, can be seen as a starting point for a deeper understanding of the molecular recognition at an atomic level which can find fruitful applications in the rational drug design studies. Keywords: binding site detection  molecular recognition  offtarget proteins  protein druggability  drug repositioning

How to cite this article: Di Domizio, A., Vitriolo, A., Vistoli, G., Pedretti, A. J. Comput. Chem. 2014, 35, 2005–2017. DOI: 10.1002/jcc.23714

]

Additional Supporting Information may be found in the online version of this article.

[1] [2] [3] [4]

Y. Landry, J. P. Gies, Fundam. Clin. Pharmacol. 2008, 22, 1. I. Yusof, M. D. Segall, Drug Discov. Today 2013, 18, 659. M. Segall, Expert Opin. Drug Discov. 2014, 9, 803. A. T. Garcia-Sosa, U. Maran, C. Hetenyi, Curr. Med. Chem. 2012, 19, 1646. Y. Yuan, J. Pei, L. Lai, Curr. Pharm. Des. 2013, 19, 2326. S. Henrich, O. M. Salo-Ahen, B. Huang, F. F. Rippmann, G. Cruciani, R. C. Wade, J. Mol. Recognit. 2010, 23, 209. S. P erot, O. Sperandio, M. A. Miteva, A. C. Camproux, B. O. Villoutreix, Drug Discov. Today 2010, 15, 656. E. B. Fauman, B. K. Rai, E. S. Huang, Curr. Opin. Chem. Biol. 2011, 15, 463. B. Huang, M. Schroeder, BMC Struct. Biol. 2006, 6, 19. V. J. Haupt, S. Daminelli, M. Schroeder, PLoS One 2013, 8, e65894. L. Xie, P. Bourne, BMC Bioinformatics 2007, 8 Suppl 4, S9. C. Het enyi, D. Van Der Spoel, Protein Sci. 2011, 20, 880. T. A. Halgren, Comp. Chem. 1996, 17, 490. C. Bissantz, B. Kuhn, M. Stahl, J Med. Chem. 2010, 53, 5061. A. T. Garcıa-Sosa, J. Chem. Inf. Model. 2013, 53, 1388. D. Seeliger, B. L. de Groot, PLoS Comput. Biol. 2010, 6, e1000634. A. Pedretti, L. Villa, G. Vistoli, J. Comput. Aided Mol. Des. 2004, 18:167. Available at: http://watcut.uwaterloo.ca/cgi-bin/makemultimer. Last accessed on March 2014. A. Prior, J. F. Hancock, Semin. Cell Dev. Biol. 2012, 23, 145. X. Bian, R. W. Klemm, T. Y. Liu, M. Zhang, S. Sun, X. Sui, X. Liu, T. A. Rapoport, J. Hu, Proc. Natl. Acad. Sci. USA 2011, 108, 3976. P. Rice, I. Longden, A. Bleasby, Trends Genet 2000, 16, 276. P. A. Boriack-Sjodin, S. M. Margarit, D. Bar-Sagi, J. Kuriyan, Nature 1998, 394, 337. V. Le Guilloux, P. Schmidtke, P. Tuffery, BMC Bioinformatics 2009, 10,168.

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG

[24] M. Hernandez, D. Ghersi, R. Sanchez, Nucleic Acids Res. 2009, 37, W413. [25] K. Kinoshita, H. Nakamura, Protein Sci. 2005, 14, 711. [26] R. Friesner, R. B. Murphy, M. P. Repasky, L. L. Frye, J. R. Greenwood, T. Halgren, P. C. Sanschagrin, D. T. Mainz, J. Med. Chem. 2006, 49, 6177. [27] K. W. Borrelli, A. Vitalis, R. Alcantara, V. Guallar, J. Chem. Theory Comput. 2005, 1, 1304. [28] A. Madadkar-Sobhani, V. Guallar, Nucleic Acids Res. 2013, 41, W322. [29] M. Brams, E. A. Gay, J. C. Saez, A. Guskov, R. van Elk, R. C. van der Schors, S. Peigneur, J. Tytgat, S. V. Strelkov, A. B. Smit, J. L. Yakel, C. Ulens, C. J. Biol. Chem. 2011, 286, 4420.

SOFTWARE NEWS AND UPDATES

[30] M. L. Raves, M. Harel, Y. P. Pang, I. Silman, A. P. Kozikowski, J. L. Sussman, Nat. Struct. Biol. 1997, 4, 57. [31] J. R. Horton, K. Sawada, M. Nishibori, X. Cheng, J. Mol. Biol. 2005, 353, 334. [32] S. J. Teague, Nat. Rev. Drug Discov. 2003, 2, 527.

Received: 30 April 2014 Revised: 30 July 2014 Accepted: 3 August 2014 Published online on 1 September 2014

Journal of Computational Chemistry 2014, 35, 2005–2017

2017

SPILLO-PBSS: detecting hidden binding sites within protein 3D-structures through a flexible structure-based approach.

The study reports a flexible structure-based approach aimed at identifying binding sites within target proteins starting from a well-defined reference...
1MB Sizes 2 Downloads 3 Views