Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

825

Full Paper In Silico Study Combining Docking and QSAR Methods on a Series of Matrix Metalloproteinase 13 Inhibitors Lili Xi1, Shuyan Li2, Xiaojun Yao2, Yuhui Wei1, Jiazhong Li3, Huanxiang Liu3, and Xin’an Wu1 1 2 3

Department of Pharmacy, First Hospital of Lanzhou University, Lanzhou, China Department of Chemistry, Lanzhou University, Lanzhou, China School of Pharmacy, Lanzhou University, Lanzhou, China

Matrix metalloproteinase 13 (MMP-13) plays an important role in the degradation of articular cartilage and has been considered as an attractive target for the treatment of osteoarthritis; hence, the development of efficient inhibitors of MMP-13 has become a hot study field. Taking a series of carboxylic acid-based MMP-13 inhibitors as research object, this work utilized an extended QSAR method to analyze the structure–activity relationships. We focused on two important topics in QSAR: bioactive conformation and descriptors. Firstly, molecular docking was carried out to dock all molecules into the MMP-13 active site in order to obtain the bioactive conformation. Secondly, based on the docked complex, descriptors characterizing receptor–ligand interactions and the ligand structure were calculated. Thirdly, a genetic algorithm (GA) and multiple linear regression (MLR) were employed to select important descriptors related to inhibitory activities, simultaneously, to build the predictive model. The built model gave satisfactory results with highly accurate fitting and strong external predictive abilities for chemicals not used in model development. Furthermore, the selected descriptors were explored to elucidate important factors influencing the inhibition activities. This study demonstrates that the selection strategy of the docking-guided bioactive conformation is rational and useful in predicting MMP-13 inhibitor activities, and receptor–ligand complex descriptors have an advantage over directly reflecting receptor–ligand interactions. Keywords: Genetic algorithm / Matrix metalloproteinase 13 inhibitors / Molecular docking / Multiple linear regression Received: May 18, 2014; Revised: July 1, 2014; Accepted: July 3, 2014 DOI 10.1002/ardp.201400200

:

Additional supporting information may be found in the online version of this article at the publisher’s web-site.

Introduction Matrix metalloproteinases (MMPs) are a great family of the zinc-dependent endopeptidases that are mainly involved in degrading structural components of the extracellular matrix (ECM) and tissue remodeling [1]. It has been identified that MMPs play critical roles in a number of normal physiological processes, such as embryonic development, wound healing, angiogenesis, etc. [2–4]. However, over-expression of MMPs could accelerate matrix degradation and then result in several

Correspondence: Prof. Xin’an Wu, Department of Pharmacy, First Hospital of Lanzhou University, Lanzhou 730000, China. E-mail: [email protected] Fax: þ86 931 8616392

ß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

pathological disorders, including arthritis, cancer, cardiovascular disease, etc. [5–7]. Therefore, MMPs have been regarded as attractive targets for drug discovery [8, 9]. Worldwide, osteoarthritis (OA) is the most common joint disorder in middle-aged and elderly patients. It is characterized by progressive loss of articular cartilage that eventually leads to chronic pain and disability. The main structural component of articular cartilage is the type II collagen [10, 11]. The excessive cleavage of type II collagen in OA is assumed to be caused by the up-regulation of the synthesis and activities of collagenases [12]. Among the collagenase subfamilies of MMPs, MMP-13 (collagenase-3) is the most efficient to degrade type II collagen [13]. It has been reported that MMP-13 is expressed in articular cartilage and joints of OA and rheumatoid arthritis (RA) patients, respectively, but not in normal adult tissues [14, 15].

826

L. Xi et al.

Accordingly, MMP-13 has been believed to be the most important for degradation of articular cartilage [16], and considered as a target for the inhibition of the progression of OA. Thus, the tasks to design highly efficient inhibitors of MMP13 and to assess their corresponding inhibition activities are still of importance and gain great interest. In recent years, computational methods have become an important and essential part of the process of lead identification and optimization. These methods have already been used to study MMPs and their inhibitors [17–22]. Quantitative structure–activity relationship (QSAR) is one of the most popular ways of these methods. In QSAR studies, there are two important and basic issues: (i) the acquisition of molecular bioactive conformation and (ii) the descriptors characterizing molecular structure. In early stage of QSAR study, it is generally believed that the lowest-energy conformation is the active one. But now, this view has been considered to be incorrect. In most cases, especially in the field of medicinal chemistry, the experimentally determined bioactive conformation of drugs bound to their target, typically obtained from X-ray crystallography, is not always the lowest-energy one [23]. In the case of receptor 3D structure is known, molecular docking method can be applied to get the possible bioactive conformations [24, 25]. On the basis of the precise docking results, the docking-guided molecular conformation selection strategy is very reasonable, and QSAR models based on these docked conformations have more practical significance. In the second issue, descriptors are important intermediates correlating the molecular structures with their target activities. In traditional QSAR studies, molecular descriptors are characterized by constitution, topology, charge, and physicochemical properties and so on. In the presence of receptor–ligand complex conformation, descriptors that characterize the interaction, such as binding free energy, can be available. Then, QSAR model using these descriptors could better reflect the effect of molecular structures on target activities, and these descriptors are probably easy to be interpreted. In this work, therefore, we focus on the above two questions and study a series of carboxylic acid based inhibitors of MMP13 reported by Monovich et al. [26] (the coordinates and X-ray data for the crystal structure of MMP-13 complexes with compound 24f were also reported). The research purpose is to analyze the binding mode between this series of inhibitors and MMP-13, to provide a predictive model and to further study the structure–activity relationships of these inhibitors. Thus, the research is carried out from the following aspects: (i) The docking method is employed to dock all molecules into the active sites of MMP-13, and the binding mode is explored, then the bioactive conformation for each molecule is determined. (ii) Two kinds of descriptors are calculated: the docked receptor–ligand complexes descriptors and the ligand descripß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

tors. (iii) Genetic algorithm (GA) and multiple linear regression (MLR) methods are employed to select the important descriptors related to the MMP-13 inhibitory activity and to build the predictive model. Furthermore, the selected descriptors are analyzed to elucidate critical factors influencing MMP-13 inhibition activity, which might be very useful for the design of new compounds with improved inhibition activities.

Results and discussion Docking analysis In order to validate the docking reliability, firstly, the cocrystallized compound 24f (i.e. compound 25 in this study) was re-docked into the active sites of MMP-13 by Glide [27] program. The re-docking result is shown in Supplementary Fig. S1. Compared the docked pose with the co-crystallized pose, the lower RMSD of 0.432 Å was obtained, which demonstrated that Glide program could well reproduce the experimentally observed binding mode, and it was reliable. Then, the rest of 60 molecules were docked into the active sites by the same way. The automated molecular docking produced multiple binding conformations. The most possible binding conformations were selected according to both GlideScore and the interaction between zinc-binding group (ZBG) and Zn2þ. The poses of all 61 molecules docked into the binding pocket of MMP-13 are shown in Supplementary Fig. S2, indicating that all molecules were positioned in the same way. The binding mode analysis can further recognize the important interaction and residues that are responsible for binding affinity. The binding mode of the most active compound 1 in the MMP-13 active sites is displayed in Fig. 1. In the docked complex, carboxyl (i.e., ZBG) together with three residues His222, His226, and His232 in catalytic domain are coordinated to Zn2þ; the carbonyl oxygen atom of ZBG formed one hydrogen bond with Glu223; H2O1202 formed hydrogen bond separately with NH group of compound 1 and Pro242, which shaped a water bridge for stabilizing the interaction; the oxygen atom of sulfonamide group is positioned to serve as hydrogen bond acceptor for Ala186 and Leu185. Moreover, substituent linked to sulfonamide group occupies the S0 1 pocket of MMP-13 and is surrounded by residues Ala219, Leu218, Ala221, and Leu239, mainly through the hydrophobic interaction. All other inhibitors shared a similar binding mode with compound 1.

Results of the built MLR model and internal validation In GA process, when adding another descriptor does not improve the statistics of a model to any significant degree, the optimum subset size is achieved. To avoid over-parameterization, the increase of Q 2LOO value less than 0.02 was considered as the breakpoint criterion. The best model was chosen based on the following criterion [28, 29]: higher Q 2LOO , Q 2LMO , and CCCcv; www.archpharm.com

Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

In Silico Study of Inhibitory Activity by Extended QSAR

827

Fitting criteria: R2training ¼ 0:910; RMSEtraining ¼ 0:305; CCCtraining ¼ 0:953; AARDtraining ¼ 3:016% Internal validation: Q 2LOO

Figure 1. The binding mode of the most active compound.

least difference between R2 and Q 2LOO ; lower RMSVcv and R2Yscrambling as low as possible. Finally, the model with seven descriptors was considered as the best one. The linear equation and the statistical items were described as follows: pIC50 ¼ 0:0025 ETEwithout constrains þ 4:2819 MATS2v þ 2:0794 EEig03d  8:4883 GGI7 þ 0:0944 RDF130u þ 0:1555 RDF075v  2:1046 Mor18v  2:1641 ð1Þ

¼ 0:874; RMSELOO ¼ 0:361; CCCLOO ¼ 0:934; Q 2LMO ¼ 0:856; R2Yscrambling ¼ 0:160

The resulted correlation coefficients (R2 and Q2) from LMO procedures (repeating 5000 times), together with those from the built model versus Kxy (the inter-correlation among descriptors and response), were plotted in Fig. 2A. In this figure, the majority of parameter values of LMO are around the model parameters, which indicates that the built model is robust and stable. In addition, Fig. 2B shows the distribution of correlation coefficients (R2 and Q2) from Y-scrambling procedure (repeating 5000 times) and of those from the built model. It is evident that the correlation coefficients of the built model are much higher than those after endpoint scrambling because the relationship between the structure and response is broken. All of above results indicate that the built MLR model is not obtained by chance and has good fitting ability and good stability. Furthermore, these results also demonstrate that the relationship between the structure of compounds and corresponding inhibitory activity does exist. The predicted values of training set are listed in Supplementary Table S1. The scatter plot of the predicted

Figure 2. (A) The LMO scatter plot of Kxy versus R2 and Q2 from the LMO procedure and the built model. (B) Y-scrambling plot of Kxy versus the R2 and Q2 from the Y-scrambling procedure and the built model.

ß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.archpharm.com

828

L. Xi et al.

Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

versus experimental inhibitory activities of training set is shown in Fig. 3.

Analysis of the applicability domain and external validation Analyzing the applicability domain (AD) of the built model can provide information about reliability of the prediction. Only predictions for samples that fall into the domain can be considered as reliable. From Fig. 4 (Williams plot), it can be found that all compounds in training set were within the domain of the built model, that is to say, there is neither X outlier nor Y outlier. The predictive ability of the built model for new compounds is most important for its practical usage [30, 31]. It is very necessary to use external test set that is never used in model development to evaluate the predictive ability of the built model. The predicted results for external test set are listed in Supplementary Table S1. From Fig. 4, first, it can be seen that all compounds in external test set were located in the domain, therefore, the predicted values were considered to be reliable; and second, the compounds of external test set were well predicted, with their standardized residuals in the range from þ3.0s to 3.0s. The performance of external validation is listed below:

Figure 3. The plots of the predicted versus experimental inhibitory activities by the built model.

R2ext ¼ 0:837; Q 2F1 ¼ 0:832; Q 2F2 ¼ 0:831; Q 2F3 ¼ 0:878; CCCext ¼ 0:910; r 2m ¼ 0:765 RMSEext ¼ 0:355; AARDext ¼ 3:480% Here, the values of Q 2F1 , Q 2F2 , and Q 2F3 for the external validation were quite similar and all very high. The parameters CCCext and r 2m were both higher than their corresponding criteria thresholds [32–34], respectively. The value of AARDext was less than 5%. All the above results suggested that the built model had the high capability for assessing external test set. The scatter plot of the predicted versus experimental inhibitory activities of external test set is shown in Fig. 3.

The explanation of important descriptors Among seven selected descriptors, one belongs to receptor– ligand complex descriptors, and the rest are ligand structural descriptors. The relative importance of descriptors is explained by their standardized regression coefficients. The detailed information is listed in Table 1. The inter-correlations between the selected descriptors are presented in Table 2. It is noted that the correlation coefficient (r) of GGI7 and EEig03d, as well as RDF130u and RDF075v, was a little high. However, this result is also acceptable, mainly because the resulting model demonstrates its ability in prediction for external compounds. When we deleted one of these two descriptors, the model performance decreased substantially. Therefore, we kept them in the final model. ß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Figure 4. Williams plot for the model with seven descriptors. The values of the training set and the external test set were labeled differently. The two short dashed horizontal lines are the 3.0s limit and the one short dashed vertical line is the threshold value of leverage (h ¼ 0.533). www.archpharm.com

Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

In Silico Study of Inhibitory Activity by Extended QSAR

829

Table 1. The detail information of the selected descriptors. Symbol ETEwithout MATS2v EEig03d GGI7 RDF130u RDF075v Mor18v

constraints

Meaning

Std. reg. coeff.

Ligand binding energy, DGbind Moran autocorrelation-lag2/weighted by atomic van der Waals volumes Eigenvalue 03 from edge adj. matrix weighted by dipole moments Topological charge index of order 7 Radial distribution function-lag13.0/unweighted Radial distribution function-7.5/weighted by atomic van der Waals volumes 3D-MoRSE-signal 18/weighted by atomic van der Waals volumes

0.177 0.305 0.422 0.609 0.376 0.234 0.397

Table 2. The inter-correlation coefficients of the selected descriptors in the built model. ETEwithout ETEwithout MATS2v EEig03d GGI7 RDF130u RDF075v Mor18v

constraints

constraints

1.000 0.217 0.185 0.193 0.219 0.228 0.063

MATS2v

EEig03d

GGI7

RDF130u

RDF075v

Mor18v

1.000 0.161 0.104 0.129 0.240 0.147

1.000 0.550 0.056 0.191 0.002

1.000 0.009 0.007 0.244

1.000 0.613 0.127

1.000 0.138

1.000

The most important descriptor GGI7 belongs to the Gálvez topological charge indices [35, 36], which are proposed to evaluate the charge transfer between pairs of atoms, and therefore the global charge transfer in the molecule based on the corrected adjacency matrix. It is worthwhile to note that the molecular conformation can affect the distribution of charges, consequently changes the value of this kind of descriptors. Therefore, reasonable conformation is very important. GGI7 represents the seventh eigenvalue of the matrix of a molecule, and it is unfavorable to inhibitory activity in this study. In addition, it is interesting that GGI7 reflects charge information and could be related to the electrostatic interaction between ligand and receptor; Zn2þ is in the active sites of MMP-13, so electrostatic interaction for the binding of ligand and receptor is very critical. The second important one EEig03d belongs to edge adjacency index that derived from the H-depleted molecular graph and encodes the connectivity between graph edges. EEig03d is weighted by dipole moments. The information of dipole moment of atoms may describe the ability in the molecules to be an H-bond acceptor. The inhibitory activity increases with the increase of this descriptor. Both MATS2v and Mor18v are weighted by atomic van der Waals volumes. The former belongs to 2D-autocorrelation descriptors based on Moran autocorrelation of topological structure [37]; the latter belongs to 3D-MoRSE descriptors that are directly related with the molecular stereo-structure [38]. Two other descriptors RDF130u and RDF075v belong to radial distribution ß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

function (RDF) descriptors, based on a radial distribution function that can be interpreted as the probability distribution of finding an atom in a spherical volume of radius R [39]. RDF130u is unweighted, while RDF075v is weighted by atomic van der Waals volumes. These three descriptors, MATS2v, Mor18v, and RDF075v, all reflect the size of the molecule. Seen from Eq. (1), the coefficients of RDF075v and MATS2v in the built model are positive, indicating that increasing molecular size could improve inhibitory activity to some extent. Although the coefficient of Mor18v in the model is negative, values of this descriptor for compounds are negative; therefore, the contribution of these three descriptors to inhibitory activities is consistent in this study. Descriptor ETEwithout constrains (Embrace_Total_Energy_without_constraints) represents ligand binding energy (DGbind). Although it is the least important in the model with the minimum absolute value of standardized regression coefficients, it can directly reflect the receptor–ligand interaction information. In the process of ligand binding with receptor, the higher the interaction energy needed, the more difficult the binding process is. Thus, the interaction between ligand and receptor could be weak, which is unfavorable for inhibitory activity.

Conclusions The present work studied the structure–activity relationships of a series of carboxylic acid based inhibitors of MMPwww.archpharm.com

830

L. Xi et al.

13, by using GA-MLR method. In the whole process of QSAR study, two important and basic issues, including the selection of molecular bioactive conformation and the descriptors characterizing molecular structure, were focused on. Docking-guided molecular conformation selection strategy should be a good choice to obtain the bioactive conformation for each molecule; moreover, the results from docking can be useful to understand the receptor–ligand interaction. Besides, descriptors representing interaction of receptor–ligand are able to obtain, based on the docked complex, if 3D structure of receptor is available. This kind of descriptors can directly reflect the receptor–ligand interaction, with explicit meaning. The MLR model constructed, based on training set only, was proved to be stable and had good predictive ability, according to the results of the internal and external validations, respectively. Furthermore, analysis of the AD showed that all compounds both in training and external test set were within the domain of the built model, indicating that the predicted activities were considered to be reliable. However, there are limitations with the built model here; frankly speaking, it was only suitable for the new compounds whose descriptors can be projected into the AD but useless for the compounds that cannot, because the predicted activities for these compounds could be considered less reliable. Finally, the seven selected descriptors were analyzed. The ligand structural descriptors indicated the effect of electrostatic interaction and molecular size is of importance for inhibitory activity, while ETEwithout constrains descriptor can directly reflect the receptor–ligand interaction information. These descriptors had specific physical-chemical meaning. Overall, this study could be expected to help facilitate the design of new derivatives with potential MMP-13 inhibition activity.

Materials and methods Dataset A dataset containing 61 potent carboxylic acid based inhibitors of MMP-13 was collected from the literature [26]. The molecular structures and their inhibitory activities are listed in Supplementary Table S1. The inhibitory activity IC50 values against MMP-13 were converted to the corresponding pIC50 (log IC50) and used as dependent variables in QSAR study. Their inhibitory activities covered over 4 log units (pIC50 ranging from 5.046 to 9.523). By fully considering model validation, all molecules were randomly divided into a 45-molecule training set and a 16-molecule external test set. The training set was used to construct prediction model, while the external test set never used in model development was only applied to evaluate the predictive ability of the built model. ß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

Active conformation generation – molecular docking study Protein preparation The three-dimensional complex structure of MMP-13 with compound 24f was obtained from the Protein Data Bank (PDB code: 3ELM). The crystal structure of the receptor was prepared with Protein Preparation Wizard workflow of Maestro9.0 [40], including adding hydrogens, removing water molecules (out of the 5 Å of active sites), treating metals, assigning partial charges using OPLS_2005 force field and protonation states, and carrying out energy minimization with a small number of steps to relax amino acid residue side chains. Following this step, the structure underwent restrained minimization in vacuum. The minimization was carried out with Impact Refinement module [41] using OPLS_2005 force field and terminated when the root-meansquare deviation (RMSD) reached a maximum cutoff of 0.30 Å. This step allows hydrogen atoms to be freely minimized, while allowing for heavy-atom movement to relax strained bonds, angles, and clashes. The ligand in the crystal structure was used to determine the location of a docking grid box and was then removed prior to grid generation in next step.

Ligand preparation Maestro9.0 package was used to construct molecular structures. Energy minimization was performed by Impact module, using OPLS_2005 force field with a distancedependent dielectric and conjugate gradient algorithm, and others were default values. All the optimized molecules were prepared with LigPrep tool [42] before docking, and all species existing at pH 7  2, including tautomers and enantiomers, were generated. Using OPLS_2005 force field in vacuum, a short conformational search was performed to relax the structures.

Grid generation and docking The scaling factors for protein van der Waals radii was set as 1.0 (default). The ligand in the active site was used as the centroid to generate the grid files for the following docking process. The default grid size was adopted from Glide program. As we consider that MMPs are zinc-dependent and most of MMPs inhibitors have a common group binding to the catalytic zinc, Zn2þ (Zn300 in the complex) was set as constraint in this study. The prepared all molecules were automatically docked into the binding sites of MMP-13 by using Glide SP to obtain the active conformations. Docking results were displayed by Open Source PyMOL program [43].

Descriptor generation Descriptors based on receptor–ligand complexes Based on the docked receptor–ligand complex conformations, a series of structure-based descriptors were calculated by www.archpharm.com

Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

Schrödinger program, using Liaison, Prime MM-GBSA, and eMBrAcE of MacroModel module. Liaison program calculates the receptor–ligand binding affinities using a linear interaction approximation. OPLS_2005 force field was used for charge assignment, and simulation methods were employed to do energy minimization. The truncated Newton method was used for the minimization with a residue-based cutoff distance of 15 Å and a maximum step of 1000. eMBrAcE program was used to calculate the receptor–ligand binding energies by energy minimization of the complex, receptor, and ligand, respectively, without continuum solvation. The PRCG method was used for the minimization with a maximum of 5000 steps. Finally, 20 descriptors were obtained.

Ligand descriptors based on molecular structures All the docked structures were imported to DRAGON5.4 package to generate ligand descriptors. A total of 1664 descriptors were computed for each molecule. Constant and near-constant descriptors were excluded for the sake of containing useless information, and one of the two descriptors whose pairwise correlation coefficient was greater than 0.95 were also removed to reduce redundant information. After then, 601 descriptors remained. Combining receptor–ligand complex descriptors and ligand descriptors, 621 descriptors remained totally.

Descriptor selection and model construction Many applications have proved GA to be a very effective tool in solving descriptors selection problems [29, 44–48]. Therefore, in our work, GA was also employed to select the important descriptors relevant to the inhibitory activities. Leave-one-out (LOO) cross validation was used as the fitness function. The important parameters of GA were as follows: population size was 1000, GA iterations were 10,000 and mutation rate was 10%. The initial population (i.e., a set of models) with the minimum number of allowed descriptors is developed by allsubset-method procedure to explore all the low dimension combination. Then these models are evaluated by fitness function, and are ranked according to the fitness score (Q 2LOO here). If this population meets the conditions of convergence, it can be considered as potential solution; otherwise, individual selection, crossover, and mutation are operated to produce new population. Then the fitness function is used again to evaluate the new population. This process continues until the new population matches the conditions of convergence. When increasing the model size does not increase the Q 2LOO value to any significant degree, the best solution is obtained. The final model is built by using MLR method based on the selected descriptors, called GA-MLR, implemented in the ß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

In Silico Study of Inhibitory Activity by Extended QSAR

831

software QSARINS [49, 50], which is free for academia and research groups.

Model validation and performance The internal predictive ability and robustness of the models are evaluated by LOO and LMO (leave-many-out) crossvalidation approaches. The LOO procedure involves removing one sample from the original training set and reconstructing model only based on the remaining samples and then testing on the removed one. In this form, all the samples in training set are tested, and Q 2LOO is calculated. In the LMO procedure, 35% of the objects were left out randomly from the training set each time. The model obtained on the rest samples is used to predict values for the excluded samples. This procedure is repeated 5000 times, and then the mean value of Q 2LMO is calculated. In addition, to avoid chance correlation, Yscrambling technique is used. In Y-scrambling procedure, the response vector Y is randomly reordered, and new model is recalculated based on randomized responses [51]. That procedure is repeated 5000 times and then the mean value of R2Yscrambling is reported. It is worthwhile to point out that all the validation methods mentioned above just assess the internal predictive ability of model. However, it is necessary to externally validate the built model, based on the fact that external validation can provide more rigorous evaluation of the model’s predictive capability [28, 30, 31]. In this study, therefore, 16 molecules never used in model development are employed as an external test set. In addition to Q 2LOO , Q 2LMO , and R2Yscrambling mentioned above, model performance was also evaluated by the following statistic terms: the root mean squared error (RMSE) and the absolute average relative deviation (AARD)   n  Y iexp  Y ipred  100 X AARD ¼ ð2Þ n i¼1 Y iexp where i represents the i-th sample, Y iexp is the desired output, Y ipred is the actual output of the model, and n is the number of samples in the dataset. The software QSARINS [49, 50] employs various validation criteria for a rigorous external validation, to assess the predictive ability of the built model. Therefore, the following criteria are used: R2ext , Q 2F1 [52], Q 2F2 [53], Q 2F3 [54, 55], CCCext [32–34], and r 2m [56].

Applicability domain In this work, the AD was verified by the leverage approach to verify prediction reliability [31]. The leverage (h) is defined as: hi ¼ xi ðXT XÞ1 xTi

ði ¼ 1; 2; …; nÞ

ð3Þ

where xi is the descriptor row-vector of the query sample i; n is the number of query samples; X is the n  p matrix of dataset www.archpharm.com

832

L. Xi et al.

(p is the number of selected descriptors). The threshold value of warning leverage h is defined as 3(p þ 1)/n. In fact, leverage can be used as a quantitative measure of the model AD suitable for evaluating the degree of extrapolation. It represents a sort of compound distance from the model experimental space. Willams plot [30, 57] (leverage values vs. standardized residuals) is used to visualize the model AD. In Willams plot, the two horizontal lines indicate the limit of normal values for Y outliers (i.e., samples with standardized residuals greater than 3.0 standard deviation units, 3.0s); the vertical straight line indicates the limits of normal values for X outliers (i.e., samples with leverage values greater than the threshold value, h > h ). For a sample in external test set whose leverage value is greater than h , its prediction is considered less reliable, because the prediction is the result of substantial extrapolation of the model. The authors are especially grateful to Prof. P. Gramatica for providing QSARINS software for free. This work was supported by the National Natural Science Foundation of China (NSFC-21305057) and the Fundamental Research Funds for the Central Universities (lzujbky2013-153 and lzujbky-2012-77). The authors wish to express their gratitude to the anonymous reviewers for their constructive suggestion in improving the paper. The authors have declared no conflict of interest.

References [1] J. Huxley-Jones, S. M. Foord, M. R. Barnes, Drug Discov. Today 2008, 13, 685–694. [2] A. Page-McCaw, A. J. Ewald, Z. Werb, Nat. Rev. Mol. Cell Biol. 2007, 8, 221–233. [3] T. L. Haas, M. Milkiewicz, S. J. Davis, A. L. Zhou, S. Egginton, M. D. Brown, J. A. Madri, O. Hudlicka, Am. J. Physiol. Heart Circ. Physiol. 2000, 279, H1540–H1547. [4] J. E. Fata, A. T. V. Ho, K. J. Leco, R. A. Moorehead, R. Khokha, Cell Mol. Life Sci. 2000, 57, 77–95. [5] P. S. Burrage, K. S. Mix, C. E. Brinckerhoff, Front. Biosci. 2006, 11, 529–543. [6] P. Liu, M. Sun, S. Sader, Can. J. Cardiol. 2006, 22, 25B–30B. [7] E. Deryugina, J. Quigley, Cancer Metastasis Rev. 2006, 25, 9–34. [8] A. D. Rowan, G. J. Litherland, W. Hui, J. M. Milner, Expert Opin. Ther. Targets 2008, 12, 1–18. [9] J. L. Hu, P. E. Van den Steen, Q. X. A. Sang, G. Opdenakker, Nat. Rev. Drug Discov. 2007, 6, 480–498. [10] E. V. Tchetina, Arthritis 2011, 2011, 683970. [11] J. Wu, T. S. Rush Iii, R. Hotchandani, X. Du, M. Geck, E. Collins, Z. B. Xu, J. Skotnicki, J. I. Levin, F. E. Lovering, Bioorg. Med. Chem. Lett. 2005, 15, 4105–4109. [12] H. Takaishi, T. Kimura, S. Dalal, Y. Okada, J. D’Armiento, Curr. Pharm. Biotechnol. 2008, 9, 47–54. ß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

[13] R. C. Billinghurst, L. Dahlberg, M. Ionescu, A. Reiner, R. Bourne, C. Rorabeck, P. Mitchell, J. Hambor, O. Diekmann, H. Tschesche, J. Chen, H. Van Wart, A. R. Poole, J. Clin. Invest. 1997, 99, 1534–1545. [14] M. Vincenti, C. Brinckerhoff, Arthritis Res. 2002, 4, 157–164. [15] A. R. Johnson, A. G. Pavlovsky, D. F. Ortwine, F. Prior, C. F. Man, D. A. Bornemeier, C. A. Banotai, W. T. Mueller, P. McConnell, C. Yan, V. Baragi, C. Lesch, W. H. Roark, M. Wilson, K. Datta, R. Guzman, H.-K. Han, R. D. Dyer, J. Biol. Chem. 2007, 282, 27781–27791. [16] L. Dahlberg, R. C. Billinghurst, P. Manner, F. Nelson, G. Webb, M. Ionescu, A. Reiner, M. Tanzer, D. Zukor, J. Chen, H. E. Van Wart, A. R. Poole, Arthritis Rheum. 2000, 43, 673–682. [17] O. Nicolotti, T. F. Miscioscia, F. Leonetti, G. Muncipinto, A. Carotti, J. Chem. Inf. Model. 2007, 47, 2439–2448. [18] C. A. Kontogiorgis, Curr. Med. Chem. 2005, 12, 339–355. [19] S. P. Gupta, Chem. Rev. 2007, 107, 3042–3087. [20] M. Fernández, J. Caballero, A. Tundidor-Camba, Bioorg. Med. Chem. 2006, 14, 4137–4150. [21] M. Fernández, J. Caballero, Bioorg. Med. Chem. 2007, 15, 6298– 6310. [22] K. C. Tsai, T. H. Lin, J. Chem. Inf. Comput. Sci. 2004, 44, 1857–1871. [23] I. J. Chen, N. Foloppe, J. Chem. Inf. Model. 2008, 48, 1773–1791. [24] K. R. Webster, Expert Opin. Invest. Drugs 1998, 7, 865–887. [25] B. L. Lei, J. Z. Li, J. Lu, J. Du, H. X. Liu, X. J. Yao, J. Agric. Food Chem. 2009, 57, 9593–9598. [26] L. G. Monovich, R. A. Tommasi, R. A. Fujimoto, V. Blancuzzi, K. Clark, W. D. Cornell, R. Doti, J. Doughty, J. Fang, D. Farley, J. Fitt, V. Ganu, R. Goldberg, R. Goldstein, S. Lavoie, R. Kulathila, W. Macchia, D. T. Parker, R. Melton, E. O’yrne, G. Pastor, T. Pellas, E. Quadros, N. Reel, D. M. Roland, Y. Sakane, H. Singh, J. Skiles, J. Somers, K. Toscano, A. Wigg, S. Zhou, L. Zhu, W. C. Shieh, S. Xue, L. W. McQuire, J. Med. Chem. 2009, 52, 3523–3538. [27] Glide, version 5.5, Schrödinger, LLC, New York, NY 2009. [28] P. Gramatica, E. Giani, E. Papa, J. Mol. Graph. Model. 2007, 25, 755–766. [29] J. Z. Li, B. L. Lei, H. X. Liu, S. Y. Li, X. J. Yao, M. C. Liu, P. Gramatica, J. Comput. Chem. 2008, 29, 2636–2647. [30] P. Gramatica, QSAR Comb. Sci. 2007, 26, 694–701. [31] A. Tropsha, P. Gramatica, V. K. Gombar, QSAR Comb. Sci. 2003, 22, 69–77. [32] L. I. Lin, Biometrics 1989, 45, 255–268. [33] N. Chirico, P. Gramatica, J. Chem. Inf. Model. 2011, 51, 2320– 2335. [34] N. Chirico, P. Gramatica, J. Chem. Inf. Model. 2012, 52, 2044– 2058. [35] J. Galvez, R. Garcia, M. T. Salabert, R. Soler, J. Chem. Inf. Comput. Sci. 1994, 34, 520–525. [36] J. Galvez, R. Garcia-Domenech, J. V. de Julian-Ortiz, R. Soler, J. Chem. Inf. Comput. Sci. 1995, 35, 272–284. [37] P. A. P. Moran, Biometrika 1950, 37, 17–23. [38] J. Gasteiger, J. Sadowski, J. Schuur, P. Selzer, L. Steinhauer, V. Steinhauer, J. Chem. Inf. Comput. Sci. 1996, 36, 1030–1037. [39] M. C. Hemmer, V. Steinhauer, J. Gasteiger, Vib. Spectrosc. 1999, 19, 151–164. www.archpharm.com

Arch. Pharm. Chem. Life Sci. 2014, 347, 825–833

[40] [41] [42] [43] [44] [45] [46] [47]

[48] [49]

Maestro, version 9.0, Schrödinger, LLC, New York, NY 2009. Impact, version 5.5, Schrödinger, LLC, New York, NY 2005. LigPrep, version 2.3, Schroöinger, LLC, New York, NY 2009. W. L. DeLano, The PyMOL Molecular Graphics System Delano Scientific, Palo Alto, CA, USA 2006. B. L. Lei, S. Y. Li, L. L. Xi, J. Z. Li, H. X. Liu, X. J. Yao, J. Chromatogr. A 2009, 1216, 4434–4439. B. Hemmateenejad, A. Mohajeri, J. Comput. Chem. 2008, 29, 266–274. L. L. Xi, S. Y. Li, H. X. Liu, J. Z. Li, B. L. Lei, X. J. Yao, J. Theor. Biol. 2010, 264, 1159–1168. J. H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press, Cambridge, MA 1992. Y. Wei, L. Xi, X. Yao, J. Li, X. Wu, Arch. Pharm. 2012, 345, 759– 766. P. Gramatica, N. Chirico, E. Papa, S. Cassani, S. Kovarich, J. Comput. Chem. 2013, 34, 2121–2132.

ß 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

In Silico Study of Inhibitory Activity by Extended QSAR

833

[50] P. Gramatica, N. Chirico, E. Papa, S. Cassani, S. Kovarich, QSAR Res Unit in EnvironChem and Ecotox, University of Insubria, Varese, Italy. http://www.qsar.it, 2013. [51] F. Lindgren, B. Hansen, W. Karcher, M. Sjöström, L. Eriksson, J. Chemometr. 1996, 10, 521–532. [52] L. M. Shi, H. Fang, W. Tong, J. Wu, R. Perkins, R. M. Blair, W. S. Branham, S. L. Dial, C. L. Moland, D. M. Sheehan, J. Chem. Inf. Comput. Sci. 2000, 41, 186–195. [53] G. Schüürmann, R. U. Ebert, J. Chen, B. Wang, R. Kühne, J. Chem. Inf. Model. 2008, 48, 2140–2145. [54] V. Consonni, D. Ballabio, R. Todeschini, J. Chem. Inf. Model. 2009, 49, 1669–1678. [55] V. Consonni, D. Ballabio, R. Todeschini, J. Chemometr. 2010, 24, 194–201. [56] P. K. Ojha, I. Mitra, R. N. Das, K. Roy, Chemometr. Intell. Lab. Syst. 2011, 107, 194–205. [57] P. Gramatica, S. Cassani, P. P. Roy, S. Kovarich, C. W. Yap, E. Papa, Mol. Inform. 2012, 31, 817–835.

www.archpharm.com

In silico study combining docking and QSAR methods on a series of matrix metalloproteinase 13 inhibitors.

Matrix metalloproteinase 13 (MMP-13) plays an important role in the degradation of articular cartilage and has been considered as an attractive target...
2MB Sizes 0 Downloads 7 Views