Ecotoxicology and Environmental Safety 112 (2015) 39–45

Contents lists available at ScienceDirect

Ecotoxicology and Environmental Safety journal homepage: www.elsevier.com/locate/ecoenv

Optimal descriptor as a translator of eclectic data into prediction of cytotoxicity for metal oxide nanoparticles under different conditions Alla P. Toropova a, Andrey A. Toropov a,n, Robert Rallo b, Danuta Leszczynska c, Jerzy Leszczynski d a

IRCCS, Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, 20156 Milano, Italy Departament d′Enginyeria Informatica i Matematiques, Universitat Rovira i Virgili, Av. Països Catalans, 26, 43007 Tarragona, Catalunya, Spain c Interdisciplinary Nanotoxicity Center, Department of Civil and Environmental Engineering, Jackson State University, 1325 Lynch Street, Jackson, MS 392170510, USA d Interdisciplinary Nanotoxicity Center, Department of Chemistry and Biochemistry, Jackson State University, 1400 JR Lynch Street, P.O. Box 17910, Jackson, MS 39217, USA b

art ic l e i nf o

a b s t r a c t

Article history: Received 14 July 2014 Received in revised form 1 October 2014 Accepted 3 October 2014

The Monte Carlo technique has been used to build up quantitative structure–activity relationships (QSARs) for prediction of dark cytotoxicity and photo-induced cytotoxicity of metal oxide nanoparticles to bacteria Escherichia coli (minus logarithm of lethal concentration for 50% bacteria pLC50, LC50 in mol/L). The representation of nanoparticles include (i) in the case of the dark cytotoxicity a simplified molecular input-line entry system (SMILES), and (ii) in the case of photo-induced cytotoxicity a SMILES plus symbol ‘^’. The predictability of the approach is checked up with six random distributions of available data into the visible training and calibration sets, and invisible validation set. The statistical characteristics of these models are correlation coefficient 0.90–0.94 (training set) and 0.73–0.98 (validation set). & Elsevier Inc. All rights reserved.

Keywords: QSAR Quasi-SMILES Quasi-QSAR, Nano-QSAR Monte Carlo method Cytotoxicity Metal oxide nanoparticle

1. Introduction Nanomaterials become important components of modern everyday life. This requires studies that would reveal their characteristics and provide guidelines to facilitate their safe applications. Predictive models for nanomaterials can be useful for theoretical and practical reasons (Randic, 1991; Cosentino et al., 2000; Balaban et al., 2005; Ivanciuc et al., 2006; Tetko et al., 2008; Bhhatarai et al., 2010; Das and Trinajstic, 2010; Mitra et al., 2010; Duchowicz et al., 2011; Furtula and Gutman, 2011; Afantitis et al., 2011; Toropov et al. 2012b,c; Liu et al., 2013; Cohen et al., 2013; Toropova and Toropov, 2014) to the same extend as models for “classic” substances (organic, inorganic, organometallic) have been used. Many of suggested approaches which are aimed to build up quantitative structure–property/activity relationships (QSPRs/ QSARs) for nanomaterials were obtained with “classic” descriptors (Fourches et al., 2010; Petrova et al., 2011), tested for “classic” substances. However, (owing to the uncertainty of molecular architecture that is related to nanomaterials), the development n

Corresponding author. E-mail address: [email protected] (A.A. Toropov).

http://dx.doi.org/10.1016/j.ecoenv.2014.10.003 0147-6513/& Elsevier Inc. All rights reserved.

of fresh “nanodescriptors” (Leszczynski 2010; Toropova and Toropov, 2013) becomes a necessary task of modern computational approaches focusing on the problem. An attractive and innovative alternative to “classic” descriptors are optimal descriptors calculated using available eclectic data (Toropova et al., 2013; Toropova and Toropov, 2013). Optimal descriptors (Toropova et al., 2010, 2011, 2012a, Toropov et al., 2010a,b, 2013a,b), could be considered as a transitional step between “classic” and “nanodescriptors”. On the one hand, these descriptors can be calculated with data on the molecular structure (i.e. just as “classic” descriptors); but on the other hand, these descriptors can be computed using eclectic information about a substance, even without detailed data on its molecular structure (Toropov et al., 2007; Toropova and Toropov, 2013). However, data on various nanoparticles can be represented by special strings which are encoded data on physicochemical and biochemical conditions of impact of the nanoparticles. These SMILES-like strings can be named “quasi SMILES”, since they represent conditions in contrast of traditional SMILES which represent solely the molecular structures. The paradigm for traditional QSPR/QSAR analyses could be expressed as:

40

A.P. Toropova et al. / Ecotoxicology and Environmental Safety 112 (2015) 39–45

Endpoint ¼ F(Molecular Structure) In the case of the nanomaterials the paradigm can be modified as follows: Endpoint ¼ F(Available Eclectic Data) The available eclectic data can be (i) the molecular structure of substances which are involved in phenomenon under consideration; (ii) presence/absence of photo-inducing; and (iii) any other circumstances which are able to have influence on the phenomenon under consideration (Toropova and Toropov, 2013; Toropov and Toropova, 2014). Consequently, one can define the following hybrid paradigm: Endpoint ¼ F(Molecular Structure and Available Eclectic Data) Since the above mentioned quasi SMILES are basis for establishing of correlation between impacts (these are not only data on the molecular structure, but any available eclectic data with influence upon nanoparticles) which are defining the behavior of metal oxide nanoparticles, these correlations can be named as “quasiQSARs” or “nano-QSARs”. In the present work, the only eclectic factor is the presence or vice versa absence of photo-inducing, however the number of eclectic components for the quasi-QSAR or nano-QSAR can be larger (Toropova and Toropov, 2013; Toropov and Toropova, 2014). The aim of the present study is an attempt to build up united QSAR model for dark cytotoxicity and photo-induced cytotoxicity of metal oxide nanoparticles to bacteria Escherichia coli, using optimal descriptors which are a mathematical function of atomic composition and the conditions (i.e. the dark or the photoinducing).

Table 1 Upper triangle of percentages of identity for random splits. Set

Split 1

Split 2

Split 3

Split 4

Split 5

Split 6

Split 1

Training Calibration Validation

100.0a 100.0 100.0

72.3 16.7 16.7

72.7 0.0 33.3

65.1 33.3 15.4

57.8 0.0 16.7

69.8 16.7 15.4

Split 2

Training Calibration Validation

100.0 100.0 100.0

76.2 42.9 16.7

58.5 0.0 30.8

69.8 30.8 33.3

58.5 28.6 30.8

Split 3

Training Calibration Validation

100.0 100.0 100.0

53.7 0.0 30.8

65.1 30.8 0.0

68.3 28.6 15.4

Split 4

Training Calibration Validation

100.0 100.0 100.0

52.4 15.4 30.8

70.0 0.0 42.9

Split 5

Training Calibration Validation

100.0 100.0 100.0

61.9 15.4 30.8

Split 6

Ttraining Calibration Validation

100.0 100.0 100.0

Where Ni, j is the number of substances distributed into the same set for both the i-th split and the j-th splits (set¼ training, calibration, and validation); Ni is the number of substances distributed into the set for the i-th split; N j is the number of substances distributed into the set for the j-th split. a

Identity (%) = Ni, j/0.5* (Ni + N j ) × 100

Thus the optimal descriptors have been calculated as follows: 2. Method

DCW (T , N) = ΣCW (A k )

2.1. Data

where Ak is an attribute of the quasi-SMILES that comprises one symbol (e.g. ‘O’, ‘V’, etc.) or two symbols which should be examined as one (e.g. ‘Cu’, ‘Al’, etc.). In the case of dark cytotoxicity, nanoparticles are represented by SMILES of ACD/ChemSketch software (ACD/I-LAB, 2014), in the case of photo-induced cytotoxicity, nanoparticles are represented by the SMILES of ACD/ ChemSketch software (ACD/I-LAB, 2014) plus symbol ‘^’ (Table 2). The CW(x) is correlation weight for an attribute x, that is extracted from a quasi-SMILES; the T is the threshold to divide attributes into two categories rare (noise) or not rare; the N is the number of epochs of the Monte Carlo optimization. Correlation weights are calculated for not rare attributes by the Monte Carlo optimization that gives maximum of determination coefficient between DCW(T,N) and pLC50 for the calibration set. The preferable values for the Tn and Nn which provides best statistics for the calibration set should be defined at the preliminary phase of the QSAR analysis (Toropova et al., 2011). Having Tn, Nn, and CW(x) which give maximum of the determination coefficient for the calibration set, one can define (using data from the training set) the following model:

The numerical data on cytotoxicity of metal oxide nanoparticles to bacteria E. coli (the concentration of the nanoparticles that proved to be fatal to 50% of the bacteria E. coli LC50, in mol/L) have been taken from the literature (Pathakoti et al., 2014). The negative decimal logarithm of the LC50 (pLC50) has been examined as the endpoint. Six random distributions of the available data into training and calibration sets (these metal oxide nanoparticles are used to build up the model) and validation set (these metal oxide nanoparticles are not involved to build up the model, they are used to check up predictability of the model) are examined. All these splits are prepared according to the following principles: (i) they are random; (ii) the range of endpoints in each sub-set is similar to ranges for other sub-sets; and (iii) these splits are not identical (Table 1). The dark cytotoxicity and photo-induced cytotoxicity are examined as an united endpoint, owing to application of the model which is a mathematical function of atomic composition and conditions (presence or absence of photo-inducing).

(1)

2.2. Optimal descriptors

pLC50 = C0 + C1*DCW (T *, N *)

In order to take into account the photo-induction, the symbol ‘^’ is used. Thus, SMILES used in this work are not equivalent traditionally used ones (Weininger, 1988, 1990; Weininger et al., 1989). Under such circumstances, the term ‘quasi-SMILES’ is used to define the name for the used representation of metal oxide nanoparticles, because the quasi-SMILES is the representation of data on molecular structure together with condition: presence or absence of photo-inducing. The presence of photo-inducing indicated by symbol ‘^’ that is added at the end of traditional SMILES (Table 2).

The predictability of the model should be checked up with external validation set. Table 3 contains the numerical data on the correlation weights of different attributes involved in the modeling process. These are (i) various chemical elements represented traditionally by one (e.g. ‘O’, ‘V’) or by two symbols (e.g. ‘La’, ‘Ni’). The symbol ‘¼ ’ represents double bonds. The symbol ‘^’ represents the photo-inducing. The symbols ‘[‘ and ’]’ are used in the classic SMILES for encoding special group or metal (Weininger, 1988, 1990; Weininger et al. 1989). Thus, all attributes have transparent interpretation. The correlation

(2)

A.P. Toropova et al. / Ecotoxicology and Environmental Safety 112 (2015) 39–45

41

Table 2 The quasi-SMILES of metal oxide nanoparticles, distribution of available data into the “visible” training (t) and calibration (c) sets and “invisible” validation set (v); experimental and calculated pLC50 values. Distribution in splits 1

2

3

4

5

6

v t t c t t t t t c v t t t t v t t t c t c t v t t c v t t t v t t

t c t t c t c v t t c t t t v c t t t c v t t t v t v c t t t v t t

c v c t c t t c t v v t t t t c t t t t c t v t t v t c t t t v t t

t t t c t t t v c t v v t c t t v c t t t t v c v t c t v t t c t t

t t c t t v c v t t t v t t c c t c t t v t t t t t t v t t v t t c

c t t t t c t v v t c c t t t c v t t t t c c t t v v v v t t t t t

Quasi-SMILES for metal oxide nanoparticles

pLC50 in mol/L (Pathakoti et al., 2014)

Eq. (3) Eq. (4) Eq. (5) Eq. (6)

Eq. (7) Eq. (8)

O¼[Zn] [Cu] ¼O O¼[V]O[V] ¼O O¼[Y]O[Y] ¼O O¼[Bi]O[Bi] ¼ O O¼[In]O[In] ¼ O O¼[Sb]O[Sb] ¼ O O¼[Al]O[Al] ¼O O¼[Fe]O[Fe]¼ O O¼[Si] ¼ O O¼[Zr] ¼O O¼[Sn] ¼ O O¼[Ti] ¼ O [Co]¼ O [Ni]¼ O O¼[Cr]O[Cr] ¼ O O¼[La]O[La] ¼ O O¼[Zn]\widehat [Cu] ¼O\widehat O¼[V]O[V] ¼O\widehat O¼[Y]O[Y] ¼O\widehat O¼[Bi]O[Bi] ¼ O\widehat O¼[In]O[In] ¼ O\widehat O¼[Sb]O[Sb] ¼ O\widehat O¼[Al]O[Al] ¼O\widehat O¼[Fe]O[Fe]¼ O\widehat O¼[Si] ¼ O\widehat O¼[Zr] ¼O\widehat O¼[Sn] ¼ O\widehat O¼[Ti] ¼ O\widehat [Co]¼ O\widehat [Ni]¼ O\widehat O¼[Cr]O[Cr] ¼ O\widehat O¼[La]O[La] ¼ O\widehat

5.80 4.24 3.48 5.79 3.55 2.83 3.12 2.42 2.40 2.54 2.58 2.53 2.14 3.13 3.79 2.06 4.96 6.23 5.71 3.78 5.84 4.02 3.48 3.66 2.75 2.54 2.92 3.04 3.24 4.68 3.33 3.87 2.06 5.56

4.8787 4.5261 3.4218 4.8413 3.5020 2.7830 3.0677 2.0447 1.9340 1.9472 1.9472 2.3730 2.9359 2.7212 4.0297 1.2274 4.7716 5.8662 5.5136 4.4093 5.8288 4.4895 3.7705 4.0552 3.0322 2.9215 2.9347 2.9347 3.3605 3.9234 3.7087 5.0172 2.2149 5.7591

5.5619 4.5950 3.0063 5.6924 3.3510 2.7086 2.8854 2.0027 2.0115 2.2699 2.4823 2.4409 2.9419 2.8427 3.6350 1.2519 4.9361 6.3998 5.4329 3.8443 6.5304 4.1890 3.5465 3.7233 2.8407 2.8494 3.1079 3.3202 3.2789 3.7798 3.6807 4.4729 2.0898 5.7741

weights of blocked attributes are equal to 0.0, i.e. these have no influence on the model. Table 4 contains an example of the calculation of DCW(Tn,Nn) and pLC50.

5.6397 4.9409 3.4806 5.7780 3.2568 2.7641 2.9087 1.9188 2.1031 2.5408 2.1160 2.4975 3.0362 2.8531 2.3187 1.2908 4.8804 6.4002 5.7014 4.2411 6.5385 4.0172 3.5246 3.6692 2.6793 2.8636 3.3013 2.8765 3.2579 3.7966 3.6136 3.0792 2.0513 5.6409

5.3001 4.7720 2.8323 5.8049 3.0963 2.8287 2.9420 1.8091 2.4093 1.9869 1.8349 2.4194 2.9432 2.7777 3.8110 1.1302 4.7911 6.2253 5.6972 3.7575 6.7301 4.0215 3.7539 3.8672 2.7343 3.3345 2.9121 2.7601 3.3446 3.8684 3.7029 4.7362 2.0554 5.7163

5.8000 4.5033 3.1451 4.8582 3.3190 2.8271 3.1404 1.8644 1.5757 2.5464 2.0905 2.3363 2.9262 2.3693 3.7899 1.5730 4.6073 6.7572 5.4606 4.1023 5.8155 4.2762 3.7843 4.0976 2.8217 2.5329 3.5036 3.0477 3.2935 3.8834 3.3265 4.7471 2.5302 5.5645

Split 6

pLC50 = 2.9489(±0.0252) + 0.7092(±0.0095) *DCW (1, 11) n

3. Results and discussion The search for Tn and Nn has been carried out in the ranges (i) T from 1 to 3; and (ii) N from 1 to 20 (Toropova et al., 2011). The developed models are the following: Split 1

pLC50 = 2.4451(±0.0246) + 0.7074(±0.0089) *DCW (1, 7)

(3)

Split 2

pLC50 = 2.5185(±0.0230) + 0.8969(±0.0071) *DCW (1, 12)

(4)

Split 3

pLC50 = 2.4390(±0.0204) + 0.7537(±0.0050) *DCW (1, 12)

(5)

Split 4

pLC50 = 3.0378(±0.0246) + 0.7113(±0.0101) *DCW (1, 11)

(6)

Split 5

pLC50 = 1.5185(±0.0334) + 0.8370(±0.0110) *DCW (1, 9)

(7)

5.4862 4.5975 3.2528 5.4403 3.5654 2.1465 3.0264 2.0101 2.1465 2.5329 2.3042 2.3042 3.0468 2.8603 3.4581 1.2961 4.8107 6.2302 5.3415 3.9968 6.1844 4.3094 2.8906 3.7704 2.7541 2.8906 3.2770 3.0483 3.0483 3.7908 3.6043 4.2021 2.0401 5.5547

(8)

Table 5 contains the preferable values of N together with the statistical quality of models for six random splits. Preferable threshold for all models is Tn ¼ 1. One can see that for six splits statistical quality of the model calculated using the described approach is the following: the range of the standard error of estimation for the training set 0.293–0.370; and the range of standard error of estimation for the validation set (n¼ 6 or 7; these metal oxide nanoparticles are not involved to build up the model) is 0.367–0.858. The models suggested in the literature (Pathakoti et al., 2014), separately for dark cytotoxicity and photo-induced cytotoxicity, are characterized for the validation set (n ¼4) by the following parameters: s¼ 0.52; and s ¼0.88, for the two forms of cytotoxicity above, respectively. Thus, the statistical quality of predictions with Eqs. (3)–(8) is comparable with the abovementioned model, based on quantum mechanics descriptors (Pathakoti et al., 2014). Table 3 contains the correlation weights for calculations with Eqs. (3)–(8). Fig. 1 graphically represents models calculated with Eqs. (3)–(8). Having data on the correlation weights obtained in several runs of the Monte Carlo optimization with the preliminarly defined Tn and Nn, one can select four classes of the attributes: (i) stable promoters of pLC50 rise, i.e. all runs give positive correlation weights; (ii) stable promoters of pLC50 decay, i.e. all runs result in

42

A.P. Toropova et al. / Ecotoxicology and Environmental Safety 112 (2015) 39–45

Table 3 Correlation weights for calculation DCW(Tn,Nn).

Table 3 (continued )

Ak

CW(Ak)

Eq. (3) ¼ Al Bi Co Cr Cu Fe O In La Ni V Sb Si Y Sn Ti [ \widehat Zn

 0.14674 23 0.17345 2 1.20345 1 0.74595 2  0.40424 1 3.29735 2 0.09521 2  0.20138 23 0.69530 2 2.10083 2 2.59570 1 1.14679 1 0.89652 1 0.0 0 2.15007 1 0.60191 2 1.39757 2  0.00385 23 1.39594 11 3.79580 1

5 0 1 0 0 0 0 5 0 0 0 1 0 2 1 0 0 5 3 0

Eq. (4) ¼ Bi Co Cr Cu Fe O In La V Sb Si Y Sn Ti [ \widehat Zn Zr

 0.00295 21 0.74583 1 0.59579 2  0.35010 1 2.92349 1 0.10270 2  0.22305 21 0.47120 2 1.65091 2 0.87061 1 0.55181 1 0.47366 1 2.15128 1 0.42528 2 1.02591 2 0.00163 21 0.84785 11 3.70262 2 0.0 0

7 1 0 1 1 0 7 0 0 1 1 0 0 0 0 7 2 0 2

Eq. (5) ¼ Al Bi Co Cr Cu Fe O In La Ni V Sb Si Y Sn Ti [ \widehat Zn Zr

 0.14512 0.12461 0.97850 0.87786  0.32574 3.52377 0.52276  0.22784 0.80095 2.10276 2.24871 0.80333 0.87613 0.20168 2.77526 0.77543 1.47037  0.02781 1.22745 4.22435 0.0

21 1 1 2 1 1 1 21 1 2 1 1 2 1 1 2 2 21 12 1 0

7 1 1 0 1 0 0 7 0 0 0 1 0 0 1 0 0 7 2 1 1

Eq. (6) ¼ Bi Co Cr Cu Fe O

 0.07510 20 1.02239 2  0.20142 1  0.20489 2 2.79865 2  0.20298 1  0.17273 20

7 0 1 0 0 1 7

Frequency in training set

Frequency in calibration set

In La Ni V Sb Si Y Ti [ \widehat Zn Zr

0.67663 1 1.92798 1 1.79562 1 0.90015 2 0.89686 1 0.29539 1 2.10436 1 0.82928 2  0.24530 20 1.34570 10 4.62152 1  0.34559 1

0 0 1 0 1 1 1 0 7 4 1 0

Eq. (7) ¼ Al Bi Co Cr Cu Fe O In La Ni V Sb Si Y Sn Ti [ \widehat Zn Zr

0.10158 0.19940 1.00478 1.35252  0.24908 3.44583 0.20463  0.27911 0.62103 1.95157 2.29898 0.79887 0.72663 0.84577 2.40333 1.05005 1.64851 0.20343 1.00105 4.60092 1.09949

22 1 2 1 1 2 2 22 1 1 1 1 1 2 1 1 2 22 12 1 1

6 0 0 0 1 0 0 6 0 1 1 1 1 0 0 0 0 6 2 1 0

Eq. (8) ¼ Al Bi Co Cr Cu O In La Ni V Sb Si Y Sn Ti [ \widehat Zn Zr

 0.27984 20  0.09617 1 1.00027 1 0.37719 2  0.59955 1 2.82661 2  0.12700 20 0.0 0 1.87818 1 1.22003 2 0.77992 2 0.62029 2 0.32248 1 2.32210 2 0.0 0 1.04695 2  0.04768 20 1.04907 11 4.07959 1 0.0 0

7 0 1 0 1 0 7 2 0 0 0 0 0 0 1 0 7 2 1 1

Table 4 Example of calculation of DCW(1,7) for Eq. (3). The representation of metal oxide NP is [Cu] ¼ O DCW(1,7)¼ ΣCW(Ak) ¼ 2.94152; pLC50 ¼ 2.4451þ 0.7074n2.94152¼ 4.5261. Ak

CW(Ak)

[ Cu [ ¼ O

 0.0039 3.2973  0.0039  0.1467  0.2014

negative correlation weights; (iii) attributes with unclear role, i.e. there have both positive and nagative correlation weights; and (iv) blocked attributes.

A.P. Toropova et al. / Ecotoxicology and Environmental Safety 112 (2015) 39–45

43

Table 5 Statistical quality of models for pLC50 calculated with various distributions (1–6) of available data into the training, calibration, and validation sets. The threshold Tn ¼ 1 for all models. Training set

Calibration set

Validation set

No.

Nn

na

r2

q2

s

F

n

r2

c

k

k′

s

n

r2

s

1 2 3 4 5b 6

7 12 12 11 9 11

23 21 21 20 22 20

0.9250 0.9464 0.9469 0.9276 0.9081 0.9160

0.9115 0.9384 0.9396 0.9127 0.8925 0.9006

0.347 0.317 0.293 0.339 0.354 0.370

259 335 338 2312 198 196

5 7 7 7 6 7

0.7279 0.9737 0.9527 0.7921 0.9943 0.9473

0.62 0.89 0.84 0.74 0.91 0.87

0.96 0.99 0.95 0.97 0.98 0.91

1.01 0.98 1.03 1.00 1.01 1.08

0.683 0.527 0.705 0.786 0.454 0.533

6 6 6 7 6 7

0.7332 0.7905 0.8078 0.8965 0.9835 0.8961

0.828 0.858 0.721 0.367 0.418 0.300

Rp2

a The n is number of metal oxide nanoparticles in set (i.e. training, calibration, or validation); r2 is the determination coefficient; q2 is the cross validated r2; s is standard error of estimation; and F is the Fischer F-ratio. The cRp2 is the criterion of Y-randomization (Ojha and Roy, 2011; Veselinović et al., 2013a,b): a model is not chance correlation if cRp2 is larger than 0.5; the k and k′ are criteria of predictability of a model: both should be close to 1 (Golbraikh and Tropsha, 2002; Melagraki and Afantitis, 2013; Veselinović et al., 2013a,b). b The best model is marked by bold.

Fig. 1. Graphical representation of models calculated with Eqs. (3)–(8).

44

A.P. Toropova et al. / Ecotoxicology and Environmental Safety 112 (2015) 39–45

The analysis of these data for six splits has shown that stable promoters of the pLC50 increase are photo-inducing (^), vanadium (V), and Yttrium (Y). The stable promoters of the pLC50 decrease are oxygen (O), and presence of double bond (¼). More detailed assessment of activities of the studied metals is impossible, since each split leads to a specific role of each metal, that depends on distribution of other metals. In other words, in order to estimate functionality of a metal for the endpoint, the metal should take place in a group of various nanoparticles. Hence, data examined in this work do not provide possibility for such estimation. However, attributes which are common for all six splits allows for the mechanistic interpretation of the results of the described approach. The measure of statistical quality of attributes which are involved to build up model can be estimated as the following:

⎧ |PTRN (A k ) − PCLB (A k ) | ⎪ , defect(A k ) = ⎨ NTRN (A k ) + NCLB (A k ) ⎪ ⎩1, otherwise

if NCLB (A k ) > 0 (9)

where the PTRN(Ak) is the probability of presence of the SAk in SMILES of the training set, i.e.

PTRN (A k ) = NTRN (Ak)/NTRN The PCLB(Ak) is the probability of presence of the Ak in SMILES of the calibration set, i.e.

PCLB (A k ) = NCLB (Ak)/NCLB The NTRN(Ak) is the number (frequency) of SMILES which contain Ak in the training set; The NTRN is the total number of SMILES in the training set; The NCLB(Ak) is the number (frequency) of SMILES which contain Ak in the calibration set (Table 3); The NCLB is the total number of SMILES in the calibration set.

limit is the following:

Defect − quasi − SMILES < 2*Defect−quasi−SMILES

(11)

where Defect−quasi−SMILES is average of the Defect-quasiSMILES for the training set. The inequality 6 should be classified as a semi-qualitative criterion, because the large value of the Defect-quasi-SMILES is not the guarantee, the prediction for substance represented by the quasi-SMILES will be poor, and vice versa, the small value of the Defect-quasi-SMILES is not the guarantee that the prediction will be good. However, “probabilistic” meaning of this criterion is quite transparent. The calculations with Eq. (11) were carried out with the CORAL software (CORAL, 2014). The percentage of the domain of applicability, according to the analysis revealed by this software is 100%, 76%, 76%, 71%, 71%, and 71%, for splits 1, 2, 3, 4, 5, and 6 respectively. It is traditional logic to define 50% as a threshold for estimation of some quality able to be 100%. Consequently, a split that is characterized by domain of applicability of more than 50% should be considered as satisfactory. Thus, six examined splits can be estimated as “satisfactory splits”. This work is theoretical one: we try to answer question: “whether it is possible to prepare this kind of models or not?”. Table 5 indicates that all models are more or less satisfactory. The next question is “how to extract most reliable prediction for an external unknown metal oxide nanoparticle?” On one hand, we believe that statistical characteristics of external validation set are more important criterion to estimate a model than the statistical characteristics for the “visible” training set. This conception leads to selection of model calculated with Eq. (7) for split 5 (Table 5). On the other hand we believe that for the practical prediction of the endpoint for an external unknown metal oxide nanoparticle preferable estimation can be defined as the average over all six predictions with using Eqs. (3)–(8).

3.1. The logic If the probability of an attribute to be in the training set is equal to the probability of the attribute in the calibration set it is the ideal situation and the defect is zero. However, this situation is not typical, i.e. the difference between the probability of an attribute in the training set and the probability of the attribute in the calibration set is not zero. Under such circumstances, the frequency of an attribute in the training set and in the calibration set also should be taken into account: if these are small then the defect of the attribute must be larger. Finally, if Ak is absent in the calibration set, the defect(Ak) is maximal. Thus, the measure calculated with Eq. (9) can be used for estimation of the statistical significance of Ak (Table 3) involved in building up model. 3.2. The criterion definition of domain of applicability for a quasiSMILES Having the numerical data on the defect(Ak) one can estimate reliability of the model for a representation of metal oxide nanoparticles by a quasi-SMILES (Table 2): the basic hypothesis is “the probability of the quasi-SMILES to be in the domain of applicability is inversely proportional of sum of Ak-defects

Defect − quasi − SMILES =

∑ defect(A k )

(10)

If the Defect-quasi-SMILES calculated with Eq. (5) is equal to zero this is an ideal situation. However in praxis, the ideal situation is rare. Consequently, one should define some limitation for the Defect-quasi-SMILES value. The possible selection for the

4. Conclusions Quasi-QSAR approach was used in this work. Since these quasiQSARs are oriented to metal oxide nanoparticles they can be named nano-QSARs. Optimal descriptors calculated with eclectic data represented by the quasi-SMILES (i.e. atomic composition and presence/absence of photo-inducing) give statistically robust model for cytotoxicity of metal oxide nanoparticles (Table 5). However, the distribution of data into the training, calibration, and validation sets has significant influence upon the predictive potential of these models. Development of the described models was carried out in accordance with OECD principles (OECD, 2007). The probabilistic approach to define the domain of applicability in accordance with the distribution of available data into the “visible” training and external “invisible” validation sets for the nano-QSAR is suggested.

Acknowledgments We thank EC project PreNanoTox (Contract 309666), the EC project NanoPUZZLES (Project Reference: 309837), the EU project PROSIL funded under the LIFE program (project LIFE12 ENV/IT/ 000154), the National Science Foundation (NSF/CREST HRD0833178), and EPSCoR (Award #: 362492-190200-01/NSFEPS090378) for financial support. We also express our gratitude to Dr. L. Cappellini, Dr. G. Bianchi and Dr. R. Bagnati for valuable consultations on the computer science.

A.P. Toropova et al. / Ecotoxicology and Environmental Safety 112 (2015) 39–45

References ACD/I-LAB, 〈http://www.acdlabs.com〉, 2014. Afantitis, A., Melagraki, G., Koutentis, P.A., Srimveis, H., Kollias, G., 2011. Ligandbased virtual screening procedure for the prediction and the identification of novel b-amyloid aggregation inhibitors using Kohonen maps and Counterpropagation Artificial Networks. Eur. J. Med. Chem. 46, 497–508. Balaban, A.T., Khadikar, P.V., Supuran, G.T., Thakur, A., Thakur, M., 2005. Study on supramolecular complexing ability vis-à-vis estimation of pKa of substituted sulfonamides: dominating role of Balaban index. Bioorg. Med. Chem. 15, 3966–3973. Bhhatarai, B., Gang, R., Gramatica, P., 2010. Are mechanistic and statistical QSAR approaches really different? MLR studies on 158 cycloalkyl-pyranones. Mol. Inf 29, 511–522. Cohen, Y., Rallo, R., Liu, R., Liu, H.H., 2013. In silico analysis of nanomaterials hazard and risk. Acc. Chem. Res. 46, 802–812. CORAL, 〈http//www.insilico.eu/coral〉, 2014. Cosentino, U., Moro, G., Bonalumi, D., Bonati, L., Lasagni, M., Todeschini, R., Pitea, D., 2000. A combined use of global and local approaches in 3d-QSAR. Chemom. Intell. Lab. 52, 183–194. Das, K..Ch., Trinajstic, N., 2010. Comparison between first geometric-arithmetic index and atom-bond connectivity index. Chem. Phys. Lett. 497, 140–151. Duchowicz, P.R., Mirifico, M.V., Rozas, M.F., Caram, J.A., Fernandes, F.M., Castro, E.A., 2011. Quantitative structure – spectral property relationships for functional groups of novel 1.2.5-thiadiazole compounds. Chemom. Intell. Lab. 105, 27–37. Fourches, D., Pu, D., Tassa, C., Weissleder, R., Shaw, S.Y., Mumper, R.J., Tropsha, 2010. A quantitative nanostructure–activity relationship modelling. ACS Nano 4, 5703–5712. Furtula, B., Gutman, I., 2011. Relation between second and third geometricarithmetic indices of trees. J. Chemom. 25, 87–91. Golbraikh, A., Tropsha, A., 2002. Beware of q2!. J. Mol. Graph. Model. 20, 269–276. Ivanciuc, T., Ivanciuc, O., Klein, D.J., 2006. Modeling the bioconcentration factors and bioaccumulation factors of polychlorinated biphenyls with posetic quantitative super-structure/activity relationships (QSSAR). Mol. Divers. 10, 133–145. Leszczynski, J., 2010. Nano meets bio at the interface. Nat. Nanotechnol. 5, 633–634. Liu, R., Rallo, R., Weissleder, R., Tassa, C., Shaw, S., Cohen, Y., 2013. Nano-SAR development for bioactivity of nanoparticles with considerations of decision boundaries. Small 9, 1842–1852. Melagraki, G., Afantitis, A., 2013. Enalos KNIME nodes: exploring corrosion inhibition of steel in acidic medium. Chemom. Intell. Lab. Syst. 123, 9–14. Mitra, I., Saha, A., Roy, K., 2010. Exploring quantitative structure–activity relationship studies of antioxidant phenolic compounds obtained from traditional Chinese medicinal plants. Mol. Simul. 13, 1067–1079. OECD, 2007. Guidance Document on the Validation of (Quantitative) Structure– Activity Relationships (Q)SARs] Models, ENV / JM / MONO (2007) 2. 〈http:// www.oecd.org/dataoecd/55/35/38130292.pdf〉. Ojha, P.K., Roy, K., 2011. Comparative QSARs for antimalarial endochins: Importance of descriptor-thinning and noise reduction prior to feature selection. Chemom. Intell. Lab. Syst. 109, 146–161. Pathakoti, K., Huang, M.-J., Watts, J.D., He, X., Huey-Min Hwang, H.-M., 2014. Using experimental data of Escherichia coli to develop a QSAR model for predicting the photo-induced cytotoxicity of metal oxide nanoparticles. J. Photochem. Photobiol. A 130, 234–240. Petrova, T., Rasulev, B.F., Toropov, A.A., Leszczynska, D., Leszczynski, J., 2011. Improved model for fullerene C60 solubility in organic solvents based on quantum-chemical and topological descriptors. J. Nanopart. Res. 13, 3235–3247. Randic, M., 1991. Novel graph theoretical approach to heteroatoms in quantitative structure–activity relationships. Chemom. Intell. Lab 10, 213–227. Tetko, I.V., Jaroszewicz, I., Platts, J.A., Kuduk-Jaworska, J., 2008. Calculation of lipophilicity for Pt(II) complexes: experimental comparison of several methods. J. Inorg. Biochem. 102, 1224–1237.

45

Toropov, A.A., Leszczynska, D., Leszczynski, J., 2007. Predicting thermal conductivity of nanomaterials by correlation weighting technological attributes codes. Mater. Lett. 61, 4777–4780. Toropov, A.A., Toropova, A.P., Benfenati, E., Leszczynska, D., Leszczynski, J., 2010a. SMILES-based optimal descriptors: QSAR analysis of fullerene-based HIV-1PR inhibitors by means of balance of correlations. J. Comput. Chem. 31, 381–392. Toropov, A.A., Toropova, A.P., Benfenati, E., Leszczynska, D., Leszczynski, J., 2010b. InChI-based optimal descriptors: QSAR analysis of fullerene[C60]-based HIV1PR inhibitors by correlation balance. Eur. J. Med. Chem. 45, 1387–1394. Toropov, A.A., Toropova, A.P., Benfenati, E., Gini, G., Puzyn, T., Leszczynska, D., Leszczynski, J., 2012c. Novel application of the CORAL software to model cytotoxicity of metal oxide nanoparticles to bacteria Escherichia coli. Chemosphere 89, 1098–1102. Toropov, A.A., Toropova, A.P., Puzyn, T., Benfenati, E., Gini, G., Leszczynska, D., Leszczynski, J., 2013a. QSAR as a random event: modeling of nanoparticles uptake in PaCa2 cancer cells. Chemosphere 92, 31–37. Toropov, A.A., Toropova, A.P., Benfenati, E., Gini, G., Leszczynska, D., Leszczynski, J., 2013b. CORAL: QSPR model of water solubility based on local and global SMILES attributes. Chemosphere 90, 877–880. Toropov, A.A., Toropova, A.P., 2014. Optimal descriptor as a translator of eclectic data into endpoint prediction: mutagenicity of fullerene as a mathematical function of conditions. Chemosphere 104, 262–264. Toropova, A.P., Toropov, A.A., Benfenati, E., Leszczynska, D., Leszczynski, J., 2010. QSAR modeling of measured binding affinity for fullerene-based HIV-1PR inhibitors by CORAL. J. Math. Chem. 47, 959–987. Toropova, A.P., Toropov, A.A., Benfenati, E., Gini, G., Leszczynska, D., Leszczynski, J., 2011. CORAL: quantitative structure–activity relationship models for estimating toxicity of organic compounds in rats. J. Comput. Chem. 32, 2727–2733. Toropova, A.P., Toropov, A.A., Rasulev, B.F., Benfenati, E., Gini, G., Leszczynska, D., Leszczynski, J., 2012a. QSAR models for ACE-inhibitor activity of tri-peptides based on representation of the molecular structure by graph of atomic orbitals and SMILES. Struct. Chem. 23, 1873–1878. Toropova, A.P., Toropov, A.A., Martyanov, S.E., Benfenati, E., Gini, G., Leszczynska, D., Leszczynski, J., 2012b. CORAL: QSAR modeling of toxicity of organic chemicals towards Daphnia magna. Chemom. Intell. Lab. Syst. 110, 177–181. Toropova, A.P., Toropov, A.A., Puzyn, T., Benfenati, E., Leszczynska, D., Leszczynski, J., 2013. Optimal descriptor as a translator of eclectic information into the prediction of thermal conductivity of micro-electro-mechanical systems. J. Math. Chem. 51, 2230–2237. Toropova, A.P., Toropov, A.A., 2013. Optimal descriptor as a translator of eclectic information into the prediction of membrane damage by means of various TiO2 nanoparticles. Chemosphere 93, 2650–2655. Toropova, A.P., Toropov, A.A., 2014. CORAL software: prediction of carcinogenicity of drugs by means of the Monte Carlo method. Eur. J. Pharm. Sci. 52, 21–25. Veselinović, A.M., Milosavljević, J.B., Toropov, A.A., Nikolić, G.M., 2013a. SMILESbased QSAR model for arylpiperazines as high-affinity 5-HT1A receptor ligands using CORAL. Eur. J. Pharm. Sci. 48, 532–541. Veselinović, A.M., Milosavljević, J.B., Toropov, A.A., Nikolić, G.M., 2013b. SMILESBased QSAR models for the calcium channel-antagonistic effect of 1.4-dihydropyridines. Arch. Pharm. 346, 134–139. Weininger, D., 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36. Weininger, D., Weininger, A., Weininger, J.L., 1989. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Comput. Sci. 29 (1989), 97–101. Weininger, D., 1990. Smiles. 3. Depict. Graphical depiction of chemical structures. J. Chem. Inf. Comput. Sci. 30, 237–243.

Optimal descriptor as a translator of eclectic data into prediction of cytotoxicity for metal oxide nanoparticles under different conditions.

The Monte Carlo technique has been used to build up quantitative structure-activity relationships (QSARs) for prediction of dark cytotoxicity and phot...
1015KB Sizes 0 Downloads 4 Views