RiskAnalysis, Vol. 11, No. 3, 1991

Estimation of Maximum Tolerated Dose for Long-Term Bioassays from Acute Lethal Dose and Structure by Q S A R Viay K. Gombar,' Kurt Enslein,' Jeffrey B. Hart,' Benjamin W. Blake,' and Harold H. Borgstedt' Received May 17, 1990; revired September 10, I990

A quantitative structureactivity relationship (QSAR) model has been developed to estimate maximum tolerated doses (h4TD) from structural features of chemicals and the corresponding oral acute lethal doses (LD,) as determined in male rats. The model is based on a set of 269 diverse chemicals which have been tested under the National Cancer Institute/National Toxicology Program (NCU NTP) protocols. The rat oral LD,, value was the strongest predictor. Additionally, 22 structural descriptors comprising nine substructural MOLSTAO")keys, three molecular connectivity indices, and sigma charges on 10 molecular fragments were identified as endpoint predictors. The model explains 76% of the variance and is significant (F = 35.7) at p c 0.0001 with a standard error of the estimate of 0.40 in the log(l/mol) units used in Hansch-type equations. Cross-validation showed that the difference between the average deleted residual square (0.179) and the model residual square (0.160) was not significant (f = 0.98). KEY WORDS: Maximum tolerated dose; MTD; LD,;

QSAR, chronic toxicity; acute toxicity.

is not expected Q priori.(') It has recently been suggested,(*-') however, that a significant relationship may, in fact, exist, and possible explanations have been suggested.(2-8) Two findings have become evident in the NCUNTP bioassays: (a) of the tested induced tuand (b) in all of the treated chemicals only rarely increased tumor incidence.(g)Zeise et and Bernstein er have that the maximum dose and potency are evidently related. The present study is an attempt to develop a quantitative relationship between the MTD of a compound and its acute t o ~ c i t yas expressed in its oral mso-The usefulness of such a correlative model would lie in ability to use acute toxicity data to predict MTDs, which are more difficult and costly to determine experimentally and predict carcinogenic potency therefrom.

1. INTRODUCTION

To insure that there is adequate survival of experimental animals to their nominal lifetime on the one hand, and development of a sufficient number of tumors at acceptable cost on the other, oncogene& studies generally employ the maximum tolerated dose (m), the dose which can be administered to he animals without causing an unacceptable number of premature deaths or serious adverse effects which would jeopardize the conduct or evaluation of the studies. Until recently, tumor formation after chronic exposure to a chemical and death after administration of acute lethal doses of the same chemical have generally been considered to be unrelated because a formal relationship between carcinogenic potency and acute toxicity HDi, 183 East Main Street, Rochester, New York 14604.

509

omL-433y911o9oM)5o9S06.50/10 1991 society for Risk Analysis

Gombar et al.

510 2. METHODS

-

Table I. Database Compounds LD5, and MTD Estimates and Values (ctd) ~~

2.1. Data Sets

2.1.1. Maximum Tolerated Doses A recent NCI/NTP report(l0)was the most important source of data for MTDs. Since extrapolations from inhaled concentrations to oral dosages are uncertain, only oral MTD values for gavage and incorporation in feed or drinking water were selected. Only results on Fischer344 and Osborne-Mendel male rats were included to minimize the potential influence of strain and sex differences. Since it is not possible to compute structure descriptors for chemicals with uncertain structures or for mixtures, only MTDs of single, unequivocal chemical entities were included. 2.1.2. Acute Lethal Doses Experimentally determined LDsos are influenced by age, sex, strain, diluents, and the health of the animals; they are also afflicted by substantial interlaboratory variability. Even the statistical methods used for computing the LD,,s influence the values. It is, therefore, not uncommon for several LD,, values reported for the same compound to differ by a factor of five or even more. For the purposes of this study, preference was given to the elaborate compilation by Zeise,(”) since the Registry of Toxic Effects of Chemical Substances (RTECS) lists only the lowest reported LD,,, irrespective of reliability of the assay, statistical significance, protocol, etc. Oral LD5, data in adult rats were found for 124 of the 358 chemicals (coded “Z” in Table I). LD,, values for 123 additional compounds were identified in RTECS (coded “R”), two were taken from Marhold(12)(coded “M’), and three were retrieved using the CIS (Chemical Information Service, Baltimore, Maryland) on-line information service (coded “(2”). Two more LDso values were identified in IARC Monographs 39(13)and 41.(14) Enslein et al. (lS)had previously developed a model for the estimation of the rat oral LD,,. A new quantitative structure-activity model has now been developed (unpublished data) to estimate the acute LDs0 based on a training set of over 2000 diverse structures. This model was used to estimate the LD,, values of the compounds for which no experimental LD,, data were available. Estimates could be generated with a high degree of confidence for 36 compounds; these were included in the

CAS no.

Name

MTD MTD mgkg m@g bioas- LDsO estimate say” mgkf

24 28.89 50-29-3 DDT 50-33-9 Phenylbutazone 179 100 50-55-5 Reserpine 3 0.45 1114 2250 50-81-7 GAscorbic acid 51-03-6 Piperonyl Butoxide 537 450 52-68-6 Dimethyl-l-OH-2,2,250 23 Trichloroethylphosphonate 54-31-9 Furosemide 420 31.5 55-38-9 Fenthion 4 0.90 56-23-5 Carbon tetrachloride 50 2 56-38-2 Parathion 2 2.83 56-72-4 Coumaphos 3 0.90 57-06-7 Ally1 isothiocyanate 4 25 57-41-0 5,5-Diphenylhydantoin 111 108 57-66-9 Probenecid 226 400 13 21.24 58-89-9 Lindane 59-87-0 Nitrofurazone 131 27.9 9 13.95 60-51-5 Dimethoate 60-57-1 Dieldrin 1 2.25 110 56.25 61-76-7 Phenylephrie.HC1 62-73-7 Diochlorvos 13 8 1113 1125 64-75-5 Tetracycline 64-77-7 Tolbutamide 611 1080 67-20-9 Nitrofurantoin 63 112.5 67-66-3 Chloroform 181 80 67-72-1 Hexachloroethane 300 20 69-65-8 D-Mannitol 975 2250 70-30-4 Hexachlorophene 29 6.75 71-43-2 Benzene 258 200 71-55-6 l,l,l-Trichloroethane 964 750 72-20-8 Endrin 1 0.22 72-43-5 Methoxychlor 47 38.03 72-54-8 TDE 84 148.23 72-55-9 DDE 54 37.75 72-56-0 p-p’-Ethyl-DDD 117 315 73-22-3 LTryptophan 429 2250 75-09-2 Dichloromethane 220 1000 75-25-2 Bromofom 149 200 75-27-4 Bromodichloromethane 150 100 75-34-3 1,l-Dichloroethane 212 764 75-35-4 Vinylidene chloride 30 5 75-47-8 Iodofom 61 141 75-65-0 tert-Butanol 480 900 75-69-4 Trichlorofluoromethane 918 977 76-01-7 Pentachloroethane 266 150 76-06-2 Chloropicrin 63 26 76-44-8 Heptachlor 8 3.51 77-65-6 Carbromal 174 112.5 77-79-2 3-Sulfolene 334 372 78-34-2 Dioxathion 4 8.1 78-42-2 t r i s- (2- E t hy I h exy I) 170 4000 phosphate 78-59-1 Isophorone 523 500 78-87-5 1,2-Dichloropropane 254 125

113 245 420 11900 7500 150

Z R R R Z R

R Z Z Z Z R C R Z R R Z R Z R R R Z Z R Z R Z Z Z Z 880 2 8170 Z 1634 Z 2136 C 1147 R 916 R 1308 Z 200 C 355 R 3500 R 993 E 4000 E 250 z 100 Z 316 Z 2830 Z 118 Z 37000 R

2600 215 2920 13 41 148 2195 1600 88 590 152 69 350 80 6443 2490 604 1186 6000 13500 66 4894 12300 17 5000 880

2330 2196

R R

MTD from LD,, by QSAR

511

Table 1. Continued

CAS no.

Name

Table I. Continued

MTD MTD mgflrg m a g bioas- LD5, estimate say” m g k d

79-00-5 1,1,2-Trichloroethane 79-01-6 Trichloroethylene 79-11-8 Chloroacetic acid 79-34-5 1,1,2,2-Tetrachloroethane 80-05-7 Bisphenol A 80-08-0 Dapsone 81-11-8 4,4’-Diamino-2,2’-Stilbenedisulfonic Acid 82-28-0 1-Amino-2-methyl-anthraquinone 82-68-8 Pentachloronitrobenzene 83-79-4 Rotenone 85-44-9 Phthalic anhydride 85-68-7 Butyl benzyl phthalate 86-30-6 N-Nitrosodiphenylamine 86-50-0 Azinphosmethyl 86-57-7 1-Nitronaphthalene 87-29-6 Cinnamylanthranilate 87-62-7 2,dXylidene 88-06-2 2,4,6-Trichlorophenol 88-96-0 Phthalamide 89-25-8 I-Phenyl-3-methyl-5pyrazolone 90-94-8 Michler’s ketone 91-23-6 o-Nitroanisole 91-64-5 Coumarin 91-84-9 Mepyramine 91-93-0 3,3’-Dimethoxybenzidine-4,4’-diisocyanate 94-20-2 Chlorpropamide 94-52-0 6-Nitrobenzimadole 95-06-7 Sulfallate 95-14-7 1H-Benzotriazole 95-50-1 1,2-Dichlorobenzene 95-74-9 3-Chloro-p-toluidine 95-79-4 5-Chloro-o-toluidine 95-80-7 2,CDiamino toluene 95-83-0 4-Chloro-o-phenylenediamine 96-09-3 Styrene oxide 96-12-8 1,2-Dibromo-3-chloropropane 96-18-4 1,2,3-Trichloropropane 96-45-7 2-Thioimidazolidone 96-48-0 y-Butyrolactone 96-69-5 4,4-Thio-bis-(6-tert-buty1)-m-cresol 97-53-0 Eugenol 97-77-8 Tetraethylthiuram d i d tide 98-01-1 Furfural

Z

119 92 521 lo00 103 30 50 108

835 7159 250

Z R Z

247 90 48 54 320 1125

4040 630 5200

Z Z E

162

7700

E

90

580

161

393.75

1650

R

2 371 67 100

3.37 675 540 180

60 4020 2330 1650

C Z Z Z

6 7.02 57 81 813 1350 86 135 170 450 465 1350 70 225

26 120 5000 840 820 1800 3500

50 22.5 104 90 98 100 36 135 346 1980

1600 740 293 318 2000

877 147 4 198 92 195 96 56 150

270 2390 225 500 18.45 850 618.75 1000 120 500 147.1 1500 225 464 7.92 260 450 916

115 36

550 219

2000 300

C

60 20 304 40

30 11.25 225 112.5

320 265 1800 752

R R R E

273 80 41

270 27 60

2680 4950 65

Z

R Z R

CAS no.

Name

MTD MTD m&3 mg/kg bioas- LD,, estimate s a p mglknb

98-85-1 a-Methyl-benzyl alcohol 99-55-8 5-Nitro-o-toluidine 99-56-9 4-Nitro-o-phenylenediamine 99-59-2 5-Nitro-o-anisidine 100-40-3 4-Vinylcyclohexene 100-42-5 Styrene 100-44-7 a-Chlorotoluene 100-51-6 Benzyl alcohol 100-52-7 Benzaldehyde 101-05-3 Anilazine 101-54-2 N-Phenylp-phenylenediarnine 101-61-1 4,4‘-Methylene-bis(N,N-dimethyl)-benzenamine 101-80-4 4,4‘-Oxydianiline 101-90-6 Diglycidyl resorcinol 102-50-1m-Cresidine 102-96-5 b-Nitrostyrene 103-23-1Di-(2-ethylhexyl)adipate 103-33-3Azobenzene 103-85-5 1-Phenyl-2-thiourea 103-90-2p-Hydroxyacetanide 105-11-3 p-Quinonedioxime 105-55-5N,N’-Diethylthiourea 106-60-2 Caprolactam 105-87-3 Geranyl acetate 106-46-7 1,4-Dichlorobenzene 106-47-8 p-Chloroaniline 106-93-4 1,2-Dibromoethane 107-05-1Ally1 chloride 107-06-2 1,2-Dichloroethane 108-30-5 Succinic anhydride 108-46-3m-Dihydroxybenzene 108-60-1bis-(2-Chloro-l-methylethyl) ether 108-78-1Melamine 108-90-7 Chlorobenzene 108-94-1 Cyclohexanone 108-95-2 Phenol 109-69-3 n-Butyl chloride 110-80-5 Ethoxyethanol 110-86-1 Pyridine 111-42-2 Diethanolamine 115-28-6 Chlorendic acid 115-29-7 Endosulfan 115-32-2Diwfol 115-96-8 2-Chloro-ethanol phosphate 116-06-3Aldicarb 117-39-5 Quercitin 117-79-32-Aminoanthraquinone

338 122 143 97 400 1752 124 671 130 21 42 26

750 4.5 33.75 360 400 2000 30 400 400 45 54

33.75

400

R

574 681

Z Z

704 2563

5000 1231 1230 1300 2700 464 500

R

42 22.5 725 13 2.25 2570 79 0.16 1100 232 300 1400 33290 590 1125

18 89 1000 8 20 5.4 2400 438 270 464 87 33.75 29 11.25 316 1650 518 337.5 6330 3661 2000 500 121 300 310 76 22.5 125 27 41 142 77 700 770 113 95 1510 356 100 301 81 225 240 56 200 565 275 313 397 298 261 195 103 80 11 100 49

202.5 120 315 225 120 2000 40 50 56.25 42.84 42.39 88

1 0.27 57 1800 100 310.5

3200 2910 1535 480 2670 3000 891 710 1770 43 1100 1230 1 161 7800

2

Z R Z 2

Z R R X Z Z Z R R Z R R R Z R R R R R 2 Z

R 2

C E

Gombar et al. Table I. Continued

Table I. Continued

CAS no.

Name

MTD MTD mgncg mgkg bioas- LDs0 estimate say” m g k g

117-81-7 Di- (2-ethyl hexy I) phthalate 118-92-3 0-Anthranilic acid 119-34-6 4-Amino-2-nitrophenol 119-53-9 Benzoin 119-84-6 3,4-Dihydrocoumarin 119-90-4 3,3 ’-Dimethoxybenzidine 119-93-7 4,4’-Diamino-3,3‘-dimethylbiphenyl 120-32-1 Chlorophene 120-61-6 Dimethyl terephthalate 120-62-7 Piperonyl sulfoxide 120-71-8 p-Cresidine 120-83-2 2,4-Dichlorophenol 121-14-2 2,4-Dinitrotoluene 121-66-4 2-Amino-5-nitrothiazole 121-69-7 N,N-Dimethylaniline 121-75-5 Malathion 121-79-9 Propyl gallate 121-88-0 2-Amino-5-nitrophenol 122-66-7 Hydrazobenzene 123-31-9 p-Dihydroxybenzene 123-91-1 1,4-Dioxane 124-48-1 Chlorodibromoethane 126-72-7 tris-(2,3-Dibromopropyl) phosphate 126-92-1 Na(2-ethylhexyl) Alcohol sulfate 127-18-4 Tetrachloroethylene 127-69-5 Sulfisoxazole 128-37-0 Butylated hydroxytoluene 129-15-7 2-Methyl-1-nitroanthraquinone 131-17-9 Diallyl phthalate 132-98-9 Penicillin VK 133-06-2 Captan 133-90-4 Chloramben 134-72-5 Ephedrine sulfate 135-20-6 Cupfermn 135-88-6 n-Phenyl-2-naphthylamine 136-40-3 Phenazopyridine.HC1 136-77-6 4-Hexylresorcinol 137-17-7 2,4,5-Trimethylaniline 139-13-9 Nitrilotriacetic acid 139-65-1 4,4’-Thiodianiline 139-94-6 Nithiazide 140-11-4 Benzyl acetate 140-49-84’-(Chloroacetyl)-acetanilide 140-56-7 Fenanosulf 140-88-5 Ethyl acrylate 142-04-1Aniline.HCI

889

30600

R

346 1350 4600 284 112.5 1470 96 11.25 1600 1460 158 600 42 14.85 1920

Z Z E R R

30

540

CAS no.

6.75

404

R

127 146 277 90 135 108 221

120 225 135 450 225 9 27

1700 4390 2000 1450 580 270 1100

C Z Z

630 79 496 224 33 85 457 136 21

30 180 540 200 13.5 50 450 80 4.5

1410 1375 2600 1100 301 320 4200 848 5240

R Z R E Z R

580

900

4000

R

703 561 173

750 400 270

12982 loo00 1670

Z R Z

232

54

7400

E

88 100 770 892 1000 1040 870 272.25 1200 581 900 5620 11.25 404 404 304 180 250 385 225 8730

R R Z Z R Z R

84 273 102 356 52 42 6229 312 32 47 552

Z R Z E

Z

R Z

337.5 403 550 125 1250 36 1470 675 135 1100 56.25 2150 500 2490 90 2150 45 200 270

60 1020 1070

Z R Z

Name

MTD MTD mgncg mgkg bioas- LD,, estimate say“ m&f

147-24-0 Diphenhydramine 54 148-24-38-Hydroxyquinoline 227 149-30-4 2-Mercaptobenzothia- 182 zole 150-38-9 Trisodium EDTA 517 Z 13 150-68-5 Monuron 428 151-21-3Dodecylsulfate.Na 156-10-5 p-Nitrosodiphenylam- 136 ine 1 298-00-0 Methyl parathion 298-59-9 a-Phenyl-2-piperidine- 28 acetic acid methyl ester.HC1 298-81-7 9-Methoxy-7H-furo- 134 (3,2-g)-(l)-benzopyran-7-one 303-34-4 Lasiocarpine 5 110 303-47-9 Ochratoxin-A 3 309-00-2 Aldrin 14 315-18-4 Mexacarbate 22 333-41-5 Diazinon 389-08-2 Nalidixic acid 311 69 396-01-0 2,4,7-Triamino-6phenyl-pteridine 434-13-9 Lithocholic acid 101 469-21-6 Doxylamine 88 33 486-12-4 Triprolidine 504-88-1 3-Nitropropionic acid 231 83 510-15-6 Chlorobenzilate 146 512-56-1 Trimethyl phosphate 32 513-37-1 Dimethylvinylchloride 351 518-47-8 Fluorescein.Na 359 536-33-4 Ethionamide 41 542-75-6 1,3-Dichloropropene 28 556-52-5 Glycidol 563-47-3 3-Chloro-2-methylpro- 120 pene 506 597-25-1 Dimethylmorpholinophosphoramidate 609-20-1 2,6-Dichloro-p-phenyl- 135 enediamine 619-17-0 4Nitroanthranilicacid 153 24 624-18-0 p-Phenylenediamine.DiHC1 630-20-6 l,l,1,2-Tetrachloroe- 101 thane 103 636-21-5 o-Toluidine.HC1 828-00-2 Dimethoxane 544 30 834-28-6 1-Phenethyl-biguanide.HC1 136 842-07-9 CI Solvent yellow 14 34 924-42-5 n-Methyloacrylamide 1031 961-11-5 Tetrachlorvinphos 445 968-81-0 Acetohexamide 1116-54-7 N-Nitrosoimino-di593 ethanol 205 1156-19-0 Tolazamide

28.17 135 750

500 1200 1680

R R R

350 67.5 54 225

2150 1480 1288 2140

R R R Z

1.8 45

14 367

Z R

75

791

R

110 20 39 37 250 1160 400

Z R

1.35 210 5.4 18.81 36 180 27

z Z

z R R

500 3900 z 90 357 E 90 153 E 0.85 1700 E 134.78 1130 Z 100 3437 Z 200 150 M 225 6721 R 135 1320 R 50 250 R 75 420 R 150 580 M 600

5910

R

90

700

Z

675 56.25

640

80

Z Z

250

670

I1

270 125 36

900 1930 938

Z

22.5 12 382.5 900 1125

1100 474 4000 5000 7500

E R R R R

450

1600

E

R R

MTD from LD,, by QSAR

513

Table I. Continued

CAS no.

Name

MTD MTD mgflrg mgkg bioas- LD,, estimate say” mgk&

1212-29-9N,N‘-Dicyclohexylthiourea 1582-09-8Trifluralin 1596-84-5 Daminozide 1634-78-2Malaoxon 1777-84-0 3-Nitrop-acetophenetidine 1825-21-4 Pentachloroanisole 1836-75-5 Nitrofen 1897-45-6Chlorothalonil 1918-02-1Picloram 1955-45-9 Pivalolactone 2164-17-2 Fluometuron 2243-62-1 1,5-Naphthalenediamine 2244-16-8 (S)-( +)p-Mentha-6,8dien-Zone 2425-85-6 Cl pigment red 3 2432-99-7 11-Aminoundecanoic acid 2438-88-2 2,3,5,6-Tetrachloro-4nitroanisole 2475-45-8 Disperse blue 1 2489-77-2 Trimethylthiourea 2784-94-3 HC blue no. 1 2832-40-8 Cl Disperse yellow 3 2835-39-4 Ally1 i s d e r a t e 2871-01-4 HC red no. 3 3567-69-6 CI acid red 14 5131-60-2 4-(310x1-m-phenylenediamine 5307-14-2 2-Nitrop-phenylenediamine 5989-27-5 d-Limonene 6109-97-3 3-Amino-9-ethylcarbazole 6369-59-1 2,5-Toluenediamine sulfate 6471-49-4 CI pigment red 23 6959-47-3 2-(Chloromethyl)-pyridine.HC1 6959-48-4 3-(Chloromethyl)-pyridine.HC1 13171-21-6Phosphamidon 13552-44-84,4’-Methylenedianiline.DiHC1 15356-70-4dl-Menthol 17026-81-23-Amino-4-ethoxyacetanilide 20265-97-8 Arochlor 1254 22628-22-8 Sodium azide 33229-34-4 HC blue no. 2

33 2250

500

R

115 663 47 203

360 450 45 171

1400 8230 158 664

E Z Z Z

40 64 659 263 257 45 171

40 437 270 640 455.67 loo00 669.37 6000 300 1470 11.25 8910 450 921

E Z Z Z Z Z E

375

4

R

174 1125 342 675

1000 4200

E E

database and are coded “E” in Table I. (See Ref. 15 for a detailed description of the estimation process and its limitations, including the assignment of degrees of confidence.) Two hundred seventy-one compounds were identified for which both the MTD and an LD,, value were available. 2,3,7,8-Tetrachloro-dibenzo-p-dioxin (CAS no. 1746-01-6), and 1,2,3,4,7,&hexachloro-dibenzo-p-dioxin are extremely toxic; these compounds were therefore not considered for model development to avoid distributional problems. The names and CAS numbers of the remaining 269 compounds which were used in model development, along with their h4TDs and experimental or estimated LD,, values in mgkg, and the sources of the data are listed in Table I. 2.2. Structure Descriptors

13

Complete representations of molecular structures entail the quantitative description of the spatial arrangement of all atoms in the minimum energy conformation and the corresponding electron distribution. This can be achieved by means of full-scale rigorous quantum mechanical methods, but this is not practically feasible for such a large number of molecules as in the present study. Recourse was, therefore, taken to descriptors which can be derived from knowledge of patterns of connectedness and some intrinsic properties of the constituent atoms. The following topology-based structure quantifiers were computed.

48

5.4

260

Z

90 32 549 232 387 438 742 136

225 22.5 135 450 62 500 562.2 180

6000 316 4000 1700 230 2300 10000 915

E Z E E R E Z E

380

49.5

3080

Z

624 87

150 80 Z

4400 234

R R

2.2.1. Shape Indices

29

90

98

Z

156 2250 113 150

1300 316

E Z

316

Z

Molecular shape is known to influence various physical and biological properties. Numerical descriptions of shape using the geometry of a molecule have been attempted by several investigators, and Kier(”-”) has recently defined kappa indiceswhich encode the shapes of molecules at a simpler topological level. In a study on various physicochemical properties of alkylbenzenes, Gombar and Jaidm) have extended this approach to the seventh order and have found that higher order indices are useful in the distinction among isomeric molecules. In this context, order refers to the number of atoms in forming a path; the order is the number of atoms minus one. In a recent study, Gombar and Enslein(21)have shown a correlation between shape indices and the logarithms of the water/n-octanol partition coefficient (log P),which is frequently used as a parameter in QSAR studies.

112

150

2 59

7.2 13.5

24 830

IZ

Z

402 126

337.5 675

3180 631

Z Z

93 10 864

270 27 450

1400 14 7300

Z C

E

“The source for MTD bioassay values is the NCI/NTP Report.(l0)Values marked Z are from Zeise.(ll) T h e sources identified as: C = CIS (Chemical Information Service, 1988, Baltimore, Maryland); E = estimated value; I1 = IARC (20); I2 = IARC (39); M = Marhold(’”; Z = Zeise.‘”)

Gombar et al.

514

In the present study, 14 kappa indices, including simple and atom-specific ones, have been calculated using custom-designed computer programs.

set of 269 compounds, 74 CHHI and CHHO descriptors passed this frequency check. 2.2.4. Substructure Descriptors

2.2.2. Molecular Connectivity Indices (MCIs)

MCIs, according to Kier and are unitless descriptors which quantify topological features such as type and number of atoms, extent and position of branching, and multiplicity of bonds. These indices, which can be calculated from hydrogen-suppressed graphs (HSG) representing molecular connections, have been found to be useful for modeling a variety of endpoints, toxicological,(23)physico~hemical,(~~) and biological.(z) It has recently been shown that descriptors derived from the sums and differences of simple and valence MCIs of a given order implicitly encode some electronic characteristics of molecules; see Kier and for a detailed discussion. Seventy-six MCIs were computed for every compound. They included path MCIs of orders 0-6, cluster MCIs of orders 3-6, path/cluster type MCIs of orders 4-6, and chain MCIs of orders 3-7. For every type of MCI, simple and valence-type indices and their sums and differences were also computed. The computer program CFUNC, designed by HaN23for this purpose, was used to calculate them.

2.2.3. a-Charges

The quantitative assignment of partial atomic charges can be realized with quantum-mechanical methods, but this depends upon the wave function selected and cannot be used routinely for economic reasons. Some procedures which require only topological information rather than the geometry of a molecule are available. They are fast and produce atomic charge values which are highly correlated with those obtained from ab initio calculations. For the present work, the method due to Gasteiger and Marsilic2*)was used. Computations were restricted to the a-level. The a-charges on atoms of certain welldefined molecular fragments were accumulated and used as structure descriptors. A descriptor which includes the charges on hydrogen atoms (CHHI) and another without the charges on hydrogen atoms (CHHO) were generated for the selected molecular fragments. Fragments had to occur in at least three compounds to be included; for the

Every compound was scanned for the presence of substructuresdefined in the MOLSTAO system.(29)This system defines over 3000 substructures representing chemically and/or biologically important functional groups, fused and heterocyclic aliphatic and aromatic ring systems, electron-donating and -withdrawing groups and their environments, carbon chain and ring fragments, etc. As in the case of the a-charge descriptors, only substructureswhich occurred in three or more compounds were included. In all, 113 different substructures were found to be represented with acceptable frequency (see above) and their presence or absence was coded in a binary fashion. For substructures occurring six or more times with three or more different occurrences per compound, continuous descriptors with values equal to the frequency of occurrence of the substructure count in the compounds were generated. The structures of the database compoundswere coded in the computer notation SMILES,(16)a linear sequential code for the entry of chemical structures by means of a standard computer keyboard. In the case of alkali metal salts of weak organic acids and mineral acid salts of weak organic bases, structure descriptors corresponding to the parent acids or bases were generated. 2.3. Model Development

As required by Hansch-type equations, the MTD and LD50 values of the 269 compounds were converted to a log(l/mol) scale before use in modeling as pMTD and pLD,,, respectively. QSAR model development followed the general schema shown in Fig. 1. After collection of the database and generation of the structure quantifiers, potential descriptor variables were selected from the pool of computed data to minimize the risk of chance correlations and a biased model. The best descriptors from each of the classes were identified by performing individual stepwise forward and backward regression analysis on the variablepMTD, using pLD,, and descriptors from each class as predictors. A set of 40 potential descriptors was selected com-

MTD from LD,, by Q S A R

515 Table II. Compounds Dropped During Modeling CAS no.

I,, va 1 A ctat 1 on

iy )

Fig. 1. Schema of model development.

prising 5 MCIs, 21 MOLSTAC(") keys, 10 CHHI ucharges, 3 CHHO a-charges, and the pLDs0 value. Steps 3 and 4 (see Fig. 1) were performed repeatedly until all of the grossly misfitted (as defined by exclusion from the range of three standard deviations) and influential observations (as defined by an unduly large Cook's distance) and poorly behaved descriptors (observation-sensitive regression coefficient, low contribution to variance of endpoint, high probability > t ) had been removed. Cook's distance is a statistical parameter to define the stability of a function. It is calculated by comparison of estimates for a compound generated by the resubstitution method and the jackknife method; a large difference between the estimates indicates that a substantial change in the coefficients would occur if the observation were dropped from the training set. Compounds were not removed in order to achieve higher correlation coefficients; they were excluded only if they contained structural features of rare occurrence in the database. During nine cycles of iteration through steps 3 and 4, 14 compounds were excluded as influential observations and/or serious outliers; their names and known special features are given in Table 11. For a set of 255 chemicals of diverse structure, a robust equation with 24 predictors, including the intercept, was obtained. 2.4. Validation

The model was validated by means of cross-validation (the jackknife procedure; see Ref. 30 for details). Table I provides the predicted MTD values for each compound for comparison with the experimental values. The predictions were made from the model when the

Reasons excluded"

54-31-9 67-72-1 78-42-2 99-55-8 102-50-1 117-39-5 121-14-2 121-69-7 134-72-5 140-11-4 504-88-1 1212-29-9 2244-16-8 6471-49-4

Characteristic features Infrequent structure No specific features No specific features No specific features Estimated LD, MTD > LD, No specific features No specific features Salt considered as base No specific features Estimated LD, MTD > LD, MTD > LD, MTD > LD,. estimated LD,

"I, influential observation; 0, outlier.

compound under consideration was not included in the database. 3. RESULTS

The statistical characteristics of the predictor variables in the final equation correlating pMTD withpLD50 and molecular structure descriptors are detailed in Table 111. The final equation has the following statistical parameters: R2 = 0.781; EV = 75.9%; F (p > 0.0001) = 35.7; DF = 23 and 231. R is the linear multiple correlation coefficient, EV is the amount of explained variance, and F is the F-ratio at degrees of freedom = DF. The correlation is highly significant and explains about 76% of the variance. 4. DISCUSSION

The strongest predictor of the MTD was the LD50, with a correlation coefficient of 0.70. Since the acute oral lethal dose and the maximum tolerated dose are measures of short-term and intermediate-term toxicity, respectively, it appears reasonable to accept the fact that they are also formally related. This conclusion is in keeping with the thoughts expressed by Bernstein et UZ.,(~*') Metzger et uZ.,(~) Parodi et al. ,(5) and Zeise et al. (6*p The fact that a descriptor is strongly correlated with the endpoint should not be taken as implying causality.

Gombar et al.

516 Table III. Statistical Characteristics of the Predictor Variables"

Description Intercept Anthraquinone ring Aryl halide fragment (CHHI) Phosphoric acid ester, phosphamide fgmt (CHHI) Aliphatic C in Sat alkyl fluoride fgmt (CHHI) S or N in urea, thiourea fragment (CHHI) Aliphatic C bound to ether 0 (CHHI) Aromatic C bound to ether 0 (CHHI) C = N (CHHO) Summed chain MCI order 3 Rat oral LD,,; log(l/mol)units Longest atom chain in nonring moleculeb Sat prim or sec aliph ester (non-B phenyl) Longest aliphatic C chain in moleculeb Carbonyl fragment bound to ethene Ring carbonyIb 0 El W/dr Grps and 1 El Re1 Gp on benzene ring Unfused imidazole ring or derivative Benzene ring Aliphatic C in unsat a w l halide fgmt (CHHI) Benzene ring (CHHI) Aliphatic C bound to single-bonded N (CHHI) Summed path MCI order 3 Difference path MCI order 1

Regression weff.

Standard error

Prob.

1.441 1.067 2.499 2.249 -5.251 - 4.660 1.373 - 1.861 -5.442 1.386 6.179 6.595 - 9.233 - 8.034 9.534 - 2.003 - 6.617 7.369 2.477 - 1.271 -5.439 -9.179 6.118 - 1.300

0.109 0.259 0.872 0.414 2.367 0.929 0.369 0.300 0.983 0.339 0.424 1.318 2.062 1.940 2.225 0.820 1.260 2.640

0.000

0.446

O.OO0

0.405 2.160 3.060 0.769 0.303

0.002 0.012 0.003

P" O.OO0 0.005 0.000 0.028 0.000 0.000 0.000 O.OO0 O.OO0

0.000 0.000 0.000 0.000 O.OO0

0.015 0.000 0.006

O.OO0 O.OO0

Contrib. to R2

0.0160 0.0078 0.0279 0.0046 0.0238 0.0131 0.0365 0.0290 0.0158 0.2017 0.0237 0.0190 0.0162 0.0174 0.0056 0.0259 0.0073 0.0293 0.0093 0.0060 0.0085 0.0600 0.0174

"Abbreviations: aliph, aliphatic; CHHI, charges (charges on H included); CHHO, charges (charges on H excluded); El, electron; fgmt, fragment; rel, releasing; prim, primary; sat, saturated; sec, secondary; W/dr, withdrawing. bCount of descriptor used as variable.

Only considerations outside the intent and scope of this study, including knowledge about mechanism of action, would serve to provide answers about causality. The correlation coefficients of 0.45,- 0.36, 0.36, and 0.27, respectively, of pMTD with the descriptors (see Table 111) Summed Path MCI order 111, CHIXI on aliphatic C bound to ether 0, CHHI on phosphoric acid ester/phosphamide fragment, and CHHI on aromatic C bound to ether 0, indicate that these substructural features are also strong predictors of the maximum tolerated dose. The probability values shown in Table I11 demonstrate that each of the descriptors has a significant role in the description of the endpoint pMTD. The coefficients of intercorrelation among the descriptor variables are small. No pair of variables has a correlation of greater than 0.63; all are sufficiently orthogonal and evidently quantify different structural features. The observations-to-variables ratio is greater than 10. This demonstrates that the observed correlation is far from chance. The correlation was cross-validatedby means of the

jackknife procedure for robustness. Each compound was held out, in turn, from the training sample and itspMTD was predicted from a model computed on the basis of all of the remaining observations. The difference between the residuals from the model and the cross-validation procedure results in a pooled standard error of 0.070 with a value for t of 0.056. Such small differences were found to be insignificant at p > 0.955. See Table IV for a detailed comparison of crossvalidation results with model predictions and actual bioassay data. The correlation is robust and explains about 76% of the variance. The estimates are to be evaluated with caution, however. Although validation by means of the jackknife procedure evaluates the performance of a model on the basis of compounds not included in the training set, predictions should be restricted to molecules which have structural features that are adequately represented in the model database. It is safer to interpolate within a given database than to extrapolate into areas of chemistry which are not adequately covered by the substructural features represented by database molecules.

MTD from LD,, by QSAR

517

Table IV. Bioassay Data, Predictions, and Cross-Validation Parameter

Bioassay data ~

Highest pMTD Lowest pMTD Mean pMTD

SD Median 99Ibpercentile 90” percentile 10” percentile I.1 percentile Mean square error

Predicted

Hold-1-out

5.74 1.77 3.25 0.72 3.15 5.56 4.18 2.50 1.99 0.400

5.63 1.73 3.25 0.72 3.14 5.43 4.18 2.50 1.98 0.445

~~

6.23 1.65 3.25 0.82 3.13 5.97 4.24 2.36 1.81

It must also be noted that the actual experimental values for both the LD,,s and the MTDs which underlie this model are known to be imprecise and introduce a good deal of uncertainty into the database which is carried forward into the results of the modeling process both as far as the estimation of the missing LDso values and the MTD values is concerned.

REFERENCES 1. C. Whipple, “Toxicity and Carcinogenicity,” Risk Analysis 5, 261 (1985). 2. L. Bemstein, L. S. Gold, B. N. Ames, M. C. Pike, and D. G. Hoel, “Some Tautologous Aspects of the Comparison of Carcinogenic Potency in Rats and Mice,” Fund AppL Taa’col. 5 , 7986 (1985). 3. L. Bernstein, L. S. Gold, B. N. Ames, M. C. Pike, and D. G. Hoel, “Toxicity and Carcinogenic Potency,” Risk Analysis 5 , 263-264 (1985). 4. B. Metzger, E. Crouch, and R. Wilson, “On the Relationship Between Carcinogenicity and Acute Toxicity,” Risk Analysis 9, 169-177 (1989). 5. S. Parodi, M. Taningher, P. Boreo, and L. Santi, “Quantitative Correlations Amongst Alkaline DNA Fragmentation: DNA Covalent Binding, Mutagenicity in the Ames Test and Carcinogenicity for 21 Compounds,” Mutation Res. 93, 1-24 (1982). 6. L. Zeise, R. Wilson, and E. Crouch, “Use of Acute Toxicity to Estimate Carcinogenic Risk,” Risk Analysis 4, 187-199 (1984). 7. L. Zeise, E. A. C. Crouch, and R. Wilson, “A Possible Relationship Between Toxicity and Carcinogenicity,” J. Am. CON. Taricol. 5 , 137-151 (1986).

8. B. N. Ames, R. Magan, and L. S. Gold, “Ranking Possible Carcinogenic Hazards,” Science 236, 271-280 (1987). 9. L. Zeise, E. A. C. Crouch, and R. Wilson, “Reply to Comments: On the Relationship of Toxicity and Carcinogenic&,” l&kAnalysis 5. 265-270 11985). 10. National Tokcoldgy Program (Division of Toxicology Research and Testing, Report on Chronic Doses, November 10, 1987). 11. L. Zeise, “Surrogate Measures of Human Cancer Risks” (Ph.D. thesis, Harvard University, Cambridge, Massachusetts, 1984). 12. J. Marhold, F‘rehled Pnunyslove Taxikologie (Organicke Latky, Svazek 1, Avicenum, Prague, 1986). 13. IARC-International Agency for Research on Cancer (Monograph No. 39, 1986), p. 349. 14. IARC-International Agency for Research on Cancer (Monograph No. 41, 1986), p. 87. 15. K. Enslein, T. R. Lander, M. E. Tomb, and P. N. Craig, “A Predictive Model for Estimating Rat Oral LD, Values.” (Benchmark Papers in Toxi~logy,Princeton Scientific Publishers, Princeton, New Jersey, 1983). 16. D. Weininger, “SMILES, A Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules,” J. Chem In$ Comput. Sci. 28, 31-36 (1988). 17. L. B. Kier, “A Shape Index from Molecular Graphs,” Quant. S t w t . Activ. Relat. 4, 109-116 (1985). 18. L. B. Kier, “Shape Indexes of Orders One and Three from Molecular Graphs,” Quant. Sbucf. Activ. Relat. 50, 1-7 (1986). 19. L. B. Kier, “Distinguishing Atom Differences in a Molecular Graph Shape Index,” Quant. Sbuct. Activ. Relut. 5,7-12 (1986). 20. V. K. Gombar and D. V. S. Jain, “Quantification of Molecular Shape and Its Correlation with Physicochemical Properties,” Ind J. Chem 26A, 554-555 (1987). 21. V. K. Gombar and K. Enslein, “Topological Shape and Electronic Descriptors and Their Correlation with Toxicity to Photobacterium phosphoreum,” In Vitro T&oL 2, 117-127 (1989). 22. L. B. Kier and L. H. Hall, Molecular Connectivity in Chemistty and Drug Research (Academic Press, New York, 1976). 23. W. Schultz, L. B. Kier, and L. H. Hall, “Structure-Toxicity Relationships of Selected Nitrogenous Heterocyclic compounds. 111. Relation Using Molecular Connectivity,” BdL Environ. Contarn Taa’col28, 373-378 (1982). 24. L. B. Kier and L. H. Hall, “Recent Advances in Molecular Connectivity for Biological S A R Analysis,” in J. Miyamoto, ed., IUPAC Pesticide Chemistry,Human Welfare and the Environment (Pergamon Press, New York, 1983), p. 351. 25. D. V. S. Jain, S. Singh, and V. Gombar, “Correlations Between Topological Features and Physicochemical Properties of Molecules,” Proc. Indian Acad Sci. (Chon. Sci.) 93,927-945 (1984). 26. L. B. Kier and L. H. Hall, Moleculur Connectivity in StructureActivity Analysis, (Research Studies Press, Letchworth, England, 1986). 27. L. H. Hall, (1987).CFUNC Technical Brochure. 28. J. Gasteiger and M. Marsili, “Iterative Partial Equalisation of Orbital Electronegativity - A Rapid Access to Atomic Charges,” Tetmhedmn 36, 3219-3288. 29. HDi, TOPUT Technical Brochure (Health Designs, Inc., Rochester, New York, 1988). h Genemlized Jacwotife (Marcel 30. H.L.Gray and W.R. Schucany, l Dekker, Inc., New York, 1972).

Estimation of maximum tolerated dose for long-term bioassays from acute lethal dose and structure by QSAR.

A quantitative structure-activity relationship (QSAR) model has been developed to estimate maximum tolerated doses (MTD) from structural features of c...
724KB Sizes 0 Downloads 0 Views