Mutation Research, 242 (1990) 285-303 Elsevier

285

MUTGEN 01605

A C A S E - S A R a n a l y s i s of p o l y c y c l i c a r o m a t i c hydrocarbon carcinogenicity * Ann M. Richard a and Yin-tak Woo b a Carcinogenesis and Metabolism Branch (MD-68), Genetic Toxicology Division, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711 and b Ontology Branch (TS-796), Health and Environmental Review Division, U.S. Environmental Protection Agency, Washington, DC 20460 (U.S.A.) (Received 6 March 1990) (Revision received 25 May 1990) (Accepted 7 June 1990)

Keywords: Structure-activity; SAR; CASE; PAH carcinogehicity; Fragment analysis

Summary A CASE SAR analysis was performed on a selected database of PAHs to investigate the possible use of the CASE method as an aid for preliminary assessment of carcinogenic potential of untested environmental PAHs. A data set, denoted LEARN, consisting of 78 PAHs and their experimental carcinogenicities was used to 'train' the CASE method and derive the CASE fragments. 8 activating fragments and 4 inactivating fragments were identified. These fragments predicted the activities of 94% of the LEARN set correctly. The biological significance of several of these fragments are rationalized in light of the current theories of PAH carcinogenesis. Using these fragments, the potential activities of a database of 106, mostly untested PAHs, denoted TEST, were predicted. These were compared to 'expert judgement' predictions based on mechanistic considerations in order to evaluate the extent of concordance between these two methods and their respective strengths and weaknesses. Initial poor agreement (64%) was attributed to limitations of the LEARN database involving inadequate representation of 2- and 3-ring PAH subclasses. When these subclasses were excluded from the TEST database, the concordance improved to 90%. The CASE fragments were also used to predict the activities of a database of 24 PAHs, denoted VALIDATE (not included in the LEARN set) for which carcinogenicity data were available. The total prediction accuracy of 75% (89% of the actives correctly identified), despite the structural diversity of the VALIDATE set, provided independent evidence of the utility of the present CASE results. A close examination of the CASE incorrect predictions was conducted to delineate inadequacies of these CASE results in order to provide cautionary guidance for future application of the method. Finally, the present results were compared to the results of a previous CASE analysis based on a more limited PAH data set, and were

* The research described in this manuscript has been reviewed by the Health Effects Research Laboratory, U.S. Environmental Protection Agency and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

Correspondence: Dr. A.M. Richard, MD-68, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711

(U.S.A.).

0165-1218/90/$03.50 © 1990 Elsevier Science Publishers B.V. (Biomedical Division)

286

found to be of greater general utility. It is concluded that the CASE fragments derived in the current study should provide a useful tool for assisting and complementing 'expert judgement' in the preliminary screening of PAHs for carcinogenic activity.

Polycyclic aromatic hydrocarbons (PAHs) constitute a large class of ubiquitous environmental pollutants formed from the incomplete combustion of fossil fuels, tobacco products, food and virtually any organic matter (Crittenden and Long, 1976; Grimmer, 1983; Woo and Arcos, 1981; IARC, 1983). From a health risk standpoint, PAHs are of priority concern because of significant human exposure, because many members of this class have been shown to be potent animal carcinogens, and because many PAHs are constituents of complex mixtures such as coke emissions which are associated with human neoplasia (IARC, 1987). From the standpoint of structureactivity analysis, PAHs represent an intriguing class of compounds for experimental investigation because seemingly small changes in the chemical structure can markedly affect carcinogenic activity (Arcos and Argus, 1974; Yang and Silverman, 1988; Dipple, 1984). Despite a relative wealth of carcinogenicity data on PAHs, however, there are still a substantial number of environmental PAHs for which little or no information is available. To provide initial hazard screening and priority setting, there has been considerable interest in utilizing structure-activity relationship (SAR) techniques to predict the carcinogenic potential of these untested compounds. Essentially, there are two basic approaches to SAR analysis: mechanistic and correlative (Richard et al., 1989). The mechanistic SAR approach is based on the sum of knowledge concerning presumed or proposed mechanisms for biological activity. In contrast, the correlative approach attempts to determine relevant molecular descriptors from a statistical analysis of available activity data on chemical analogs. The ultimate utility of the correlative SAR descriptors, however, will depend on their causal significance and relevance to the underlying mechanistic determinants for activity. A number of published studies have applied various correlative SAR techniques to the study of PAH carcinogenicity (see e.g. Nordrn et al., 1978; Yuan and Jurs,

1980; Enslein and Craig, 1982; Klopman, 1984; Rosenkranz et al., 1985; Mitchell et al., 1986; Rosenkranz and Klopman, 1989). The current study will focus on application of a particular correlative SAR method, the CASE (Computer Automated Structure Evaluation) system, to the study of a larger group of PAHs than had been previously considered by CASE. Our goals are 2-fold: (1) to describe a careful, objective CASE analysis pointing out the limitations and potential pitfalls, as well as the utility of the CASE approach for aiding in preliminary assessment; and (2) to model an important class of environmental chemicals for which mechanistic and experimental data are available, and for which criteria for aiding and improving preliminary assessment of untested compounds are needed. The present study will develop a CASE model based on a large group of PAHs for which sufficient experimental data are available. This model will then be used to predict the carcinogenic potential of a group of inadequately tested PAHs. SAR prediction for this latter group of PAHs had been previously carried out using what is referred to here as 'expert judgement', i.e. judgement involving consideration of mechanisms, structural features known to be associated with PAH carcinogenicity, and prior experience (Y.-T. Woo, unpublished results; see Appendix V in Woo et al., 1988). Predictions based on these two different approaches will be compared in order to assess the level of agreement, and to highlight their respective strengths and limitations. In addition, the general validity and predictive capability of the present CASE model, and the mechanistic significance of the CASE fragments, which may provide structural features useful as predictors of PAH carcinogenicity, will be explored. Finally, the present CASE model will be compared to previous CASE models for PAH carcinogenicity, based on smaller PAH databases, to evaluate the influence of the size and composition of the training database on the CASE results.

287

PAH databases Three PAH databases were used in the present study: a 'LEARN' set of 78 PAHs with carcinogenicity data; a 'TEST' set of 106 untested PAHs; and a 'VALIDATE' set of 24 PAHs with available carcinogenicity data which were not included in the original LEARN set. The LEARN set and the TEST set originated from a larger set of 223 PAHs initially compiled for another purpose than the present SAR study; namely for the purpose of helping U.S. EPA regional offices to identify carcinogenic or potentially carcinogenic PAHs at hazardous waste sites. These 223 were all the PAHs in the SANSS * database satisfying the search criteria: (a) mass spectral data available; (b) homocyclic (i.e. containing only C and H atoms); and (c) composed of at least 2 aromatic rings. Of these 223 PAHs, 39 were excluded from consideration since they contained intercyclic bonds, were of uncertain structure, or were a deuterated form of a PAH already represented within the database. Of the 184 chemicals remaining, 78 had experimental carcinogenicity data (Arcos and Argus, 1974; USPHS, 1951-1985; IARC, 1983; Dipple, 1984; Yang and Silverman, 1988; Nesnow et al., 1986). This set of 78 PAH structures and their corresponding activities are referred to as the 'LEARN' database in this study. Activity assignments were based on evaluation of available experimental data and ranged from L (low or no activity) to LM (low moderate), M (moderate), HM (high moderate) and H (high); a greater degree of uncertainty warranted a split rating such as L / L M (low to low moderate). The LEARN database represents the total knowledge of PAH carcinogenicity for the purposes of the present CASE analysis, and is used to train the

* SANSS, Structure and Nomenclature Search System, is a computerized search system developed jointly by U.S. National Institutes of Health and Environmental Protection Agency as a component of N I H / E P A ' s Chemical Information System. SANSS allows searches of chemicals based on names, chemical structures/substructures, or molecular formula, and provides information on availability of physicochemical/toxicity/exposure/use data in other speciafized CIS data files. SANSS currently contains over 350000 chemicals from over 100 files or databases.

CASE SAR method and derive the present CASE model and results. The 'TEST' set consists of the remaining 106 PAHs, for which inadequate or no carcinogenicity data are available. Carcinogenic potentials for these PAHs were predicted by 'expert judgement'. The 'expert judgement' represents the judgement of one of the authors of the present study who has been extensively involved in mechanism-based SAR prediction of the carcinogenic potential of chemicals, including PAHs (see e.g.: Woo et al., 1985, 1988; Woo and Arcos, 1989). His judgement is based on the bulk of his experience and knowledge in this field, on established rules and mechanisms of PAH carcinogenicity, and, in some instances, on the necessity of erring towards caution when regulations concerning human health are at issue. This judgement involves mechanistic considerations which include evaluation of the presence/ absence of a variety of structural features that are known to affect the carcinogenic potential of PAHs by a combination of metabolic, stereochemical, conformational a n d / o r electronic factors (Arcos and Argus, 1974; Dipple, 1984; Yang, 1988; Hecht et al., 1988; Woo et al., 1988). Pertinent features used in evaluation of the TEST set included: (1) favorable molecular size and shape (e.g., number of condensed rings, coplanarity, molecular encumbrance area, aspect ratio); (2) unsubstituted bay-region benzo ring a n d / o r unsubstituted pseudo bay-region benzo ring; (3) unoccupied peri position adjacent to the bay-region benzo ring (e.g., 5-position of benz[a]anthracene), referred to as the ' - P effect' if occupied; (4) substitution at the peri position (inside the bay-region) of inner naphtho moiety of the bay-region with methyl or small alkyl group (e.g., 5-position of chrysene), referred to as the ' + B effect'; (5) presence of proelectrophilic substituent (e.g., alkenyl group with terminal double bond); (6) substitution at the L-region with methyl or small alkyl groups; and (7) lack of bulky alkyl substituents. These SAR considerations were complemented by available short-term test results, physicochemical and/or metabolism data and experience-based intuition to arrive at a level of concern for carcinogenic potential. Based on the 'expert judgement' confidence and the perceived potency in comparison to simi-

288

STRUCTURALFORMULASOF PARENTPAH'S

,P,,

(P2)

"

L

T

V

2

,,

o

,P,,

4

5

0

(P12)

(P3) : ~ 1

2

(P4) &

0 |

0

(P5) ~

1

0

L

210

0

(P13)

],JI : } ~

1

V

o

o

0

0

0

0

0

~

L

T

V

(P20) ~

0

0

t

(1321) ~

0 0 1

P (Z? ).~

o

o

I

(P23) ~

0

0

1

1

I

0

.

2

|

1

0

0

(P24) [ ~ (P16) I

5

(P7)

~

4

5

4

0

7

2

0

~ , ~

t

3

I

(P18)

4 3

2

0

5

6

0

(P19)

0

0

1

L

T

V

0 0

1

(P17)

i

(P10)

~

~

(P27) ~ 5

(P'29) ~

t

o

0

1 0

0

0

0

0

0

0

(P34) ~

0

0

(P35) ~

0

0

o

1

0

0

I

(P33) [ (P26)

(P9)

V

2 o o

(P32)~ (P25) ~

2 (P'S)

T

(P28). ~

(P31) ~ W

W

0

L

0

~ , ~ (P15)

~ 20 I

|

(P14)

' : • (P6)

1

T

1 o o

1

0

L

T

0

~

l

7

(P36)

(P38) [

~

0

0

1

% (P39) ~

0 0

L

0

0

0

T

V

(P53) ~

0

I

0

(P59) ~

0

(P45)

0

0

(P54)

I

0

0

(P60) ~

1 0 0

~ (p46) ~

(P55) ~

I

0 0

0

1 0

(l=gl) ~

0

0

2

1

1

I

L

0 0

l

(---~ ~

(P56)

I

0

0

1 0

I

"x~z/

0

I

0

0

(P49)

t

0

0

(P50) ~

1 0 0

(I°51) [ ~

I

0 0

(P52) [

0

l

[ ~ 0

V

I

(P48)

(P42) ~

V

(p44) ~

(P47)

E-~. (P41) CL ~~[,.,.~..[.

T

(P58) ; ~

1 0 0

(P62) ~

0

1 0

3 0

(P63) ~

0

1 0

0

I

J 3 (P43)

~

0

0

l

~

0

Fig. 1. Listing of the parent P A H structures and the number of occurrences of c o m p o u n d s within each category for each of the three databases considered (L = L E A R N , T = TEST and V = V A L I D A T E ) .

289

lar compounds, each PAH was assigned a qualitative activity rating ranging from L to H, as in LEARN. It must be cautioned that these ratings represent scientific judgement and consensus would not necessarily be expected if other qualified 'experts' were consulted. However, this is not the primary focus in the present manuscript, nor is it the intent that these 'expert judgements' be used outside this study. Preliminary assessment based, in part, on the guidance of 'expert judgement' represents an important component of regulatory considerations within EPA for predicting potential carcinogenicity of inadequately tested chemicals and for setting priority. Since proponents of programs such as CASE argue that these methods provide automated tools for potentially augmenting or replacing such judgement, they should be evaluated on this basis. It is of interest in this study to evaluate the level of agreement between the 'expert judgement' predictions for the TEST set and the CASE model predictions. While good agreement between these two methods would bolster confidence in both sets of predictions, poor agreement would not necessarily discredit either method and should serve as a basis for method refinement. The 'VALIDATE' database consists of 24 PAHs for which experimental carcinogenicity data exist (Dipple, 1984), but which were not included in the LEARN database due to the lack of mass spectral data for these compounds. As the name implies, this set of PAHs is used to evaluate the predictive value of the fragments identified in the present CASE-SAR study for extrapolating beyond the LEARN database used in development. This is an extremely important component of the present study, and in our view, should be a necessary component of any correlative SAR model development. It should be noted that while it is a straightforward matter to evaluate the predictive utility of the CASE model using a data set excluded from the original LEARN 'training' set, there is no attempt made here to similarly evaluate the 'expert judgements' in TEST. The 'expert judgement' is based on the sum total of prior knowledge and experience and, unlike a computer database, it is not possible to selectively exclude portions of that knowledge from human memory once it has been incorporated. Hence, the predic-

"PAH STRUCTURAL

DIVERSITY"

80

[]

u') a z.-~

0 0

60

TEST

r~ LEARN

"

40

ii

0 20

2

~

4

=,5

#

6

7

8

9

I0

OF RINGS

Fig. 2. Plot of the n u m b e r of P A H s within subclasses (based on the number of rings) which indicates the structural diversity of the L E A R N and TEST databases.

tion accuracy of the 'expert judgement' is unknown and can only be determined retrospectively. The number of representatives in each parent PAH structure category which comprise each of the three databases, LEARN(L), TEST(T), and VALIDATE(V) is listed in Fig. 1. A complete listing of the chemicals is provided in the Appendix accompanied by the associated parent PAH structure label. There are a total of 63 parent PAH structures represented in the three PAH databases, 34 of these represented within LEARN. Within individual databases, 41/78 compounds in the LEARN set, 88/106 in the TEST set and 0/24 in the VALIDATE set are alkyl substituted PAHs. The structural diversity within the LEARN and TEST PAH databases is also illustrated in the bar graph of Fig. 2 according to 'subclass' based on the number of rings in the molecules. As can be noted from this figure, the 2- and 3-ring structural subclasses are very unequally represented within these two databases. The implications of this in the present study will become clear. The present analysis required the PAH structures to be coded and input into the CASE program. For the purposes of a qualitative CASE analysis, the 'L' category was designated INACTIVE(-), the ' L / L M ' category was designated MARGINAL (mar), and the 'LM' through 'H'

290

" P A H A C T I V I T Y DISTRIBUTION FOR C A S E A N A L Y S I S " lOO

80

69 a Z 50

© i1

[]

TEST

[]

LEARN

INACTIVE mar MARGINAL + ACTIVE

40

© 20

CARCINOGENICACTIVITY

Fig. 3. Plot of the number of PAHs within the compressed potency categories utilized by the CASE program which indicates the activity distribution within the LEARN and TEST databases.

categories were condensed and designated as ACTIVES(+). The distribution of condensed activities within the L E A R N and TEST databases is presented graphically in Fig. 3. LEARN contains approximately equal numbers of actives and inactives which implies statistical suitability for purposes of a CASE analysis.

CASE method The CASE program is an automated, statistically-based correlative SAR program which utilizes a single type of molecular descriptor:computer generated structural fragments (Klopman, 1984). The program represents a general algorithm which incorporates chemical structural and activity data, and identifies a set of significant molecular fragments which constitutes the C A S E - S A R model for activity prediction. The specific CASE model will depend on the structural composition of the database analyzed, the activity endpoint and values considered, and the designated cutoffs for activity categories (active, marginal and inactive). The general procedure is as follows: A database, consisting of one or more groups of similar chemicals and their associated activities, constitutes the input to the program. The chemical structures are

represented by a linear text notation particular to CASE, or entered graphically. In addition, an activity and inactivity cutoff must be specified (if these cutoffs are not the same, a marginal category results). The CASE program then proceeds to calculate all possible fragments (of length 3 12 non-hydrogen atoms) for each molecule in the database. Each fragment is labeled A C T I V E ( + ) or I N A C T I V E ( - ) , according to the activity of the parent molecule from which it originated, and the distribution of occurrence of each unique fragment within the database is determined. Only if the fragment distribution is significantly skewed from random is it labeled as activating or inactivating. The final result of the CASE analysis is a small set of significant fragments. The CASE program can operate in a QSAR mode (in which significant fragments and their associated weights are used to make quantitative predictions of potency) or in a qualitative prediction mode (in which activity predictions for molecules with unknown activity are based on the presence or absence of one or more of these fragments in the molecule). Only the qualitative CASE mode was used in the present study. A molecule is labeled ACTIVE if it contains more occurrences of activating fragments than inactivating fragments, and INACTIVE otherwise. Only in equivocal cases where an equal number of activating and inactivating fragments is present are multiple occurrences of the same fragment within a molecule considered.

CASE fragment results The results of the CASE analysis of the LEARN database are summarized in Figs. 4 and 5. Fig. 4 lists the significant activating fragments (AF) and their incidence within the L E A R N database. Fig. 5 lists the same for the significant inactivating fragments (IF). The ' + ' , ' - ' and 'mar' columns indicate the number of ACTIVE, INACTIVE and M A R G I N A L molecules, respectively, in which the fragment occurs. The CASE fragments in Figs. 4 and 5 represent the present C A S E - S A R model for activity prediction in the discussion that follows. By far, AF1 is the most significant fragment in terms of representation within LEARN, occurring in 23/31 or 74% of the LEARN ACTIVE PAHs

291

and only 1 of the INACTIVE PAHs. This fragment indicates ring positions C1-C5 (see Fig. 4) must be unsubstituted for activity, whereas positions C6 and C7 are only specified to be part of a ring. AF1 clearly represents an unsubstituted bay-region benzo ring with an unoccupied peri position at C5. This is consistent with the current 'bay-region' theory of PAH carcinogenicity which stresses the importance of metabolic availability of this region, resonance stabilization of the

CASE

ACTIVE

IF1

2

IF2

+

34 ~ 2

AF2

:~

mar

AF4

~

AF8

~-.-.]

~

0

3

1

6

4

ii

"""i

0

3

I

4

~

1

3

0

4

.p 23

1

13

37

10

2

3

15

4

0

0

4

4

0

1

5

4

1

0

5

2

0

0

2

2

0

0

2

,.-

-"-.

AF7

3

1

IF4

,.-'-.

AF5

0

Fig. 5. Listing of the inactivating fragments determined to be significant by the CASE program (see Fig. 4 caption).

AF3

-.-

TOTAL

TOTAL

1

~ . ' ]

-..

mar

FRAGMENTS

7 6 "'" "-,

AF1

FRAGMENTS

+

IF3 CASE

INACTIVE

Fig. 4. Listing of the activating fragments determined to be significant by the CASE program: CASE fragment represented by solid lines, with non-hydrogen substitution position indicated explicitly (see e.g. AF4); dashed fines indicate an implied framework with no information concerning non-hydrogen substitution (e.g. position 2 in AF8 may or may not be substituted). The incidence of each fragment within the LEARN database is listed according to activity category (+ = active, - = inactive, mar = marginal).

carbonium ion formed by metabolic activation, and the stereochemical and conformational accessibility of the carbonium ion for critical binding to DNA (Jerina et al., 1977; Jerina and Lehr, 1978). The requirement of an unsubstituted peri position indirectly supports the ' - P effect' hypothesis which suggests that substitution at this position causes conformational changes that tend to inhibit PAH carcinogenicity (Hecht et al., 1988). AF1 alone, however, cannot account for the inactivity of the phenanthrenes, and the low-moderate (LM) activities of molecules with modified bay regions (such as LEARN PAHs Nos. 51 and 63) or no apparent bay regions (such as active naphthalenes, anthracenes and LEARN PAHs Nos. 60 and 78). The remaining activating fragments in Fig. 4 are much less prevalent within the LEARN database. AF2 and AF3 are basically modifications or extensions of the 'bay region' fragment, and A F 4 - A F 8 are derived from the ACTIVE PAHs with no apparent bay regions. The significant inactivating fragments in Fig. 5 serve two functions: to identify portions of molecules likely to be associated with inactivity; and to account for the inactivity of compounds, such as phenanthrene, which might otherwise be presumed to be active by CASE. The relatively low incidence of the IF1,

292

IF3 and IF4 fragments in the LEARN database leads one to doubt their significance. An important consideration, however, is that small structural differences between molecules which translate into large activity differences can yield a fragment with a relatively high confidence level. An example is IF3, which is present in only 4 molecules - - LEARN PAHs Nos. 11, 25, 32 and 42. LEARN PAH No. 10, which is ACTIVE, differs from No. 11, which is INACTIVE, only in the methyl substitution associated with IF3; similarly for L E A R N PAHs Nos. 30 and 32. It is interesting that 41/78 of the L E A R N compounds and 88/106 of the TEST compounds are methyl or alkyl substituted PAHs, yet only fragments AF4 and IF3 explicitly specify a substitution. In the case of AF4, the substitution is activating. This is consistent with the requirement of an unreactive meso-anthracenic region or 'L-region' in Pullman's electronic 'K,L-region' theory (see Arcos and Argus, 1974) and Flesher's biomethylation hypothesis (Flesher et al., 1986, 1988). In the case of IF3, the methyl substituent is deactivating which is consistent with the 'bay-region' theory. All the remaining fragments in Figs. 4 and 5 refer to unsubstituted portions of parent PAH structures. Hence, in most instances the effect of substituents on PAH activity for this particular CASE model will be only indirectly

accounted for by interference with the CASE fragments. CASE self-test: LEARN An evaluation of the predictive ability of the current CASE model has two components: 'selftest' and 'validation'. The former gives an indication of the self-consistency of the CASE results, i.e. how well the significant fragments in Figs. 4 and 5 predict the activities of the LEARN database from which they were derived, while the latter evaluates the validity and predictive utility of the CASE results beyond the database used in development. The 'self-test' results are summarized in Table 1 and tabulated for the individual chemicals in the Appendix. Using all 12 of the fragments in Figs. 4 and 5, the qualitative ( + / - ) prediction accuracy within the LEARN database was excellent, as might be expected. All but 1 ACTIVE (No. 73) and 2 INACTIVES (Nos. 74, 77) were correctly predicted for a total prediction accuracy of 94%. [Note that M A R G I N A L S are not included in these totals.] As alluded to earlier, the fragments in Figs. 4 and 5 differ greatly with respect to their relative incidence and discriminating ability. For example, AF3 contains the tragment AF1, and, hence, provides no further discrimination of ACTIVES from INACTIVES than

TABLE1 CASE SELF-TEST FOR LEARN PAH DATABASE Actives a

Inactives b

F+ c

F-

Total e

%f

Frag g

30/31

23/25

2

1

53/56

94%

A F 1 - A F 8 h. IF1 IF4

30/31

24/25

1

l

54/56

96%

AF1, 4, 5, 6, 8; IF1

23/31

24/25

1

8

47/56

84%

AF1

b c d e f g h

d

Fraction of data base Actives correctly predicted by CASE. Fraction of data base Inactives correctly predicted by CASE. Number of false ' + ' s . Number of false ' - ' s . Total number of correct CASE predictions as fraction of total data. Total expressed as % correct CASE predictions. CASE fragments used in prediction. Refer to CASE fragment labels in Figs. 4 and 5.

293

the use of AF1 alone, except perhaps in estimating the degree of carcinogenic potency. The second row of Table 1 indicates that as few as 6 of the CASE fragment descriptors can account for the qualitative ( + / - ) activities of 54/56 or 96% of the LEARN PAHs, 31 of which are ACTIVE. To highlight the importance of the AF1 fragment, the third row of Table 1 indicates the prediction accuracy of the LEARN database with this single fragment which correctly accounts for the activities of 47/56, or 84%, of the non-marginals in the database. [Recall that a molecule which contains no CASE fragments is labeled INACTIVE by default in this prediction mode.] The self-test indicates that a group of relatively few parameters are capable of quite accurately discriminating ACTIVES from INACTIVES within the LEARN database. However, this does not necessarily reflect the predictive capability and applicability of these results outside the LEARN database. Independent evaluation of the CASE model must be performed on compounds not included in the original LEARN database. Although the TEST database does not fulfill these requirements, since experimental activity assignments are unavailable, it is of interest to gauge the level of agreement between the CASE model and 'expert judgment' predictions for the TEST database.

The CASE predictions: TEST Results of the CASE model predictions for the TEST database are summarized in Table 2. CASE

predictions utilizing the 12 fragments in Figs. 4 and 5 for the entire TEST database of 106 compounds are summarized in the first row in Table 2. In contrast with the excellent prediction accuracy for the LEARN database, the agreement with the 'expert judgment' in this case is poor - - only 64% concordance. If all the 106 chemicals were simply assumed to be INACTIVE, the agreement would be 61/91 or 67%! Even worse, only 9/30 or 30% of the compounds designated by 'expert judgement' as ACTIVE are assigned as such by CASE. These disturbingly disparate predictions between CASE and 'expert judgement' motivated a closer examination of the data. Upon inspection, it was apparent that the majority of predictions in disagreement between CASE and 'expert judgement' were for substituted naphthalenes and phenanthrenes, i.e. 2- and 3-ring compounds: 20/21 of the ACTIVES and 2/10 of the INACTIVES (designated as such by 'expert judgement') for which CASE and the 'expert judgement' prediction disagreed were substituted naphthalenes, and 5/10 of the 'misidentified' INACTIVES were phenanthrenes (2 were anthracenes). Recall that these subclasses were very unequally represented within the LEARN and TEST databases, as indicated in Figs. 1 and 2. These are also subclasses for which the 'expert judgement' predictions are based on very little data and should be considered highly speculative. The LEARN databases, in effect, represents the knowledge base which trains CASE to recognize and distinguish an ACTIVE compound from an INACTIVE one. However, it contains only 2 naphthalenes (both

TABLE 2 CASE PREDICTIONS FOR TEST PAH DATABASE Data

Actives a

Inactives b

F+ ~

F- d

Total e

%f

Frag g

Total

9/30

49/61

12

21

58/91

64%

AF1-4, 7, 8 h IF1-4

* * * Breakdown into subclasses * * * 2 Rings 4/24 12/14 3 Rings 1/2 15/22

2 7

20 1

16/38 16/24

42% 67%

> 3 Rings

3

0

26/29

90%

AF8 AF1, 3, 4 IF1-4 AF1, 2, 7 IF2, 3

a-h

4/4

Refer to footnotes ifi Table 1.

22/25

294 ACTIVE) and 2 phenanthrenes (both INACTIVE). With this very limited information, CASE is attempting to extrapolate predictions to the TEST database which contains 43 substituted naphthalenes (24 designated ACTIVE and 14 designated INACTIVE by 'expert judgement') and 20 substituted phenanthrenes (all presumed INACTIVE by 'expert judgement'). The above observation stresses the importance of ensuring adequate representation of structural subclasses within the LEARN database used to train CASE. While it might seem obvious that CASE had not been adequately trained to predict the activities of the 2- and 3-ring PAH subclasses, this should not necessarily have been presumed prior to analysis. If the mechanism of action of all the ACTIVE PAHs were the same, then fragments significantly associated with activity likely would not be exclusive to a particular subclass. Figs. 4 and 5 indicate that 11/12 of the significant fragments require 3 or more rings. AF8 is the only fragment which could be present in the naphthalenes, yet it has only 2 occurrences (both ACTIVE) within LEARN. Hence, AF8 has no applicability to the rest of the PAH database and is of little use for extrapolation purposes to a diverse set of substituted naphthalenes. The subclass distinction is not so clear in the case of the 3-ring compounds. The parent phenanthrene structure, L E A R N No. 7, contains the activating 'bay region' fragment AF1. Hence, in order to correctly predict the inactivity of this compound, CASE must rely on the presence of one of the INACTIVE fragments (IF1, IF3 or IF4) in Fig. 5. Hence, phenanthrenes by default are presumed by CASE to be ACTIVE unless an INACTIVE fragment is present. In contrast, 'expert judgement' has predicted all phenanthrenes within TEST as INACTIVE based on knowledge of effective metabolic detoxification by oxidation of the 'K-region', or C-9,10 double bond (Lavoie and Rice, 1988). The possible structural requirement of both an unsubstituted bay-region benzo ring and substitution at or near the 'K-region' (to prevent the competing detoxification) for carcinogenicity of phenanthrene is not recognized by CASE in this study. Note that the 3-ring subclass also includes 5 substituted anthracenes and 1H-phenalene. CASE has limited knowledge of anthracenes from 4 struc-

tures in L E A R N and agrees with 'expert judgement' that TEST No. 50 is ACTIVE. However, 'expert judgement' disagrees with CASE in predicting structures TEST Nos. 51 and 52 to be INACTIVE due to the presence of a bulky hydrocarbon chain. Whereas the 'expert judgement' is based on prior experience, there are no examples of substituents of this type in LEARN and, hence, CASE has no previous knowledge upon which to assign a significance to such fragments. Finally, 1H-phenalene (TEST No. 74) probably most closely resembles the naphthalenes chemically, and, hence, should be grouped within that subclass. To summarize these observations, rows 2 and 3 in Table 2 present a breakdown of the CASE prediction results for the 2- and 3-ring subclasses. These data clearly indicate that these subclasses account for the bulk of the disagreement between the CASE and 'expert judgement' predictions for the TEST database. Thus, it can be concluded that a meaningful evaluation of the CASE vs. 'expert judgement' predictions for the TEST database should exclude predictions for these subclasses for which LEARN has inadequate data and CASE predictions are suspect. Henceforth, the CASE results will be presumed to be applicable only to PAH structures consisting of greater than 3 rings. Since the significance of fragments AF8, IF1 and IF4 is directly attributed to their representation within the 2- and 3-ring subclasses; these fragments will be excluded from further application of the CASE model. Finally, the last row in Table 2 presents the prediction results of CASE vs. 'expert judgement' for the remaining TEST structures with more than 3 rings. This reduces the size of TEST dramatically to 29 PAHs, only 4 of which are ACTIVE. The CASE and 'expert judgement' agreement appears to be greatly improved, although the 90% agreement is somewhat misleading since 86% of this data would be correctly predicted INACTIVE by CASE by default. In particular, however, there is agreement in the identification of all 4 ACTIVES. These final TEST results indicate that when CASE has adequate information on structural subclasses, the agreement of CASE model predictions with 'expert judgement' provides some independent evidence for the validity of both predictive methods.

295

CASE predictions: VALIDATE

will necessarily depend on the degree of extrapolation beyond the LEARN database required.

Assessment of the general utility and applicability of the present CASE model requires additional PAH structures not included in the original SAR model development for which experimental carcinogenicity data are available. The VALIDATE database, which consists of 24 greater than 3-ring PAH structures not included within LEARN, is suitable for this purpose. These PAH structures and their experimental activities and CASE predictions are listed in the Appendix, and the CASE results are summarized in Table 3. Using the LEARN fragments AF1-7 and IF2-3, CASE correctly predicts 75% of the PAH activities within VALIDATE. At first glance, these results seem disappointing, particularly when compared to the 94% agreement obtained with the 'self-test' prediction in LEARN. When Fig. 1 is consulted, however, it is found that only 3/24 of the VALIDATE structures have parent PAH structures represented within LEARN. Thus, analysis of the VALIDATE database involves almost a total extrapolation of the CASE predictive ability beyond the strict boundaries of the parent PAH structural subclasses represented within LEARN. In this sense, the VALIDATE database comparison provides a severe test of the CASE predictions and could be considered a worst case scenario. When viewed as a probable lower bound, 75% seems an acceptable level of agreement. In particular, note that there is only a single false negative for this data, i.e. 8 / 9 or 89% of the ACTIVES are correctly identified as such by CASE. This high sensitivity level of the CASE model is an important consideration for the application of this model to preliminary hazard assessment. In any case, the overall validity of the present CASE model cannot be definitively answered 'yes or no' since the prediction accuracy of the present model

Analysis of CASE 'incorrect' predictions In order to utilize the present CASE model for any type of predictive application, it is important to delineate inadequacies of the model during the testing phase to provide cautionary guidance for future apphcation of the model. This requires a closer examination of the CASE incorrect predictions within the databases. In the LEARN set, all 3 incorrect predictions (LEARN PAHs Nos. 73, 74 and 77) are for cyclopenta-PAHs, of which there are 7 total. This indicates a possible inadequacy of the CASE fragments for application to this subclass, due perhaps to the different topology of cyclopenta-PAHs giving rise to structural features (e.g. 'pseudo' bay-regions) not found in hexacyclic PAHs (Roussel et al., 1988; Gold et al., 1988). The ability of CASE to recognize the importance of partial ring saturation from the LEARN set is also questionable. Arguments have already been proyided against application of the current CASE model to the 2- and 3-ring PAH subclasses. For the greater than 3-ring PAHs within TEST, there are only 3 CASE predictions which disagree with 'expert judgement'. The 'expert judgement' predicts the compounds to be INACTIVE, in spite of the presence of CASE ACTIVE fragments, due to the presence of a bulky hydrocarbon chain (TEST No. 78), partial ring saturation yielding a cycloalkyl-substituted phenanthrene derivative (TEST No. 95), and steric interference with the bay-region and distortion from coplanarity (TEST No. 99) - - all features which CASE has been unable to adequately model. Finally, incorrect CASE predictions within VALIDATE are for a very close analog of TEST No. 99 (VALIDATE No. 8) and two close analogs of TEST No. 78 (VALIDATE Nos. 16, 20). Hence,

TABLE 3 CASE PREDICTIONS

FOR VALIDATE

PAH DATABASE

Actives a

Inactives b

F + c

F- d

Total e

% f

Frag g

8/9

10/15

5

1

18/24

75%

A F 1 - 3 , 5,6 h; I F 2

a - h R e f e r to f o o t n o t e s in T a b l e 1.

296

their incorrect CASE prediction is consistent with the disagreement found within TEST. In addition, there are 2 naphthopyrenes (VALIDATE Nos. 13, 14) and picene (VALIDATE No. 7) for which the CASE prediction is incorrect. These inadequacies should be considered in any future application of the present CASE model. Finally, we make the general observation that the sensitivity of the present CASE model (% ACTIVES correctly identified) exceeds the overall prediction accuracy for all three databases considered, which, as we have stated, is an important consideration for preliminary regulatory assessments. It is generally true in SAR modeling studies that active compounds represent a much more well defined group than inactive compounds and, as a result, conditions for inactivity are rarely adequately modeled. In particular, the present CASE model predicts the bulk of INACTIVES correctly by default, i.e. due to absence of CASE fragments. The implication is that in using the present CASE model for extrapolation, one would expect conditions for PAH carcinogenic activity to be generally more adequately modeled than conditions for inactivity, i.e. a high sensitivity, yet low specificity may result.

study, it was of interest to compare the two models. In the earlier CASE study, 2 ACTIVE fragments and 1 INACTIVE fragment were considered highly significant. One of the ACTIVE fragments contained a bay-region benzo ring and resembled IF1 with a third ring at positions C1, C2 specified (see Fig. 5). The second ACTIVE fragment was identical to AF3 in Fig. 4, and the INACTIVE fragment resembled IF2 with position C1 not specified. In order to evaluate the relative utility of the two sets of CASE fragments, their respective abilities to predict the activities for the previous and current PAH data sets were considered. There were a few PAHs from the earlier CASE study which were excluded from this comparison - - benzene, and 6 other PAHs (naphthalene, anthracene, naphthacene, benzo[e]pyrene, anthanthrene and dibenz[a,j]anthracene) whose activity assignments have been more recently revised. This left 36 PAHs from the earlier CASE study (16 ACTIVES, 20 INACTIVES and no marginals) which are, henceforth, referred to as the KL84 database. Tables 4a and 4b present the CASE results obtained when the two sets of CASE fragments were applied to both the previous KL84 data and the current LEARN database. Note that the first row of Table 4a and the second row of Table 4b represent the 'self-test' of the respective fragment set. Whereas the LEARN self-test gave excellent agreement of 94%, the KL84 self-test was only 78% accurate, with only 9 / 1 6 or 56% of the ACTIVES correctly identified. In addition, the KL84 fragments were unable to adequately predict the L E A R N PAH activities, as evidenced by the poor 55% accuracy in Table 4a. In striking contrast, the CASE fragments derived from L E A R N predicted 83% of the KL84 PAHs correctly, and most importantly, correctly identified

Comparison to previous CASE models In a previously published study (Klopman, 1984), a CASE analysis was performed on a database of 43 PAH compounds for which carcinogenicity data were available (Dipple et al., 1984). All but one of these 43 PAHs are included in the present study - 20 are contained in the LEARN set and 22 in the V A L I D A T E set. Since the earlier CASE model was derived from a more limited PAH database than the current L E A R N set, and since the CASE fragments obtained in that study differ from those found in the current

TABLE 4a CASE P R E D I C T I O N S U S I N G KL84 F R A G M E N T S (See text) Data

Actives

KL84 LEARN

9/16 6/31

a

a-f Refer to footnotes in Table 1.

Inactives b

F+ c

F-

19/20 25/25

1 0

7 23

d

Total 28/36 31/56

e

%f 78% 55%

297 TABLE 4b CASE PREDICTIONS USING LEARN FRAGMENTS (Figs. 4 and 5) Data

Actives a

Inactives b

F+ c

F- d

Total e

%f

KL84

15/16

15/20

5

1

30/36

83%

LEARN

30/31

23/25

2

1

53/56

94%

a-f Refer to footnotes in Table 1.

94% of the ACTIVES. Another interesting observation is that the AF1 fragment, which was found to be highly significant in the L E A R N database, would not have been considered significant in the KL84 study (its distribution was not sufficiently skewed towards ACTIVES, i.e. of the molecules which contained this fragment, 62% were A C T I V E and 38% were INACTIVE). These findings lead one to conclude that the current C A S E SAR model, whose fragment descriptors are listed in Figs. 4 and 5 and which was derived from a more extensive P A H database, is of more general utility than the earlier KL84 results. This demonstrates the need for adequate data representation within the CASE training database, and the importance of subsequent testing and independent assessment of the validity of the CASE model. Finally, we briefly consider how the current C A S E - S A R model relates to a third published C A S E - P A H model - - denoted ROS85 (Rosenkranz et al., 1985; Mitchell et al., 1986; Rosenkranz and Klopman, 1989). The ROS85 CASE model (fragments and fragment-based activity predictions) is derived from a different P A H database with a different profile of structures than the current CASE model or the KL84 model. The ROS85 database consists of 48 methyl or fluoride derivatives of only 3 parent P A H structures - chrysene, benzo[a]pyrene and benz[a]anthracene. The activity assignments are based on ED50 values from skin tumor induction in mice, and of the 9 PAHs common to both the L E A R N and ROS85, 7 are assigned different activities ( + / M / - ) in the two databases. Hence, it is difficult to meaningfully compare the present CASE model, which was derived from the L E A R N database, with the ROS85 model; the ROS85 model has knowledge of a much wider range of substitution patterns than the present model, yet for very few

parent compounds, and the activity measures do not appear to be comparable. As a result, the ROS85 model has little applicability to the L E A R N database. While the present CASE model correctly identifies all of the ROS85 active compounds, it is ineffective in identifying the ROS85 inactive compounds because it is not sufficiently trained to recognize inactivity features represented by the large variety of substitution patterns within ROS85. This supports the view that the current CASE model is fairly sensitive to conditions for P A H carcinogenic activity for a wide variety of parent P A H structures, yet may inadequately model conditions for inactivity in extrapolations beyond L E A R N . Conclusion In our view, the results of a correlative SAR program such as CASE should not be applied to prediction in isolation, but rather should be used to complement current knowledge and 'expert judgment'. The present analysis supports this view and has indicated that when a degree of insight and caution is applied in the interpretation and application of the CASE results, they can be potentially useful. The present study has evaluated a CASE model based on a more extensive P A H database than previous CASE P A H model studies, and has attempted to provide a careful strategy for interpreting the CASE results, a discussion of the limitations of the present CASE model and cautionary guidance in the future use of such results. In particular, application of the present CASE model should be restricted to new P A H structures which fall within the structural subclasses adequately represented within the 'training' database L E A R N , i.e. PAHs with more than 3 rings, and should take into account the history

298

of CASE error within the databases evaluated. For example, there are inadequacies in CASE predictions for cyclopenta PAHs, and in modeling the effect of steric interference from bulky hydrocarbon or aromatic groups and the effect of partial hydrogenation. The CASE results have been derived from a LEARN PAH database which was not expressly compiled for the purpose of the current SAR investigation. Hence, data gaps within LEARN which restrict the knowledge base of the CASE analysis do not represent fundamental limitations of CASE, and could perhaps be overcome by expanding the LEARN database in future applications. Subject to data gap constraints, CASE predictions are in good agreement with 'expert judgement' predictions of PAH carcinogenicity within TEST, and in good agreement with experimental activities within LEARN and VALIDATE, demonstrating a very low false-negative rate in each case. Also, the CASE fragments obtained in the present analysis are consistent with current mechanistic theories of PAH carcinogenesis involving the bay-region, ' - P effect', and L-region methylation. In conclusion, the CASE fragments derived in the current study should provide a useful tool for assisting and complementing 'expert judgement' in the preliminary screening of PAHs for carcinogenic activity.

Acknowledgements We gratefully acknowledge Professor Gilles Klopman for graciously allowing one of us (A.M.R.) to visit his laboratory and run the CASE program, and for helpful suggestions. We also appreciate the helpful comments and suggestions of Drs. Stephen Nesnow, Joseph Arcos, James Rabinowitz, and John Ashby in review of the manuscript.

References Arcos, J.C., and M.F. Argus (1974) Chemical Induction of Cancer: Structural Bases and Biologic Mechanisms, Polynuclear Compounds, Academic Press, New York, Vol. IIA, 387 pp. Crittenden, B.D. and R. Long (1976) The mechanisms of formation of polynuclear aromatic compounds in combustion systems, in: Freudenthal and P.W. Jones (Eds.), Carcinogenesis: A Comprehensive Survey, Polynuclear

Aromatic Hydrocarbons: Chemistry, Metabolism and Carcinogenesis, Vol. 1, Raven, New York, pp. 209-223. Dipple, A., R.C. Moschel and C.A.H. Bigger (1984) Polynuclear Aromatic Carcinogens, in: C.E. Searle. (Ed.), Chemical Carcinogens, Vol. 1, 2nd edn., ACS Monograph 182, American Chemical Society, Washington DC, pp. 41 163. Enslein, K., and P. Craig (1982) Carcinogenesis: A predictive structure-activity model, J. Toxicol Environ. Health, 10, 521 530. Flesher, J.W., S.R. Myers, C.H. Bergo and J.W. Blake (1986) Bioalkylation of dibenz[a, h]anthracene in rat liver cytosol, Chem.-Biol. Interact., 57, 223-234. Flesher, J.W., S.R. Myers and J.W. Blake (1988) Bioalkylation of polynuclear aromatic hydrocarbons in vivo, A prediction of carcinogenic activity in polynuclear aromatic hydrocarbons: A decade of progress, in: M. Cooke and A.J. Dennis (Eds.), The 10th International Symposium on Polynuclear Aromatic Hydrocarbons, Battelle Press, Columbus, OH, pp. 261-276. Gold, A., R. Sangaiah and S. Nesnow (1988) Structure-activity relationships in the metabolism and biological activity of cyclopenta-fused polycyclic aromatic systems, in: S.K. Yang and B.D. Silverman (Eds.), Polycyclic Aromatic Hydrocarbon Carcinogenesis: Structure-Activity Relationships, Vol. 1. CRC Press, Boca Raton, FL, pp. 177-207. Grimmer, G. (1983) Environmental Carcinogens: Polycyclic Aromatic Hydrocarbons, CRC Press, Boca Raton, FL, 261 Pp. Hecht, S.S., A.A. Melikian and S. Amin (1988) Effects of methyl substitution on the tumorigenicity and metabolic activation of polycyclic aromatic hydrocarbons, in: Yang, S.K. and B.D. Silverman (Eds.), Polycyclic Aromatic Hydrocarbon Carcinogenesis: Structure-Activity Relationships, Vol. 1, CRC Press, Boca Raton, Florida, pp. 97-128. International Agency for Research on Cancer (1983) Polynuclear Aromatic Compounds, Part 1, Chemical, Environmental and Experimental Data, IARC Monograph on the Evaluation of the Carcinogenic Risk of Chemicals to Humans, Vol. 32, 477 pp. International Agency for Research on Cancer (1987) Overall Evaluations of Carcinogenicity, Evaluation of Carcinogenic Risks to Humans, Suppl. 7, An Updating of IARC Monographs Vols. 1-42, 440 pp. Jerina, D.M., and R.E. Lehr (1978) The bay-region theory, A quantum mechanical approach to aromatic hydrocarbon induced carcinogenicity, in: V. Ullrich, I. Roots, A. Hilderbrandt and R.W. Estabrook (Eds.), Microsomes and Drug Oxidation, Pergamon, Oxford, pp. 709-720. Jerina, D.M., R.E. Lehr, M. Schaefer-Ridder, H. Yagi, J.M. Karle, D.R. Thakker, A.W. Wood, A.Y.H. Lu, D. Ryan, S. West, W. Levin and A.H. Conney (1977) Bay-region epoxides of dihydrodiols: A concept which may explain the mutagenic and carcinogenic activity of benzo[a]pyrene and benzo[a]anthracene, in: H.H. Hiatt, J.D. Watson and J.A. Winsten (Eds.), Origins of Human Cancer, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, pp. 639-658. Klopman, G. (1984) Artificial intelligence approach to structure-activity studies, Computer Automated Structure

299 E valuation of biological activity of organic molecules, J. Am. Chem. Soc., 106, 7315-7321. Lavoie, E.J., and J.E. Rice (1988) Structure-activity relationships among tricyclic polynuclear aromatic hydrocarbons, in: S.K. Yang and B.D. Silverman (Eds.), Polycyclic Aromatic Hydrocarbon Carcinogenesis: Structure-Activity Relationships, Vol. I, CRC Press, Boca Raton, FL, pp. 152-175. Mitchell, C.S., G. Klopman and H.S. Rosenkranz (1986) Computer automated evaluation of mutagenicity and carcinogenicity of selected polycyclic aromatic hydrocarbons, in: M. Cook and A.J. Dennis (Eds.), Polynuclear Aromatic Hydrocarbons: Chemistry, Characterization and Carcinogenesis (9th Int. Symp.), Battelle Press, Columbus, OH, pp. 611-624. Nesnow, S., M. Argus, H. Bergman, K. Chu, C. Frith, T. Helmes, R. McGaughy, V. Ray, T.J. Slaga, R. Tennant and E. Weisburger (1986) Chemical carcinogens, A review and analysis of the literature of selected chemicals and the establishment of the Gene-Tox program, Mutation Res., 185, 1-195. Nord6n, B., U. Edlund and S. Wold (1978) Carcinogenicity of polycyclic aromatic hydrocarbons studied by SIMCA pattern recognition, Acta Chem. Scand., B32, 602-608. Richard, A.M., J.R. Rabinowitz and M.D. Waters (1989) Strategies for the use of computational SAR methods in assessing genotoxicity, Mutation Res., 221, 181-196. Rosenkranz, H.S. and G. Klopman (1989) Mechanistic insights gained from an analysis of carcinogenic polycyclic aromatic hydrocarbons with the Computer Automated Structure Evaluation system, J. Am. Coll. Toxicol., 8, 1091-1101. Rosenkranz, H.S., C.S. Mitchell and G. Klopman (1985) Artificial intelligence and Bayesian decision theory in the prediction of chemical carcinogens, Mutation. Res., 150, 1-11. Roussel, O.P., O. Chalvet, B. Ekert, J.M. Lhoste, J. Mispelter, S. Saguem and F. Zajdela (1988) Their biological activities,

structure-activity relationships and metabolic activation, in: S.K. Yang and B.D. Silverman (Eds.), Polycyclic Aromatic Hydrocarbon Carcinogenesis: Structure-Activity Relationships, Vol. I, CRC Press, Boca Raton, FL, pp. 67-88. USPHS (1951-1985) Survey of compounds which have been tested for carcinogenic activity, U.S. Public Health Service Publication No. 149 and subsequent updating volumes. Woo, Y.-T. and J.C. Arcos (1981) Environmental chemicals, in: J.M. Sontag (Ed.), Carcinogens in Industry and the Environment, Marcel Dekker, New York, pp. 167-281. Woo, Y.-T., and J.C. Arcos (1989) Role of structure-activity relationship analysis in evaluation of pesticides for potential carcinogenicity, in: N.M. Ragsdale and R.E. Menzer (Eds.), Carcinogenicity and Pesticides - - Principles, Issues and Relationships, ACS Symposium Series 414, American Chemical Society, Washington, DC, pp. 175-200. Woo, Y.-T., D.Y. Lai, J.C. Arcos and M.F. Argus (1985) Chemical Induction of Cancer: Structural Basis and Biologic Mechanisms, Vol. IIIB (Aliphatic and Polyhalogenated Carcinogens), Academic Press, New York, 598 PP. Woo, Y.-T., D.Y. Lai, J.C. Arcos and M.F. Argus (1988) Chemical Induction of Cancer: Structural Basis and Biologic Mechanisms, Vol. IIIC (Natural, Metal, Fiber and Macromolecular Carcinogens), Academic Press, New York, 869 pp. Yang, S.K., and B.D. Silverman (Eds.) (1988) Polycyclic Aromatic Hydrocarbon Carcinogenesis: Structure-Activity Relationships, Vol. I, CRC Press, Boca Raton, FL, Vol. I, 213 pp. and Vol. II, 210 pp. Yuan, M., and P.C. Jurs (1980) Computer-assisted structureactivity studies of chemical carcinogens: a polycyclic aromatic hydrocarbon data set, Toxicol. Appl. Pharmacol. 52, 294-312.

300

APPENDIX Listings of PAH compounds, experimental carcinogenic activities and CASE predictions for 3 databases: LEARN, TEST AND VALIDATE Q U A N T I T A T I V E A C T I V I T Y SCALE:

C O N D E N S E D A C T I V I T Y SCALE:

HEADINGS:

L = Non-carcinogen LM,M, HM,H = K n o w n or suspect c a r c i n o g e n with low m o d e r a t e (LM), moderate (M), high moderate (HM), or high (H) potency INACTIVE " - " M A R G I N A L "mar" ACTIVE " + "

P# = P A R E N T PAH label in F i g 1 E X P E R I M E N T A L A C T I V I T I E S = quantitative and c o n d e n s e d e x p e r i m e n t a l c a r c i n o g e n i c activities E X P E R T J U D G E M E N T = " e x p e r t j u d g e m e n t " qualitative and c o n d e n s e d c a r c i n o g e n i c activity prediction C A S E = C A S E qualitative prediction AF# = A C T I V E f r a g m e n t # from Fig. 4 IF# = I N A C T I V E f r a g m e n t # from Fig. 5

I. "LEARN" PAH DATABASE # 1. 2. 3. 4. 5 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18 19. 20. 21. 22. 23. 24. 25. 26 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42.

(78 TOTAL)

NAME NAPHTHALENE NAPHTHALENE, 2-MEqHYL ANTHRACENE ANTHRACENE,2-METHYL ANTHRACENE,9-METHYL ANTHRACENE,9,10-DIMETHYL PHENANTHRENE PHENANTH REN E, 1-M ETHYL-7-(1-METHYLETHYL) NAPHTHACENE BENZ[A]ANTHRACENE BENZ[A]ANTH RACENE, 1-METHYL B ENZ[A]ANTH RACEN E, 2-M ETHYL BENZ[A]ANTH RACENE,3-M ETHYL BENZ[A]ANTH RACEN E,4-M ETHYL B ENZ[A]ANTH RACEN E,5-M ETHYL 8ENZ[A]ANTHRACENE,6-METHYL BENZ[A]ANTHRACENE,7-METHYL BENZ[A]ANTHRACENE,8-METHYL 8ENZ[A]ANTHRACENE,9-METHYL BENZ[A]ANTHRAOENE,10-METHYL BENZ[A]ANTHRACENE, 11-METHYL BENZ[A]ANTHRACENE, 12-METHYL BENZ[A]ANTHRACENE,7-ETHYL BENZ[A]ANTHRACENE,8-PROPYL BENZ[A]ANTHRACENE, 1,12-DIMETHYL BENZ[AIANTHRACENE,7,12-DIMETHYL B ENZ[A]ANTH RACENE,5,6-DIHYDRO-7,12-DIMETHYL BENZ[A]ANTHRACENE,7,12-DIHYDRO-7,12-DIMETHYL BENZ[A]ANTHRACENE,8,9,10,11-TETRAHYDRO-7,12-DIMETHYL CHRYSENE CHRYSENE,1-METHYL CHRYSENE,4-METHYL CHRYSENE,5-METHYL CHRYSENE,6-METHYL BENZO[C]PHENANTHRENE BENZO[CIPHENANTHRENE,2-METHYL BENZO[C]PH ENANTHREN E,3-METHYL BENZO[C]PHENANTHRENE,4-METHYL BENZO[CIPHENANTHRENE,5-METHYL BENZO[C]PHENANTHREN E,6-M ETHYL BENZO[C]PH ENANTHRENE,5,8-DIMETHYL TRIPHENYLENE,1-METHYL

P#

EXPERIMENTAL ACTIVITIES

(P1) (P1) (P2) (P2) (P2) (P2) (P3) (P3) (PS) (P6) (P6) (P6) (P6) (P6) (P6) (PG) (P6) (P6) (P6) (PG) (P6) (P6) (P6) (P6) (P6) (P6) (P6) (P6) (P6) (P7) (P7) (P7) (P7) (P7) (P8) (P8) (P8) (P8) (P8) (P8) (P8) (P9)

LM/M LM L/LM L/LM L / LM M L L L/LM LM L L L L L LM/M HM M L/LM L / LM L/LM HM M/HM L/LM L H L/LM L/LM LM/M LM L/LM L/LM M/HM L/LM LM/M L/LM LM/M LM/M M M L L

+ a + b mar mar mar +

CASE + +

AF#

IF#

8 8 2 2

+ +

4 4 1

mar b +

+

1,2

+ + + mar mar mar + + mar

+ + + + + + + + +

2 1,2 1,2,4 1,2 1,2 1,2 1 1 1,2,4 1,2

+ mar mar + + mar mar + mar + mar + + + +

+

1,4

+ + + +

1 1 1 1 1 1 1 1 1 1 1 1 1

1,4 2 2 2,3 2 2 2 2

3

+ + + + + + + +

3

1.3

301

43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78.

PYRENE PYRENE,1-METHYL PYRENE,2-METHYL PYRENE,4-METHYL PYRENE, 1,2,3,3a,4,5-HEXAHYDRO PENTACENE BENZO[B]CHRYSENE BENZO[A]PYRENE BENZO[EJPYRENE PERYLENE BENZO[B]FRIPHENYLENE,10,11,12,13-TETRAHYDRO DIBENZ[A,H]ANTHRACENE DIBENZ[A,J]ANTHRACEN E DIBENZ[A,J]ANTH RACENE, 1,2,3,4-TETRAHYDRO DIBENZ[A,J]ANTH RACENE, 1,2,3,4,8,9-H EXAHYDRO DIBENZO[DEF,P]CHRYSENE DIBENZO[FG,OP]NAPHTHACENE DIBENZO[DEF,MNO]CHRYSENE DIBENZO[DEF,MNO]CHRYSENE,1,2,3,7,8,9-HEXAHYDRO NAPHTHO[ 1,2,3,4-DEF]CHRYSEN E BENZO[GHI]PERYLENE CORONENE ACENAPHTHYLENE ACENAPHTHYLENE, 1,2-DIHYDRO 4-H-CYCLOPENTA[DEF]PHENANTHRENE 11-H-BENZO[A]FLUORENE 7H-BENZO[C]FLUORENE FLUORANTHENE BENZO[B]FLUORANTHENE BENZO[J]FLUORANTHENE BENZO[K]FLUORANTHENE BENZO[GHI]FLUORANTHENE 3-METHYLCHOLANTHRENE 3-METHYLBENZ[J]ACEANTHRYLENE 3-METHYLCHOLANTHRENE,11,12-DIHYDRO INDENO[1,2,3-CD]PYRENE

(P10) (P10) (P10) (P10) (P10) (P 11 ) (P12) (P13) {P14) (P15) (P 16) (P17) (P18) (P18) (P18) (P26) (P27) (P28) (P28) (P29) (P30) (P44) (P47) (P47) (P48) (P49) (P50) (P51) (P54) (P55) (P56) (P57) (P58) {P58) (P58) (P60)

L L L L L L L/LM H LM L L M/HM L/LM L/LM L/LM H L M L/LM HM/H LM L L/LM L L L/LM L/LM L HM/H HM/H M L H H L M

4

2 mar + +

+ +

1 1,3,6 5,5

+ mar mar mar +

+ + +

1 1 1

+

1,3,5,6

+ mar + +

+

6,7

+ +

1,2,3,5 5

+

1

+ + -° +" + + +" +

1,2,3 1

2 4 1

mar mar mar + + + + + +

5 1,2 1,2 2 2,7

a Based on unaudited preliminary NTP (National Toxicology Program) data showing w e a k to moderately active positive data. b Data adequacy may be marginal; conservative predictions err towards greater activity. * incorrect CASE prediction

II.

#

"TEST"

5. 6.

7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. lB. 19. 20.

DATABASE

(106

NAME

2 RING PAH's

1. 2. 3. 4.

PAH

TOTAL)

P#

EXPERT JUDGEMENT

CASE

AF#

a

NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE, NAPHTHALENE,

1-METHYL 1-ETHYL 2-ETHYL 2-ETHENYL 1-(2-PROPENYL) 2-(1-METHYLETHYL) 1-BUTYL 2-BUTYL 1-(2-METHYLPROPYL) 1-(1,1-DIMETHYLETHYL) 2-(1,1-DIMETHYLETHYL) 1-(1-CYCLOPENTEN-1-YL) 1-(1-CYCLOHEXEN-1-YL) 2-HEXYL 2-(1-CYCLOPENTEN-I-¥L) 2-(1-CYCLOHEXEN-1-YL) 1-UNDECYL 1-PENTADECYL 1-(1-DECYL-I-UNDECENYL) 1,2-DIMETHYL

(P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1) (P1)

LM LM LM M / HM M/HM L / LM L / LM L/LM L L L LM LM L LM LM L L L LM

+ + + + + mar mar mar

-* -° + + - ° +

6 8

+

8 6

+ +

+* -* -* + + +

+



+ +

8

8 8 8

IF._##

302

2;. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47.

NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE NAPHTHALENE

3 RING PAH's

48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68 69. 70. 71. 72. 73. 74.

,3-DIMETHYL ,4-DIMETHYL ,5-DIMETHYL ,6-DIMETHYL ,7-DIMETHYL ,8-DIMETHYL 2,3-DIMETHYL 2,6-DIMETHYL 2,7-DIMETHYL 1,2-DIETHYL 1-METHYL-7-(1-METHYLETHYL) 2-METHYL-I-PROPYL 1,8-DI-1-PROPYNYL 2,6-BIS(1,1-DIMETHYLETHYL) 2,7-BIS(I, I-DIMETHYLETHYL) 2-BUTYL-3-HEXYL 7-BUTYL- 1-HEXYL 2,3-DIHEXYL 1,3,6-TRIMETHYL 1,4,5-TRIMETHYL 1,4,6-TRIMETHYL 1,6,7-TRIMETHYL 2,3,6-TRIMETHYL 1,6-DIMETHYL-4-(1-METHYLETHYL) 1,4-DIMETHYL-5-OCTYL 2,6-DIMETHYL-3-OCTYL 1,2,3-TRIM ETHYL-4-PROPENYL

(P1) (m) (P1) (P1)

(m) (m) (P1) (PI) (P1) (m) (m) (P1) (Pl) (m) (m) (P1) (Pl) (P1) (P1) (P1) (P1) (P1) (P1) (P1)

LM LM LM LM LM LM LM LM LM LM LM LM LM/M L L L L L L/LM L/LM L/LM L/LM L/LM L/LM L L LM

(P2) (P2) (P2) {P2) (P2) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P3) (P4)

L/LM L/LM HM L L L L L L/LM L L L L L L L L L L L L L L L L L LM

(P6) (P7) (PT) (P7) (P7) (PS) (PS) (Pg) (Pg) (P9) (P10) (P10) (m0)

L LM

(PT)

(Pl)

+ + 4+ 44-

t

44+ + +

m

44-

mar mar mar mar mar mar

+

a

ANTHRACENE, I-METHYL ANTHRACENE, 2-ETHYL ANTHRACENE, 9-ETHENYL ANTHRACENE, 9-BUTYL ANTHRACENE, 9-DODECYL PHENANTHRENE, 1-METHYL PHENANTHRENE, 2-METHYL PHENANTHRENE, 3-METHYL PHENANTHRENE, 4-METHYL PHENANTHRENE, 9-METHYL PHENANTHRENE, 2-ETHYL PHENANTHRENE, 9-ETHYL PHENANTHRENE, 9-BUFYL PHENANTHRENE, 9-NONYL PHENANTHRENE, 2-DODECYL PHENANTHRENE, 9-DODECYL PHENANTHRENE, 2,3-DIMETHYL PHENANTHRENE, 2,5-DIMETHYL PHENANTHRENE, 2,7-DIMETHYL PHENANTHRENE, 3,6-DIMETHYL PHENANTHRENE, 4,5-DIMETHYL PHENANTHRENE, 9,10-DIMETHYL PHENANTHRENE, 3,9-81S(1,1-DIMETHYLETHYL) PHENANTHRENE, 2,3,5-TRIMETHYL PHENANTHRENE, 2,4,5,7-TETRAMETHYL PHENANTHRENE, 2,4,5,6-TETRAMETHYL 1H-PHENALENE

mar mar +

+ +* +° +* +*

mar +*

+ +

BENZ[A]ANTHRACENE,7,12-DIHYDRO CHRYSENE, 3-METHYL CHRYSENE, 5-ETHYL CHRYSENE, 6-OCTYL CHRYSENE, 11-BUTYL-1,2,3,4-TETRAHYDRO BENZO[C]PHENANTHRENE, 1-METHYL BENZO[C]PHENANTHRENE, 1,12-DIMETHYL TRIPHENYLENE TRIPHENYLENE, 2-METHYL TRIPHENYLENE, 1,2,3,4-TETRAHYDRO PYRENE, 1-DECYL PYRENE, f,3-DIMETHYL PYRENE, 1,9-DIMETHYL

+

+ + +"

L L/LM L L L L L L L

3,4 1 1 1 1 1

1

M L/LM

1

3,4

>3 RING PAH's

75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87.

1,3,4 4 4 1 1 1 1 1 1 1 1 1 1 1 1

mar mar

+

303

88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106.

PYRENE, 1,6-BIS(1,1-DIMETHYLETHYL) PYRENE, 4,5-DIHYDRO PYRENE, 4-DECYL-1,2,3,6,7,8-HEXAHYDRO PERYLENE, 3-HEXYL BENZO[B]TRIPHENYLENE, 9,14-DIHYDRO BENZO[B]TRIPHENYLENE, 1,2,3,4,10,11,12,13-OCTAHYDRO DIBENZ[A,J]ANTHRACENE, 7,14-DIHYDRO DIBENZ[A,J]ANTHRACENE, 1,2,3,4,4A,5,6,14B-OCTAHYDRO BENZO[G,H,I]PERYLENE, 3,4-DIHYDRO BENZO[G,H,I]PERYLENE, 3,4,11,12-TETRAHYDRO BENZO[G,H,I]PERYLENE, 5,6,7,8,9,10-HEXAHYDRO PHENANTH RO[3,4-C]PHENANTH RE NE OVALENE ACENAPHTHYLENE, 1,2-DIHYDRO-5-PENTADECYCL 15H-CYCLOPENTA[A]PHENANTHRENE, 16,17-DIHYDRO-3 -( 1-M ETHYLETHYL) 11-H-BENZO[B]FLUORENE CYCLOPENTA[CD]PYRENE ANTHRA[1,2-E]ACEPHENANTHRYLENE BENZ[A]IN DENO[ 1,2,3-D E]NAPHTHACEN E

(P10) (P10) (P10) (P15) (P16) (P16) (P18) (P18) (P30) (P30) (P30) (P31) (P46) (P47)

L L ! L L L L/LM L L L L L L L

mar

(P52) (P53) (P59) (P62) (P63)

L L HM L LM

d

÷*



1

+ b

+

+

+

7 1 1,2

2 2

a The "expert judgement" ratings of the 2- and 3- ring compounds should be considered speculative and used with caution; the CASE predictions for these compounds were derived from very limited data (refer to discussion in text). b Limited positive experimental carcinogenicity data available (see e.g. IARC, 1983; Nesnow et al., 1986). * CASE prediction in disagreement with "expert judgement" prediction.

III. "VALIDATE" PAH DATABASE # 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. *

(24 TOTAL)

NAME TRIPHENYLENE (TEST #82) BENZO[B]TRIPHENYLENE BENZO[B]TRIPHENYLENE, 9,14-DIHYDRO (TEST #92) BENZO[A]NAPHTHACENE BENZO[C]CHRYSENE BENZO[G]CHRYSENE PICENE DIBENZO[C,G]PHENANTHRENE DIBENZO[B,G]PHENANTHREN E PENTAPHENE DIBENZO[A,H]PYRENE DIBENZO[A,I]PYRENE NAPHTHO[2,3-A]PYRENE NAPHTHO[2,3-E]PYRENE DIBENZO[B,K]OHRYSENE DIBENZO[A,J]NAPHTHACENE DIBENZO[A,C]NAPHTHACENE ANTH[ 1,2-A]ANTHRACENE BENZO[C]PENTAPHENE NAPHTHO[1,2-B]FRIPHENYLENE HEXACENE BENZO[B]PENTAPHENE TRIBENZO[A,E,I]PYREN E DIBENZO[A,E]FLUORANTHENE Incorrect CASE prediction

P#

EXPERIMENTAL ACTIVITIES

(P9) (P16) (P16) (P19) (P20) (P21) {P22) (P23) (P24) (P25) (P32) (P33) (P34) (P35) (P36) (P37) (P38) (P39) (P40) (P41) (P42) (P43) (P45) {P61)

L LM L L M M L L L L H LM M L L L M L L L L L M HM/H

CASE

AF#

+

+

2,2

+ +

+ + +° +*

1,2 1 1 1 1 1

+ + +

+ + +*

1,3 1,3 6 5

+" +

1,2 2,2

+*

1 1

IF.__##

2,2

2 2 2 2

+

2 2 2 2 2

+ +

+ +

1,2,3 1,2,3

A CASE-SAR analysis of polycyclic aromatic hydrocarbon carcinogenicity.

A CASE SAR analysis was performed on a selected database of PAHs to investigate the possible use of the CASE method as an aid for preliminary assessme...
1MB Sizes 0 Downloads 0 Views