Mutation Research, 266 (1992) 1-6 © 1992 Elsevier Science Publishers B.V. All rights reserved 0027-5107/92/$05.00

MUTREV ~310

INTERNATIONAL COHHISSION FOR PROTECTION AGAINST ENVIRONHENTAL HUTAGENS AND CARCINOGENS

A method for combining and comparing short-term genotoxicity test data: Preface A R e p o r t from I C P E M C C o m m i t t e e 1

Members of Committee 1: D.J. Brusick, J. Ashby, F.J. de Serres, P.H.M. L o h m a n , T. Matsushima, B.E. Matter, M.L. M e n d e l s o h n , D.H. M o o r e II, S. N e s n o w and M.D. W a t e r s Additional assistance with computer programming was provided by Walter Lohman (Accepted 7 October 19911

Background

Assessing the genetic activity of a chemical has typically been approached by conducting a series of tests (battery) measuring genotoxicity. Genetic toxicology assays included in batteries encompass a large number of mechanisms and target organisms under the premise that species and mechanism diversity is important for a thorough evaluation of test agents. The basis of the premise lies with the knowledge that human genetic disease can result from an array of genotoxicity including gene mutation, chromosomal aberrations and changes in chromosome number. Consequently, a single assay would not provide sufficient breadth Correspondence: DJ. Brusick, Ph.D., Hazleton Washington, 9200 Leesburg Turnpike, Leesburg, VA 22182 (U.S.A.). ICPEMC is affiliated with the International Association of Environmental Mutagen Societies (IAEMS) and the Institut de la Vie.

of detection to protect the gene pool from agents acting through mechanisms it does not detect. The application of assays detecting genotoxicit), to chemical assessment has gone through several phases. Initially, the health consequence of concern was germ cell mutation, and data from rodent and Drosophila assays for heritable genetic damage dominated testing strategies. Following several key publications by Ames (1973) and other investigators suggesting that carcinogens are mutagens, the emphasis of genetic toxicology shifted to mechanistic research and prediction of carcinogenic potential. The technological legacy of these two orientations was the development and use of a broad range of short-term mammalian and submammalian assays measuring mutation and other manifestations of genotoxicity or cell transformation. Most of the assays were developed during the past 20 years and were designed to be rapid and comparatively inexpensive. Many were proposed as surrogates for rodent germ cell or carcinogenicity bioassays.

ICPEMC Committees 1 and 2 were established in 1979 to review the performance of short-term tests used to predict germ cell and carcinogenic effects in mammals, respectively (ICPEMC Committee 1 Report, 1983; ICPEMC Committee 2 Report, 1982). The conclusion expressed in both Committee reports was that the endpoints and target cells used in the short-term assays appeared to be sufficiently dissimilar to the phenomena of germ-cell mutation and carcinogenesis so as to preclude accurate prediction using short-term assays.

Aims and objectives During the course of the initial Committee 1 activities, testing strategies in genetic toxicology evolved from the reliance on single tests to the use of sets of short-term assays and in 1983, the task of Committee 1 was restructured to recommend methods for evaluating cumulative, heterogeneous genotoxicity data sets and to determine whether information derived from complex data sets would predict more closely germ cell and/or carcinogenic phenomena. The objectives of the Committee were directed toward three broad aims: (1) To establish a formalized data compilation, integration and interpretation scheme which drew upon the intuitive processes used by genetic toxicologists, and was complemented by objective numerical analysis and machine learning. This scheme would be required to cope with redundant data, test disagreement, and sporadicallyfilled data sets. It would involve, at least at the outset, minimal a priori knowledge about the involved chemicals and tests. (2) To try to express the results of the assessment in a single composite number describing the strength of a positive or negative response, and to do this in such a way that comparisons could be made at all levels of the analysis from the single test to the combination of all tests. (3) To evaluate test and chemical performance through statistical analysis of accumulated data, and to use the resultant information to rank and cluster chemicals, to define the accuracy of the rankings, and eventually to predict genetic-based toxicity such as cancer and heritable mutation.

The Committee's starting point was a semiquantitative, weight-of-evidence method published by Brusick (1981) which was designed to combine toxicologic results from heterogenous assays into a single score by using dose information and a series of factors applied to predetermined test weights. While the broad strategy of that design survives in the Committee's product, almost all of the details have been replaced through a slowly evolving, trial-and-error process. Thus, the notion of weighted tests has been replaced by an optional weighting system that can be invoked as desired. The method of handling dose information is similar to the original scheme but has been substantially elaborated to account for differences in test sensitivity. Modifying factors have been carefully honed, and the method of coalescence has been structured to produce a single consensus score.

Brief system description Shown in Fig. 1 is the flow of data in the Committee 1 system, from individual literature entries through the reduction process to a single numerical agent score (Sa) for mutagenicity. Each literature entry on a chemical test combination is processed into a replicate score by multiplying together the definitive sign of response, a defining dose, an entry for metabolic activation, and an entry for target localization. For a positive sign of response, the defining dose is the logarithm of the smallest reported dose giving a positive outcome; for a negative sign of response, it is the logarithm of the largest dose tested giving a negative outcome. All the values for a given testchemical combination (the replicates) are averaged together into a composite test score (St) for that chemical. Tests are then grouped by genetic endpoint and phylogenetic criteria into classes, and the classes into two families, one for in vitro tests and one for in vivo tests. Finally, the two families are grouped into the agent score for the chemical. Averaging is done at each step by a method that preserves the weights given to the number of replicates, number of tests per class, and, optionally, the a priori importance of the test or class.

HIERARCHICAL DATA REDUCTION

Replicate Dataset for a Specific

Test I

!

I.o

I oo ,oo.,o,o

oo.o !

I

Classes Combined into I in vitro Family Score In vivo Family Score

I Families Combined into Agent Score for the

Chemical

I

g

I !

Fig. 1.

For purposes of development and evaluation of the system, the following minimal criteria were invoked: that the tests and chemically-specific results pass IARC peer review; that a minimal of 3 in vitro and 2 in vivo tests be carried out conventionally for each chemical; and that each test have been used on at least 5 chemicals, in the data to be presented, 85 tests and 113 chemicals meet these criteria. The calculations are done fully automatically on a general purpose digital computer with a program that includes the database and graphical routines. A usable but less complete version is available for personal computer. Other approaches to short-term test evaluation Other methods have been proposed to evaluate and interpret results from batteries of shortterm tests. During the mid-1980's, Waters and co-workers developed a profile display of mutagenicity data which illustrated graphically the positive and negative results for all tests conducted on a chemical. The profiles, known as genetic activity profiles (GAPs), have undergone several modifications and enhancements since their introduction and are currently available on PC-based software (Waters et al., 1988). The Committee 1 project has relied on assay identification, data formatting and the dose-scoring pro-

cedures developed by Waters and colleagues, and has also used their data-basing system. The primary difference between the two approaches is that while the GAPs process the individual testchemical result and display this in parallel for all tests, the Committee 1 emphasis is on composite scoring, statistical analysis and reduction of a chemical to a single agent score representing its overall genotoxicity. Some investigators have attempted to develop quantitative methods to assess short-term test results using computer modeling and statistical approaches to cluster tests and test data or to rank order the potency of genotoxic agents as potential animal carcinogens. Most of these models have been tried on small data sets (Benigni and Giuliani, 1988) or with only a single shortterm assay (Parodi et ai., 1990). Rosenkranz and colleagues (Chankong et al., 1985) developed a Bayesian approach to the contingency relationships between short-term tests and relevant endpoints such as carcinogenesis. A fundamental aspect of this system as well as other SAR-modelling methods is the reliance on a large primary data-base of short-term test results. Using structure activity relationships (Ashby and Tennant, 1988) alone or in combination with computer programs loaded with short-term test results (Rosenkranz and Klopman, 1988; Enslein, 1988) models have been developed to predict biological activity for new and, as yet untested, entities, and to construct optimal test batteries. The results have been variable with good success for some chemical classes and poor predictive performance for many others. in anticipation of future applications of the Committee 1 scoring system, Nesnow (1990) has published a Committee 1 working paper describing the rank ordering of rodent carcinogens by a weighted averaging method. This method is similar to the mutagenicity methods to be described here, except that it does not coalesce positive and negative results. His method relies heavily on the data-basing of animal carcinogenicity by Gold and colleagues (1984). Assumptions and risks inherent in this approach The primary assumptions on which this approach is based are:

(1) That the mutagenicity of a chemical can be relevantly defined by the ensemble of the chemical's effects in the universe of short-term in vitro and in vivo tests for genotoxicity. This proposition is essentially self-evident, and becomes debatable only at later stages of the argument when specific strategies are suggested for defining the ensemble. (2) That "defining doses" adequately capture the dose information. Alternate approaches could involve the strength of the mutational response or the entire dose-response relationship, rather than the simple threshold-like method of defining doses. (3) That averaging within and among tests is a reasonable way to deal with discordant data and differing signs of outcome. The carcinogenesis community adamantly refuses to accept any notion of combining positive and negative responses; the mutagenesis community seems less concerned but has generally ignored the possibility in spite of the common occurrence of events of mixed sign. Averaging is used in almost all other biological or statistical methodology. (4) That a consensus-driven system is the best way to estimate primary effects. This will work provided there is sufficient community of response among the tests to minimize the risk that narrowly-based mechanisms will be swamped out by majority positions. While our experience with the system strongly suggests that differences across classes are major, we also find that a narrowly-expressed mechanism is an unusual event. Further study of this proposition is in order. (5) That a one-dimensional score is sufficient to compare test or chemical outcomes. This is a working hypothesis and needs more experience before it can be evaluated. (6) That background and other technical information on tests or chemicals is not necessary for a reasonable analytical outcome. This is an assumption born of expediency, since Committee 1 was not prepared to design an expert system that could handle the complex and non-numerical detail that would be required. (7) That inconsistencies of application, errors and other sources of variability for literature data are sufficiently small to allow this type of aggre-

gation to be successful. Again, at the beginning this was an assumption of expediency. The data were all screened for acceptability, and there were no obvious, general rules for reaching beyond that to the a priori selection of ideal data sets. The method creates the potential for an a posteriori method to eliminate highly variable or discordant tests, or outlying individual results. In retrospect, the system has adequate statistical resolution to allow good discrimination of tests, classes and chemicals. Tighter data would be better, and the needs of the system may eventually motivate the generation of such data; but for the present, much can be done with what we have. Some risks inherent in using the approach are: (1) That the process of merging data by weighted averaging carries with it a risk of obscuring important minority data, such as a chemical that responds to only one or a few tests, or a test that responds to only one or a few chemicals. While this is clearly a risk of using the agent score in isolation when evaluating chemicals, the system includes detailed tabular and graphical output from all levels of the ana!vsis such that the subcomponents of the response are preserved and highly visible. Similar controls are available when evaluating tests. (2) That a single score for classifying chemicals for toxicity has potential for misuse or misin. terpretation. One must always be sensitive to the potential misuses of a single value when describing or summarizing a complex process. However, useful examples of this process already exist in general toxicology (e.g., the index values for eye and skin irritancy, and values such as the EPA's "Reference Dose"). Such scores attempt to summarize a complex process or situation for purposes of making comparisons. Without these sireplifiers, it would be much more difficult to com. pare the relative biological activity of two agents, a process that is fundamental to the progress of safety testing. Any such simplification inevitably loses detail and subtlety by converting descriptive and multidimensional measures into one number. They must be carefully designed and tested, well understood by the users, and preferably paralleled with more elaborate analyses to be justified.

(3) That the carcinogenic process involves more than genotoxic mechanisms, yet the present system has little or no provision for acquiring or using non-genotoxic information. This is clearly a lim.;tation of using mutational information as a single descriptor for any broad aspect of carcinogenicity. We anticipate that the ICPEMC system will share in the inefficacy of using genotoxicity to predict carcinogens. However, there remains an important need to define mutagenicity, for its own sake, as well as for the definition of what is or is not genotoxic among carcinogens. (4) That all published data may not be of equal quality. The statistical aspects of the system provide a build-in, and for genotoxicity, a novel measure of the noise in the data. This allows use of open literature entries within the conventional statistical safeguards of confidence limits and measures of significance, and is one of the most important of its contributions. Given such tools, one can then search for erroneous data or conversely for highly conforming data sets. Also, one can use any of a variety of methods to eliminate outliers, or the outliers can be looked at specifically to be certain that a hidden effect has not been obscured by the analysis.

Summary of Committee 1 progress Following this overview are three Committee 1 working papers which describe the method and discuss its application to the field of genetic toxicology. The first technical paper in the series describes the system and its tabular and graphical outputs. The second paper approaches the dose problem in detail, including the merging of positive and negative results, and correcting for differences in the amounts of chemical used from test to test. It applies methods for sparse matrices, and using a series of statistical yardsticks, introduces major improvements in resolution of the system. The third paper discusses the general attributes of the results, identifies the three primary properties of the test-chemical interactions that the system is responding to, and tries to interpret what has been learned. It also lists the agent scores for the chemicals in the current data

base. This final paper is the summary of the findings and their interpretation. The results of the Committee activities, as will be described in the following papers, is a complex, fully automated, set of procedures for evaluation, combination and interpretation of shortterm test data-bases. It is an analytical tool capable of measuring predictive performance and internal consistency of individual tests and groups of tests. It is applicable, retrospectively, to large data sets, but is also useful for monitoring an unfolding series of tests in progress, much like the ongoing decision rules for a clinical trial. And, it is well suited to agglomerating data by chemical to assess the mutagenic potential of the chemical in the context of all other chemicals that have been processed by the system. Even with a brief encounter with the system one can readily see a variety of ways it can be used. The power of this tool should increase greatly as the data-base is expanded, extraneous or unstable material is excluded, and the selflearning capabilities of the system are invoked. To promote this evolution, we will develop the sets of factors needed for the differing endpoints of general mutagenicity, heritable mutagenicity and carcinogenicity, and will examine their respective performance.

References Ashby, J, and R.W. Tennant (1988) Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. Mutation Res., 204, 17-115. Benigni, R., and A. Giuliani(1988) Predictingcarcinogenicity with short-term tests: biological models and operational approaches, Mutation Res., 205, 227-236. Brusick, D. (1981) Unified scoringsystem and activitydefinitions for results from in vitro and sub-mammalianmutagenesis test batteries, in: E.D. Copanhaven, C.R. Richmond and PJ. Walsh (Eds.), Health Risk Analysis, Franklin Institute Press, Philadelphia,pp. 273-286. Chankong, V., Y.Y. Haimes, H.S. Rosenkranzand J. Pet-Edwards (1985) The carcinogenicityprediction and battery selection (CPBS) method: a Bayesian approach. Mutation Res., 153, 135-166. Enslein, K. (1988) An overviewof structure-activityrelationships as an alternative for carcinogenicity,mutagenicity, dermal and eye irritation, and acute oral toxicity,Toxicol. Ind. Health, 4, 479-498.

Gold, L.S., C.B. Sawyer, R. Magaw, G.M. Backman, M. de Vcciana, R. Levinson, N.K, Hooper, W.R. Havender, L. fternstein, R. Peto, M.C. Pike and B.N. Ames (1984) A carcinogenic potency database of the standardized results of animal bioassays, Environ. Health Perspect., 58, 9-319. IARC (1987) Monographs on the Evaluation of Carcinogenic Risks to Humans, Suppl. 6, Genetic and Related Effects: An Updating of Selected IARC Monographs from Vols. 1-42, Lyon, France. ICPEMC: Committee 1 Final Report (1983) Screening strategy for chemicals that are potential germ-cell mntagens in mammals, Mutation Res., 114, 117-177. ICPEMC: Committee 2 Final Report (1982) Mutagenesis testing as an approach to carcinogenesis, Mutation Res., 99, 73-91. Nesnow, S. (1990) A multi-factor carcinogen potency ranking

scale for comparing the activity of chemicals, Mutation Res., 239, 83-115. Parodi, S., M. Taningher, P. Romano, S. Grilli and L. Sanit (1990) Mutagenic and carcinogenic potency indices and their correlation, Teratogenesis, Carcinogenesis and Mutagenesis, 10, 177-197. Rosenkranz, H.S., and G. Klopman (1988) CASE, The Computer Automated Structure Evaluation Method, correctly predicts the low mutagenicity for Salmonella of nitrated cyclopenta-fused polycyclic aromatic hydrocarbons, Mutation Res., 199, 95-101. Waters, M.D., H.F. Stack, A.L. Brady, P.H.M. Lohman, L. Haroun and H. Vainio (1988) Use of computerized data listings and activity profiles of genetic and related effects in the review of 195 compounds, Mutation Res., 205, 295-312.

International Commission for Protection Against Environmental Mutagens and Carcinogens. A method for combining and comparing short-term genotoxicity test data: preface. A report from ICPEMC Committee 1.

Mutation Research, 266 (1992) 1-6 © 1992 Elsevier Science Publishers B.V. All rights reserved 0027-5107/92/$05.00 MUTREV ~310 INTERNATIONAL COHHISSI...
474KB Sizes 0 Downloads 0 Views