The Science of the Total Environment, 127 (1992) 187-200 Elsevier Science Publishers B.V., Amsterdam

187

Clusters galore: insights about environrne, tal clusters fi:om prob,.:blllty theory Raymond Neutra~ Shanna Swan a and Thomas M a c k b aCalifornia Department of Health Services, 5900 Hoilis St., Emeryviile CA 94608, USA bUniversity of Southern California Medical School University Park, Los Angeles, CA 90089, USA

ABSTRACT The posterior probability of a causal explanation given that an environmental cancer cluster is statistically significant depends on the prior probability of an environmentally caused cluster, the sensitivity, of the statistical test and its specificity. The prior probability is low, because it is rare t,~ have enough carcinogen in the general environment to cause a relative risk fo cancer high enough to achieve statistical significance in a small geographic area. Tile sensitivity and specificity are not great. The likelihood that a census tract escapes statistically significant elevations in all 80 types of cancer can be calculated. Many of "~he thousands of census tracts will, by chance alone, have at lezst one type of cancer whose elevation is statistically significant. Actual observation from a large cancer registry confi.,~ns this probabilistic prediction. Applying the principles of Bayes' Theore~m would suggest that most statistically significant environmental cancer clusters are not due to environmental carcinogens. One would have to investigate hundreds of environmental cancer clusters to find one with a true environmental cause.

Key words: cancer cluster; screening; Bayes' Theorem

INTRODUCTION

The media and the public have been paying increasing attention to the occurrence of time space clusters of cancer, particularly those that are attributed to environmental chemical and radiological hazards. Although we will focus on cancer, much of ~vhat we will say could be applied to birth defects and other rare diseases. Clusters may come to the attention of public health officials in three ways. First, a concerned citizen or health professional may notice more than the expected number of cancers and report them to the public health department. These 'clusters' are often reported in conjunc-tion with a suspected hazard. A second mode of presentation occurs when there is a hazard of concern to the public and they request a review of a death certificate or cancer registry incidence data to see if any type of cancer occurs

188

R.tNEUTRA El" AL.

at an excessive rate. Finally, an epidemiologist or statistician may have access to death records or cancer registry data and has decided to explore geographical data looking for 'statistically significant clusters' in an exploratory mode. The first mode is the most frequent and the probability pitfalls are more subtle and pernicious, because the factor of 'multiple comparisons' is less obvious than in the second two modes. For these reasons we shall focus on the cluster reported to the health department by concerned citizens. When the public notices an unusual number of cases of cancer in a particular residential area and time period, there are four generic explanations for the excess. The first is chance; the second is migration to that place of persons with an age, race, gender or other 'host-factor' which puts them at inherently high-risk; the third is the adoption of some risky lifestyle; and the fourth is the appearance of a physico-chemical factor which subsequently caused cancer. When clusters occur, the public and the media usually attributed them to the latter cause since potential sources of carcinogens are found near most neighbourhoods. Yet residential and ,;chool cancer cluster investigations have a notoriously poor record for finding physico-chemical explanations. A number of reasons have been given for this poor batting average [1,2]. Some of the less sophisticated studies have gerrymandered space and time to create a cluster, which did not in fact exist. Others have lu~nped heterogeneous diagnostic categories which could not have a common cause or considered cases whose disease arose before moving to the suspect area. The small number of cases available for study and the bias from the usual publicity have been mentioned as obstacles to successful study. Finally establishing exposure status iz uniquely difficult for cancer clusters for several reasons: the exposure event may have been many years in the past and hard to determine through questionnaire or environmental and physiological testing. The miasmatic routes of exposure; air, water and dust, are particularly difficult to study. All these factors do indeed decrease the sensitivity of our efforts to successfully investigate a cancer cluster, but in this article we will stress three additional obstacles which derive from the probabilistic nature of the decision to label the number of concerns in a cluster as 'significantly elevated' and worthy of further investigation. In what follows we will discuss the traditional test of significance as if it were an epidemiological screening test with a sensitivity, specificity, prior probability of physico-chemical causation, and resultant posterior probability of finding a physico-chemical ca,Jse. We will begin with the prior probability. PRIOR PROBABILITY

In order for a residential locality to have sufficient excess cancer cases to

189

INSIGHTS ABOUT ENVIRONMENTAL CI.USTERS FROM PROBABILITY THEORY

be noticed by the general public, there must be a substantial increase over expected cases. To reach levels which are 'statistically significant' implies substantial relative risks for those cancer types displaying annual rates below one per thousand (the majority). Table I shows as an example of the number of cancer eases expected over a decade in a town or census tract with a population of 5000. :Notice that the relative risk required to achieve 'significance' is high among the many types of cancer wi~h low incidence rates. We must ask what kind of carcinogenic dose would need to be present in air, water, or dust, to produce relative risks higher than those seen in most occupational studies, and how likely is it, a priori, that such high levels of exposure occur in residential or school settings? A 'back of the envelope' calculation illustrates this point. Suppose that a carcinogen in a rodent bioassay causes a 50% excess risk of cancer at the maximum tolerated dose. If humans received the same (properl) scaled) dose for a lifetime assume that they experience this excess equally in each of the 70 years of their lifetime. This comes to about seven per thousand excess risk per year. If the cancer type affected had a baseline risk of 1/100 000 this would represent an annLal relative risk of 700. Lowering the daily lifetime dose to 1% of the maximum tolerated dose would lower this relative risk to seven assuming linearity. One could also compress the dose into a l-year period so as to produce a time space cluster with a sevenfold increase and incubation period later. But this would require 70% of the maximum tolerated dose (MTD) during that year (0.01 MTD x 70 years = 0.7 MTD x 1 year). TABLE 1 Expected and 'significant' numbers of cancer cases per decade in a town of 5000: cancers with varying annual incidence rates Annual rate

Expected no./decade

No. needed to give P-value < 0.01

10-3

50

70

RP (P)

1.4

(9.004.i 10 ~

5

12

2.4

10 -s

0.5

4

8.0 (0.002)

10 -6

0.05

2

40

lO -7

0.005

l

200

(0.005)

(o.ool) (0.005)

190

R. NEUTRA El" AL.

There are rare workplace scenarios and medical uses of mutagens in which such high doses are delivered but they are often accompaaied by other easily documented effects like hair loss, or confusion. This degree of contamination of the neighbourhood or actual environmental has rarely been documented and must be rare a priori. It is appropriate therefore that we should ask, how often the study of an unusual number of cancers in a residential location or school setting, has lead to the discovery of a physico-chemical environmental carcinogen. To address this question, we reviewed the substances classified as having sufficient evidence for carcinogenicity in humans by the International Agency for Research on Cancer [3]. We determined how these substances had been originally identified as suspect carcinogens. The results are shown in Table 2. It is clear that a number of carcinogens have been detected by clusters in the work setting, and another group detected by clusters among patients treated with anti-neoplastic agents. Another major group have been identified through systematic, epidemiological study of large groups of workers expo,;ed to animal carcinogens or to substances structurally similar to animal carcinogens. We did find one instance from the recent literature in which an alert clinician discovered enormously high levels of endemic mesothelioma in a rural residential area of Turkey. However, in some cases a community excess was the lead to identification of an occupation of excess.

TABLE 2 Substances with sufficient evidence for carcinocenicity in humans (IARC, 1987) Substance or broad exposure

Discovered by Residential or school cluster

Aflatoxins Aluminum production 4 Aminobiphenyl Mixtures with phenacetin Arsenic and compounds Asbestos Benzene Benzidene Betel quid c tobacco Chlornaphazine

Medical cluster

Occupational cluster

Other

m

m

m

m

N

m

B

191

INSIGHTS ABOUT ENVIRONMENTAL CLUSTERS FROM PROEABILITY THEORY

TABLE 2 (Continued) Substance or broad exposure

Discovered by Residential or school cluster

Bischloromethyl ether Boot manufacture Myleran Chlorainbucil Methyl CCNV Chromium (Cr 6) Coal gasification Coal tar pitches Coal tars Coke Cyclophosphamide Diethylstilbesterol E~onite Furniture making Hematite mining (radon) Iron foundry Isopropyl alcohol (acid process) Manufacture of magenta Melphalan Methoxsalen c U.V. Mineral oils MOPP (Cancer Rx) Mustard gas 2 Naphthylamine Nickel and compounds Estrogen replacement Rx Estrogen non-steroidal Oral contraceptives combined Oral contraceptives sequential Rubber industry Shale oils Soots Talc c fibers Tobacco, smokeless Tobacco smoke Treosulphan Vinyl chloride

Medical cluster

Occupational cluster

Other

m

m

m

m

m

m

D

192

R. NEUTRA ETAL.

Baris [4], a pulmonary specialist treated two patients from the same town of 400 people who were initially diagnosed as having refractory tuberculosis but who on pathological examination after operation were found to have mesothelioma. With the assistance of the International Agency for Research on Cancer [3] a review of tissues revealed a presence of an unusual fiberous mineral and a review of death certificates revealed 23 deaths from mesothelioma over a 4-year period in a village with only 400 occupants. This constituted a relative risk of approximately 9000! Lung cancer rates were also elevated. Air sampling revealed detectable levels of the mineral erionite, which was also found in the quarried stone used to build many of the houses [5]. This is the op.ly example that we could find i~ the literature ia which a residential can~er cluster detected by the public or an alert clinician lead to a defiuite cause, in this case a totally new carcinogen. Note that there was no statistical impediment to demonstrating the cause of this cluster. With a relative risk of 9000 the population attributable risk is virtually one and twenty-three cases provv:le more than enough statistical power. If there is a physico-chemical cause and one can document its presence with a question or a laboratory test there will probably be the power to detect it. This one successful environmental cluster has certain characteristics which distinguish it from the usual cluster presented to public health authorities. First, it was really an unrecognised endemic which had been going on for a long tirr:~ein a remote rural area. It was not due to a transitory exposure. The particular rare cancer (mesothelioma) had p~eviously been known to have its origin in another fibrous mineral. This prompted the pathologist to look for body burdens of fibrous mineral which were present both in the tumor and the normal tissue of the lung. The agent was persistent in the environment and could be measured there as well. Indeed the reason it was so ubiquitous was that it occurred naturally in the area. It was not put there by humans. This is an easier situation to deal with than that of the transient cluster in a given location, because the transience suggests that an agent appeared and probably disappeared some time in the past. Note that the usual agents of concern, chemicals and radiation, usually do not leave body burdens or perm,~uent traces in the environment. Of the hundreds of residential and school clusters which have been published [2,6-11], this is unique in its identification of a causal environmental agent. In most residential situations, unlike the remote Turkish village, the population is transient, the dose of exposure to any environmental carcinogens is likely to be extremely low, and the expected burden of additional cases is so low as to be epidemiologically undetectable. Thus, our best gu~s as to the true incidence of physico-chemically caused environmengal clusters

INSIGHTS ABOUT ENVIRONMENTAL CLUSTERS FROM PROBABILITY THEORY

193

(prior probability) is extremely low. In California it might well be less than one in a decade. Thus, if we were to think of the choice of a P-value as something like the selection of cutoff point in a screening test and we wanted to know the posterior probability of environmental causality for an observed duster, we would have two rather discouraging parameters for the Bayes' Theorem equation. The prior probability of environmental clusters is very low, and the sensitivity of our efforts to detect them (as discussed in the introductiion) is low as well. But we still need to know something else before we get a sense for the posterior probability of environmental causation in cancer clusters. This is the 'specificity' or its complement. To obtain this we need to know the fraction of clusters that are due to chance which under standard procedures would be considered ag 'statistically significant.' SPECIFICITY

Suppose that we defined a cancer cluster an any excess whose attendant P-value was ___0.01, then the probability of escaping just one of these clusters would equal the compliment of 0.01 or 0.99. The probabiliity of escaping all 80 types of cancer which are recorded by cancer registries would be 0.998°. This equals 0.45. This means that a given juri,:diction would have a 55% chance of one or more clusters. But clearly a cluster of mesothelioma, a cancer with a rate of less than one in a milll[en, does not have a 1% probability of occurring by chance in any given time period. For a given location one would .need to calculate the expected nuniber of mesothelioma cases, compare it to the observed number, and use the Poisson dJ,stribution, to calculate the probability of the excess occurrence of that rare cancer. If one was interested in calculating the probability of escaping all eighty types of cancer, one would calculate the probability of escaping each of those cancers based on its expected frequency under the Poisson distribution, and then multiply those probabilities to get the probability of escaping all 80 types of cancer. As an illustration of the general procedure, we consider a census tract or town with a population of 5000 followed for a time period of a decade. Table 1 shows the expected number of cases for cancers occurring at a variety of incidence rates [12]. As the type of cancer becomes more rare, the expected number of cases decreases as does the excess number of cases which constitute a 'significant' cluster. At the same time, the relative risk that this represents becomes much greater. There are two points to be made however. First, because of the discontinuous nature of the Poisso~ di~trib:~tion, the exact significance level of 0.01 is rarely achieved, and the first possible P-value less than 0.01 can often range from 0.001 to 0.005. This can be seen in Table 3 which shows the actual probabilities for observing zero, one, two, three,

19~.

R. NEUTRA ET AL

TABLE 3 Poisson probability when only 0.5 cases are expected per decade Observed cases

Probability

0 1 2 3

0.607 0.303 0.076 0.013

li

0.001 0.0001 0.00001

TABLE 4 Neoplasms by prior etiologic plausibility and age-adjusted incidence rate (1972-82) Neoplasm

LA County: AAIR/100 000 Males

High prior pla,isibility Lip Hew.:oma Mesothelioma Connective tissues Vasculature Ureter and p~lvis Kidney parenchyma Glioma Neur,~blastoma M~ningioma CNS Conn. Tiss. and Pp"'oh. NS Microglioma Clin/Other CNS tumors Hodgkin's/Nodular scleros Hodgkin's, Other , Hodgkin's/Lymphoma deplet. Nodular/Follicular lymphoma Diffuse/Nos lymphoma Immunoblastic sarcoma Reticulosarcoma Burkitt's lymphoma Mycosis fungoides Multiple •yeloma

4.2 3.0 0.9 4.0 0.4 2.2 7.0 5. l 0.4 1.9 0.9 < 0. l 0.7 I. l 1.7 0.2 2. l 5.2 0.3

Females

0.6 1.2 0.2 4.3 0.4 1.0 3.0 3.6 0.3 3.0 0.9 < 0. l 0.5 0.9 0.8 0.2 2.0 3.6 0.2

2.5

2. I

0.1 0.2 4.4

< 0.1 0. I 3.1

!95

INSIGHTS ABOUT ENVIRONMENTAL CLUSTERS FROM PROBABILITY THEORY

TABLE 4 (Continued) Neoplasm

LA County: AAIR/!00 000 Males

Females

Polycyti~emia Rubra Vera Acute lymphoid leukemia Acute non-lymphoid leukemia Hairy cell leukemia Chronic lymphoid leukemia Chronic myeloid leukemia Leukemia, other

0.5 1.6 4.2 0.2 3.1 1.9 0.5

0.3 1.I 2.8 0.1 1.7 1.1 0.3

Medium ~;ior plausibility Tongue Gum and mouth Nasopharynx Oro/Hypopharynx Nose, ears and sinus Larynx Esophagus, upper Cardio-esoph./gastri., squamous Esophagus, adenocarcinoma Gastric, upper, adenocarcinoma PTlorus, adenocarcinoma Small intestine Colon, upper Colon, middle Colon, sigmoid Rectum Hepatoblastoma Biliary tract Gallbladder Pancreas Lung, adenoca Lung, squamous/other CA O~teosarc6ma Ewing's sarcoma Bone, other Melanoma, arm/hand Bladder

2.9 4.3 0.9 4.2 0.7 8.4 3.0 2.4 0.8 9.3 2.6 0.6 I I.l 9.0 15.5 16.1

Clusters galore: insights about environmental clusters from probability theory.

The posterior probability of a causal explanation given that an environmental cancer cluster is statistically significant depends on the prior probabi...
952KB Sizes 0 Downloads 0 Views