American Journal of

EPIDEMIOLOGY Volume 133

Copyright © 1991 by The Johns Hopkins University

Number 7

School of Hygiene and Public Health

Apnl 1,1991

Sponsored by the Society for Epidemiologic Research

REVIEWS AND COMMENTARY What is a Cause and How Do We Know One? A Grammar for Pragmatic Epidemiology

Mervyn Susser

In this paper, criteria used by many epidemiologists as aids in causal inference are reviewed and revised. The revised scheme emphasizes the distinction between essential properties of a cause and criteria useful for deciding on the presence of these properties in a given case. A systematic procedure for causal inference tests each essential causal property in turn against appropriate criteria. For a pragmatic epidemiology in which all determinants serve as causes, their essential properties are held to be association, time order, and direction, in an ascending hierarchy. Criteria for association are probabilistic and can be enhanced by strength and consistency. Given association, criteria for time order of the relevant variables follow from access to observation, which is dependent on design. Given association and time order, causal direction (or consequential change) calls on an array of criteria, namely, consistency and survivability, strength, specificity in cause and in effect, predictive performance, and coherence in all its forms (e.g., theoretical, factual, biologic, and statistical). The evolution of such criteria is traced through the epidemiologic literature in the light of historical context. Although Popper's philosophy cannot directly serve an inherently inductive judgmental process, his notion of survivability has here been added, alongside repltoability, as a subclass of consistency. This criterion is proposed to bridge the gap between the particularity of designs and the generality required of causal relations. Designs are ordered and described in the framework of testing survivability. Finally, definitions are offered for the list of criteria deployed. Am J Epidemiol 1991 ;133:635-48. association; causal criteria; causality; epidemiologic methods; epidemiology; logic; philosophy; research design

Since the 1950s, epidemiologists have developed widely shared ideas on how to go about causal inference and arrive at causal

judgments. The task of development is not complete. Moreover, these received ideas have recently faced healthy challenge. The

Received for publication March 19, 1990, and in final form November 13, 1990. From the Gertrude H. Sergievsky Center, Columbia University, 630 West 168th Street, New York, NY 10032

(Reprint requests to Dr. Mervyn Susser at this address.) The author thanks Drs. Ruth Ottman, Stephen Shea, Ezra Susser, and especially Zena Stein for useful cornments

635

636

Susser

challenge has been both direct and indirect. Direct challenge comes from followers of Karl Popper, who have disputed the utility of any such inductive procedures; indeed, some would entirely nullify this aspect of the history of epidemiology. Indirect challenge comes from the wonderful facility conferred on quantification by computers; the accompanying advances in statistical technique have at times obscured issues of cause. In some hands, theoretical epidemiology too, intent on creating an unencumbered analytical framework at the highest level of abstraction, has tended to bypass the needs of epidemiologists who address matters of substance. This paper offers one response to these trends. My aim is to describe, clarify, and carry forward a pragmatic line of thought in epidemiology. Today, as in the past, epidemiologists have drawn on philosophies adapted to current knowledge and concepts of disease. From such philosophy as I am acquainted with, I select what seems most usable as a foundation for recognizing causes. The historical review that follows is meant to yield insight into how emergent epidemiologic practice in causal inference can be aligned with this philosophic foundation. Gaps and elisions that emerge in the course of that review receive fresh attention. The description of historical context is thus deliberately complemented by prescription intended to improve existing analytical tools. To summarize this endeavor, the distinction between essential properties of a cause and criteria by which to judge the existence of those properties is emphasized; a reprise is given of the historical evolution of judgmental criteria in the epidemiologic literature; and the place of design in the judgmental process is considered, along with a classification of designs in terms of their salience for essential causal properties. WHAT IS A CAUSE?

For the most part, epidemiologists have skirted the topic of the inherent properties of a cause. They have preferred to speak of determinants, exposures, and risk factors

without facing the treacherous issues of the definition of a cause. Or they have categorized kinds of causes—necessary or sufficient, single or multiple, direct or indirect— in a descriptive fashion without addressing properties as such. Diffidence on this question among practicing scientists is understandable. Philosophers disagree profoundly about definitions of causality and the definitive recognition of causation, and the literature grows to the extent that they do disagree. The 18thcentury empiricist David Hume (1) held that all knowledge is rooted in the subjective sense data to which experience gives rise; he challenged the notion that causality could be proved at all because of the subjectivity of knowledge and the fallibility of inductive reasoning. Other modern empiricists went further, George Berkeley (2) doubted that material reality could be shown to exist outside the subjective perception of an individual. Bertrand Russell (3) refuted Hume's subjective view of the world by the demonstration, which he attributed to Immanuel Kant, that a priori knowledge exists independent of experience. Russell himself showed that relationships too can exist independent of experience (3). Many epidemiologists will be content to follow him in this, and so to avoid the barriers that Hume's abstractions will place before them in pursuing their chosen field. This discussion aims to avoid the extremes of both classical idealism and empiricism. As to what can be termed a cause, specificity of singular causes has been required in some models, multicausality in others. According to Russell (4), past notions of cause have been bound up with volition: historically, causes are active, effects passive. Hence, it is unsurprising that sometimes causes have been restricted solely to active agents of change. This idea has been held even by statistical exponents of multiple causality (5). If true, then a large part of the epidemiologic pursuit of causes is lost or reduced to insignificance. Philosophic as well as pragmatic grounds exist for disagreeing with the idea. Thus, for Russell cause, in

What is a Cause: Pragmatic Epidemiology

a scientific sense, is not tied to an analogy with acts and volition. Some multicausal models, while less restrictive, distinguish between causes and determinants. In Mario Bunge's (6) view, for example, although causes are confined to active agents, determinants are seen more broadly. In this view, a determinant is any factor that affects an outcome—not only the agent of change but all contributors to outcome, including the interactions between causal factors and the reciprocal feedback from an effect to its initial cause (6). From these debates about meaning and battles with definitions the practitioners of health sciences can learn, but they may be hard put to apply such various and conflicting ideas in their work. In a pragmatic perspective, a cause is something that makes a difference. Insofar as epidemiology is a science which by definition aims to discover the causes of health states, the search includes all determinants of a health outcome. These may be both active agents such as are embodied by interventions and similar activity and static conditions such as the attributes of persons and places. This concept of determinants enjoins a model of multiple causes. By contrast, Galileo Galilei's concept of necessary and sufficient cause (6) does not serve the elucidation of causality and/or prevention in situations where specific agents are difficult to imagine, let alone to find. The following chart (reprinted from ref. 7) shows what can obtain for causal studies under the Galilean concept. X, an independent variable, may relate to the effect Y under four conditions: Xis necessary 1. 2. 3. 4.

+ + -

Xis sufficient

+ — +

1. X is both necessary and sufficient to cause Y. (Both X and Y are always present together, and nothing but X is needed to cause Y.) 2, X is necessary but not sufficient to cause Y. (X must be present when Y is present,

637

but Y is not always present when X is. Some additional factor must also be present, i.e., X + Z^> Y.) 3. X is not necessary but is sufficient to cause Y. (X may or may not be present when 7 is present, since, if X is not necessary, other factors, singly or together, must be sufficient, i.e., X —> Y; Z - * Y.) 4. X is neither necessary nor sufficient to cause Y. (X may or may not be present when Y is present; if it is, an additional factor must be present to achieve sufficiency. Thus, X is a contributory cause, i.e., X + Z ->• Y; W + Z -^ Y; or

X+Z+

W...->Y.)

In modern epidemiology, it is a fair guess that the most common causal condition under which X occurs with Y is that of a contributory cause, as in point 4 above. For this reason, it seemed to me wise to abandon the traditional terminology of causes for that of the broad and nonspecific category of determinants (7). Much later, I discovered that more than a decade previously, in the course of a philosophic disquisition on causality, Bunge (6) had reached a similar position. Rothman's model of "sufficient component causes" has followed the same principle (8, 9). Active and passive determinants fall under the same rubric in this scheme. If the determinant is an active agent, it produces change; the determinant is an intended or unintended intervention, or a natural force or an accident, or the removal or absence of something needed, like vitamins. If the determinant is a static condition, it is an unchanging antecedent in a given set of circumstances; outcomes differ as the nature or quality of the condition differs. Usually, conditions are fixed attributes or circumstances, like sex, heritage, or geography. Sometimes they may be changeable attributes, like poverty or rural isolation, but unchanging in all the given circumstances open to study. PROPERTIES OF CAUSES

One must still ask how to know a cause upon seeing one, and how not to confuse

638

Susser

the real thing with an impostor. Hume (1) recognized as essential attributes of a cause both association (if we freely translate "contiguity" and "constant conjunction" as special forms of association) and priority in time. Beyond this, he took a skeptical position. In his words, "to define a cause, by saying it is something productive of another, 'tis evident he would say nothing" (1, p. 190). Hume required a "necessary connection" that, he and his followers have held, is not to be found in strict reason. In the material world inhabited by pragmatic epidemiology, on the contrary, a cause can be found and does produce something. The necessary connection is enfolded in the attribute here described as direction. In a multicausal framework, "connection" remains necessary, but not in the Galilean sense that a particular cause must always be present to produce a given effect. A complex of sufficient causes (in Hobbes's modification (10) of Galileo) can meet the case. So too, as indicated above, can a complex of contributory causes none of which is individually either necessary or sufficient. Thus, three attributes of a cause—association, time order, and direction—are here taken as sine qua non. Association

A causal factor (X) must occur together with the putative effect (Y). Association is judged by the criterion of probability in relation to preset expectations of normal variation or so-called chance occurrences. The probabilities derive from one or more of the available statistical approaches, whether conventional hypothesis testing for significance, confidence limits, Bayesian, or any other. If no grounds for an association can be shown to exist, causality has been rejected, and we proceed no further. The certain or even uncertain presence of association allows testing to continue. Association can be inferred with greater assurance when it is of substantial strength as indicated by risk ratios or other measures. Assurance comes also from consistency upon replication, as indicated by association persisting under numerous and various conditions.

Time order

If association is present, then a suspect causal factor (X) must precede the effect (Y). If the reverse can be shown to hold, again causality has been refuted, and we proceed no further. Failure to refute—to find that X precedes Y, or may possibly do so—allows testing to continue. (Bunge (6) points out that strict antecedence in time is not a requirement for causation in all philosophic systems, but all at the least require "existential priority"—the cause must be there when the effect occurs. This permits simultaneity as well as time sequences. Epidemiologists hardly need revise their conventional ideas on ordering causal sequences to meet this objection.) The criteria for time order follow solely from access to observation of the relevant variables at the opportune moments. Access to observation depends on research design, and the decision about a particular observation will be more or less secure according to design. Recent discussion in the epidemiologic literature about so-called "directionality" in design indicates that confusion exists about these seemingly simple matters (11-13). Directionality is governed by design and refers to the operational perspective on the relations between causes and effects in a given study. It should not be confused with "direction," a general property of causes unrelated to particular operations. Epidemiologic operations begin with the assembly of effects (cases in backward-looking casecontrol studies) or causes (exposures in forward-looking cohort studies) or both at once (cross-sectional studies). While directionality has little import for the statistical aspects of analysis, it has great import as a criterion for time order and hence for causality. Timing is a general property of causes for which evidence needs to be advanced, not a particularity of design that can be taken as a given. Direction

Given association and time order, we must apply ourselves to affirming or rejecting the existence of direction between cause

What is a Cause: Pragmatic Epidemiology

and effect. Inference about direction hinges on the demonstration that change in an outcome is a consequence of change in an antecedent factor. With either active or static determinants, direction is indicated by the presence of consequential change (the adjective "consequential" is chosen to imply effect, that is, more than consequent in the sense of time order). An active agent itself changes and, in turn, is shown to change the outcome; alternatively, change recognized in the outcome is shown to be the consequence of change in the agent. With a static determinant, the effect changes in consequence of a change or shift in a prior condition, say from male to female sex. Direction is the crux of the difficulties in making a valid causal inference. We often reach firm decisions about both association and time order, but decisiveness about direction is another matter. In an observed association between X and Y, time order can rule out Y -* X, but in X —> Y, direction remains to be established. Even though both association and time order are appropriate for causality, the relation between two variables may yet be symmetric and without direction (as with the sun preceding the moon in the heavens, or with two persons responding in mutual communication, or with any association produced by a common cause or third variable). Direction requires the demonstration, between cause and effect, of asymmetry and consequential change. Asymmetry is perhaps most easily comprehended by its antonym. That is, it is a state with properties opposite those of a symmetric association in which the relevant associated variables are in balance, with neither affecting the other. These two types of relation can be diagrammatically represented in a simple way as follows: between two variables, Asymmetry: X —* Y

Symmetry: X«-» Y

and between three variables,

Asymmetry: 2 —> X

Symmetry: Z

"\

639

Explicatory statements about properties such as direction remain as several descriptive tautologies unless independent grounds for decision are found. In epidemiology, a number of criteria have been assembled to aid causal inference. They can fairly be seen as an effort to provide the independent grounds for assessing causal properties. It must be allowed that these criteria tend to have been conflated with the triad of essential causal properties—that is, association, time order, and direction—whose presence they help ascertain. Conflation is to be expected, since the distinction between criteria and properties has not been sharply etched in the epidemiologic literature. We now trace the emergence of these criteria. EVOLUTION OF CAUSAL CRITERIA

In the pragmatic inductive/deductive approach to causal inference in epidemiology, the development and testing of criteria to guide the evaluation of evidence about cause has been a central theme. This development can be better understood in historical context. As we shall see, the construction of the etiologic search has followed the perceptions of the times as to what causality entails. These perceptions, in turn, were constrained by the possibilities inherent in each historical stage of knowledge and technology. For much of the 19th century, at the height of the Sanitary Movement, the idea of miasma was prevalent. Foul emanations were taken to be the common cause of many disorders. Specific agents were ignored in the predominant concepts of disease. Appropriate to the concept of miasma, the predominant theories of causation involved multiple factors relating to multiple outcomes. To causality so conceived, the approach exemplified at the time by John Stuart Mill (14) is well-suited: his canons, especially the methods of "difference," "agreement," "concomitant variation," and "residues," are readily applied, both then and now, to the multiple and unknown agents that epidemiologists face (7). The work of Semmelweis on puerperal fever (15), Snow on cholera (7, 16), and Budd on typhoid (16) presaged a revision in

640

Susser

concepts of causality. All implied the existence of specific transmissible pathogens, and Snow and Budd implied further that these were unknown living organisms. When the search for pathogenic microorganisms began in earnest, after Louis Pasteur had placed the germ theory of disease on a firm footing, the need for a sharper focus on inference and causal criteria in epidemiology at once became apparent. A feature of Robert Koch's momentous paper of 1882 on the role of the tubercle bacillus in phthisis (17) was the famous set of postulates to be met for establishing a pathogen. The extent to which the postulates were filtered through Jacob Henle and Edwin Klebs (18, 19) is now in dispute. Thus, K. Codell Carter shows how, in the course of Koch's own research on anthrax, wound infection, and tuberculosis, the postulates evolved and were refined over several years to reach their culminating form (20). In summary, 1. an "alien structure" [the microorganism] must always be found with the disease; 2. the "alien structure" must be shown by isolation and culture to be a living organism and distinct from any others that might be found with the disease; 3. the organism must be distributed in accord with the lesions and clinical phenomena of the disease, and hence must be capable of explaining the manifestations of the disease; 4. the organism, cultured through several generations, must produce the disease in [susceptible] experimental animals. These postulates can be restated in the present-day language of epidemiology (and in a manner consonant with the definitions of criteria of judgment given at the end of this paper). Koch leads us to a requirement for complete specificity in a unique and unconfounded bacterial cause (postulates 1 and 2) which must be rigorously tested by both the criterion of biologic coherence (postulate 3) and the further criterion of predictive performance (postulate 4). Above all, Koch insisted on the invariable presence of the organism (postulate 1). In doing so, he

returned to the Galilean tradition of necessary cause and turned away from the thencurrent idea of multiple cause (20). Koch's postulates served as a standard for the next three quarters of a century. Such a standard could be met only in the laboratory (21) and only by a fierce restriction of vision to pathogen and pathogenesis. This restriction was congenial to an era in which pathogenic microorganisms could be discovered and eponymously named in rapid succession. The need for a new set of postulates followed the advent of a new paradigm for the epidemiology of chronic diseases of entirely unknown origin (22). After the discovery of the association of smoking with lung cancer in the early 1950s (23-27), the demand for a construct of causality more usable than the Koch model could no longer be ignored. Contention about whether or not the association was causal arose from the outset (2830). The debate was quickly generalized to the broader question of how a cause of chronic disease might be recognized. For example, E. Cuyler Hammond, in 1955 at the forefront of epidemiologic research on smoking, outlined his thinking on the nature of causation (31). Hammond drew on the ideas of R. A. Fisher (32) (later to emerge as a major enemy of the smoking hypothesis) for notions of the "indeterminacy" of natural causes in biology and of the probabilities on which they rest. In essence, Fisher was proposing a probablistic rather than a strictly determined theory of causes and effects. For Hammond, a multiple-cause model clearly followed from the probabilistic perspective. For adherents of Koch's postulates, by contrast, this was a heresy against the demand for specificity and necessary causes. Joseph Berkson in particular attacked the smoking hypothesis for the lack of specificity in the putative effects of smoking (30, 33). Hammond's paper seeded several criteria for attributing causality that remain in use—namely, association, time sequence, strength of relation, and consistency. In 1959, Jacob Yerushalmy and Carroll Palmer provoked fresh debate (34). In keeping with the advent of the new paradigm,

What is a Cause: Pragmatic Epidemiology they too placed the idea of multiple causation at the center of the controversy. Taking smoking and lung cancer as the key example, they compared the properties of such an association with the standards set by Koch's postulates. From these considerations, the postulates were transposed into "two essential types of evidence" from which to infer a causal association. In fact, the two types constitute three elements, that is, association, time order, and specificity. Consistency too was given passing mention. The addition to Hammond made in this paper is thus the criterion of specificity, in line with and modified from Koch, and the main burden of the discussion bears on it. Abraham Lilienfeld made some advances in a commentary on Yerushalmy and Palmer in the same issue of the Journal of Chronic Disease (35). Lilienfeld distinguished between the specificity of causes and the specificity of effects. Beginning with an effect, one could demand a specific cause; beginning with a cause, one could demand a specific effect. Lilienfeld denied the necessity for specificity of the cause, arguing cogently that the degree of specificity required was governed by the frame of reference of the observer. The criterion of specificity of effect, Lilienfeld agreed, was usable but needed qualification. The qualifications amounted to two criteria, namely, strength of association and biologic plausibility, the latter a new addition. In further commentary on Yerushalmy and Palmer, Philip Sartwell (36) argued for much the same position and criteria as Lilienfeld. Sartwell rejected Lilienfeld's refinement of specificity of effect as a causal criterion, however, and he also pointed to the pitfalls of "biological reasonableness." What is viewed as reasonable falls within the limits of current knowledge and may not remain reasonable as knowledge advances. Time order received more attention from him than from the previous writers. Sartwell's term for specificity was "strength of association." This diverse terminology emphasizes the fluidity of the concepts in the evolving argument. The 1964 Report of the Advisory Com-

641

mittee to the US Surgeon General, "Smoking and Health" (37), was a public health landmark for the present era of chronic disease epidemiology. In addressing the evidence on smoking, the report listed and described (if not very adequately and without citing the literature)fivecriteria for judging causality in a given association. These were time order, strength, specificity, consistency, and coherence; each had been arrived at in one or more of the papers discussed above. This codification gave rise to two independent elaborations, one by Hill (38) and the other by myself (7). In the course of his much-cited paper, Hill (38) expanded the number of judgmental criteria for transmuting association into causality. To the Advisory Committee list of five (excluding association as given), Hill added analogy, experiment, and biologic gradient (or dose-response curve), and he separated biologic plausibility from coherence. The list thus adds up to nine criteria, several buttressed with illuminating examples. By the criterion of analogy, Hill meant that when one of a class of causal agents is known to have produced an effect, the standards for evidence that another agent of that class produces a similar effect can be reduced. Analogy in this sense facilitates hypothesis formation rather than causal inference. If accepted nonetheless, analogy might best be subsumed under the criterion of coherence. The same holds good for biologic plausibility and for dose-response relations (7, 39). William of Occam's Razor, non sunt multiplicanda entia praeter necessitatem, requires parismony: The assumptions (entities) used to explain a thing must not be multiplied beyond necessity. As to experiment, when belatedly I came to comment on Hill's list, I rejected this criterion as a classification or category error: A research design is particular to evaluating a given result and not general to a persisting association as required by the property of causality (39). Causality is a relation, or in Bertrand Russell's (4) terms a "universal," which "has being" independently of such a given circumstance as design. In other

642

Susser

words, the qualities of data or of design do nothing to alter the realities of a cause-andeffect relationship, which exists regardless of them. My recollection of Hill's paper, I have since discovered upon rereading, was mistaken in detail. By experiment, Hill referred not to research design so much as to intervention and active change; his examples were preventive actions. The criticism holds nonetheless, and it forces a gap between causal inference on the one side and design, execution, and analysis on the other. Yet one shares with other epidemiologists the intuition that design matters in the process of causal inference. Popper's criterion of survivability, we shall see below, can be made to bridge the gap between inference and design, and at the same time reconcile the particularism of design with the generality required of a causal relationship. In ignorance of Hill's paper, I developed my own discussion of causality in order to meet the burgeoning tasks set by the multivariate age of epidemiology then emerging (7). In a first step, taking association (under the head of probability) as a sine qua non, I followed Blalock (40) in proposing, as above, two other essential properties of a causal association, namely, time order and direction. The explanation of direction—an insufficient one—emphasized asymmetry between cause and effect. The list of criteria offered in my account for the judgment that an observed association was causal—time sequence, strength, specificity, consistency, and coherence—was the same as that of the Advisory Committee to the Surgeon General. This account dilated on the Report of the Advisory Committee in expounding the definitions, cogency, and application of the criteria. Over the next decade and more, a consensus grew around the use of criteria of judgment. Evans (21) aligned them historically in relation to Koch's postulates. A range of authors illustrated their application to inference in a variety of problems. Thus, they were used to illumine theflawsand strengths of the positions of contenders in major his-

torical controversies (41). Later, they were freely applied in a renewed rally about the effects of smoking. The smoking debate was taken up by Burch (42, 43) and Lilienfeld (44) around an updated report of the US Surgeon General on smoking and cancer (45). A partial list of other applications includes problems of teratogenesis in general (46, 47) and those of teratogens such as Agent Orange (48) and spermicides (49) in particular; the causes of acquired immunodeficiency syndrome before the virus was isolated (50); and the relation of oral contraceptives to breast cancer (51). The advent of frankly Popperian approaches in epidemiology unraveled the apparent consensus. Beginning in 1975 with the advocacy of Carol Buck (52), Popper's philosophy led naturally to the dismissal of criteria of judgment as logically flawed, useless, or inapplicable (9, 53, 54) or to the substitution of other criteria (55, 56) (although these criteria were in fact not substitutes, since they pertained not to judgments about causality but to the different issue of the quality and usefulness of hypotheses). These radical critiques follow once induction is banished, as properly they should, since induction from particular instances to establish causality underlies the entire process of judgment. Falsification is Popper's sole truth, to be arrived at only by deductions that test a priori hypotheses. Popper abandoned verification on the grounds of a profound philosophic skepticism. The rise among epidemiologists of an undiluted disbelief in proof, however, is historically surprising at a time when science generally and epidemiology in particular have never been stronger; to a degree, such skepticism must reflect an implicit lack of confidence in the capabilities of epidemiology. The explosive development and exquisite refinement of molecular biology and genetics might have led one rather to anticipate a resurrection of Koch's narrow model of necessary and sufficient cause; molecular science is comfortable with the requirement for specificity. This turning away from verification, even when founded on the most

What is a Cause: Pragmatic Epidemiology refined levels of specificity, thus appears to turn away also from the technical advances of biologic science. It seems possible, therefore, that the Popperian trend reflects modern societal forces other than technology that press more directly upon epidemiologists. In an ever more public role, epidemiologists face insistent demands for assertions of proof in great public matters (22). There is no avoiding the heat of the klieg lights in etiologic research. Dietary findings have major economic implications for agriculture; the effects of nuclear radiation have political as well as economic implications for the energy industry; discoveries of adverse effects of medication destroy pharmaceutical companies; and reproductive epidemiology stokes intense moral and political fires around sexuality, contraception, and induced abortion. Causal inference properly contributes to decisions in these matters. Nonetheless, epidemiologists have come to understand that the data and assumptions used in sound causal inference and those used in sound decisionmaking are not the same (41, 53, 57). The rejection of verification and of the demand for proof underscores this gap: However weak a causal inference may be held to be in theory, public decision in the form of action or inaction will proceed. The noncommittal Popperian stance on causality accords with the argument that epidemiologists should keep a respectful professional distance between themselves and public health decisions. More usable in reaching judgments about causality is a Popperian reformulation of existing criteria that attempted to avoid the logical impasse of verification; criteria restated and redirected toward falsification (58) could still provide the occasion for dialogue. Nudged into reconsideration upon reviewing McClure's paper (58), I then weighed the contribution of each criterion to falsification and to verification separately, added the criterion of predictive performance, and subclassified coherence as theoretical, factual, biologic, and statistical (39). Predictive performance, borrowed

643

from the debate about Popper (59), requires that from an observed association successful predictions can be made that when realized add new knowledge. Although the criteria considered were given different weights, by definition none is essential to a causal judgment. They address the evidence, in a hypothesized causal relation, about the presence of the essential properties of causes. Depending chiefly on the nature of the available evidence, different criteria prove more or less useful in the judgmental process of establishing essential causal properties (7, 39). DESIGN, SURVIVABILITY, AND CONSISTENCY

Here I take the opportunity to resolve the issue of "experiment" alluded to above in the discussion of Hill's review of criteria. How does one subsume epidemiologic intuition about the contribution of design to the general criteria for causal inference? Design, we have already noted, determines access to observation of the sequence of independent and dependent variables, and to this extent serves as a criterion for the general property of time order. Now we consider the place of design as, in addition, a criterion for direction. To establish the attributes of a putative cause, the venerable scientific strategy is to simplify the conditions of observation by design (7). In epidemiologic and clinical science, this is done by two familiar general approaches: The scientist observes the relations between the cause (X) and the effect (Y) under circumstances chosen to be revealing; or the scientist experiments, which is to produce change in the effect (Y) by introducing or removing a cause (X) in a field of observation created by the scientist. The research designs which implement these approaches are particular to each study, as noted above, whereas causal associations are general and exist across studies. One link between particular research designs and the general property of causality is that, while every study challenges the survival of a hy-

644

Susser

pothesis, design governs the severity of the challenge. The more rigorous the tests withstood by the hypothesis, the greater its survivability and the more likely it is to be causal. What is intended here about survivability differs in some respects from what Popper intends. Survivability can be classed with consistency. Like consistency, survivability is ultimately an affirmative criterion based on induction. Both are qualities acquired upon repeated testing and dependent on assembling a number of tests. Consistency, as derived from Mill (14) (and Bacon (60) before him) stresses the number and variety of tests of a hypothesis. Mill's First Canon: If two or more instances of the phenomenon under investigation have only one circumstance in common, the circumstance in which alone all the instances agree is the cause (or effect) of the given phenomenon (14, p. 223). That is, the historical emphasis has been on replicability. replications of a result under otherwise widely varying conditions enhance credibility. Survivability stresses the number and severity of tests: Persistence of a result under increasingly severe conditions enhances credibility. The intent of increasing severity in a given test, according to Popper, is refutation. Yet as a criterion applied to successive tests in the cumulative process of judgment, ultimately survivability contributes most to verification. While Popper has not allowed verification of a hypothesis into his canons of scientific logic, he does allow repeated tests to confer increments in survivability by means of what he calls "corroboration." As far as I can see, survivability conferred by corroboration can only be understood in terms of induction (39). RESEARCH DESIGN

Design is a means of eliciting the valid relations between cause and effect in a given population. Thus, particular designs serve as tactics of different kinds for the same broad objective. All aim to penetrate the dense structure of birth cohorts created by the ever-

changing balance between fertility and mortality in a population. Any design cuts a slice of that demographic structure in order to isolate and display the effects and determinants under study. The emergent data describing the relations between variables are therefore not, in principle, different from one another (7, 11, 12). Different designs achieve their objectives more or less well, however, and differ in the quality of the data they provide bearing on association, time order, and direction. In the light of the criterion of survivability, the rigor of designs can be roughly ranked by the confidence they generate about the presence of the definitive attributes of causes. With regard to the determinant under study, this confidence is a function of two elements of design. One element is the amount and definition of change in the determinant mobilized by the design; a second element is the degree of isolation of the determinant achieved by the design (7). Below, the properties attributed to different designs describe ideal types, and not the many variations that occur within each type. Controlled experiment refers to intervention studies that include preselected experimental and control groups. Such experiments yield the strongest assurances about association, time order, and direction. They allow the maximum mobilization and definition of change or activity in the determinant, since the exposure is wholly a creation of the design. Controlled experiments also allow the maximum isolation of the determinant, since direct control over conditions simplifies observation. A critical factor for control is that experimental and comparison groups are selected before the intervention in order to neutralize the heavy bias that selection after the event can produce. Then, given a priori hypotheses, an effect seen to follow an intervention can be inferred to be causal if it is not seen in a comparable group undisturbed by the same intervention. Quasi-experiment describes studies of intervention effects when comparison groups unexposed to the intervention have not been preselected by the systematic design of the researcher. (This class, in theory, also in-

What is a Cause: Pragmatic Epidemiology

eludes the unusual case of a comparison group that is preselected but an intervention group that is not.) The most usual circumstance arises when, intervention having been introduced into a situation, evaluation of its effect is undertaken after the event. Another is where, a controlled experiment having been conducted, experimental design is abandoned and analysis examines outcomes in relation to levels of exposure regardless of initial experimental assignment (some examples have been cited elsewhere (22, 61)). Quasi-experiments can give knowledge of the timing of the intervention and the supposed outcome as secure as in the classic controlled experiment. What is less secure is the knowledge both of unconfounded association and of direction. Because initial entry into observation in either study group or control group is not according to the neutral plan of the experimenter, it is open to the selective choices of the subjects and the selective vagaries of the social situation. Natural experiment is a term best reserved, in my view, for the observation of the effects of nonroutine, well-defined changes in environment. Such changes are events that are major, sharp, and out of the ordinary—for instance, the London "smog" of 1953, the A-bombs of 1945, and the Dutch Famine of 1944/1945. ("Experiments of nature," in contrast, refers to biologic anomalies such as plural births and the congenital absence of an organ or a metabolite.) Natural experiments yield strong inference about time order but also problems of bias and confounding in comparisons with unexposed groups. Observation of events in sequence obviously helps to fix time order, but different designs capture timing more or less well. Timing is most readily secured when the sequence characterizes the routine of natural or developmental events—for example, the sequence of development in the fetus and in the child postnatally. Difficulties are greatest when no a priori grounds exist for inferring chronology. In such instances, a great deal depends on design if putative exposures and outcomes are to be correctly assigned and not reversed.

645

The observations can be assembled either in longitudinal cohort designs which follow forward from exposure to effect or in casecontrol designs which look backward from the effect to the exposure. It is elementary that for establishing time order, the forwardlooking perspective is nearly always superior to the backward-looking one. Thus, the soundest case-control designs will collect incident cases with onset reasonably well specified. The onset and duration of exposure, on the contrary, are seldom equally well specified. Data on exposure collected simultaneously with those on outcome must be historical, and the relation of the moment of initiation to the advent of a supposed independent causal variable will often be in doubt. Further, because of unknown attrition among cases in a given population before observation can begin, factors that generate a disorder cannot always be distinguished from factors that cause a disorder to persist. It must be stressed that even a secure observation about the time sequence of the manifestations of supposed cause and outcome does not lead automatically to a secure inference about the time sequence of the origins of the outcome and a supposed cause, which is to say, about direction. A symmetric association, we noted, may have been produced by some third factor which is their common cause. Thus, with fetal growth retardation and postnatal developmental delay, both phenomena occur in association; fetal growth retardation precedes the postnatal delay but does not necessarily cause it. In the absence of additional evidence, the question of direction and causality is still open to challenge. Cross-sectional studies, like case-control studies, collect data on determinants and outcomes simultaneously, but determine beforehand only the size of a study population and not the number of exposures or the number of cases to be deployed. Thus, they collect data in selected populations without foreknowledge of the frequency of either outcomes or suspected causal factors, and they do so giving no special regard to timing. Such data can reveal association, but they

646

Susser

barely speak to the question of time order, and can speak not at all to the issue of direction. The associations observed are founded on what is prevalent at a moment in time and not on what is incident over time. Hence, the cross-sectional design is especially prone to failure to distinguish between associations resulting from the genesis of a disorder and those resulting from its persistence, a problem noted above with case-control designs. As to time order, firm inference is possible only when collateral evidence is brought to bear (for instance, the causal factor is a fixed attribute that was certainly present before the onset of the disorder). DEFINITIONS

To complete this exegesis of the emergence, development, and elaboration of criteria that aid inference about causal association, the criteria that seem most useful and least tautologic may bear brief definition and description. 1. Strength is defined by the size of estimated risk within the constraints of probability levels, confidence intervals, or other measures of likelihood. 2. Specificity is defined by the precision with which one variable, to the exclusion of others, will predict the occurrence of another, again to the exclusion of others. Specificity of cause and specificity of effect are subclasses of specificity. 2.1 Specificity in the cause implies, in the ideal, that a given effect has a unique cause. 2.2 Specificity in the effect implies, in the ideal, that a given cause has a unique effect. 3. Consistency is defined (inductively) by the persistence of an association upon repeated test (any of which may be deductive). Survivability and replication are subclasses of consistency. 3.1 Survivability is defined by the number and, specifically, the rigor and severity of tests of association. 3.2 Replicability is defined by the num-

ber and, specifically, the diversity of tests of association. 4. Predictive performance is defined deductively by the ability of a causal hypothesis drawn from an observed association to predict an unknown fact that is consequent on the initial association. 5. Coherence is defined by the extent to which a hypothesized causal association is compatible with preexisting theory and knowledge. Coherence can be considered in terms of many subclasses. 5.1 Theoretical coherence requires compatibility with preexisting theory. 5.2 Factual coherence requires compatibility with preexisting knowledge. 5 3 Biologic coherence requires compatibility with current biologic knowledge that is drawn from species other than human or, in humans, from levels of organization other than the unit of observation, especially those less complex than the person. 5.4 Statistical coherence requires compatibility with a comprehensible or, at the least, conceivable model of the distribution of cause and effect (it is enhanced by simple distributions readily comprehended—for instance, a dose-response relation— and is obscured by those that are nonlinear and complex). This concludes an eclectic excursion that has drawn, however lightly, on epidemiologists of two centuries and philosophers of four. In this historical perspective, it is hardly disputable that scientific thinking about causes reflects the tasks that face the scientist; epidemiologists have modified their causal concepts as the nature of their tasks has changed (7). The approach described here is meant to serve the contemporary needs of an epidemiology committed to matters of substance and scientific discovery. We should cleave to the approach only as long as its governing concepts prove useful. Old concepts may be improved upon by fresh ones, which we should surely expect. Indeed, the current set of criteria may well be displaced as the tasks of the discipline

What is a Cause: Pragmatic Epidemiology

change, which they are bound to do. That time has not yet arrived.

REFERENCES 1. Hume D. A treatise of human nature. Book 1. Of the understanding. Section 2. First published 1739. Reprinted by: LaSalle, IL: Open Court Publishing Company, 1945. 2. Berkeley G. Treatise concerning the principles of human knowledge. First published 1710. Reprinted by: Indianapolis, IN: The Bobbs-MerriJ Company, Inc, 1957. 3. Russell B. The problems of philosophy. Oxford, England: Oxford University Press, 1912. 4. Russell B. Our knowledge of the external world as afieldfor scientific method in philosophy. Chicago: Open Court Publishing Company, 1915:211 ff. 5. Holland PW. Statistics and causal inference. J Am Stat Assoc 1986;81:945-60. 6. Bunge M. Causality in modern science. 3rd rev. ed. New York: Dover Publications, Inc, 1979. 7. Susser M. Causal thinking in the health sciences: concepts and strategies in epidemiology. New York: Oxford University Press, 1973. 8. Rothman KJ. Causes. Am J Epidemiol 1976; 104:587-92. 9. Rothman KJ. Modern epidemiology. Boston: Little, Brown and Company, 1986. 10. Hobbes T. Elements of philosophy. First published 1655. Cited by Bunge M in: Causality in modern science. 3rd rev. ed. New York: Dover Publications, Inc, 1979:33. 11. Kramer MS, Boivin J-F. Toward an "unconfounded" classification of epidemiologic research design. J Chronic Dis 1987;4O:683-8. 12. Miettinen OS. Variance and dissent striving to deconfound the fundamentals of epidemiologic study design. J Clin Epidemiol 1988,41:709-13. 13. Greenland S, Morgenstern H. Classification schemes for epidemiologic research designs. J Clin Epidemiol 1988;41:715-16. 14. Mill JS. A system of logic: ratiocinative and inductive. First published 1856. Reprinted by: London: George Routledge and Sons, 1892. 15. Carter KC. Ignaz Semmelweis, Carl Mayrhofer, and the rise of germ theory. Med Hist I985;29:3353. 16. Frost WH. Introduction. In: Snow on cholera: being a reprint of two papers by John Snow, M.D. New York: The Commonwealth Fund; London: Humphrey Milford, Oxford University Press, 1936:ix-xxi. 17. Koch R. Die aetiologie der Tuberkulose. First published 1882. Reprinted in: Schwalbe J, ed. Gesammelte Werke von Robert Koch. (In German). Leipzig, Germany: Georg Thieme, Verlag, 1912;l:428-55. 18. Baumgartner L. Edwin KJebs: a centennial note. N EnglJMed 1935^213:60-3. 19. Rosen G. Jacob Henle and William Farr. Bull Hist Med 1941;9:585-9. 20. Carter KC. Koch's postulates in relation to the work of Jacob Henle and Edwin Klebs. Med Hist

647

1985;29:353-74. 21. Evans AS. Causation and disease: the Henle-Koch postulates revisited. Yale J Biol Med 1976,49:17595. 22. Susser M. Epidemiology in the United States after World War II: the evolution of technique. Epidemiol Rev 1985;7:147-77. 23. Wynder EL, Graham EA. Tobacco smoking as a possible etiological factor in bronchogenic carcinoma. JAMA 1950;143:329-36. 24. Schreck R, Baker LA, Ballard GP, et al. Tobacco smoking as an etiological factor of cancer. Cancer Res 1950; 10:49-58. 25. Levin ML, Goldstein H, Gerhardt BR. Cancer and tobacco smoking. JAMA 1950;143:336-8. 26. Doll R, Hill AB. Smoking and carcinoma of the lung. BrMed J 1950;2:74O-8. 27. Doll R, Hill AB. The mortality of doctors in relation to their smoking habits. Br Med J 1954; 1:1451-5. 28. Fisher RA. Lung cancer and cigarettes? (Letter). Nature 1958;182:108. 29. Cornfield J, Haenszel W, Hammond EC, et al. Smoking and lung cancer recent evidence and a discussion of some questions. J Natl Cancer Inst 1959^22:173-203. 30. Berkson J. Smoking and cancer of the lung. Proc Staff Meet Mayo Clin 1960;35:367-85. 31. Hammond EC. Cause and effect In: Wynder ES, ed. The biologic effects of tobacco. Boston: Little, Brown and Company, 1955:171-96. 32. Fisher RA. Indeterminism and natural selection. PhilosSci 1934; 1:99-117. 33. Berkson J. Mortality and marital status: reflections on the derivation of etiology from statistics. Am J Public Health 1962;52:1318-29. 34. Yerushalmy J, Palmer CE. On the methodology of investigations of etiologic factors in chronic diseases. J Chronic Dis 1959; 10:27-40. 35. Lilienfeld AM. "On the methodology of investigations of etiologic factors in chronic diseases"— some comments. J Chronic Dis 1959; 10:41—6. 36. Sartwell PE. "On the methodology of investigations of etiologic factors in chronic diseases"—further comments. J Chronic Dis 1960; 11:61-3. 37. Smoking and health. Report of the Advisory Committee to the Surgeon General. Washington, IX: US Department of Health, Education, and Welfare, 1964. 38. Hill AB. Environment and disease: association or causation? Proc R Soc Med 1965;58:295-3OO. 39. Susser M. Falsification, verification, and causal inference in epidemiology: reconsideration in the light of Sir Karl Popper's philosophy. In: Susser M. Epidemiology, health, and society: selected papers. New York: Oxford University Press, 1987:82-93. (Reprinted in: Rothman KJ, ed. Causal inference. Chestnut Hill, MA: Epidemiology Resources, 1988:33-57.) 40. Blalock HM. Causal inference in non-experimental research. Chapel Hill, NC: University of North Carolina Press, 1964. 41. Susser M. Judgment and causal inference: criteria in epidemiological studies. Am J Epidemiol 1977;105:l —15. (Reprinted in: Greenland S, ed. Evolution of epidemiologic ideas: annotated readings on concepts and methods. Chestnut Hill, MA:

648

Susser

Epidemiology Resources, Inc, 1987.) 42. Burch PRJ. The Surgeon General's "epidemiologic criteria for causality": a critique. J Chronic Dis 1983;36:821-35. 43. Burch PRJ. The Surgeon General's "epidemiologic criteria for causality": reply to Lilienfeld. J Chronic Dis 1984,37:148-56. 44. Lilienfeld AM. The Surgeon General's "epidemiologic criteria for causality": a criticism of Burch's critique. J Chronic Dis 1983;36:837-45. 45. US Surgeon General. The health consequences of smoking (cancer). Rockville, MD: US Department of Health and Human Services, 1982:16-20. 46. Stein ZA, Kline J, Kharrazi M. What is a teratogen? Epidemiologies! criteria. In: Kalter H, ed. Issues and reviews on teratology. Vol 2. New York: Plenum Publishing Corporation, 1984:23-60. 47. Kline J, Stein Z, Susser M. Conception to birth: the epidemiology of prenatal development. New York: Oxford University Press, 1989. 48. Hatch MC, Stein ZA. Agent Orange and risks to reproduction: the limits of epidemiology. Teratogenesis Carcinog Mutagen 1986;6:185-202. 49. Schlesselman JJ. "Proof" of cause and effect in epidemiologic studies: criteria for judgment. Prev Med 1987;16:195-210. 50. Flam R, Stein ZA. Behavior, infection, and immune response: an epidemiological approach. In: Feldman DA, Johnson TM, eds. The social dimension of AIDS: methods and theory. New York: Praeger Publishers, Inc, 1986:61-76. 51. Schlesselman JJ, Stadel BV, Murray P, et al. Consistency and plausibility in epidemiologic analysis:

52. 53. 54. 55. 56.

57. 58. 59.

60.

61.

application to breast cancer in relation to use of oral contraceptives. J Chronic Dis 1987;40: 1033-9. Buck C. Popper's philosophy for epidemiologists. Int J Epidemiol 1975,4:159-68. Lanes SF. The logic of causal inference. In: Rothman KJ, ed. Causal inference. Chestnut Hill, MA: Epidemiology Resources, 1988:59-75. Poole C. Induction does not exist in epidemiology. In: Rothman KJ, ed. Causal inference. Chestnut Hill, MA: Epidemiology Resources, 1988:153-64. Weed DL. On the logic of causal inference. Am J Epidemiol 1986; 123:965-79. Weed DL. Causal criteria and Popperian refutation in causal inference. In: Rothman KJ, ed. Causal inference. Chestnut Hill, MA: Epidemiology Resources, 1988:15-32. Rothman KJ, Poole C. Science and policy-making. Am J Public Health 1985;75:34O-l. McClure M. Popperian refutation in epidemiology. Am J Epidemiol 1985;121:343-50. Lakatos I. Falsification and the methodology of scientific research programs. In: Lakatos I, Musgrave A, eds. Criticism and the growth of knowledge. Rev. ed. Cambridge, England: Cambridge University Press, 1974:91-196. Bacon F. The new organon and related writings. First published 1599. Reprinted by: New York: The Bobbs-Merril Company, Inc, 1960. (Anderson FH, ed.) Susser M. The challenge of causality: human nutrition, brain development, and mental performance. Bull N Y Acad Med 1989;65:1032-49.

What is a cause and how do we know one? A grammar for pragmatic epidemiology.

In this paper, criteria used by many epidemiologists as aids in causal inference are reviewed and revised. The revised scheme emphasizes the distincti...
1MB Sizes 0 Downloads 0 Views