Journal of Consulting and Clinical Psychology 1978, Vol. 46, No. 4, 643-647

Stimulus Sampling in Clinical Research: Representative Design Reviewed Brendan A. Maher Harvard University Brunswik's concept of representative design is reviewed with special reference to studies of clinical bias. The limitations of single-stimulus, actor-script, and serial replications are discussed. No satisfactory alternatives exist to adequate sampling of stimulus persons.

More than 30 years ago, Egon Brunswik tion of the stimulus is followed carefully. (1947) pointed out that if we wish to gener- Should a subsequent investigator change one alize the results of a psychological experiment or more of these attributes, we are not surto populations of subjects and to populations prised if there is a concomitant change in the of stimuli, we must sample from both popula- responses that are made to the stimulus. tions. This argument was elaborated by him When the stimuli to which the subjects rein other articles and was summarized cogently spond cannot be defined in physical units and in a short article by Hammond (1948). The are likely to vary within a population, a differpurpose of this article is to review the issues ent situation arises. Outstanding examples are that Brunswik raised and to examine some of to be seen in research directed to the investheir implications for contemporary research tigation of the effects of human beings as stimuli that elicit behavior from other human in clinical psychology. Brunswik's thesis is very simple. When we beings. Consider some instances drawn from conduct an experiment intended to investigate recent volumes of this journal. Acosta and the effect of different values of an independent Sheehan (1976) reported that Mexican' variable on a population, we always take care American subjects viewed an Anglo American to draw a sample of subjects that is repre- professional therapist as more competent than sentative of the population in question. We do a Mexican American professional when all so, naturally, because we recognize the range other variables were matched. Babad, Mann, of variation that exists in populations of indi- and Mar-Hayim (197S) reported that trainee viduals. We wish to make sure that deviant in- clinicians who were told that a testee was a dividual values do not distort our estimate of high-achieving upper-middle class child asthe parameters of the population. If the stim- signed higher scores to Wechsler Intelligence uli that we use are denned in physical units, Scale for Children (WISC) responses than did we are (or should be) careful to confine our another sample of clinicians who were led to generalizations to the range of values actually believe that the same responses had been included in the study. When physical units made by an underachieving deprived child. are involved, we have relative confidence that Research of this kind is generally cast in the stimulus can be replicated by another in- terms of a hypothesis that members of a vestigator, provided that the detailed descrip- specified population respond in discriminatory fashion to members of certain other populaMy appreciation is due to Winifred Barbara Maher tions. Thus, for example, we encounter such for her critical reading of early drafts of this article. questions as, Do physicians give less adequate Requests for reprints should be sent to Brendan A. medical care to ex-mental patients than they Maher, Department of Psychology and Social Relations, Harvard University, Cambridge, Massachusetts do to normal medical patients? (Farina, Hagelauer, & Holzberg, 1976) and Are thera02138. Copyright 1978 by the American Psychological Association, Inc. 0022-006X/78/4604-0643$00.75

643

644

BRENDAN A. MAHER

pists with a behavioral orientation less affected by the label patient when evaluating observed behavior than are therapists of psychoclynamic persuasion? (Langer & Abelson, 1974). Single-Stimulus Design Human attributes are generally distributed in such a fashion that any one of them is likely to be found in conjunction with a wide variety of others. Let us consider an investigation of bias toward ex-mental patients. To belabor the obvious a little, we can note that the attribute ex-mental patient can be associated with any measure of intelligence, age, sex, education, socioeconomic status, physical attractiveness, and so forth. It is true that some of these attributes may have significant correlations with each other; a patient of upper socioeconomic status is quite likely to have had substantial education, for example. Nonetheless, even the largest of these correlations is quite modest, and the population of ex-mental patients to which we wish to generalize will have a wide range of values on these attributes. When we employ only one person as a stimulus, we are faced with the fact that the specific values of some of the other attributes possessed by this person will also have stimulus value that will be unknown and uncontrolled. Responses made by a sample of the normal population to an ex-mental patient who is female, young, attractive, articulate, and intelligent may well be different from those made to a normal control who is male, old, ugly, incoherent, and dull. These differences cannot be assigned to the patient/nonpatient status of the two stimulus persons, as many other unidentified differences were uncontrolled. At first sight it may appear that this problem is solved by the simple expedient of matching the patient and the control on all variables other than that of patient status. Unfortunately, this can only be achieved at the cost of further difficulties. We do not know the full range of variables that should be matched, and hence this solution necessarily involves resort to an actor and a script, barring the unlikely availability of discordant monozygotic twins for research purposes! Scripts bring with them some special prob-

lems, of which more will be said later. The main point to note here is that the use of a single human stimulus acting as his or her own control fails to deal with the problem of the interaction of the attribute under investigation with those that have been controlled by matching. Pursuing for a moment the example of responses to the label ex-mental patient, let us consider a hypothetical study using a male actor with athletic physique and vigorous movements. The willingness of a normal subject to accept this individual as a fellow worker, neighbor, or friend may well be influenced by the perception that the ex-patient, if violent, could be dangerous. Had the actor been older and visibly frail, the reaction might well be different. Under the first set of circumstances, the bias hypothesis would probably be confirmed, and under the second set, the null hypothesis might fail to be rejected. An additional difficulty is incurred by the single-stimulus own-control strategy. We cannot determine whether a finding of no difference between group means is due to the weakness of the hypothesis, errors of method, or the inadvertent selection of an atypical stimulus person to represent one or both conditions. An example of the complexities of interpretation with this design can be found in Farina et al. (1976). These investigators hypothesized that physicians would provide less adequate medical care to former mental patients than to normal medical students. To test this hypothesis one stimulus person, a 23-year-old male graduate student, approached 32 medical practitioners. In each case he entered the doctor's office carrying a motorcycle helmet and a small knapsack. . . . The same symptoms were reported to all doctors. Stomach pains suggestive of ulcers were selected to be neither clearly psychiatric nor unrelated to the mind. . . . Every other practitioner was told the pains had first occurred 9 months earlier while the patient was traveling around the country. The remaining 16 doctors were also informed that the pains had appeared 9 months earlier, but al that time the patient reported being in a mental hospital. (Farina et al., 1976, p. 499)

No significant difference of any relevance was found in the kind of medical care given by the practitioners under either condition. In

REPRESENTATIVE DESIGN REVIEWED

conclusion, the authors stated that "a former mental patient seems to receive the same medical treatment as anyone else" (p. 499). Logically, several conclusions are compatible with this finding. One obviously valid conclusion is that a young male motorcyclist with the symptoms of ulcers receives a certain class of treatment whether or not he describes himself as a former mental patient. We cannot tell whether this treatment is the same, better, or worse than that typically given to a random sample of the normal population of patients who seek treatment for stomach pains, as no such sample was obtained. A substantial number of physicians may have had opinions about motorcyclists as unfavorable as those that they were hypothesized to have about former mental patients, and hence both conditions produced equally inadequate medical care. Alternatively, the physicians may have felt the necessity to be unusually careful in providing care to individuals who might be assumed to be irresponsible (such as motorcyclists and mental patients), and hence they provided better than average care. Finally, medical practice may be sufficiently precise about the adequate procedures to follow with patients who complain of stomach pains that no real room for bias exists, the treatment provided being the same as would be given to any sample of patients. We can summarize the limitations of the single-stimulus design as follows: 1. Obtained differences may be due to the validity of the tested hypothesis or to the effect of uncontrolled stimulus variables in critical interaction with the intended independent variable. No method of distinguishing between these two explanations is possible. 2. Lack of difference may be due to the invalidity of the hypothesis, undiscovered methodological factors such as subject sampling error, or the presence of an uncontrolled stimulus variable operating to either counteract the effect of the intended independent variable or to raise this effect to a ceiling value in both experimental and control situations. It is readily apparent that the problem of uncontrolled attributes occurring in a single stimulus person can only be solved by the provision of an adequate sample of stimulus persons, since they will tend to cancel each

645

other out. No satisfactory solution is possible within the single-stimulus design. Scripts and Manticores Some investigators have attempted to solve the problems of the single-subject stimulus by fabricating scripts without the use of a human actor to present them. Case histories, dossiers, vignettes, audiotapes, or other devices have been used to reduce the effects of the uncontrolled aspects of a human stimulus. Thus, in a study by Babad et al. (1975), the trainee clinicians were given only the WISC protocol and did not see the child who was alleged to have been tested. These manufactured materials may be termed scripts. Scripts may be taken from existing sources of genuine material, such as clinical files; they may be created de novo in accordance with prior theoretical guidelines or in an attempt to present an ideal "typical" case. When the script is drawn from original clinical files, the investigator is assured that at least one such case exists in nature. The limitations on the results obtained from such scripts are, in principle, the same as those that plague any single-stimulus design. Some minor advantage accrues to the method, however, in that the number of uncontrolled accidental attributes has been reduced by the elimination of those attributes associated with physical appearance, dress, and so forth. When the script is fabricated for research purposes, a new problem develops—namely that in devising material according to theoretical guidelines, a case is created that like the manticore, may never have existed in nature. We can imagine a hypothetical investigation of the attitudes of males toward females of varying degrees of power. Varying naval ranks with male and female gender of the occupant of each rank, we create the dossier of an imaginary female Fleet Admiral. Whatever our male subject's response to this dossier may be, we have no way of knowing whether it is due to the theoretically important combination of high rank with female gender or to the singularity of a combination that is, as yet, unknown to human experience. For a recent illustration of this problem, we can turn to Acosta and Sheehan (1976). They

646

BRENDAN A. MAKER

presented groups of Mexican American and Anglo American undergraduates with a videotaped excerpt of enacted psychotherapy. Each group saw an identical tape, except that in one version the therapist spoke English with a slight Spanish accent and in the other version the accent was standard American English. Some subjects were told that the therapist was a highly trained professional; the others were told that the therapist was a paraprofessional of limited experience. There were thus four experimental conditions and two kinds of subjects. The Spanish-accent tape of a trained professional was introduced with a background vingette describing the therapist as American born of Mexican parentage and as having a Harvard doctorate in his field and a distinguished professional record. For the American-English-accent tape, the therapist was introduced with the same vingette but with an Anglo-Saxon name and parentage identified as Northern European. Anglo American ratings of the therapist's competence were uninfluenced by the ethnic identification, whereas Mexican Americans rated the Mexican American therapist less favorably than the Anglo American therapist. In their discussion of this somewhat surprising result, the authors noted that the number of Mexican American therapists actually in practice in the United States shortly before the study was done was 48 (28 psychologists and 20 psychiatrists). We do not know what characteristics would be typical of this population, and no attempt seems to have been made to ascertain them before preparing the script. There is, therefore, no way to be sure that the therapeutic style, choice of words, gesture, and so forth, were authentically typical of actual Mexican American therapists. Given that essentially the same script was used for both ethnic conditions, we must conclude that either one or the other version of the script was ethnically inaccurate or, less likely, that the only actual difference that would be seen in the comparative behaviors of Mexican American and Anglo American therapists would be their accent. In brief, we cannot ignore the possibility that the Mexican American subjects disapproved of the Mexican American therapist not because he was Mexican American but because his behavior was not representative of that of actual Mexican

Americans. Like the woman admiral, he may have presented a combination of characteristics that is theoretically possible but unknown in the experience of the subjects responding to it. The only guarantee that a script is free from impossible or improbable combinations of variables is when it is directly drawn from an actual clinical case or other human transaction. We cannot produce a fictional script of a psychotherapeutic session with any confidence that it is as representative as a transcript of an actual session. The ideal or typical therapeutic interview may be as rare as the perfect textbook case of conversion hysteria or as a stereotypical Mexican American. This rarity or implausibility may well determine a subject's response far more than the attributes that were planned to make it appear typical. Our hesitation in generalizing from a single stimulus case to a population of cases is increased substantially by the prospect of generalizing from a case that is not known to have existed at all. Representative Design The moral to the foregoing review is simple. If we wish to generalize to populations of stimuli, we must sample from them. Only in this way can we be confident that the various attributes that are found in the population will be properly represented in the sample. Those attributes that are significantly correlated with membership in the population will appear in appropriate and better-than-chance proportions; those attributes that are uncorrelated with population membership will appear in chance proportions but will not affect the outcomes. If we intend to draw conclusions about the way in which physicians treat former mental patients, we must sample physicians and former mental patients. If we wish to know what Mexican American students think of Mexican American therapists, we must sample students and therapists. This is the essence of Brunswik's (1947) concept of representative design. There is no satisfactory alternative to it. Nonetheless, the use of representative design is rarely, if ever, seen in reported research. There are, in my opinion, three reasons for this. First, many clinical psychologists are unaware of Brunswik's work. The remedy for this is obvious and easy to

REPRESENTATIVE DESIGN REVIEWED

apply. Second, there is a common failure to understand that the replication of single-stimulus studies with additional single-stimulus studies cannot create accumulated representative design unless the selection of single-stimulus persons was achieved by sampling. Let us consider a hypothetical series of studies of the effect of examiner gender on children's test responses. In the population of examiners, there are likely to be attributes that distinguish males from females in addition to those that are inseparable from gender. Thus the proportions of married and single persons, prior experience with children, knowledge of various hobbies, mean age, prior locale of undergraduate education, and so forth, may differ between the two groups. In the first study we use one male examiner and one female examiner, each with 1 year of experience. Using samples of male and female children, we find differences in test responses attributable to examiner gender. Conscious of the fact that we included inexperienced examiners, we replicate the study with one male examiner and one female examiner each with 3 years of experience. Now we find no difference. Our series ends when we have made gender comparisons for examiners with 1, 3, 5, 7, 9, 11, 13, IS, 17, and 19 years of experience. We found significant examiner effects at every level of experience except 3 and 5 years. As 8 of our 10 studies have found significant differences due to gender, we conclude that there is a generalizable finding. We might even treat the entire series as a single experiment comparing the group of 10 male examiners with the group of 10 female examiners and find a statistically significant difference between the mean test responses elicited by one group versus the other. To accept this conclusion it is first necessary to know what the true proportion of the total population of examiners at each level of experience is. If the experience range of 3-5 years includes 65 % of all examiners, our best conclusion is that gender differences have not been established. The reason is, of course, that the "sample" of examiners was not representative of the population to which it is intended to generalize, being underrepresented in the 3- to 5-year experience range. Note that we cannot handle this by some proportional weighing of the data obtained from the ex-

647

aminers with 3-5 years of experience, as the results obtained from those comparisons suffer from the limitations of single-stimulus design and might well be due to the effects of uncontrolled differences between examiners other than gender. A third reason for the failure to use representative design is that it is laborious and expensive. Providing an adequate sample of stimulus persons, each of whom is to be observed by an adequate sample of subjects, necessarily involves large numbers and long hours. For some investigators it is, as one of my correspondents put it, "too hard to do it right." There is, however, no satisfactory alternative to doing it right. Clinical psychology is concerned with real people and not with hypothetical collections of attributes. Our research into the behavior of patients, therapists, diagnosticians, normal persons, and the like, must produce generalizations that are valid for actual populations of these people, Conclusions based on inadequate sampling may be worse than no conclusions at all if we decide to base our clinical decisions on them. If the patience and time that it takes to do it right create better science, our gratitude should not be diminished by the probability that fewer publications will be produced. References Acosta, F. X., & Sheehan, J. G. Preferences toward Mexican American and Anglo American psychotherapists. Journal of Consulting and Clinical Psychology, 1976, 44, 272-279. Babad, E. Y., Mann, M., & Mar-Hayim, M. Bias in scoring WISC subtests. Journal of Consulting and Clinical Psychology, 19VS, 43, 268. Brunswik, E. Systematic and representative design of psychological experiments. Berkeley: University of California Press, 1947. Farina, A., Hagelauer, H. D., & Holzberg, J. D. Influence of psychiatric history on physician's response to a new patient. Journal of Consulting and Clinical Psychology, 1976, 44, 499. Hammond, K. Subject and object sampling—A note. Psychological Bulletin, 1948, 45, S30-S33. Langer, E. J., & Abelson, R. P. A patient by any other name . . . : Clinician group difference in labeling bias. Journal of Consulting and Clinical Psychology, 1974, 42, 4-9. Received April 5, 1978 •

Stimulus sampling in clinical research: representative design reviewed.

Journal of Consulting and Clinical Psychology 1978, Vol. 46, No. 4, 643-647 Stimulus Sampling in Clinical Research: Representative Design Reviewed Br...
448KB Sizes 0 Downloads 0 Views