MARCH

The American

Journal

1975

of Medicine VOLUME NUMBER

58 3

EDITORIAL

Patient Bias, Investigator Bias and the Double-Masked Procedure in Clinical Trials For what a man had rather were true, he more readily believes.

FRED EDERER, M.A. Bethesda, Maryland

From the Section on Clinical Trials and Natural History Studies, National Eye Institute, National Institutes of Health, Bethesda, Maryland 20014. Requests for reprints should be addressed to Mr. Fred Ederer. Manuscriot accented Mav, 21. 1974.

FRANCIS BACON

Cornfield thought it possible to summarize the principles of research in two words: be careful [I]. It is doubtful that fields of investigation exist in which this dictum is more pertinent than in clinical research. Clinical trials abound with potential biases, and various ingenious devices have been developed to control them [ 2,3], the most prominent of which is the randomized control group [4,5]. Randomization protects against bias in the selection of patients for treatment. The “double-masked” procedure protects against patient and investigator bias in measuring the results of treatment. It has often been called the “double-blind” procedure, but this designation is less specific and also awkward, particularly in ophthalmology. I am concerned here with bias in assessing the results of clinical trials. Such bias can result from the enthusiasm of the patient or of the investigator. Louis Lasagna said: “Most patients with disease want to get better, and most investigators are anxious to come up with successful results. Both patients and investigators, therefore, may be tempted (on a conscious or unconscious level) to record improvement in symptoms under treatment” [6]. In medical trials, double-masking is usually accomplished by giving the control group a placebo which is made to look, taste and smell like the active medication, so that both investigator and patient are masked from knowledge of what the treatment is; assessing the results of treatment can thus be unbiased. In surgical trials, in which the surgeon cannot be masked, sham procedures have been employed [7-9] to mask the patient. Whether or not a sham procedure is used, assessment of the outcome of treatment can be unbiased if it is made by a masked observer.

March 1975

The American Journal of Medicine

Volume 59

295

BIAS AND DOUBLE-MASKED PROCEDURE IN CLINICAL TRIALS-EDERER

PATIENT BIAS

the history of the placebo effect, since almost all medications until relatively recently were placebos . . Placebo effects are so omnipresent that it is unlikely for a controlled study to be reported without some measure of placebo reac-

A prominent discovery of patient bias was made during the 1920’s when the Western Electric Company carried out a series of experiments at its Hawthorne plant in Chicago to determine the effect of illumination on production [lo]. The control groups worked under constant illumination whereas illumination in the experimental groups was varied (increased and decreased). The result was that production increased

tions. If no placebo reactions are reported, it is likely that the study is unreliable and that bias has inadvertently

Harvard Professor of Chemistry E. B. Wilson, whose “Introduction to Scientific Research” should be required reading for all clinical investigators, said: “Randomization can prevent human bias from entering in the selection of the sample and again in making assignment of subjects and controls, but after this stage the experiment is still in grave peril from the ever-present danger of psychological distortion . . . It is not merely the subjects who are influenced by psychological factors: the experimenter himself can easily be deceived in interpreting the results by his personal interest in the outcome. In the best experimental designs, the person making comparisons, measurements, or records is kept ignorant of the identity of the subjects and controls. Even in such routine matters as recording long lists of numbers or other simple data, it has been demonstrated that the mistakes which are made are usually more numerous in the direction personally favored by the recorder. No human being is even approximately free from these subjective influences: the honest and enlightened investigator devises the experiment so that his own prejudices cannot influence the result. Only the naive or dishonest claim that their own objectivity is a sufficient safeguard . . . ([17], pp 43-45)”

“The desire for successful outcome is felt so strongly in both patient and investigator that objectivity cannot be guaranteed. Both have an emotional stake, overt or occult, in the result. Further, the giving of any treatment, especially by needle injection, is a strong psychotherapeutic stimulus in itself [ 121.” Beecher [ 13,141 found that the average effectiveness of placebos was 35.2 per cent on symptoms like pain, cough, seasickness and anxiety, that the magnitude of the effect increased with the patient’s stress and also that surgery acts as a placebo.

In a study of internal mammary artery ligation and a sham procedure in angina pectoris, “a marked improvement in the degree of angina occurred in 10 of the 13 (patients) actually ligated.” All five patients who had only a skin incision “emphatically described marked improvement” [ 91. Shapiro [ 161 contended that the placebo effect is ever-present and universal: “We are led to the conclusion that the history of medical treatment can be characterized as

296

March

1975

The American

Journal of Medicine

Volume

the results.”

OBSERVER BIAS

not only in the test groups, but also, at comparable rates, in the control groups; this held true not only when illumination was increased, but also when it was decreased. It became clear that the increase in production was not caused by the changes in illumination, but apparently by the increased attention (interviews) the workers received from management. This unplanned effect on the “untreated” control group is called the “Hawthorne effect.” The “placebo effect,” the clinical counterpart of the Hawthorne effect, was obtained in psychopharmacology experiments in the 1950’s. Neglected patients in mental hospitals suddenly found themselves attended to by doctors and nurses who issued medications and asked questions. The placebo effect, however, need not result from a change in the social situation; it can also result from suggestion [ 111.

“While one must be cautious, there is evidence in surgery, as in other fields, that the enthusiast actually gets results which are better than those of the sceptic [ 151.”

influenced

Investigator bias caused one of the greatest scientific delusions of this century, the “n-ray” phenomenon [ 181. N-rays were discovered in 1902 by the eminent French physicist Blondlot, who in Comptes rendus, the leading French scientific journal, reported properties of these rays which far transcended those of xrays. According to Blondlot, n-rays were given off spontaneously by many metals, such as copper, zinc, lead and aluminum, and when they fell upon the eye, the n-rays-most remarkably-increased the eye’s ability to see objects in a nearly dark room. The existence of n-rays was soon confirmed in laboratories in various parts of France, and a number of noted French scientists soon applied n-rays to research’in chemistry, botany, physiology and neurology. In 1904, Science Abstracts listed 77 n-ray pa-

56

S AND DOUBLE-MASKED PROCEDURE IN CLINICAL TRIALS-EDERER

pers. The French Academy awarded Blondlot the Lalande prize of 20,000 francs and its gold medal “for

when the shutter was closed as when it was open

the discovery of the n-rays.” That same year, however, reports began appearing of investigations in laboratories outside of France which failed to confirm the existence of n-rays. In the late summer of that year, the American physicist R. W. Wood visited Blondlot in his laboratory to test the experiments. Here is part of Wood’s own account:

The wish to succeed can influence not only the investigator but also the technician. The finding by Kahn and co-workers [21] that the standard deviation of masked duplicate laboratory tests was two to two and a half times as large as that of known duplicate tests emphasizes the importance of masking technicians. Research workers in pharmacology [6], chemistry [ 171, medicine [ 22-241, surgery [ 121, and medical statistics [3,25,26] have warned of the dangers of

1201.

“He [Blondlot] first showed me a card on which some circles had been painted in luminous paint. He turned down the gas light and called my attention to their increased luminosity when the n-ray was turned on. I said that I saw no change. He said that was because my eyes were not sensitive enough, so that proved nothing. I asked him if I could move an opaque lead screen in and out of the path of the rays while he called out the fluctuations of the screen. He was almost 100 per cent wrong and called out fluctuations when I made no movement at all, and that proved a lot, but I held my tongue. He then showed me the dimly lighted clock, and tried to convince me that he could see the hands when he held a large flat file just above his eyes. I asked if I could hold the file, for I had noticed a flat wooden ruler on his desk, and remembered that wood was one of the few substances that never emitted n-rays. He agreed to this, and I felt around in the dark for the ruler and held it in front of his face. Oh, yes, he could see the hands perfectly. This also proved something [ 191 .I’

observer bias. “Both in the initial assessment

of the patients

and the subsequent assessment of their progress the tests should be applied by observers who remain unaware of which patient is undergoing treatment and which is a control. If this is not done, the subjective judgments which are inseparable from nearly all tests in clinical medicine may prejudice the results [23]. The desire for a successful outcome is felt so strongly by both patient and investigators that objectivity cannot be guaranteed. Both have an emotional stake, overt or occult, in the result . . Highly qualified judges are not always unanimous or unbiased. Where possible, the assessment of those various parameters, which will be compounded into a decision for success or failure, should be made by some different assessor [ 121 .‘I

After Wood published his account, n-ray publications diminished in number. Science Abstracts listed only eight n-ray papers in 1905, and none after 1909. The French Academy changed its announced reason for the award to Blondlot “for his life work, taken as a whole.” According to Seabrook [ 191, the exposure of the blunder led to Blondlot’s madness and death. A masked experiment by the Austrian physicist Pozdena contributed to the disproof of the existence of n-rays. At haphazard intervals Pozdena’s assistant soundlessly operated a shutter which in its closed position blocked the transmission of the hypothetical nrays. During a pretest the assistant wrote “0” for offen and “g” for geschlossen while Pozdena indicated when he could detect increased luminosity. Whereas the shutter’s movements were silent, the assistant’s pencil scratches were not. Pozdena was able to hear the difference between an “0” and a In the definitive experiment the assistant “9.” switched to a coded notation, and in 150 trials Pozdena reported increased luminosity about as often

Evidence to support the existence of investigator bias has been found in reviews of uncontrolled versus controlled clinical studies. Foulds [27] surveyed 72 clinical trials reported in psychiatric journals and found that the experimental treatment succeeded in 43 of 52 (83 per cent) uncontrolled trials, but only in 5 of 20 (25 per cent) controlled trials. Chalmers [28] reported similar findings in a review of studies of stilbesterol to prevent accidents of pregnancy. These results are caricatured in Muench’s Second Law [29] : “Results can always be improved by omitting controls.” Observer bias need not favor the experimental treatment: it could be in the opposite direction. This can occur when the investigator’s bias is against the treatment or when he overcompensates for his known bias in favor of the treatment. Thurber’s moral: “You might as well fall flat on your face as lean over too far backward.” “If an experiment is not blindfold the rule is that objective methods of assessment must be used . . Functional tests . . are readily influenced

March 1975

The American Journal of Medicine

Volume 58

297

BIAS AND DOUBLE-MASKED WIOCEDURE IN CLINICAL TRIALS-EDERER

by the attitude of the examiner. In questioning a patient, even if we use a set form of words, our tone or attitude can easily be affected by our knowledge . . . A number of men, half of whom had received treatment . . . were lined up for inspection . . . by a medical officer who was skeptical of the treatment. After the inspection was over, an attendant remarked: “Did you notice, sir, that you spent much more time examining the treated cases than the controls?”

1251. When everyone is in the dark, subjective measures can be used with confidence as there can be no bias introduced by patient or observer. If an observer had to diagnose a slight paroxysmal cough in a child known to have been vaccinated against whooping cough he might, because of bias in favor of the vaccine, decide it could not be pertussis. What is probably more likely than such cheating is that the clinician would attempt to compensate for his known bias and label the case one of whooping cough against his better judgment [ 301.” No one has stated the case for masking better than Sir Austin Bradford Hill, whose development of the randomized, controlled clinical trial, according to an authority not less than the President of the Royal College of Physicians, was a contribution to medicine “as important and valuable as the discovery of penicillin“ [31]. “If it [the clinical assessment of the patient’s progress and of the severity of the illness] is to be used effectively, without fear and witho~ reproach, the judgments must be made without any possibility of bias, without any overcompensation for a possible bias, and without any possibility of accusation of bias [3] .” MASKING

PROCEDURES

Carrying out masking successfully is usually more easily said than done. It often takes ingenuity, resourcefulness and a good deal of planning to succeed. (One ophthalmologist measured patients’ visual acuity with their bodies draped with a cloth and their heads covered with a hood [32).) It also takes commitment and dedication by each member of the investigative team to the principle that violations of masking could destroy the integrity of the experiment. Pretests of masking procedures are often desirable. The perils of breakdowns in masking are particularly great in single-masked studies. In doublemasked studies, the pharmacist is often the only one to know the treatment of a given patient, and he is

298

March 1975

The American Journal of Medicfne

usually removed from the scene. In single-masked studies, someone is always around, either the physician or the patient, who can give away the critical information, directly or indirectly. It takes careful planning, rehearsals and discipline to prevent breakdowns. In both single- and double-masked studies, objectivity of the assessment can be assured if the posttreatment observations are made not on the patient directly, but on laboratory results, x-rays, photographs, efectric tracings or other documents which do not reveal the identity of the treatment. It is insufficient to proclaim a study as “masked,” whether singly or doubly, and let it go at that, as if the mere label insured the integrity of the masking procedures. A placebo should not only look, smell and taste like the active drug, but should also have the same side effects. Rarely is this ideal achieved, and to the extent that it is not, it is incumbent upon the investigators to determine the effect of this lack on the objectivity of the results 1331. Near the conclusion of the National Diet-Heart Study, the participating subjects and investigators were interviewed to determine his or her best impression as to which of several doubl~masked diets individual subjects were on. The survey produced no. evidence that the integrity of the masking had been violated [34]. A more recent study of vitamin C and the common cold, however, produced contrary evidence. The investigators reported that “vitamin C had a definite but small influence on the frequency, duration or severity of colds.” The investigators warned, though, of a possible bias from a breakdown of masking. A “significant number” of volunteer subjects had guessed their medication. Moreover, those receiving the placebo who thought they were receiving vitamin C had fewer colds than those receiving vitamin C who thought they were receiving the placebo [35]. Some clinical investigators, perhaps most, firmly believe they are unbiased. But science is founded on documentary evidence, not on belief. The scientific public wants to know what procedures were used to prevent bias, what evidence was produced on the success of these procedures, and if the procedures were unsuccessful, what effect this had on the results. Clinical investigators can learn a lesson here from another field, that of interview surveys. The question was asked how a reader of reports on interview surveys can tell whether or not the interviewing was well done.

Volume 58

“A few rules of thumb provide some guidance . . . the greater the detail in which the report describes the methods used, and the fewer the

BIAS AND DOUBLE-MASKEDPROCEDUREIN CLINICAL TRIALS-EDERER

vague claims, such as “highly trained interviewers,” the better the work is likely to be. The description of method may tell something about the selection, training and supervision of interviewers, and this may yield clues to the quality of the interviewing. If 2 or 3 days were devoted to training interviewers, if carefully written instructions were prepared, etc., the resutts are likely to be better than if the interviewers were simply given questionnaires and told to collect interviews [36] .‘I

To this we can add that the scientifically oriented clinical investigator soon learns to judge the quality of clinical research from the description of the procedures to control bias. Medicine will always remain an art. But if it is to progress as a science, then the quality of evidence produced in clinical research will have to improve. Increased awareness by clinical investigators of the omnipresent patient and observer biases, and increased and vigilant use of methods to control and assess them will be major steps in this direction.

REFERENCES 1. 2. 3. 4. 5. 6. 7.

8.

9.

10.

11.

12. 13. 14.

15. 16. 17. 18. 19. 20.

Cornfield J: Principles of research. Am J Ment Defic 64: 240, 1959. Hill AB: Clinical trials, chap 20. Principles of Medical Statistics, London, Oxford Press, 1971, p 244. Hill AB: Statistical Methods in Clinical and Preventive Medicine, chap 1,2,3, Edinburgh, E. & S. Livingstone, 1962. lngelfinger FJ: The randomized clinical trial. N Engl J Med 287: 100, 1972. Spodick DH: The surgical mystique and the double standard. Am Heart J 85: 580, 1973. Lasagna L: The controlled clinical trial: theory and practice. J Chronic Dis 1: 353, 1955. Cobb LA, Thomas GI, Dillars DH, Merendino KA, Bruce RA: An evaluation of internal mammary-artery ligation by double-blind technic. N Engl J Med 260: 1115, 1959. O’Rourke DA, O’Rourke HM: Removal of the carota body for asthma: an appraisal of results. Med J Aust 2: 869, 1964. Dimond EG, Kllle CF, Crocket JE: Evaluation of internal mammary artery ligation and sham procedure in angina pectoris (abstract). Circulation 18: 7 12, 1958. Roethlisberger FG, Dickson WJ: Management and the Worker. An Account of a Research Program Conducted by the Western Electric Company, Hawthorne Works, Chicago. Cambridge, Harvard University Press, 1946. Wolf S: Effects of suggestion and conditioning on action of chemical agents in human subjects-pharmacology of placebos. J Clin Invest 29: 100, 1950. Cox KR: Planning Clinical Experiments. Springfield, Charles C Thomas, 1968. Beecher HK: The powerful placebo. JAMA 159: 1602, 1955. Beecher HK: Evidence for increased effectiveness of placebo with increased stress. Am J Physiol 187: 163. 1956. Beecher HK: Surgery as placebo. A quantitative study of bias. JAMA 176: 1102,196l. Shapiro AK: Factors contributing to the placebo effect. Am J Psychother 18: 73, 1964. Wilson EB: An Introduction to Scientific Research. New York, McGraw-Hill, 1952. Vogt EZ,,Hyman R: Water Witching, U.S.A. Chicago, University of Chicago Press, 1959, p 50. Seabrook W: Doctor Wood. New York, Harcourt, Brace and Co., 1941, p 234. Pozdena RF: Versuche uber Blondlots “Emission Pesante.” Ann Physik 17: 104, 1905.

21.

22.

23.

24.

25. 26.

27. 28.

29.

30.

31. 32. 33. 34.

35.

36.

March 1975

Kahn HA, Medalie JH, Neufeld HN, Ress E, Balogh M. Green JJ: Serum cholesterol: its distribution and association wlth dietary and other variables in a survey of 10,000 men. Isr J Med Sci 5: 1117, 1969. Crofton J: Controlled trials in tuberculosis, Controlled Clinical Trials (Hill AB, ed), Springfiekf, Charles C Thomas, 1960. Fletcher CM: Criteria for diagnosis and assessment in clinical trials, Controlled Clinical Trials (Hill AB. ed). Springfield. Charles C Thomas, 1960. Truelove SC: Therapeutic trials, Medical Surveys and Clinical Trials (Wls W, ed), London, Oxford University Press, 1959. Mainland D: Elementary Medical Statistics, 2nd ed, Philadelphia, W. B. Saunders, 1963. Mainland D: Clinical Trials and Heafth Care Administrators, (note 42). Notes on Biometry in Medical Research. Washington, Veterans Administration Monograph 10-I (suppl6), 1969. Foulds GA: Clinical research in psychiatry. J Ment Sci 104: 259, 1958. Chalmers TC: The impact of clinical trials on medical practice. Presented at the Symposium on Clinical Pharmacological Methods, Tulane University Medical Center, New Orleans, March 24, 1973. Bearman JE. Loewenson RB, Gullen WH: Muench’s Postulates, Laws, and Corollaries. Biometrics Note No. 4, Office of Biometry and Epidemiology, National Eye Institute, Bethesda, f&l., 1974. Knowelden J: Prophylactic trials, Medical Surveys and Clinical Trials (Wltts LJ, ed), London, Oxford University Press, 1959. Atkins H: Conduct of a controlled clinical trial. Br Med J 2: 377, 1966. Salus R: A contribution to the diagnosis of arteriosclerosis and hypertension. Am J Ophthalmol45: 81, 1958. Beatty WW: How blind is blind? Psycho1 Bull 78: 70, 1972 National Diet-Heart Study Research Group: The National Diet-Heart Study. Circulation 37 (suppl I) no.3, 1988. U.S. Department of Health, Education, and Welfare, National lnstftutes of Health, Cold study reveals some vitamin C influence; more research needed, Bethesda, NIH Record, vol 25, 1973, p 4. Wallis WA, Roberts HV: Statistics, A New Approach, Glentoe, The Free Press, 1956, p 151.

The American Journal of Medlclne

Volume 59

299

Patient bias, investigator bias and the double-masked procedure in clinical trials.

MARCH The American Journal 1975 of Medicine VOLUME NUMBER 58 3 EDITORIAL Patient Bias, Investigator Bias and the Double-Masked Procedure in Cli...
602KB Sizes 0 Downloads 0 Views