ClinEpidemiol Vol.45,No. 6,pp. 567-580,1992


in Great


All rights

0895-4356/92 $5.00+0.00 CopyrIght



1992 Pergamon





Yale University School of Medicine, P.O. Box 3333, New Haven, CT 06510-8025, U.S.A (Received for publication 8 January 1992)

In 1985, the Journal published R bibliography of publications on observer variability [I], thus updating the two main prior collections one and two decades earlier [2,3]. Since 1985, the literature on this topic has expanded substantially, but papers published since 1990 can now be easily accessed through computerized literature search systems, such as MEDLINE;“;. The improved accessibility occurred because the National Library of Medicine in 1990 introduced “observer variation” as a new medical subject heading, with four cross-reference “key words”: (a) inter-observer variation, (b) intraobserver variation, (c) observer bias and (d) bias, observer. If pertinent new studies are cited with at least one of the appropriate terms, the publications will easily be found when sought. The many papers that were published before 1990, however, will not necessarily be listed in the new computerized collection because “observer variability” was not previously cited as a reference term. The current report, therefore, is intended to give the pre-1990 literature a chance to survive by updating the 1985 bibliography, thus easing future searches in this field. For the purpose of the current bibliography, an observer was defined as either a physician, a health professional, or a patient. Observer variability can arise during description, when the *From the Clinical Epidemiology Unit, Yale University School of Medicine, New Haven, CT, U.S.A. TRobert Wood Johnson Clinical Scholar, Yale University School of Medicine, New Haven, CT. SSterling Professor of Medicine and Epidemiology; CoDirector, Robert Wood Johnson Clinical Scholars Program, Yale University School of Medicine, New Haven, CT. §Author for correspondence.

observed entity is converted into data, or during classification, when the data are converted into diagnostic or other stipulated categories. In two or more observations of the same entity, intraobserver variability arises when the same person gets different results, and inter-observer variability, when two or more people disagree. The bibliography is confined to these two types of variability and excludes phenomena in which the main variability occurs before the evidence gets to “the eye of the beholder”. For example, sequential measurements of blood pressure in the same patient may differ not because of observer variability, but because the blood pressure was altered over time. We also excluded studies in which the variability arises during the operation of devices or systems, such as automated laboratory measurements of plasma electrolyte concentration. For the many phenomena in clinical medicine that are not easily measured and that lack a convenient or definitive “gold standard”, the consequences of observer variability may be dramatic. A radiologist’s interpretation of a mass lesion may lead to an expensive and invasive diagnostic work-up; a pathologist’s reading of a tissue slide may determine whether a woman keeps her breast or loses it; a research technician’s decision about primary clinical data may affect the final results of a research project. The criteria used for selecting articles for inclusion in this bibliography were that observer variation be a prime topic in the study, that the observed entities refer to human phenomena, and that the text be in English. While the current bibliography includes references published before 1985, we excluded all citations 567



listed in the previous bibliography [ 11. By examining the previous and current lists, a reader should be able (we hope) to find almost all the cogent publications that appeared before 1990. The bibliography that follows is not claimed or intended to be exhaustive. It represents articles we have encountered since the publication of the prior bibliography [I], as well as pertinent references inadvertently omitted therein. Additional articles that emerged from a computerized MEDLINE” literature search for 1983-1989 were also included. The search was aimed at articles in which the key words observer, inter-observer, or intra-observer were found in the same sentence as any of the following terms: variation, bias, validity, variability, or reliability. The references have been arranged using the same categories as the prior bibliography [1], with four main rubrics containing 29 sections, followed by a final miscellaneous section. Within each sectional category, the references are arranged according to year of publication, and then alphabetically by first author. The titles of journals are abbreviated according to the style used in Index Medicus. The classification scheme is as follows:

4. 5.







12. 13.

14. INFORMATION FROM PATIENTS: 1. Methods of acquiring data; 2. Demography; 3. Diet and nutrition; 4. Epidemiologic risk factors; 5. General health status; 6. Symptoms and other clinical events; 7. Psychiatric events; 8. Antecedent therapy. CLINICAL EXAMINATION: 9. Auscultation; 10. Blood pressure; Il. Other physical signs; 12. Clinical procedures; 13. Laboratory tests; 14. Medical record analysis. CLINICAL DECISIONS: 15. General diagnosis; 16. Diagnoses for specific-organ diseases; 17. Psychiatric diagnoses; 18. Management decisions; 19. Appraisal of outcome; 20. Adverse drug reactions. PARACLINICAL TESTS AND PROCEDURES: 2 1. Electrocardiography; 22. Exercise stress tests; 23. Endoscopy; 24. Other paraclinical tests; 25. Conventional radiography; 26. Angiography; 27. New imaging techniques; 28. Cytology; 29. Histopathology.

2. Demography 1.





1. Methods of Acquiring Data Enterline PE, Capt KG. A validation of information provided by household respondents in health surveys. Am J Public Health 1959; 49:205-212. O’Muircheartaigh CA, Wiggins RD. The impact of interviewer variability in an epidemiological survey. F’sycbol Med 1981; 11:817-824. Herrmann N. Retrospective information from ques. ^ tionnaires. 1. Comparability 01. primary respondents

Little RE. Birthweight and gestational age: mother’s estimates compared with state and hospital records. Am J Public Health 1986; 76: 135CLl351.



The citations are as follows:

and their next-of-kin. Am J Epidemiol 1985; 121:937-953. Horwitz RI, Yu EC. Problems and proposals for interview data in epidemiological research. Am J Epidemiol 1985; 14:463467. Lerchen ML, Samet JM. An assessment of the validity of questionnaire responses provided by a surviving spouse. Am J Epidemiol 1986; 123:481489. O’Toole BI, Battistutta D, Long A, Crouch K. A comparison of costs and data quality of three health survey methods: mail, telephone and personal home interview. Am J Epidemiol 1986; 124:317-328. Quaak MJ, Westerman RF, Schouten JA, Hasman A, van Bemmel JH. Appraisal of computerized medical histories: comparisons between computerized and conventional records. Comput Biomed Res 1986; 19:551-564. Quaak MJ, Westerman RF, Schouten JA, Hasman A, van Bemmel JH. Computerization of patient historypatient answers compared with medical records. Meth Inform Med 1986; 25:222-228. Ramsdell JW. Concordance of the ambulatory medical record and patients’ recollections of aspects of an ambulatory new-patient visit. J Cen Intern Mecl 1986; 1: 159-162. Shalat SL, Christiani DC, Baker EL Jr. Accuracy of work history obtained from a spouse. Stand J Work Environ Health 1987; 13:6749.Gerbert B. Stone G. Stulbare M. Gullion DS. Greenfield S. Agreement among physician assessment methods. Med Care 1988; 26: 519-535. O’Brien J, Francis A. The use of next-of-kin to estimate pain in cancer patients. Pain 1988; 35: 171-178. Walter SD, Clarke EA, Hatcher J, Stitt LW. A comparison of physician and patient reports of pap smear histories, J Clin Epidemiol 1988; 41 :401410. Kemper KJ, Fink HD, McCarthy PL. The reliability and validity of the pediatric appropriateness evaluation protocol. QRB 1989; 77-80.


Nicoll A, Bassett K, Ulijaszek SJ. What’s in a name? Accuracy of using surnames and forenames in ascribing Asian ethnic identity in English populations. J Epidemiol Community Health 1986; 40: 364-368. Schumacher MC. Comparison of occupation and industry information from death certificates and interviews. Am J Public Health 1986; 76:635437. Silverman JM, Breitner JCS, Mohs RC, Davis KL. Reliability of the family history method in genetic studies of Alzheimer’s disease and related dementias. Am J Psychiatry 1986; 143: 1279-1282. Seidman DS, Slater PE, Ever-Hadani P, Gale R. Accuracy of mothers’ recall of birthweight and gestational age. Br J Obstet Gynaecol 1987; 94:731-735. Bond GG, Bodner KM, Sobel W, Shellenberger RJ, Flores GH. Validation of work histories obtained from interviews. Am J Epidemiol 1988; 128: 343-351. Rona RJ, Mosbech J. Validity and repeatability of self-reported occupational and industrial history from patients in EEC countries. Int J Epidemiol 1989; 18:674679.

3. Diet and Nutrition 1. Baker JP, Detsky AS, Wesson DE, Wolman SL, Stewart S, Whitewell J, et al. Nutritional assessment: a comparison of clinical judgment and objective measurements. N Engl J Med 1982; 306:969-972.


on Observer

Watson CC, Tilleskjor C, Hoodecheck-Schow EA, Puce1 J, Jacobs L. Do alcoholics give valid self-reports? J Stud Alcohol 1984; 45: 344-348. MB, Sternberg B. Assessment of salt use 3. Mittehnark at the table: comparison of observed and reported behavior. Am J Public Health 1985: 75: 1215-1216. NE, Smith’ EL, Gilligan C. 4. Sempos CT, Johnson Effects of intraindividual and interindividual variation in repeated dietary records. Am J Epidemiol 1985; 121:120-130. 5. Willett WC, Sampson L, Stampfer MJ, Rosner B, Bain C, Witschi J et al. Reproducibility and validity of a semiquantitative food frequency questionnaire. Am J Epidemiol 1985; 122:51-65. GE, Yeung KAS, Bright-See E. 6. McKeown-Eyssen Assessment of past diet in epidemiologic studies. Am J Epidemiol 1986; 124:94-103. 7. Wu ML, Whittemore AS, Jung DL. Errors in reported dietary intakes. Am J Epidedol 1986; 124:82&835. K. Baron RB. 8. Cummings SR. Block G. McHenrv Evaluation of two food frequency m&hods of measuring dietary calcium intake. Am J Epidemiol 1987; I26 : 796-802. FE, Lamphiear DE, Metzner HL, 9. Thompson Hawthorne VM, Oh MS. Reproducibility of reports of frequency of food use in the Tecumseh diet methodologv studv. Am J Eoidemiol 1987: 125 : 658-67 1. 10. Hunter Dj, Sampson L, Stampfer’ MJ, Colditz GA, Rosner B, Willett WC. Variability in portion sizes of commonly consumed foods among a population of women in the United States. Am J Epidemiol 1988; 127: 1240.-1249. AM, Haapa E, Rasanen L, 11. Pietinen P, Hartman Haapakoski J, Palmgren J et al. Reproducibility and validity of dietary assessment instruments. I. A self-administered food use questionnaire with a portion size picture booklet. Am J Epidemiol 1988; 128: 655466. AM, Haapa E, Rasanen L, 12. Pietinen PJ, Hartman Haapakoski J, Palmgren J ef al. Reproducibility and validity of dietary assessment instruments. II. A qualiAm J Epidemiol tative food frequency questionnaire. 1988; 128:667 -676. 13. Wu ML, Whittemore AS, Jung DL. Errors in reported dietary intakes. II. Long-term recall. Am J Epidemiol 1988; 128: 1137-I 145. D, Obermann-De Boer 14. Bloemberg BPM, Kromhout CL, van Kampen-Donker M. The reproducibility of dietary intake data assessed with the cross-check dietary data history method. Am J Epidemiol 1989; 130: 1047-1056. 15. Jain M, Howe CR, Harrison L, Miller AB. A study of repeatability of dietary data over a seven-year period. Am J Epidemiol 1989; 129:422-429. 2.
















4. Epidemiologic Risk Factors Jonathan G, Moore F. Roberts L. A discussion of technique and an analysis of errors in taking industrial histories ir. coal-miners. Br J Ind Med 1957; 14: 135-136. Ashford JR, Forwell CD, Routledge R. A study of the repeatability of ventilatory tests, anthropometric measurements, and answers to a respiratory symptoms questionnaire in working coal-miners. Br J Ind Med 1960; 17:114-121.

The Multiple Risk Factor Intervention Trial Group. The MRFIT behavior pattern study-l. Study design procedures, and reproducibility of behavior pattern judgments. J Chron Dis 1979; 32: 293-305. Shackleton S, Piney MD. A comparison of two methods of measuring personal noise exposure. Ann Occup Hyg 1984; 28:373-390.




Coggon D, Pippard EC, Acheson ED. Accuracy of occupational histories obtained from wives. Br J Ind Med 1985; 42: 563-564. Roht LH, Vernon SW, Weir FW, Pier SM, Sullivan P, Reed LJ. Community exposure to hazardous waste disposal sites: assessing reporting bias. Am J Epidemiol 1985; 122:418-433. Peach H, Shah D, Morris RW. Validity of smokers’ information about past and present cigarette brandsimplications for studies of the effects of falling tar yields of cigarettes on health. Thorax 1986; 41: 203-207. Sandler DP, Shore DL. Quality of data on parents’ smoking and drinking provided by adult offspring. Am J Epidemiol 1986; 124: 768-778. Brunekreef B, Noy D, Clausing P. Variability of exposure measurements in environmental epidemiology. Am J Epidemiol 1987; 125:892-898. Jarvholm B, Sanden A. Estimating asbestos exposure: a comparison of methods. J Occup Med 1987; 29:361-363. McLaughlin JK, Dietz MS, Mehl ES, Blot WJ. Reliability of surrogate information on cigarette smoking by type of informant. Am J Epidemiol 1987; 126: 144146. Britten N. Validity of claims to lifelong non-smoking at age 36 in a longitudinal study. Int J Epidemiol 1988; 17: 525-529. Coates RA, Calzavara LM, Soskolne CL, Read SE, Fanning MM, Shepherd FA et al. Validity of sexual histories in a prospective study of male sexual contacts of men with AIDS or an AIDS-related condition. Am J Epidemiol 1988; 128:719-728. Office of Driver and Pedestrian Research (CDC). Comparison of observed and self-reported seat belt use rates-United States. MMWR 1988; 37: 549-551. Brownson RC, Davis JR, Chang JC, DiLorenzo TM, Keefe TJ, Bagby JR Jr. A study of the accuracy of cancer risk factor information reported to a central registry compared with that obtained by interview. Am J kpidemiol 1989; 129:616-624. Chone JP. Turuie I. Haines T. Muir G. Farnworth H. Crutt&den K if al.‘Concordahce of occupational and environmental exposure information elicited from patients with Alzheimer’s disease and surrogate respondents. Am J Ind Med 1989; 15: 73-89. Krall EA, Valadian I, Dwyer JT, Gardner J. Accuracy of recalled smoking data. Am J Public Health 1989; 79 : 200-206. Luepker RV, Pallonen UE, Murray DM, Pirie PL. Validity of telephone surveys in assessing cigarette smoking in young adults. Am J Public Health 1989; 79 : 202-204. Siemiatycki J, Dewar R, Richardson L. Costs and statistical power associated with five methods of collecting occupation exposure information for population-based caseecontrol studies. Am J Epidemiol 1989; 130: 1236-1246. Smith KW. McKinlay SM, McKinlay JB. The reliability of health risk appraisals: a field trial of four instruments. Am J Public Health 1989; 79: 1603~1607.

5. General Health Status Guze SB, Tuason VB, Stewart MA, Picken B. The drinking history: a comparison of reports by subjects and their relatives. Q J Stud Alcohol 1963; 24: 249-260. Kaku K, Gilbert FI, Sachs RR. Comparison of health appraisals by nurses and physicians. Public Health Rep 1970; 85: 1042-1046. Lee P, Jasani MK, Dick WC, Buchanan WW. Evaluation of a functional index in rheumatoid arthritis. Stand J Rheumatol 1973; 2:71-77.




Mulford HA, Fitzgerald JL. On the reliability of the Iowa Alcoholic Stages Index. J Stud Alcohol 1977;


Hansen TM, Keiding S, Lauritzen SL, Manthorpe R, Sorensen SF, Wiik A. Clinical assessment of disease activity in rheumatoid arthritis. Scsnd J Rheumatol


1979; 8: 101-105.


Hutchinson TA, Boyd NF, Feinstein AR, Gonda A, Hollomby D, Rowat B. Scientific problems in clinical scales, as demonstrated in the Karnofsky Index of Performance Status. J Chron Dis 1979; 32:661-666. 7. Chick J. Alcohol dependence: methodological issues in its measurement; reliability of the criteria. Br J Addict 1980; 75: 175-186. 8. Schag CC, Heinrich RL, Ganz PA. Karnofsky Performance Status revisited: reliability, validity, and guidelines. J Clin Oncol 1984; 2: 1877193. 9. Stewart AW, Jackson RT, Ford MA, Beaglehole R. Underestimation of relative weight by use of self-reported height and weight. Am J Epidemiol 1987;

13. Linet MS, Stewart WF, Celentano DD, van Natta ML, Zeigler DK, Sprecher MA. Reproducibility of reported symptoms in a population-based prevalence survey of headache. Am J Epidemiol 1987; 126: 738-739.

14. Stanton B, Clemens J, Aziz KMA, Khatun K, Ahmed S, Khatun J. Comparability of results obtained by two-week home maintained diarrhoeal calendar with two-week diarrhoeal recall. Int J Epidemiol 1987; 16:595-601.

15. Svedlund J, Sjodin I, Dotevall G. GSRS-A clinical rating scale for gastrointestinal symptoms in patients with irritable bowel syndrome and peptic ulcer disease. Dig Dis Sci 1988; 33: 129-134. 16. Villar J, Dorgan J, Menendez R, Bolanos L, Pareja G, Kestler E. Perinatal data reliability in a large teaching obstetric unit. Br J Obstet Gynaecol 1988; 95:841-848. 17. Alam N, Henry FJ, Rahaman MM. Reporting errors in one-week diarrhoea recall surveys: experience from a prospective study in rural Bangladesh. Int J Epidemiol 1989; 18 : 697-700.

125: 122-126.

10. Ford AB, Folmar SJ, Salmon RB, Medalie JH, Roy AW, Galazka SS. Health and function in the old and very old. J Am Geriatr Sot 1988; 36: 1877197.

18. Macintyre S, Pritchard C. Comparisons between the self-assessed and observer-assessed presence and severity of colds. Sot Sci Med 1989; 29: 1243-1248.

6 Symptoms

7. Psychiatric

and Other Clinical Events

1. Cobb S, Rosenbaum J. A comparison of specific symptom data obtained by nonmedical interviewers and by physicians. J Chron Dis 1956; 4:245-252. 2. Ritchie DM, Boyle JA, McInnes JM, Jasani MK, Dalakos TG, Grieveson P et al. Clinical studies with an articular index for the assessment of joint tenderness in patients with rheumatoid arthritis. Q J Med 1968; XXXVII : 393406. 3. Milne JS, Williamson J. Comparison of teaching machine with an observer in detection of angina pectoris by questionnaire. Br J Prev Sot Med 1971; 25: 1055108. 4. Delilkan AE. Comparison of subjective estimates by surgeons and anaesthetists of operative blood loss. Br


Mendlewicz J, Fleiss JL, Cataldo M, Rainer JD. Accuracy of the family history method in affective illness. Arch Gen Psychiatry 1975; 32: 309-314. Irvin LK, Halpern AS. Reliability and validity of the Social and Prevocational Information Battery for mildly retarded individuals. Am J Ment Defic 1977; 81:603605. Zimmerman M, Coryell W, Wilson S, Corenthal C. Evaluation of symptoms of major depressive disorder: self-report vs. clinician ratings. J Nerv Ment Dis 1986; 174: 150-153. Wilson HS. Field trials of the phenomena of concern for psychiatric/mental health nursing: proposed methodology. Arch Psychiatr Nurs 1989; 3 : 305-308.

Med J 1972; 2:619-621.


Horrocks JC, de Dombal FT. Diagnosis of dyspepsia from data collected by a physician’s assistant. Br Med J 1975; 3:421-423.






Bernstein RA, Giefer EE, Rimm AA. Gallbladder disease-I. Assessment of validity and reliability of data derived from a questionnaire. J Chron Dis 1976; 29:51-58. Chisholm EM, de Dombal FT, Giles GR. Validation of a self-administered questionnaire to elicit gastrointestinal symptoms. Br Med J 1985; 290: 179551796. Hickam DH, Sox HC Jr; Sox CH. Systematic bias in recording the history in patients with chest pain. J Chron Dis 1985: 38:91-LOO. McLellan AT, ‘Luborsky L, Cacciola J. Griffith J, Evans F, Barr HL et al. New data from the Addiction Severity Index: reliability and validity in three centers. J Nerv Ment Dis 1985; 173:412423. Tilley BC, Barnes AB, Bergstralh E, Labarthe D, Noller KL, Colton T et al. A comparison of pregnancy history recall and medical records. Am J Epidemiol 1985; 121:269-281.

11. Becklake MR, Freeman S, Goldsmith C, Hessel PA, Mkhwelo R, Mokoetle K et al. Respiratory questionnaires in occupational studies: their use in multilingual workforces on the Witwatersrand. Int J Epidemiol 1987; 16:606411.

12. Colditz GA, Stampfer MJ, Willett WC, Stason Rosner B, Hennekens CH et al. Reproducibility validity of self-reported menopausal status prospective cohort study. Am J Epidemiol 126:319-325.

WB, and in a 1987;

8. Antecedent


1. Hulka BS, Kupper LL, Cassel JC, Efird RL. Medication use and misuse: physician-patient discrepancies. J Chron Dis 1975; 28:7-21. 2. Jackson JE, Ramsdell JW, Renvall M, Swart J, Ward H. Reliability of drug histories in a specialized geriatric outpatient clinic. J Gen Intern Med 1989; 4: 3943.

9. Auscultation 1. Rectra EH, Khan AH, Pigott VM, Spodick DH. Audibility of the fourth heart sound: a prospective, “blind” auscultatory and polygraphic investigation. JAMA 1972; 221:36-41. 2. Shirai F, Kudoh S, Shibuya A, Sada K, Mikami R. Crackles in asbestos workers: auscultation and lung sound analysis. Br J Dis Chest 1981; 75:386396. 3. Workum P, DelBono EA, Holford SK, Murphy RLH Jr. Observer agreement, chest auscultation, and crackles in asbestos-exuosed workers. Chest 1986: 89: 27-29. 4. Ishmail AA, Wing S, Ferguson J, Hutchinson TA, Magder S, Flegel KM. Interobserver agreement by auscultation in the presence of a third heart sound in patients with congestive heart failure. Chest 1987;

91: 870-873. 5. Jordan MD, Taylor CR, Nyhuis AW, Tavel ME. Audibility of the fourth heart sound: relationship to presence of disease and examiner experience. Arch Intern Med 1987; 147:721-726.

Publications 6.

Murphy RLH Jr, DelBono EA, Davidson dation of an automatic crackle (rale) counter. Respir Dis 1989; 140: 1017~1020.

on Observer

F. ValiAm Rev



lo. Blood Pressure 1.


3. 4.













Wright IS, Schneider RF, Ungerleider HE. Factors of error in blood pressure readings: a survey of methods of teaching and interpretation. Am Heart J 1938; 16: 469476. Anderson WF, Cowan NR. Observer error in recording arterial blood pressure. Br Heart J 1961; 25: 169-172. Wilcox J. Observer factors in the measurement of blood pressure. Nurs Res 1961; 10:4-17. Armitaee P. Rose GA. The variability of measurements lf casual blood pressure: I. A ladoratory study. Clin Sci 1966; 30:325-335. Patterson HR. Sources of error in recording the blood pressure of patients with hypertension in general practice. Br Med J 1984; 289: 1661-1664. DeGaudemaris R, Folsom AR, Prineas RJ, Luepker RV. The random-zero versus the standard mercury sphygmomanometer: a systematic blood pressure difference. Am J Epidemiol 1985; 121:282-290. Gould BA, Hornung RS, Kieso H, Cashman PMM, Raftery EB. An evaluation of self-recorded blood pressure during drug trials. Hypertension 1986; 8 : 267-27 1. Hense HW, Stieber J, Chambless L. Factors associated with measured differences between fourth and fifth phase diastolic blood pressure. Int J Epidemiol 1986; 15:513-518. Hessel PA. Terminal digit preference in blood pressure measurements: effects on epidemiological associations. Int J Epidemiol 1986; 15: 122-125. Neufeld PD, Johnson DL. Observer error in blood pressure measurement. Can Med Assoc J 1986; 135:633-637. Burke GL, Webber LS, Shear CL, Zinkgraf SA, Smoak CG, Berenson GS. Sources of error in measurement of children’s blood pressure in a large epidemiologic study: Bogalusa Heart Study. J Chron Dis 1987; 40:83-89. Fowkes FGR, Housley E, MacIntyre CCA, Prescott RJ, Ruckley CV. Variability of ankle and brachial systolic pressures in the measurement of atherosclerotic peripheral arterial disease. J Epidemiol Community Health 1988; 42: 128-133. James GD, Pickering TG, Yee LS, Harshfield GA, Riva S, Laragh JH. The reproducibility of average ambulatory, home, and clinic pressures. Hypertension 1988; 11: 545-549. Llabre MM, Ironson GH, Spitzer SB, Gellman MD, Weidler DJ, Schneiderman N. How many blood pressure measurements are good enough? An application of generalizability theory to the study of blood pressure reliability. Psychophysiology 1988; 25: 97-106. Parker D, Liu K, Dyer AR, Giumetti D, Liao Y, Stamler J. A comparison of the random-zero and standard mercury sphygmomanometers. Hypertension 1988; 11: 269-272. Villar J, Repke J, Markush L, Calvert W, Rhoads G. The measuring of blood pressure during pregnancy. Am J Obstet Gynecol 1989; 161: 1019~1024.

Il. Other Physical

4. 5.








13. 14.








Signs 22.


Sisk C, Ziegler DK, Zileli T. Discrepancies in recorded results from duplicate neurological history and examination in patients studied for prognosis in cerebrovascular disease. Stroke 1970; 1: 14-t 8.



Chamberlain J, Rogers P, Price JL, Ginks S, Nathan BE, Burn 1. Validity of clinical examination and mammography as screening tests for breast cancer. Lancet 1975; 2: 10261030. Fischman S, Picozzi A, Juliano D, Slakter M, English J. Examiner Standardization for caries studies. J Dent Res 1976; 55:926-929. Low JL. The reliability of joint measurement. Physiotherapy 1976; 62 : 227-229. Moertel CG, Hanley JA. The effect of measuring error on the results of therapeutic trials in advanced cancer. Cancer 1976; 38 : 388-394. Nicholas JJ, Taylor FH, Buckingham RB, Ottonello D. Measurement of circumference of the knee with Ann Rheum Dis 1976; ordinary tape measure. 35 : 282-284. Marks JS, Palmer MK, Burke MJ, Smith P. Observer variation in the examination of knee joints. Ann Rheum Dis 1978; 37: 376-377. Teasdale G, Knill-Jones R, van der Sande J. Observer variability in assessing impaired consciousness and coma. J Neuroi Neurosurg-Psychiatry 1978; 41:603-610. Jacobs HD. Farndell PR. Grobbelaar PS. Smith DJ. Bromfield ME. Observer bias and error in the integumentary clinical diagnosis of chronic anaemia. S Afr Med J 1979; 55: 1031-1034. Lincoln N, Leadbitter D. Assessment of motor Physiotherapy 1979; function in stroke patients. 65:48-51. Meyhoff HH, Roder 0, Andersen B. Palpatory estimation of liver size: within- and between-observer variation. Acta Chir Scand 1979; 145:479-481. van den Berge JH, Schouten HJA, Boom&a S, van Drunen Littel S, Braakman R. Interobserver agreement in assessment of ocular signs in coma. J Neural Neurosurg Psychiatry 1979; 42: 1163-l 168. Gibson RA, Sanderson HF. Observer variation in ophthalmology. Br J Ophthalmol 1980; 64:457460. Lavin PT, Flowerdew G. Studies in variation associated with the measurement of solid tumors. Cancer 1980; 46: 1286-1290. Lawson IR, Ingman SR, Masih Y, Freeman B. Reliability of palpation of pedal pulses as ascertained by 1980; the kappa statistic. J Am Geriatr Sot 28 : 300@303. Theodossi A, Knill-Jones RP, Skene A, Lindberg G, Bjerregaard B, Holst-Christensen J et al. Inter-abserver variation of symptoms and signs in jaundice. Liver 1981; I : 21-32. Thomas DC, Spitzer WO, MacFarlane JK. Inter-observer error among surgeons and nurses in presymptomatic detection of breast disease. J Chron Dis 198 I; 34r617-626. Priebe WM, DaCosta LR, Beck IT. Is epigastric tenderness a sign of peptic ulcer disease? Gastroenterology 1982; 82: 16-19. Spodick DH, Sugiura T, Doi Y, Paladin0 D, Haffty BG. Rate of rise of the carotid pulse: an investigation of observer error in a common clinical measurement. Am J Cardiol 1982; 49: 159-162. Stubbing DG, Mathur PN. Roberts RS, Moran Campbell EJ. Some physical signs in patients with chronic airflow obstruction. Am Rev Respir Dis 1982; 125: 549-552. Thyssen HH, Brynskov J, Jansen EC, Munster-Swendsen J. Normal ranges and reproducibility for the quantitative Romberg’s test. Acta Neurol Stand 1982; 66: 100-104. Tomasello F, Mariani F, Fieschi C, Argentino C, Bono G, De Zanche L ef al. Assessment of inter-observer differences in the Italian multicenter study on reversible cerebral ischemia. Stroke 1982; 13:32-35.




Lindsay KW, Teasdale GM, Knill-Jones RP. Observer variability in assessing the clinical features of subarachnoid hemorrhage. J Neurosurg 1983; 58: 5762. 24. Ralphs DNL, Venn G, Khan 0, Palmer JG, Cameron DE, Hobsley M. Is the undeniably palpable liver ever ‘normal’? Ann R CoII Surg Engl 1983; 65: 1599160. 25. Gjorup T, Bugge PM, Jensen AM. Interobserver variation in assessment of respiratory signs: physicians’ guesses as to interobserver variation. Acta Med Stand




1984; 216:6166.








Malchow-Moller A, Rasmussen SN, Jensen AM, Keiding N, Skovgaard LT, Juhl E. Clinical estimation of liver size. Dan Med BuU 1984; 3 1: 6367. Mann M, Glasheen-Wray M, Nyberg R. Therapist agreement for palpation and observation of iliac crest heights. Phys Ther 1984; 64: 334338. Fletcher SW, O’Malley MS, Bunce LA. Physician’s abilities to detect lumps in silicone breast models JAMA 1985; 253:22242228. Gajdosik R, Simpson R, Smith R, DonTigny RL. Intratester reliability of measuring the standing position and range of motion. Phys Ther 1985; 65: 169-174. Krebs DE, Edelstein JE, Fishman S. Reliability of observational kinematic gait analysis. Phys Ther 1985; 65: 1027-1033. Potter NA, Rothstein JM. Intertester reliability for selected clinical tests of the sacroiliac joint. Phys Ther 1985; 65: 1671-1675. Shinar D, Gross CR, Mohr JP, Caplan LR, Price TR, Wolf PA er al. Interobserver variability in the assessment of neurologic history and examination in the Stroke Data Bank. Arch Neurol 1985;








42: 557-565.



Cote R, Hachinski VC, Shurvell BL, Norris JW, Woflson C. The Canadian Neurological Scale: a preliminary study in acute stroke. Stroke 1986;


17:731-737. Gjorup T, Bugge PM, Hendriksen C, Jensen AM. A

critical evaluation of the clinical diagnosis of anemia. Am J Eoidemiol

1986: 124:657-665.

Merritt jL, McLean TJ, Erickson RP. Measurement of trunk flexibility in normal subjects: reproducibility of three clinical methods. Mayo CIin Proc 1986; 61: 1922197. 36. Mulrow CD, Dolmatch BL, Delong ER, Feussner JR, Benyunes MC, Dietz JL et al. Observer variability in the pulmonary examination. J Gen Intern Med 1986;




55. 56.

1: 364367. Stokman CJ, Shafer SQ, Shaffer D, Ng SK, O’Connor

PA, Wolff RR. Assessment of neurological ‘soft signs’ in adolescents: reliability studies. Dev Med Child


Neurol 1986; 28 : 428439.

Bohannon RW, Smith MB. Interrater reliability of a modified Ashworth scale of muscle spasticity. Phys Ther 1987; 67 :206207. 39. Bursch SG. Interrater reliability of diastasis recti Ther Phys 1987; abdominis measurement. 67: 1077-1079. 40. Espinoza P, Ducot B, Pelletier G, Attali P, Buffet C, David B er al. Interobserver agreement in the physical diagnosis of alcoholic liver disease. Dig Dis Sci 1987; 32 : 244247. 41. Hertzman C, Walter SD, From L, Alison A. Observer perception of skin color in a study of malignant melanoma. Am J Euidemiol 1987; 126:901-911. 42. Naylor CD, McCor-mack DG, Sullivan SN. The midclavicular line: a wandering landmark. Can Med Assoc J 1987; 136:48-50. 43. van den Berge JH, Braakman R, Schouten HJA. Interobserver agreement in assessment of vestibuloocular response. J Neural Neurosurg Psychiatry 1987; 38.

50: 1045-1047.

Wit JM, Delemarre van der Waal HA, Faber JA, van den Brande JL. Intra- and inter-observer variability in the assessment of testicular descent. AndrologIa 1987; 19 : 585-590. Alviso DJ, Dong GT, Lentell GL. Intertester reliability for measuring pelvic tilt in standing. Phys Ther 1988; 68: 1347-1351. Coates RA, Fanning MM, Johnson JK, Calzavara L. Assessment of generalized lymphadenopathy in AIDS research: the degree of clinical agreement. J Clin


1988; 41: 267-273.

Klinkhoff AV, Beliamy N, Bombardier C, Carette S, Chalmers A, Esdaile JM et al. An experiment in reducing interobserver variability of the examination for joint tenderness. J Rheumatol 1988; 15: 492494. Spiteri MA, Cook DG, Clarke SW. Reliability of eliciting physical signs in examination of the chest. Lancet 1988; I : 8733875. van Dillen LR, Roach KE. Interrater reliability of a clinical scale of rigidity. Phys Ther 1988; 68: 1679-1681. Bailey SM, Sarmandal P, Grant JM. A comparison of three methods of assessing inter-observer variation applied to measurement of the symphysis-fundal height. Br J Obstet Gynaecol 1989; 96: 12661271. Evans RA, Harries ML, Baguley DM, Moffat DA. Reliability of the House and Brackmann grading system for facial palsy. J Laryngol Otol 1989; 103: 104551046. Gadsbll N, Hilund-Carlsen PF, Nielsen GG, Beming J, Brunn NE, Stage P et al. Symptoms and signs of heart failure in patients with myocardial infarction: reproducibility and relationship to chest X-ray, radionuclide ventriculography and right heart catheterization Eur Heart J 1989; 10: 1017-1028. Lord S, Sawyer B, Pond D, 0’ Connell D, Eyland A, Mant A et al. Interrater reliability of computer-assisted scoring of breathing during sleep. Sleep 1989; 12: 55@558. Mootz RD, Keating JC Jr, Kontz HP, Milus TB, Jacobs GE. Intra- and interobserver reliability of passive motion palpation of the lumbar spine. J Manipulative Physiol Ther 1989; 12:440445. CYNeill TW, Smith M, Barry M, Graham IM. Diagnostic value of the apex beat. Lancet 1989; I :410411. Olsen LH. Inter-observer variation in assessment of undescended testis; analysis of kappa statistics as a coefficient of reliability. Br J Urol 1989; 64: 644648. Pattinson RC, Theron GB. Inter-observer variation in symphysis-fundus measurements. A plea for individualized antenatal care. S Afr Med J 1989; 76:621622. Stockstill JW, Gross AJ, McCall WD Jr. Interrater reliability in masticatory muscle palpation. J Craniomandib Disord 1989; 3: 143-146.

59. Tuffnell DJ, Bryce F, Johnson N, Lilford RJ. Simulation of cervical changes in labour: reproducibility of expert assessment. Lancet 1989; 2: 1089-1090.


Clinical Procedures

Baron JH, Connell AM, Lennard-Jones JE. Variation between observers in describing mucosal appearances in proctocolitis. Br Med J 1964; 1:89-92. 2. Dixon RA, Johnston SM. Sources of variation in clinical observations: problems of teaching and some results. Methods Inf Med 1972; 11:177-182. 3. Kafer ER, Donnelly P. Reproducibility of data on steady-state gas exchange and indices of maldistribution of ventilation and blood flow. Chest 1977; 71:758-761.

Publications 4.










on Observer

Whitaker CJ, Chinn DJ, Lee WR. The statistical reliability of indices derived from the closing volume and flow volume traces. Bull Eur Physiopathol Respir 1978; 141237-247. MacDonald JB, Cole TJ. The flow-volume loop: reproducibility of air and helium-based test in normal subjects. Thorax 1980; 35 : 64-69. Miller MR, Pincock AC. Repeatability of the moments of the truncated forced expiratory spirogram. Thorax 1982; 37:205-211. Rozas CJ, Goldman AL. Daily spirometric variability: normal subjects and subjects with chronic bronchitis with and without airflow obstruction. Arch Intern Med 1982; 142: 1287-1291. Klein R, Klein BEK, Moss SE, DeMets D. Inter-observer variation in refraction and visual acuity measurement using a standardized protocol. Ophthalmology 1983; 90: 1357-1359. McCuaig KE, Vessal S, Coppin K, Wiggs BJR, Dahlby R, Pare PD. Variability in measurements of pressure-volume curves in normal subjects. Am Rev Respir Dis 1985; 131:656458. Usherwood TP, Barber JH. Discrepancy between standard and low range mini Wright peak flow meters. Br Med J 1986; 292: 523-524. Howard TP, Solomon DA. Reading the tuberculin skin test. Who, when, and how? Arch Intern Med 1988; 148: 2457-2459. Housh TJ, Johnson GO, Thorland WG, Cisar CJ, Hughes RA, Kenney KB et al. Validity and intertester error of anthropometric estimations of body density. J Sports Med Phys Fitness 1989; 29: 149-156. Mains BT, Toner JG. Pneumatic otoscopy: study of inter-observer variability. J Laryngol Otol 1989; 103:1134-1135.

13. Laboratory





PE, Bunn HF, Loscalzo A, Goldman L. The value of the peripheral blood smear Arch Intern Med 1983; in anemic inpatients. 143:1120@1125. Kirk CR, Burke H, Savage DCL. Accuracy of home blood glucose monitoring by children. Br Med J 1986; 293: 17. Murtomaa H, Meurman JH, Rantama A, Levo S. variability in common ratings in Interexaminer reading Streptococcus mutans dip-slides with or without a microscope. Stand J Dent Res 1987; 95: 144-150. Levine RJ, Mathew RM, Brown MH, Hurtt ME, Bentley KS, Mohr KL et al. Computer-assisted semen analysis: results vary across technicians who prepare videotapes. Fertil Steril 1989; 52: 673677. Sanchez-Carrillo Cl, Ramirez-Sanchez TJ, ZambranaCastaneda M, Selwyn BJ. Test of noninvasive instrument for measuring hemoglobin concentration. Int J Technol Assess Health Care 1989; 5 : 659-667.

14. Medical 1.



Record Analysis

Madow WG. Net differences in interview data on chronic conditions and information derived from medical records. Vital Health Stat [2] 1973; 57: 1-5. Cherkin DC, Phillips WR, Gillanders WR. Assessing the reliability of data from patient medical records. J Fam Prac 1984; 18:937-940. Dick FR, van Lier SF, McKeen K, Everett GD, Blair A. Nonconcurrence in abstracted diagnoses of nonHodgkin’s lymphoma. J Nat1 Cancer Inst 1987; 78 : 675-678.


Quaak MJ, Westerman RF, van Bemmel JH. Comparisons between written and computerised patient histories. Br Med J 1987; 295: 184-190. Beard CM, Bergstralh EJ, Klee GG. Interobserver variability in collecting data from medical records. Arch Psthol Lab Med 1988; 112: 594-596. Harlow SD, Linet MS. Agreement between questionnaire data and medical records. Am J Epidemiol 1989; 129~233-248.

15 General Diagnosis




I. Jen P, Woo B, Rosenthal








14. 15.

Gill PW, Leaper DJ, Guillou PJ, Staniland JR, Horrocks JC, de Dombal FT. Observer variation in clinical diagnosis-A computer-aided assessment of its magnitude and importance in 552 patients with abdominal pain. Meth Inform Med 1973; 12: 108-l 13. Matarazzo RG, Wiens AN, Matarazzo JD, Manaugh TS. Test-retest reliability of the WAIS in a normal population. J Clin Psycho1 1973; 29: 194197. Kahn HA, Leibowitz H, Ganley JP, Kini M, Colton T, Nickerson R et al. Standardizing diagnostic procedures. Am J Ophthalmol 1975; 79-768-775. Folev WJ. Schneider DP. A comuarison of the level of care predictions of six long-term care patient assessAm J Public Health 1980; ment systems. 70: 1152-1161. Spitzer WO, Dobson AJ, Hall J, Chesterman E, Levi J, Shepherd R et al. Measuring the quality of cancer patients. J Chron Dis 1981; 34:585-597. Gjorup T, Hamberg 0, Knudsen J, Rosenfalck AM, Bugge PM, Hendriksen C et al. Does the patient appear acutely or chronically ill? Acta Med Stand 1982; 212:325-328. Weinrott MR, Jones RR. Overt versus convert assessment of observer reliability. Child Dev 1984; 55:1125~1137. MacKenzie EJ, Shapiro S, Eastham JN. The abbreviated injury scale and injury severity score: levels of inter- and intrarater reliability. Med Care 1985; 23: 823-835. Morgan DL. Nurses’ perceptions of mental confusion in the elderly: influence of resident and setting characteristics. J Health Sot Behav 1985: 26: 102-I 12. Moulopoulos SD, Stamatelopouios S, Nanas S, Economides K. Medical education and experience affecting intra-observer variability. Med Educ 1986; 20: 133-135. Shinar D, Gross CR, Price TR, Banko M, Bolduc PL, Robinson RG. Screening for depression in stroke patients: the reliability and validity of the Center for Epidemiologic Studies Depression Scale. Stroke 1986; 17:241-245. Turk DC, Rudy TE. IASP taxonomy of chronic pain syndromes: preliminary assessment of reliability. Pain 1987; 30: 177-189. Asberg KH, Sonn U. The cumulative structure of personal and instrumental ADL. A study of elderly people in a health service district. Stand J Rehabil Med 1988; 21: 171-177. Uden A, Astrom M, Bergenudd H. Pain drawings in chronic back pain. Spine 1988; 13:389%392. van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJA, van Gijn J. Interobserver agreement for the assessment of handicap in stroke patients. Stroke 1988; 19:604-607.

16. Diagnoses for Specific -Organ Diseases Westlund KB, Kuriand LT. Studies on multiple sclerosis in Winnipeg, Manitoba, and New Orleans,



Louisiana. I. Prevalence. Comparison between the patient groups in Winnipeg and New Orleans. Am J Hyg 1953; 57: 380-396. 2. Poser CM. Clinical diagnostic criteria in epidemiological studies of multiple sclerosis. Ann NY Acad Sci 1965;

medical history and physical examination in patients with acute abdominal pain. Meth Inform Med 1983; 23.


3. The Cooperating Clinics Committee of the American Rheumatism Association. A seven-day variability study of 499 patients with peripheral rheumatoid arthritis. Arth Bheum 1965; 8: 302-335. 4. Assaad FA, Maxwell-Lyons F. Systematic observer variation in trachoma studies. Bull Org mond Sante/Bull Wld Hlth Org 1967; 36:885-900. 5. Kupka K, Nizetic B, Reinhards J. Sampling studies on the epidemiology and control of trachoma in southern Morocco. Bull Org mond Sante/Bull Wld HIth Org 1968; 39: 547-566. 6. de Dombal FT, Horrocks JC, Staniland JR, Guillou PJ. Production of artificial “case histories” by using a small computer. Br Med J 1971; 2:578-581. 7. Lefebvre P, Milet J, Luyckx A. Control of diabetes: an attempt to formulate policy guidelines in a department of medicine. Diabetologia 1974; 10: 201-204. 8. Calanchini PR, Swanson PD, Gotshall RA, Haerer AF, Poskanzer DC, Price TR et al. Cooperative study of hospital frequency and character of transient ischemic attacks. JAMA 1977; 238 :2029-2033. 9. Koerner F, Koerner U, Eichenseher N. Diabetic retinopathy study: data acquisition and its reliability. Albrecbt Von Graefes Arch Kiln Exp Opbtbahnol 1977; 202: 163-173. 10. Milton RC, Ganley JP, Lynk RH. Variability in grading diabetic retinopathy from stereo fundus photographs: Comparison of physician and lay readers. Br J Opbthalmol

1977; 61: 192-201.

11. Shaw L, Murray JJ. Diagnostic reproducibility of oeriodontal indices. J Periodont Res 1977: 12: 141-147. 12. ‘Margolis CZ, Porter B, Barnoon S, Pilpel b. Reliability of the middle ear examination. Israel J Med Sci 1979; 15:23-28. 13. Nelson MA, Allen P, Clamp SE, de Dombal FT. Reliability and reproducibility of clinical findings in low-back pain. Spine 1979; 4:97-101. 14. Chapel TA. Physician recognition of the signs and symptoms of secondary syphilis. JAMA 1981; 246 : 250-25 1. 15. Goldman L, Hashimoto B, Cook EF, Loscalzo A. Comparative reproducibility and validity of systems for assessing cardiovascular functional class: advantages of a new Specific Activity Scale. Circulation 1981: 64: 1227-1234. 16. Lindberg G. Effects of observer variation on performance in probabilistic diagnosis ofjaundice. Meth Inform Med 1981; 20: 163-168. 17. Boyd NF, Cummings BJ, Harwood AR, Rider WD, Thomas GM. Observer variation in the assessment of patients with rectal cancer. Dis Colon Rectum 1982; 25 : 664668. 18. Cicchetti DV, Sharma Y, Cotlier E. Assessment of observer variability in the classification of human cataracts. Yale J Biol Med 1982: 55: 81-88. 19. Lindsay KW, Teasdale G, Kniil-Jones RP, Murray L. Observer variability in grading patients with subarachnoid hemorrhage. J Neurosurg 1982; 56: 628-633. 20. Sussman EJ, Tsiaras WG, Soper KA. Diagnosis of diabetic eye disease. JAMA 1982; 247: 3231-3234. 21. Waddell G, Main CJ, Morris EW, Venner RM, Rae PS, Sharmy SH et al. Normality and reliability in the clinical assessment of backache. Br Med J 1982; 284: 1519-1523. 22. Bjerregaard B, Brynitz S, Holst-Christensen J, Jess P, Kalaja E, Lund-Kristensen J et al. The reliability of






22: 15-18. Curb JD, Babcock C, Pressel S, Tung B, Remington

RD, Hawkins CM. Nosological coding of cause of death. Am J Epidemiol 1983; 118: 122-128. Kirwan JR, Chaput DeSaintonge DM, Joyce CRB, Currey HLF. Clinical judgement in rheumatoid arthritis. II. Judging ‘current disease activity’ in clinical practice. Ann Rheum Dis 1983; 42:648651. Kraaijeveld CL, van Gijn J, Schouten HJA, Staal A. Interobserver agreement for the diagnosis of transient ischemic attacks. Stroke 1984; 15: 723-725. Forssell G, Jonasson R, Orinius E. Identifying severe aortic valvular stenosis by bedside examination. Acta Med Stand 1985; 218:397400. Klein BE, Magli YL, Richie KA, Moss SE, Meuer SM. Klein R. Quantitation of optic disc cupping. Ophthalmology 1985; 92: 16541656. Montgomery GK, Reynolds NC Jr, Warren RM. Qualitative assessment of Parkinson’s disease: study of reliability and data reduction with an abbrevi1985; ated Columbia Scale. Clin Neuropharmacol 8:83-92.


Gross CR, Shinar D, Mohr JP, Hier DB, Caplan LR, Price TR et al. Interobserver agreement in the diagnosis of stroke type. Arch Neural 1986; 43:893-898. 30. Klein R, Klein BEK, Magli YL, Brothers RJ, Meuer SM. Moss SE et al. An alternative method of grading diabetic retinopathy. Ophthalmology 1986; 93:1183-1187.

Koudstaal PJ, van Gijn J, Staal A, Duivenvoorden HJ, Gerritsma JGM, Kraaijeveld CL. Diagnosis of transient ischemic attacks: improvement of interobserver agreement by a check-list in ordinary language. Stroke 1986; 17:723-728. 32. Schmitt BP, Kushner MS, Wiener SL. The diagnostic usefulness of the history of the patient with dyspnea. J Gen Intern Med 1986; ,I : 386-393. 33. Sperduto RD, Hiller R, Podgor MJ, Palmberg P, Ferris FI, Wentworth D et al. Comparability of ophthalmic diagnoses by clinical and reading center examiners in the Visual Acuity Impairment Survey ,Pilot Study. Am J Epidemiol 1986; 124:994-1003. 34. Amato MP, Groppi C, Siracusa GF, Fratiglioni L. Inter- and intra-observer reliability in Kurtzke scoring systems in multiple sclerosis. Ital J Neural Sci 1987; 31.

Suppl. 6: 129-131.

35. de Dombal FT, Softley A. IOIBD report no. 1: Observer variation in calculating indices of severity and activity in Crohn’s disease. Gut 1987; 28:474481. 36. Heliovaara M, Impivaara 0, Sievers K, Melkas T, Knekt P, Korpi J et al. Lumbar disc syndrome in Finland. J Epidemiol Community Health 1987; 41:251-258.

Lewiston N, Moss R, Hindi R, Rubinstein S, Sullivan M. Interobserver variance in clinical scoring for cystic fibrosis. Chest 1987; 91: 878-882. 38. Alexopoulos GS, Abrams RC, Young RC, Shamoian CA. Cornell Scale for Depression in Dementia. Biol 37.


1988; 23:271-284.

39. Amato GP, Fratiglioni L, Groppi C, Siracusa G, Amaducci L. Interrater reliability in assessing functional systems and disability on the Kurtzke Scale in multiple sclerosis. Arch Neurol 1988; 45: 746-748. 40. Bodensteiner JB, Brownsworth RD, Knapik JR, Kanter MC, Cowan LD, Leviton A. Interobserver variability in the ILAE classification of seizures in childhood. Epilepsia 1988; 29: 123-128. 41. Clark WL, Haldeman S, Johnson P, Morris J, Schulenberger C, Trauner D et al. Back impairment and disability determination: another attempt at objective, reliable rating. Spine 1988; 13 : 332-341.

Publications on Observer Variability 42.


Gelmers HJ, Gorter K, de Weerdt CJ, Wiezer HJA. Assessment of interobserver variability in a Dutch multicenter study on acute ischemic stroke. Stroke 1988; 19:709-711. Mehra V, Minassian DC. A rapid method of grading cataract in epidemiological studies and eye surveys. Br J Ophthalmol




1988; 72:801-803.

Poole JL, Whitney SL. Motor Assessment Scale for stroke patients: concurrent validity and interrater reliability. Arch Phys Med Rehabil 1988; 69: 195-197. Sparrow JM, Ayliffe W, Bron AJ, Brown NP, Hill AR. Inter-observer and intra-observer variability of the Oxford clinical cataract classification and grading system. Int Ophthalmol 1988; 11:151-157. Tielsch JM, Katz J, Quigley HA, Miller NR, Sommer A. Intraobserver and interobserver agreement in measurement of optic disc characteristics. Ophthalmology 1988; 95: 350-356. Curley RK, Cook MG, Fallowfield

ME, Marsden RA. Accuracy in clinically evaluating pigmented lesions. Br Med J 1989; 299: 16-18. 48. Gauthier M, Guay J, Lacroix J, Lortie A. Reye’s syndrome: a reappraisal of diagnosis in 49 presumptive cases. Am J Dis Child 1989; 143: 1181-1185. 49. Hopkins A, Menken M, DeFriese GH, Feldman RG. Differences in strategies for the diagnosis and treatment of neurologic disease among British and American neurologists. Arch Neural 1989; 46: 1142-I 148. 50. Karkow WS, Cranley JJ. Variations in interpretation of arterial stenosis. J Cardiovase Surg (Torino) 1989; 30: 826-832. 51. Lienert RT. Inter-observer comparisons of ophthaimoscopic assessment of diabetic retinopathy. Aust N 2 J 47.


1989; 17 : 363-368.

Maser RE, Nielsen VK, Bass EB, Manjoo Q, Dorman JS, Kelsey SF et a/. Measuring diabetic neuropathy. Assessment and comparison of clinical examination and quantitative sensory testing. Diabetes Care 1989; 12:270&275. 53. Nansel DD, Peneff AL, Jansen RD, Cooperstein R. Interexaminer concordance in detecting joint-play asymmetries in the cervical spines of otherwise asymptomatic subjects. J Manipulative Physiol Ther 1989; 12:428433. 54. Sharma YR, Vajpayee RB, Bhatnagar R, Mohan M, Azad RV, Kumar M et al. A simple accurate method of cataract classification. Indian J Ophthahnol 1989; 37:112-117. 55. Solari A, Filippini G, Gagliardi L, Bevilacqua L, Amantini A, Giuliani G et al. Interobserver agreement in the diagnosis of multiple sclerosis. Arch Neurol 1989; 52.


1% Psychiatric


1. Goldberg DP. The reliability of a standardized psychiatric interview suitable for use in community surveys. In: Hare ED, Wing JK, Eds. Psychiatric Epidemiology. _. Oxford Universit; Press; 1970:.283-290. 2. Sartorius N. Brooke EM. Lin T. Reliabilitv. of .nsvchi. atric assessment in international research from the WHO international pilot study of schizophrenia. Ibid.

physicians in evaluating and prescribing for depression. Br J Psychiatry 1981; 138: 100-109. 6. Gjerris A, Beth P, Bojholm S, Bolwig TG, Kramp P, Clemmesen L ef al. The Hamilton Anxiety Scale. Evaluation of homogeneity and inter-observer reliability in patients with depressive disorders. J Affective Disord 1983; 5: 163-170. 7. Tyrer P, Cicchetti DV, Casey PR, Fitzpatrick K, Oliver R, Baiter A et al. Cross-national reliability study of a schedule for assessing personality disorders. J Nerv Ment Dis 1984; 172:718-721. 8. Helzer JE, Robins LN, McEvoy LT, Spitznagel EL, Stoltzman RK, Farmer A et al. A comparison of clinical and Diagnostic Interview Schedule diagnoses. Arch Gen Psychiatry



Langer EJ, Abelson RP. A patient by any other name.. : clinician group difference in labelling bias. J Consult Clin Psycho1 1974; 42:&9. Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J PsychIat Res 1975; 12: 189-198. Fisch HU, Hammond KR, Joyce CRB, O’Reilly M. An experimental study of the clinical judgment of general

1985; 42:657-666.


Smith MD, Hong BA, Robson AM. Diagnosis of depression in patients with end-stage renal disease. Am J Med 1985; 79: 160-166. IO. Fernando T, Mellsop G, Nelson K, Peace K, Wilson J. The reliability of Axis V of DSM-III. Am J Psychiatry 1986; 143:752-755. Il. Fuhrer R, Rouillon F, Lellouch J. Diagnostic reliability among French Psychiatrists using DSM-III criteria. Acta Psychiatr Stand 1986; 73: 12-16. 12. Larsen F, Vaglum S. Interrater reliability of the DSMIII diagnoses in two Norwegian studies on psychiatric and super obese patients. Acta Psychiatr Stand 1986; 73: 18-21.

13. Maier W, Philipp M, Schlegel S, Heuser I, Buller R, Frommberger U et al. Operational diagnoses for schizophrenic and schizoaffective disorders. I. Interrater reliability. Pharmacopsychiat 1986; 19: 178-179. 14. Malt UF. Teaching DSM-III to clinicians. Acta Psychiatr Stand 1986; 73:68-75.

15. Zimmerman M, Coryell W. Reliability of follow-up assessments of depressed inpatients. Arch Gen Psychiatry 1986; 43:468470. 16. Bornstein RA, Baker GB, Douglass AB. Short-term retest reliability of the Halstead-Reitan Battery in a normal sample. J Nerv Ment Dis 1987; 175:229%232. 17. Cameron DJ, Thomas RI, Mulvihill M, Bronheim H. Delirium: a test of the diagnostic and statistical manual III criteria on medical inpatients. J Am GerIatr Sot 1987; 35: 1007-1010. 18. Riskind JH, Beck AT, Berchick RJ, Brown G, Steer RA. Reliability of DSM-III diagnoses for major depression and generalized anxiety disorder using the structured clinical interview for DSM-III. Arch Gen Psychiatry 1987; 44 : 8 17-820. 19. Weissman MM, Wickramaratne P, Warner V, John K, Prusoff BA, Merikangas KR et al. Assessing psychiatric disorders in children: discrepancies between mothers’ and children’s reports. Arch Gen Psychiatry 1987; 20.

44: 747-753. Hendrie HC, Hall KS, Brittain HM, Austrom MG, Farlow M, Parker J et al. The CAMDEX: a standard-


ized instrument for the diagnosis of mental disorder in the elderly: B replication with a US sample. J Am Geriatr Sot 1988; 36:402&408. Romanoski AJ, Nestadt G, Chahal R, Merchant A, Folstein MF, Gruenberg EM et al. Interobserver reliability of a “Standardized Psychiatric Examination” (SPE) for case ascertainment (DSM-III). J Nerv Ment




Dis 1988; 176:63-71.

22. 23.

Ross CA, Leichner P. Residents performance on the mental status examination. Can J Psychiatry 1988; 33:108-111. Hjortso S, Butler B, Clemmesen L, Jepsen PW, Kastrup M, Vilmar T et al. The use of case vignettes in

studies of interrater reliability of psychiatric target syndromes and diagnoses. A comparison of ICD-8, ICD-IO and DSM-III. Acta Psychiatr !&and 1989: 80: 632438.


576 24.



Kitamura T, Shima S, Sakio E, Kato M. Psychiatric diagnosis in Japan. Reliability of conventional diagnosis and discrepancies with Research Diagnostic Criteria diaenosis. Psvchonatholoev 1989: 22: 250-259. O’Conner DW, PolGtt PA, Hyde JB,’ Fellows JL, Miller ND, Brook CPB et al. The reliability and validity of the Mini-Mental State in a British community survey. J Psych&r Res 1989; 23 : 87-96. Wilson HS. Field trials of the phenomena of concern for psychiatric/mental health nursing: proposed methodology. Arch Psychiatr Nurs 1989; 3 : 305-308.


6. 7.

20. Adverse Drug Reactions 1.

18. 1.








9. 10.





King DJ, Manegold RF. Consistency of cardiologists’ prognostic judgments in cases of myocardial infarction Am J Cardiol 1965; 15: 27-32. Rutkow IM, Gittelsohn AM, Zuidema GD. Surgical decision making: the reliability of clinical judgment. Ann Surg 1979; 190:409419. Pearlman RA, Inui TS, Carter WB. Variability in physician bioethical decision-making. Ann Intern Med 1982; 97:420425. Rutkow IM. Surgical decision making: the reproducibility of clinical judgment. Arch Surg 1982; 117: 337-340. Rutkow IM. The reliability and reproducibility of the surgical decision-making process. Surg Clin North Am 1982; 621721-735. Dunn JT. Choice of therapy in young adults with hyperthyroidism of Graves’ Disease. Ann Intern Med 1984; 100:891-893. Holzman GB, Ravitch MM, Metheny W, Rothert ML, Holmes M, Hoppe RB. Physicians’ judgments about estrogen replacement therapy for menopausal women. Obstet Gynecol 1984; 63 : 303-3 11. Kissick WL, Engstrom PF, Soper KA, Peterson OL. Comparison of internist and oncologist evaluations of cancer patients’ need for hospitalization. Med Care 1984; 22:447452. Jewel1 D, Bain J. Common childhood problems: variation in management, Br Med J 1985; 291:941-944. Lawrence VA, Clark GM. Cancer and resuscitation: does the diagnosis affect the decision? Arch Intern Med 1987; 147: 1637-1640. Lomas J, Anderson G, Enkin M, Vayda E, Roberts R, MacKinnon B. The role of evidence in the consensus process. JAMA 1988; 259:3001-3005. Uhlmann RF, Pearlman RA, Cain KC. Physicians’ and spouses’ predictions of elderly patients’ resuscitation preferences. J Gerontology 1988; 43 : M 115-M12 I.

Kirwan JR, Currey HLF, Brooks PM. Measuring physicians’ judgment: the use of clinical data by Australian rheumatologists. Aust NZ J Med 1985; 15 : 738-744. Asberg KH. Physicians’ outcome predictions for elderly patients. Stand J Sot Med 1986; 14: 127-132. Saxby PJ, Palmer JH. The use of an independent panel to assess the long-term results of cleft lip repair. Br J Plast Surg 1986; 39:373-378.




Blanc S, Leuenberger P, Berger JP, Brooke EM, Schelling JL. Judgments of trained observers on adverse drug reactions. Clin Pharmacol Ther 1979; 25 : 493498. Hutchinson TA, Flegel KM, HoPingKong H, Bloom WS, Kramer MS, Trummer EG. Reasons for disagreement in the standardized assessment of suspected adverse drug reactions. Clin Pharmacol Ther 1983; 34 1421426. Pere JC, Begaud 9, Haramburu F, Albin H. Computerized comparison of six adverse drug reaction assessment procedures. Clin Pharmacol Ther 1986; 40:451-461. Mitchell AS, Henry DA, Sanson-Fisher R, O’Connell DL. Patients as a direct source of information on adverse drug reactions, Br Med J 1988; 297:891-893.

21. Electrocardiography 1.

Khurmi NS, Raftery EB. Reproducibility and validity of ambulatory ST segment monitoring in patients with chronic stable angina pectoris. Am Heart J 1987; 113: 1091-1096.

22. Exercise Stress


No references.

23. Endoscopy 1. Cantor


D, Midgley R, Imperato T, Yassinger S. Observer variability in upper gastrointestinal endoscopy. Acta Gastroenterol Latinoam 1982; 12 : 225-230. Kling PA, Edin K, Domellof L. Observer variability in upper gastrointestinal fiber endoscopy. Stand J Castroenterol 1985; 20: 462465.

24. Other Paraclinical 19. Appraisal

of Outcome

1. Smythe HA, Helewa A, Goldsmith




CH. “Independent assessor” and “pooled index” as techniques for measuring treatment effects in rheumatoid arthritis. J Rheumatol 1977; 4: 144152. Maas AIR, Braakman R, Schouten HJA, Minderhoud JM, van Zomeren AH. Agreement between physicians on assessment of outcome following severe head injury. J Neurosurg 1983; 58:321-325. Kirwan JR, Chaput DeSaintonge DM, Joyce CRB, Currey HLF. Clinical judgment in rheumatoid arthritis. III. British rheumatologists’ judgments of ‘change in response to therapy’. Ann Rheum Dis 1984; 43 : 686694. Kirwan JR, Currey HLF. Clinical judgment in rheumatoid arthritis. IV. Rheumatologists’ assessments of disease remain stable over long periods. Ann Rheum Dis 1984; 43: 695-697



Edwards DAW, Hammond WH, Healy MJR, Tanner JM, Whitehouse RH. Design and accuracy of calipers for measuring subcutaneous tissue thickness. Br J Nutr 1955; 9: 133-143. Burkinshaw L, Jones PRM. Krupowicz DW. Observer error in skinfold thickness measurements. Human Biology 1973; 45 : 273-279. Uhlmann RF, Rees TS, Psaty BM, Duckert LG. Validity and reliability of auditory screening tests in demented and non-demented older adults. J Gen Intern Med 1980; 4:90-96. Kelman AW, Sumner DJ, Whiting 9. Systolic time interval vs. heart rate regression equations using atropine: reproducibility studies. Br J Clin Pharmacol 1981; 12: 15-20. Spiro SG, Bierman CW, Petheram IS. Reproducibility of flow rates measured with low density gas mixtures in exercise-induced bronchospasm. Thorax 198 1; 36: 852-857.

Publications 6.











Lotgering FK, Wallenburg HCS, Schouten HJA. Interobserver and intraobserver variation in the assessment of antepartum cardiotocograms. Am J Obstet Cynecol 1982; 144:701-705. Venables KM, Burge PS, Davison AG, Taylor AJN. Peak flow rate records in surveys: reproducibility of observers’ reports. Thorax 1984; 39: 828-832. Rossman RN, Cashman MZ. Inter-interpreter agreement for Auditory Brainstem Response (ABR) tracings. Stand Audio1 1985; 14:9-l 1. Williams GW, Ludrs HO, Bricknew A, Goormastic M. Klass DW. Interobserver variability in EEG interpretation. Neurology 1985; 35: 1714-1719. Johnston KW, Hosang MY, Andrews DF. Reproducibility of noninvasive vascular laboratory measurements of the peripheral circulation. J Vast Surg 1987; 6: 147 -151. Nielsen PV, Stigsby B, Nickelsen C, Nim J. Intra- and inter-observer variability in the assessment of intrapartum cardiotocograms. Acta Obstet Gynecol Stand 1987; 66:421-424. Simel DL, DeLong ER, Feussner JR, Weinberg JB, Crawford J. Erythrocyte anisocytosis: visual inspection of blood films vs. automated analysis of red blood cell distribution width. Arch Intern Med 1988: 148 : 822-824. Varma R, Steinmann WC, Spaeth GL, Wilson RP. Variability in digital analysis of optic topography. Graefes Arch Clin Exp Ophthalmol 1988; 226 : 435-442. Werner EB, Bishop KI, Koelle J. Douglas GR, LeBlanc RP, Mills RP et al. A comparison of experienced clinical observers and statistical tests in detection of progressive visual field loss in glaucoma using automated perimetry. Arch OphthaImal 1988; 106:619-623. West SK, Rosenthal R, Newland HS, Taylor HR. Use of photographic techniques to grade nuclear cataracts. Invest Ophthahnol Vis Sci 1988; 29: 73-77. Denham- D, Mandelbaum S, Pare1 JM, Holland S, Pflugfelder S, Pare1 JM. Shadow photogrammetric apparatus for the quantitative evaluation of cornea] buttons. Ophthalmic Surg 1989; 20: 794-799.

25. Conventional 1. Edwards


3. 4.





on Observer



Gjorup T, Nielsen H, Bording Jensen L, Morup Jensen A. Interobserver variation in the radiographic diagnosis of gastric ulcer. Acta Radio1 IDiagnj (Stockh) 1985; 26: 289-292. 10. Fries JF, Bloch DA, Sharp JT, McShane DJ, Spitz P, Bluhm GB el ctl. Assessment of radiologic progression in rheumatoid arthritis: a randomized, controlled trial. Arthritis Rheum 1986; 29: l-9. of discrepancies in 11. Snow DA. Clinical significance roentgenographic film interpretation in an acute walkin area. J Gen Intern Med 1986: I : 295-299. 12. Zoloth S, Michaels D, Lather M, Nagin D, Drucker E. Asbestos disease screening by non-specialists: results of an evaluation. Am J Public Health 1986; 76: 1392m1395. 13. Mann J, Pettigrew JC, Beideman RW, Green P, Ship I. Inter- and intra-examiner variability in interpretation Isr J Dent of early periodontal disease by radiographs. Sci 1988; 2: 104-108. 14. Vineis P, Sinistrero G, Temporelli A, Azzoni L, Bigo A, Burke P et al. Inter-observer variability in the of mammograms. Tumori 1988; interpretation 74: 275-279. 15. Kalla AA, Meyers OL, Parkyn ND, Kotze TJW. Osteoporosis screening-radiogrammetry revisited. Br J Rheumatol 1989; 28:5ll-517. 16. Kallman DA, Wigley FM, Scott WW Jr, Hochberg MC, Tobin JD. New radiographic grading scales for osteoarthritis of the hand: reliability for determining Arthritis Rheum 1989; prevalence and progression. 32: 15841591. 17. O’Leary MR, Smith MS, O’Leary DS, Olmsted WW, Curtis DJ, Groleau G et al. Application of clinical indicators in the emergency department. JAMA 1989; 262: 344h-3447. 18. Parker DL, Bender AP, Hankinson S, Aeppli D. Public health implications of the variability in the interpretation of ‘B’ readings for pleural changes. J Occup Med 1989; 31: 775.-780. SR, Phillips C, Tyndall DA. 19. Smith SR, Matteson Quantitative and subjective analysis of temporomandibular joint radiographs. J Prosthet Dent 1989; 621456463. 9.


WM, Cox RS, Garland LH. The solitary nodule (coin lesion) of the lung: an analysis of 52 consecutive cases treated by thoracotomy and a study of preoperative diagnostic accuracy. Am J Roentgen 1962; 88: 1020-1042. Chamberlain J. Rogers P, Price JL, Ginks S, Nathan BE, Burn I. Validity of clinical examination and mammography as screening tests for breast cancer. Lancet 1975; 2: 1026-1030. Laurin S, Mortensson W. Variations in vesicoureteral reflux. Acta Radio1 [Diagn] (Stockh) 1980; 21: 269-273. Bovd NF. Wolfson C. Moskowitz M. Carlile T. Petitclerc M, Ferri HA et al. Observer variation in the interpretation of xeromammograms. J Nat1 Cancer Inst 1982; 68: 357.-363. The Pituitary Adenoma Study Group. Variation in assessing sella turcica tomograms for pituitary microadenomas. Obstet Gynecol 1982; 60:700-704. Carlile T, Thompson DJ, Kopecky KJ, Gilbert FI, Krook PM, Present AJ et al. Reproducibility and consistency in classification of breast parenchymal patterns. Am J Roentgen 1983; 140: l-7. Reit C, Hollender L. Radiographic evaluation of endodontic therapy and the influence of observer variation. Stand J Dent Res 1983; 91:205-212. Deyo RA, McNiesh LM, Cone RO III. Observer variability in the interpretation of lumbar spine radiographs. Arthritis Rheum 1985; 28: 10661070.

26. Angiography 1. Galbraith 2.






JE, Murphy ML, deSayza N. Coronary angiogram interpretation: interobserver variability. JAMA 1978; 240:2053-2056. Kennett JD, Rust PF, Martin RH, Parker BM, Watson LE. Observer variation in the angiocardiographic diagnosis of mitral valve prolapse. Chest 1981; 79: 146-150. Cameron A, Kemp HG, Fisher LD, Gosselin A, Judkins MP, Kennedy JW et al. Left main coronary artery stenosis: angiographic determination. Circula- tion i983; 68 : 484489. Chikos PM. Fisher LD. Hirsch JH. Harlev JD. Thiele BL, Strandness DE Jr. bbserver vaiiabilit; in evaluating extracranial carotid artery stenosis. Stroke 1983; 14: 885-892. Meier B, Gruentzig AR, Goebel N, Pyle R, Von Gosslar W, Schlumpf M. Assessment of stenoses in coronary angioplasty. Inter- and intraobserver variability. Int J Cardiol 1983; 3: 159-169. Sheehan FH, Stewart DK, Dodge HT, Mitten S, Bolson EL, Brown BG. Variability in the measurement of regional left ventricular wall motion from contrast angiograms. Circulation 1983; 68 : 550-559. Trask N, Califf RM, Conley MJ, Kong Y, Peter R, Lee KL et al. Accuracy and interobserver variability of coronary cineangiography: a comparison with postmortem evaluation. J Am Coll Cardiol 1984; 3:1145~1154.

578 8.


10. Il.



JOANNG. ELMOREand ALVANR. FEINSTEIN Jeremy R, Tokuyasu Y, Choong CYP, Bautovich G, Hutton BF, Shen WF et al. The reproducibility of nongeometric analysis of cardiac output and left ventricular volume by radionuclide angiography. Am Heart J 1985; 110: 1020-1026. Eskesen V, Karle A, Kruse A, Kruse-Larsen C, Praestholm J, Schmidt K. Observer variability in assessment of angiographic vasospasm after aneurysma1 subarachnoid haemorrhage. Acta Neurochir (Wien) 1987; 87: 54-57. Fisher M, Ahmadi J, Zee CS, Weiner JM. Variability in arteriographic assessment of the carotid bifurcation. Angiology 1987; 38: 116120. Kev H. Jackson PC. Thomas EA. Jeans WD. Davies Ed. Tde accuracy o\ digital subtiaction angiography for the quantification of athersclerosis. Br J Radio1 1987: 60: 1083-1088. Mall FL, Baker JD, Gomes AS. Observer variability with conventional and digital subtraction carotid angiograms. Eur J Vast Surg 1987; 1: 297-303. Sanz ML, Mancini J, LeFree MT, Mickelson JK, Starling MR, Vogel RA et al. Variability of quantitative digital subtraction coronary angiography before and after percutaneous transluminal coronary angioplasty. Am J Cardiol 1987; 60:554.

27. New Imaging Techniques 1. Nishiyama H, Lewis JT, Ashare AB, Saenger EL. Interpretation of radionuclide liver images: do training and experience make a difference? J Nucl Med 1974; 16: 11-16. 2. Turner DA. Observer variability: what to do until perfect diagnostic tests are invented. J Nucl Med 1978; 19:435437. 3. Dymond DS, Elliott A, Stone D, Hendrix G, Spurrell R. Factors that affect the reproducibility of measurements of left ventricular function from first-pass radionuclide ventriculograms. Circulation 1982; 65: 31 l-322.





8. 9.



Voyles WF, Greene ER, Miranda IP, Reilly PA, Caprihan A. Observer variability in serial noninvasive measurements of stroke index using pulsed Doppler flowmetry. Biomed Sci lnstrum 1982; 18:67-75. Heidendal GA, Bezemer PD, Koopman PA, den Hollander W, Teule GJ, van der Wall EE et al. Reproducibility of ejection fraction measurements by gated equilibrium blood pool scintigraphy. Eur J Nucl Med 1983; 8:467-170. Herman S, DeBoer G, Rideout DF, Majesky IF. Observer variation in abdominal CT. Invest Radio1 1984; 19 : 597-598. Lally M, Johnston KW, Cobbold RS. Limitations in the accuracy of peak frequency measurements in the diagnosis of carotid disease. JCU 1984; 12:403409. Allen-Mersh TG, Motson RW, Hately W. Does it matter who does ultrasound examination of the gall bladder? Br Med J 1985; 291: 389-390. Fischer M, Alexander K. Reproducibility of carotid artery Doppler frequency measurements. Stroke 1985; 16:973-976. Fisher ML, Kelemen MH, Collins D, Holder L, Winzelberg G, Plotnick GD et al. Technetium-99m-pyrophosphate scintigraphy in patients with suspected acute myocardial infarction: impact of interobserver variability. Am Heart J 1985; 110:347-352. Halperin ME, Fong KW, Zalev AH, Goldsmith CH. Reliability of amniotic fluid volume estimation from ultrasonograms: intraobserver and interobserver variation before and after the establishment of criteria. Am J Obstet Gynecol 1985; 153:264267.

12. Jackson PC, Jones M, Brimble CE, Hart J. The reduction of inter- and intra-observer variability for defining regions of interest in nuclear medicine. Eur J Nucl Med 1985; 11: 186-189. 13. Wann LS, Gross CM, Wakefield RJ, Kalbfleisch JH. Diagnostic precision of echocardiography in mitral valve orolaDse. Am Heart J 1985: 109:803-808. 14. Alperi MA: Carney RJ, Munuswamy K, Ruder MA, Kapoor AS, Webel RR et al. Observer variation in the echocardiographic diagnosis of mitral valve prolapse. Am Heart J 1986; 111: 1123-l 129. 15. Christiansen EL, Thompson JR, Kopp S. lntra- and inter-observer variability and accuracy in the determination of linear and angular measurements in comActa Odontol Stand 1986; puted tomography. 44:221-229. 16. Gjorup T, Brahm M, Fogh J, Munck 0, Morup Jensen A. Interobserver variation in the detection of metastases on liver scans. Gastroenterology 1986; 90: 166-172. 17. Carlsen 0. Evaluation of end-diastolic left ventricular volume in equilibrium gated radionuclide cardiography. lnt J Biomed Comput 1987; 21: 5565. 18. Horwitz RI, Bordley DR. Understanding and limiting observer variability: lessons from liver scans. Hepatology 1987; 7: 194-196. 19. Upadhyay SS, O’Neil T, Burwell RG, Moulton A. A new method using medical ultrasound for measuring femoral anteversion (torsion): technique and reliability. An intra-observer and inter-observer study on dried bones from human adults. J Anat 1987; 155: 119-132.










Wallerson DC, Devereux RB. Reproducibility of echocardiographic left ventricular measurements. Hypertension i98j; 9 (Suppl. 11): 11-6-11-18. Geiser EA. Oliver LH. Gardin JM. Kerber RE. Parisi AF, Reichkk N et al.‘Clinical vaidation of an edge detection algorithm for two-dimensional echocardiographic short-axis images. J Am Sot Echocardiogr 1988; 1:410421. lscoe NA, McKee JD, Arenson AM, Szalai JP. Observer error in the characterization of hepatic lesions in patients with neoplastic liver disease. JCU 1988; 16: 577-579. Shapiro E, Slovis TL, Perlmutter AD, Kuhns LR. Optimal use of 99m-technetium-glucoheptonate scintigraphy in the detection of pyelonephritic scarring in children: a preliminary report. J Urol 1988; 140: 1175-l 177. Smith MD, Grayburn PA, Spain MG, DeMaria AN. Observer variability in the quantitation of Doppler color flow jet areas for mitral and aortic regurgitation. J Am Coll Cardiol 1988; 11: 579-584. Moorthy MB, Rees DG, Hope PL. Reproducibility of measurement of pulsatility index by Doppler ultrasound of the anterior cerebral artery of preterm infants. J R Army Med Corps 1989; 135: 131-134. Sarmandal P, Bailey SM, Grant JM. A comparison of three methods of assessing inter-observer variation applied to ultrasonic fetal measurement in the third trimester. Br J Obstet Gynaecol 1989; 96: 1261-1265. Wong DH, Onishi R, Tremper KK, Reeves C, Zaccari J, Wong AB et al. Thoracic bioimpedance and Doppler cardiac output measurement: learning curve and interobserver reproducibility. Crit Care Med 1989; 17: 1194-l 198.


1. Ederer F, Goldblatt SA, Nadel EM. Analysis of the color micrograph study of the Circulating Cancer Cell Cooperative (CCCC). Acta Cytol 1965; 9: 5&57.

Publications 2.

3. 4.


on Observer

Gilchrist KW, Kalish L, Gould VE, Hirsch1 S, for Imbriglia JE, L&y WM ef al. Immunostaining carcinoembryonic antigen does not discriminate for early recurrence in breast cancer. Cancer 1985; 56:351-355. Curling M, Broome G, Hendry WF. How accurate is urine cytology? J R Sot Med 1986; 79:336338. Chan KW, Chiu KY, Fu KH, Ling JM. Observer variability in microcomputer-assisted morphometric study of nuclear parameters. Pathology 1987; 19:407409. Peters BR, Schnadig VJ, Quinn FB Jr. Interobserver variability in the interpretation of fine-needle aspiration biopsy of head and neck masses. Arch Otolaryngol Head Neck Surg 1989; 115: 1438-1442.






29. Histopathology 1.





6. I.







Gilchrist KW, Gould VE, Hirsch1 S, Imbriglia JE, Patchefsky AS, Penner DW et al. Interobserver variation in the identification of breast carcinoma in intramammary lymphatics. Hum Path01 1982; 13: 170-172. Green FHY, Attfield M. Pathology standards for asbestosis. Stand J Work Environ Health 1983; 9: 162-168. Holman CDJ, Matz LR, Finlay-Jones LR, Waters ED, variation Blackwell JB, Joyce PR et al. Inter-observer in the histopathological reporting of Hodgkin’s disease: an analysis of diagnostic subcomponents using kappa statistics. Histopathology 1983; 7: 399407. Whitcomb CC, Crissman JD, Flint A, Cousar JB, Collins RD, Byrne GE Jr. Reproducibility in morphologic classification of non-Hodgkin’s lymphomas using the Lukes-Collins System. Am J Clin Pathol 1984; 82: 383-388. Winkler B, Alvarez S, Richart RM, Crum CP. Pitfalls in the diagnosis of endometrial neoplasia. Ohstet Gynecol 1984; 64: 185-193. Beck JS. Observer variability in reporting of breast lesions. J Clin Pathol 1985; 38: 1358-1365. Gilchrist KW, Kalish L, Gould VE, Hirsch1 S, Imbriglia JE, Levy WM et al. Interobserver reproducibility of histopathological features in stage II breast cancer. Breast Cancer Res Treat 1985; 5:3-10. NC1 Non-Hodgkin’s Classification Project Writing Committee. Classification of non-Hodgkin’s lymphomas; reproducibility of major classification systems. Cancer i985; 55:91:95. Coindre JM. Troiani M. Contesso G. David M. Rouesse J, Bii NB ef al. Reproducibilit; of a histo: pathologic grading system for adult soft tissue sarcoma. Cancer 1986; 58 : 306-309. Maharaj B, Leary WP, Naran AD, Maharaj RJ, Cooppan RM, Pirie D et al. Sampling variability and its influence on the diagnostic yield of percutaneous needle biopsy of the liver. Lancet 1986; 1: 523-525. Cramer SF, Roth LM, Ulbright TM, Mazur MT, Nunez CA, Gersell DJ et al. Evaluation of the reproducibility of the World Health Organization classification of common ovarian cancers: with emphasis on methodology. Arch Pathol Lab Med 1987; 111:819-829. Dick F, van Lier S, Banks P, Frizzera G, Witrak G, Gibson R er a/. Use of the working formulation for non-Hodgkin’s lymphoma in epidemiologic studies: agreement between reported diagnoses and a panel of experienced pathologists. J Natl Cancer Inst 1987; 78: 1137-I 144. Rogers C, Klatt EC, Chandrasoma P. Accuracy of frozen-section diagnosis in a teaching hospital krch Pathol Lab Med 1987; 11 I :514-517.











Shanes JG, Ghali J, Billingham ME, Ferrans VJ, Fenoglio JJ, Edwards WD et al. Interobserver variability in the pathologic interpretation of endomyocardial biopsy results. Circulation 1987; 75: 401405. Wilson JF, Kjeldsberg CR, Sposto R, Jenkin RDT, Chilcote RR, Coccia P et al. The pathology of nonHodgkin’s lymphoma of childhood. Ham Pathol 1987; 18: 1008-1014. Sawady J, Berner JJ, Siegler EE. Accuracy of and reasons for frozen sections: a correlative, retrospective study. Hum Pathol 1988; 19: 1019-1023. Argyle JC, Benjamin DR, Lampkin B, Hammond D. Acute nonlymphocytic leukemias of childhood: interobserver variability and problems in the use of the FAB classification. Cancer 1989; 63: 295-301. Childhood Brain Tumor Consortium. Intraobserver reproducibility in assigning brain tumors to classes in the World Health Organization diagnostic scheme. J Neurooncol 1989; 7:jl l-224. _ Hauck AJ. Kearriev DL. Edwards WD. Evaluation of postmortem endo&yoc&dial biopsy specimens from 38 patients with lymphocytic myocarditis: implications for role of sampling error. Mayo Clin Proc 1989; 64: 1235-1245. Ismail SM, Colclough AB, Dinnen JS, Eakins D, Evans DMD, Gradwell E et al. Observer variation in histopathological diagnosis and grading of cervical intraepithelial neoplasia. Br Med J 1989; 298 : 707-7 10. Lyon JL, Robison LM, Moser R Jr. Uncertainty in the diagnosis of histologically confirmed pancreatic cancer cases. Int J Eoidemiol 1989: 18:305-308. Montironi R: Scarpelli M, Sisti S, Mariuzzi GM, Collan Y, Pesonen E. Sources and nature of variation in DNA analysis of follicular thyroid adenoma. Pathol Res Pratt 1989; 185:579-583. Pedersen L, Holck S, Schidt T, Zedeler K, Mouridsen HT. Inter- and intraobserver variability in the histopathological diagnosis of medullary carcinoma of the Breast Cancer breast, and its prognostic implications. Res Treat 1989; 14:91-99. Peters BR, Schnadig VJ, Quinn FB Jr, Hokanson JA, Zaharopoulos P, McCracken MM et al. Interobserver variability in the interpretation of fine-needle aspiration biopsy of head and neck masses. Arch Otolaryngol Head Neck Surg 1989; 115: 1438-1442. Robertson AJ, Anderson JM, Swanson Beck J, Burnett RA, Howatson SR, Lee FD et al. Observer variability in histopathological reporting of cervical biopsy specimens. J Clin Pathol 1989; 42: 23 l-238. Schultz HB, Ersboll J, Nissen NI, Hou-Jensen K. A simplified working formulation of non-Hodgkin’s lymphomas based on quantifiable histologic criteria. Cancer 1989; 64: 2532-2540.

30. Miscellaneous 1.



4. 5.

Richardson WP, Higgins AC, Ames RG. Interviewer response bias in a s&ey of handicapping conditions amone children. Am J Public Health 1964: 54:1092-1099. Bailar BA. The effects of rotation group bias on estimates from panel surveys. J Am Stat Assoc 1975; 70 : 23-30. Kazdin AE. Artifact, bias, and complexity of assessment: the ABC’s of reliability. J Appl Behav Anal 1977; 10: 141-150. Kalton G, Stowell R. A study of coder variability. Appl Statist 1979; 28 : 276-289. Hall PA, Lemoine NR. Comparison of manual data coding errors in two hospitals. J Clin Pathol 1986; 39: 622-626.

580 6.

7. 8.

JOANNG. ELMOREand ALVANR. FEINSTEIN Richards P, McManus IC, Maitlis SA. Reliability of interviewing in medical student selection. Br Med J 1988; 296: i520-1521. Uebersax JS. Validity inferences from interobserver agreement. Psycho1 BulI 1988; 104:405416. Quinn MF. Relation of observer agreement to accuracy according to a two-receiver signal detection model of diagnosis. Med Decis Making 1989; 9: 196206.

Acknowledgements-We thank the following persons for calling some of these references to our attention: F. Ederer, K. W. Gilchrist, G. E. King, M. Lichtenstein, J. Nishikawa, and H. P. Vogel. We also thank the Council for Tobacco Research for a special-project grant that helped, in part, to support this work. We are indebted to N. Britton,

B. Pesapane and C. Van Vranken for assistance in the preparation of the manuscript.

REFERENCES 1, Feinstein AR. A bibliography of publications on observer variability. J Chron Dis 1985; 38:619-632. 2. Koran LM. The reliability of clinical methods, data, and judgments. N Engl J Med 1975; 293:642646, 695-701. 3. Fletcher CM, Oldham PD. Bibliography on observer error and variation. In: Witts LJ, Ed. Medical Surveys and Clinical Trials, 2nd edn. New York: Oxford University Press; 1964.

A bibliography of publications on observer variability (final installment).

J ClinEpidemiol Vol.45,No. 6,pp. 567-580,1992 Printed in Great Britain. All rights 0895-4356/92 $5.00+0.00 CopyrIght reserved (, 1992 Pergamo...
2MB Sizes 0 Downloads 0 Views