Interobserver Agreement in the Examination of Acute Ankle Injury Patients IAN G. STIELL, MD, R. DOUGLAS McKNIGHT, MD, GARY H. GREENBERG, MD,* RAMA C. NAIR, MSTAT, PtiD,1_ IAN MCDOWELL, PHD,t GORDON J. WALLACE, MD The authors’ objective was to describe a method for measuring interobserver agreement and to determine the reliability of physical findings used by emergency physicians to assess ankle injury patients. A 3-month prospective survey was designed for use in the emergency departments of two university hospitals. Participants were a convenience sample of 100 adult blunt ankle injury patients. Pairs of emergency staff physicians assessed 22 standardized physical findings in each patient without knowledge of the other assessment. Agreement for each variable was measured by the kappa coefficient, the ratio of actual agreement to potential agreement beyond chance. The variables with the highest interobserver agreement and their kappa values were ability to bear weight (.83); bone tenderness at the base of the fifth metatarsal (.78), at the posterior edge of lateral malleolus (.75), and at the tip of the medial malleolus (.66); and combinations of bone tenderness (.76). Less reliable variables included soft tissue tenderness (.41) or degree of swelling (.18) of the anterior talofibular ligament, ecchymosis (.39), range of motion (.33), bone tenderness at the proximal fibula (-.Ol), and the anterior drawer sign (- .03). High kappa values indicate that several physical findings, including ability to bear weight and selected sites of bone tenderness, may be reliably assessed in ankle injury patients. This knowledge may give physicians more confidence in their physical examination and allow development of reliable clinical guidelines to diminish the reliance on radiography in ankle injuries. (Am J Emerg Med 1992;10:14-17. Copyright 0 1992 by W.B. Saunders Company.

Acute ankle trauma is one of the most common forms of injury assessed in emergency departments. Physicians traditionally rely heavily on the use of radiography to exclude the presence of fracture, even though the yield of ankle x-rays for fractures rarely exceeds 15%.‘-’ Reasons for this heavy dependence on radiography include patient demands and fear of lawsuit.6.7 Many physicians lack confidence in their clinical assessment of ankle injury patients and feel insecure about managing such cases without resorting to radiography. Several studies have attempted to determine which physical findings may be used as predictors of ankle fracture. but results have been contradictory and no guidelines have been widely adopted.2-5,8-‘2 No ne of these studies has effectively From the Departments of Emergency Medicine, Ottawa Civic Hospital and *Ottawa General Hospital, and the tDepartment of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada. Manuscript received May 6, 1991; revision accepted July 9, 1991. This study was supported by a grant from the Ministry of Health of Ontario, Ontario, Canada. Address reprint requests to Dr. Stiell, Department of Emergency Medicine, Ottawa Civic Hospital, 1053 Carling Ave, Ottawa, Ontario, Canada KlY 4E9. Key Words: Observer variation, reliability, ankle injury, diagnosis. Copyright 0 1992 by W.B. Saunders Company 07356757/92/l OOl-0004$5.00/O 14

addressed the issue of the reliability of physical findings. Reliability refers to the reproducibility or lack of variation of findings by one physician at different times (intraobserver agreement) or by two or more physicians (interobserver agreement).‘3-‘h Findings that may be highly correlated with fractures are not useful clinically if they cannot be consistently assessed by physicians. ” Clinical guidelines or decision rules for the use of radiography cannot be successfully developed without knowledge of the reliability of the physical examination. This report will describe the procedures for measuring interobserver agreement of physical findings, and will determine the reliability of findings used by physicians to assess ankle injury patients. This information may help the development of clinical guidelines for the use of radiography. METHODS This study was performed in the emergency departments of two adult university hospitals. the Ottawa Civic Hospital and the Ottawa General Hospital (Ottawa, Ontario, Canada). Patients were eligible if they had suffered acute, blunt trauma to the ankle from any mechanism. Not included in the study were patients who were under age 18. were pregnant, had isolated superficial skin injuries, had been injured more than 10 days previously, or had returned for reassessment of the same injury. Patients were entered into the study, on a convenience basis, whenever two of 21 designated assessor physicians were available. These physicians were full-time, certified emergency staff physicians who examined each patient for 22 physical findings. The physicians were instructed on a consistent examination technique through individual sessions and group lectures. which averaged 1 hour per physician. The physical findings were recorded by each physician on a standardized data collection sheet that included diagrams for IO points of bone tenderness and four areas of soft tissue tenderness (Figure I). Other findings included ecchymosis, range of motion, degree of swelling in four locations. anterior drawer sign, and ability to bear weight for at least four steps in the emergency department. Swelling and restriction of range of motion were assessed using an ordinal scale of “none. minimal, moderate, marked.” Each of the two assessor physicians examined the patients and recorded his findings without knowledge of the other physician’s examination and before the results of radiography were known. Physicians were paired, for each assessment. according to their availability. Data analysis used SPSS-X statistical software (SPSS, Inc. Chicago, IL) on the University of Ottawa mainframe computing facilities. The kappa coefficient, the measure of

STIELL

LATERAL

ET AL

n

AGREEMENT

IN ANKLE

EXAMINATION

15

Therefore, the overall agreement expected on the basis of chance is calculated to be 74% (71.4 + 2.4/100). This means that the actual agreement beyond chance is 89 - 74, or 15%. The kappa value (.58) represents the ratio of actual agreement beyond chance (15%) to the potential maximum agreement beyond chance (26%). A kappa 1.0 represents perfect agreement, and values greater than 0 represent observed agreement greater than that due to chance. Guidelines for interpreting the strength of agreement from kappa values have been suggested: less than .40 poor to fair, .41 to .60 moderate, .61 to .80 substantial, and greater than .80 almost perfect.‘*

VIEW

Proximal Flbula

RESULTS

MEDIAL

VIEW

I

I

One hundred eligible patients were entered into the study, and each was assessed by two physicians. The demographic data, type of injury, and incidence of fracture are given in Table 1 and suggest that a broad assortment of ankle injury patients was represented in the study. General physical findings, with the exception of ability to bear weight, did not show good agreement (Figure 3). The reliability of judging restriction of range of motion was improved by dichotomizing the four-point scale but offered, at best, only fair agreement. Neither ecchymosis nor soft tissue tenderness at any of four areas were found to have substantial agreement, and the anterior drawer sign fared poorly. In contrast, ability to bear weight in the emergency department proved to be the most reliable variable, with a kappa value of .83. reflecting nearly perfect agreement. Judgment of the degree of swelling at four locations, on an ordinal scale, generally demonstrated only slight to moder-

FIGURE 1. Data collection sheet used to assess sites of bone tenderness (BO-B9) and soft tissue tenderness (Sl-S4) in ankle injury patients.

agreement beyond chance, was calculated for pairs of examiners for each variable.‘8-20 Analysis of swelling and range of motion was done twice: first as ordinal data and second as data that had been dichotomized as “none-minimal” or “moderate-marked.” The unweighted kappa calculations, with 95% confidence intervals, are presented for simplicity. Weighted kappa values may be used for ordinal data, but the weighted and unweighted values are approximately the same when the number of cases discrepant by more than one category is less than 5% of the total. An alternate, more complex method for calculating kappa when there are multiple raters has been described by Fleiss, but was not used in this study. I9 Figure 2 illustrates the calculation of kappa applied to the interobserver agreement on the presence or absence of moderate to marked swelling over the medial malleolus.‘” Whereas the observed agreement between physicians appears to be 89%, this is inflated by the amount of agreement that could occur by chance, given the two physicians’ overall observations. In the example, the physicians might be expected to agree, based purely on chance, that 86% of 83 (71.4) patients would have no or minimal swelling and 14% of 17 (2.4) patients would have moderate or marked swelling.

POTENTlAL

AGREEMENT 0%

y-

BEYOND

CHANCE

= 100%

74% = WA 100%

FIGURE 2. Sample calculation of kappa value for interobserver agreement of moderate-marked swelling over medial malleolus in 100 ankle injury patients (after Sackett et al’?.

AMERICAN

16

TABLE 1.

Characteristics

of Patients

JOURNAL

OF EMERGENCY

MEDICINE

n Volume

10, Number

1 H January

1992

in the Study

Characteristic

n = 100

Mean age in years (*SD) range Male Mechanism of injury Twisting Direct blow Fall from a height Vehicle accident Other Fractures Malleolar region Midfoot

37.3 2 15.1 18-83 51 91 4 2 2 1 18 17 1

(Figure 4). Dichotomizing the scale improved the results for agreement concerning swelling over the lateral and medial malleolus. Other than ability to bear weight, the best kappa values were found for localized bone tenderness (Figure 5). Substantial interobserver agreement was obtained for tenderness at the base of the fifth metatarsal, the anterior and posterior edges of the lateral malleolus, and the inferior tip of the medial malleolus. Furthermore, the highest values were seen when bone areas were combined, as for the posterior edge and inferior tip of each of the malleoli. ate agreement

DISCUSSION This data suggest that there is a wide variation in interobserver agreement for the various physical findings of ankle injury patients. Findings for which there is high agreement should be favored in the routine physical assessment and in developing clinical guidelines for the use of radiography. Findings that cannot be reliably assessed are unlikely to be useful predictors of the severity of injury. The best agreement was found for judging ability to bear weight in the emergency department. Patients were classified as able to bear weight if they could take at least four steps, unassisted and with minimal persuasion from the physician.

FIGURE 3. Interobserver agreement (kappa values) of general findings in 100 ankle injury patients.

FIGURE 4. Interobserver 100 ankle injury patients.

agreement (kappa values) of swelling in

Agreement for most areas of bone tenderness was good and suggests that physicians can rely on these findings in their assessments. The major exception was the proximal fibula, which may reflect the very few positives found in the study. Combinations of areas of bone tenderness tended to have higher kappa values than some of the individual points; improved reliability can be achieved by grouping points of tenderness. Our results indicate that the following findings are not dependable: ecchymosis, range of motion, soft tissue tenderness, and the anterior drawer sign. Perhaps this poor agreement reflects a lack of attention by physicians towards variables not deemed clinically useful. The drawer sign would appear to be extremely unreliable, but this may, in part, be an artefact of the very few positives seen. In the study patients, the drawer sign was only one judged to be positive on six occasions, and never by the same two physicians. Kappa values are known to be dependent on the prevalance of the characteristics being assessed, and this can complicate their interpretation.” As the prevalence of a characteristic approaches 0 or 1, the value of kappa approaches 0. Further assessment of the drawer sign is required before concluding that it is completely unreliable. Reasonable interobserver agreement on swelling could only be achieved when the ordinal scale was dichotomized,

FIGURE 5. Interobserver agreement (kappa values) of bone tenderness in 100 ankle injury patients.

STIELL

ET AL n AGREEMENT

IN ANKLE

EXAMINATION

and then only for areas over the lateral and medial malleoli. This suggests that physician agreement is not good for physical findings rated by an ordinal scale and that a simple binary scale (eg, “present” or “absent”) may be more clinically useful. This study could have been statistically strengthened and then have provided narrower confidence intervals by having more patients assessed by fewer physicians, but the use of 21 experienced clinicians may lend more generalizability to our findings. Obtaining double assessments by the same two clinicians on more than 100 acutely injured patients in a reasonable period of time would be logistically formidable. Determining the interobserver agreement of physical findings is only one facet of establishing their clinical usefulness. Whether the presence of certain findings is strongly associated with fractures is another component of the process of developing a clinical decision rule for the use of radiography in ankle injury patients. In other words, physical findings can only be helpful to physicians in determining the need for x-rays if they can both accurately predict fractures and be reliably assessed. CONCLUSION This study demonstrated the process for determining the interobserver agreement of physical findings in ankle injury patients. Found to be most reliable, as measured by the kappa coefficient, were ability to bear weight; swelling of the lateral malleolus; and localized bone tenderness of the base of the fifth metatarsal, the anterior and posterior edges of the lateral malleolus, and the inferior tip of the medial malleolus. This should reassure clinicians that these findings are dependable in the assessment of ankle injury patients and help the development of a clinical decision rule for radiography. This, in turn, may allow physicians to rely more on the physical examination and less on radiography in acute ankle injuries. The authors thank the following emergency physicians for their patience and cooperation in conducting the study: J. Ahuja, R. Aubin, W. Beilby, A. Cwinn, G. Dickinson, M. Dolan, A. Henry, C. Johns, P. Johns, A. Malawski, J. Maloney, J. Nuth, M. Reardon, N. Smith, P. Stewart, B. Weitzman, and J. Worthington. We also thank T. Cacciotti, RN, and P. Sheehan, RN, for their help with data collection, and I. Harris for assistance with the manuscript.

17

REFERENCES 1. Vargish T, Clarke WR, Young RA, et al: The ankle injuryIndications for the selective use of x-rays. Injury 1983;14:507-512 2. Montague AP, McQuillan RF: Clinical assessment of apparently sprained ankle and detection of fracture. Injury 19858: 545-546 3. Dunlop MG, Beattie TF, White GK, et al: Guidelines for selective radiological assessment of inversion ankle injuries. Br Med J 1986;293:603-605 4. Sujitkumar P, Hadfield JM, Yates DW: Sprain or fracture? An analysis of 2000 ankle injuries. Arch Emerg Med 1986;3:101106 5. Diehr P, Highley R, Dehkordi F, et al: Prediction of fracture in patients with acute musculoskeletal ankle trauma. Med Decis Making 1988;8:40-47 6. Lloyd S: Selective radiographic assessment of acute ankle injuries in the emergency department: Barriers to implementation. Can Med Assoc J 1986;135:973-974 (editorial) 7. Long A: Radiographic decision-making by the emergency physician. Emerg Med Clin North Am 1985;3:437-446 8. Garfield JS: Is radiological examination of the twisted ankle necessary? Lancet 1960;2:1167-1169 9. Lettin AWF: Diagnosis and treatment of sprained ankle. Br Med J 1963;1:1056-1060 10. Stother IG: Incidence of minor fractures in twisting injuries of the ankle. Injury 1974;5:213-214 11. Brooks SC, Potter BT, Rainey JB: Inversion injuries of the ankle: Clinical assessment and radiographic review. Br Med J 1981;282:607-608 12. Brand DA, Frazier WH, Kohlhepp WC, et al: A protocol for selecting patients with injured extremities who need x-rays. N Engl J Med 1982;306:333-339 13. Sackett DL, Haynes RB, Tugwell P: Clinical Epidemiology. A Basic Science for Clinical Medicine. Toronto, Ontario, Canada, Little, Brown, 1985, pp 17-45 14. Feinstein AR: Clinical Epidemiology. The Architecture of Clinical Research. Philadelphia, PA, Saunders, 1985 15. Mulrow CD, Dolmatch BL, Delong ER, et al: Observer variability in the pulmonary examination. J Gen Intern Med 1986;l: 364-367 16. Fletcher RH, Fletcher SW, Wagner EH: Clinical Epidemiology. The Essentials (ed 2). Baltimore, MD, Williams & Wilkins, 1988, pp 22-29 17. Feinstein AR: Clinimetrics. New Haven, CT, Yale University Press, 1987 18. Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 1977;33:159-174 19. Fleiss JL: Statistical Methods for Rates and Proportions (ed 2). New York, NY, Wiley, 1981, pp 212-237 20. Kramer MS. Feinstein AR: Clinical biostatistics. LIV. The biostatistics of concordance. Clin Pharmacol Ther 1982:29:1 ll123 21. Thompson WD, Walter SD: A reappraisal of the kappa coefficient J Clin Epidemiol 1988;41:949-958

Interobserver agreement in the examination of acute ankle injury patients.

The authors' objective was to describe a method for measuring interobserver agreement and to determine the reliability of physical findings used by em...
513KB Sizes 0 Downloads 0 Views