Computer Peter J. Haug, MD #{149} Paul D. Clayton, PhD2 #{149}Irena Tocino, MD #{149}James C. Gregory Elliot, MD #{149} David V. Collins, MD3 #{149}Susan K. Harada, RN

Chest Radiography: ofReport Quality’ In a radiology department, clinical audit implies multiple readings of selected images to identify those findings that should be recognized and to document any departure from this standard for each radiologist. The authors developed an alternate approach for an audit on the basis of clinical outcomes collected in a medical computing facility. Techniques borrowed from information theory were used to measure the clinical information contributed by radiologists as they interpreted chest radiographs. The reported findings were evaluated in light of the discharge diagnosis. The scores generated quantified the information contributed to the final diagnosis by the radiologist’s description. This audit approach was tested in a group of 100 chest radiographs. Significant differences were found in the mean scores for information contributed by five different readers. These differences were similar to differences demonstrated in audits by means of multiple readings of chest radiographs. These results support use of a form of audit that is substantially less expensive and time consuming than that typically used in radiology departments. Index

terms:

Diagnostic

performance, 60.11 Model, mathematical

Radiology

radiology,

Images,

#{149}

1991; 180:271-276

analysis,

observer 60.11

A Tool

N important

technique

for

for

Applications

W. Morrison, MD Philip R. Frederick,

MD

#{149}

the

Audit

assur-

images

ing

quality in any medical enterprise is the implementation of regular reviews of the services rendered. Unfortunately, routine audit has proved problematic in many areas of health care because of the difficulty and expense involved in the measurement of quality. Herein, we describe a prototype automated auditing system designed to measure the quality of radiology reports. This approach involves use of clinical information systems, expert systems technology, and information theory. This automated system will allow continuous review of the information generated as a part of a radiology report. The medical literature contains numerous examples of audits of the ability of radiologists to detect and report abnormalities. Most of these audits take the form of investigations of typical levels of accuracy and ways to increase them (1-6). These audits are typically performed with use of multiple readings of images. Redundant reading can take several forms. As part of the training of radiology residents, each report and associated irnage are reviewed by a senior radiologist who corrects any errors (7). This review provides the feedback necessany to improve the resident’s future interpretations. The expense of this type of audit is considered to be part of the cost of training new radiologists. When the radiologic interpretations of practicing radiologists are reviewed, one or more additional radiologists typically reexamine a set of

I From the Departments of Radiology (IT., J.W.M., P.R.F.), Medical Informatics (P.J.H., S.K.H.), and Pulmonary Medicine (C.G.E.) LDS Hospital, 325 Eighth Aye, Salt Lake City, UT 84143, and Salt Lake City (D.V.C.). From the 1989 RSNA scientific assembly. Received October 4, 1990; revision requested November 21; revision received February 12, 1991; accepted March 5. Supported in part by grant ROl LM04932 from the National Library of Medicine. Address reprint requests to P.J.H. 2Current address: Center for Medical Informatics, Columbia University, New York. 3Current address: Price, Utah. RSNA, 1991

and

attempt

to confirm

the

impressions of the original reader (8). An alternative method is to encourage self-review by each radiologist by implementing a program of rereading films in a blinded manner and cornparing the two interpretations. Each of these review techniques can be expensive because of the time required for the radiologist to reexamine images (9). However, multiple readings are necessary to provide a standard against which the individual interpretations can be compared. Clinical computing has made a different standard possible. Among the data collected in the modern medical computing facility are a variety of outcome measures that can be linked to images, such as biopsy reports, discharge diagnoses, and posttherapy laboratory test values. A review on the basis of correlation of these data with the data in the reports can reduce or eliminate the need for rereading of films and can offer an outcomebased assessment of quality. Herein, we describe a study of techniques that link diagnostic outcome data maintained in a medical information system to data obtained from descriptive reports of chest radiographs. Reports produced by five radiograph readers (three radiologists and two internists) were compared by using two different measures of quality. One measure, similar to typical forms of review based on multiple readings, used accuracy criteria (the true-positive results [TP], false-positive results [FP], and false-negative results [FN]) measured for a group of 100 chest radiographs. A set of abnormalities used as a standard of reference was defined in terms of agreement among a majority of the readers.

Abbreviations: FP = false-positive directed information result.

FN

= false-negative result, ODIC content, TP

result,

outcome-

= =

true-positive

271

an

The second measure represented effort to test the usefulness

formation cal expert the quality

theory

coupled

of

with

in-

medi-

systems for evaluation of of radiographic reports. This approach presupposed that the physician creating the report was pre-

Title: Variable

model

involved of report

calculation quality based

seen

as a measure

information

their

in each

mean

of the

was

of a meaon the

to

reader

use

METHODS of the HELP

System

(3M

Health

Information

Systems, Salt Lake City, Utah) (described in detail elsewhere [10]), which combines an integrated clinical data base with an on-line medical expert system. This data base contained most of the cmical information generated for each patient in the

hospital

setting.

These

data

were

stored in a form that was immediately accessible not only to clinicians and nurses caring for the patient but also to the expert system, which functioned routinely to provide decision support to all health care personnel. The HELP expert system was designed to provide computerized decision support by means of a set of tools for creating, editing, and implementing medical knowledge. This knowledge was written with use of a special-purpose decision syntax that was based on the evolving Arden medical knowledge syntax (11). The result is an easily read, modular representation of medical logic. Each module, called a medical logic module, can be used to make decisions

concerning

or management. 272

Radiology

#{149}

righbasi1ar leftapical Ieftjnid_Iung

which is I-RIGHT APICAL-I or which is I-RIGHT MID-LUNG-I which is l-RIOHT BASILAR-I or which is LEFF APICAL-I or

which ii I-tIFF MID-LUNG-I left_basilar which is 1-LEFF BASILAR-I) whose value ii localized_alvcolarjnfiltratc;

ALVEOLAR

INFILTRATE-I

and

or

or

0.1;

of(no,

yes,O.9)

for

use euc rates of(no, 0.15; yes.O.85) and false rates of(no. 0.7; yes.O3);

pneumonic_infiltrate

us.

mu and

rites of(no. 0.36; yes. 0.64) false rates of(no. 0.95; yes. 0.05);

Logic:

bacterialjneumonia.disease..prob If Exist(dyspnea)

then

=

0.067;

bacterial..pneumonia.disease..prob

=

tsayca(bacttha.pncumonia.diseas&.pmb.

dyspnea);

then

bacteriaL.pncumonia.disease.prob

=

Baycs(bacterial...pneumonia.diseasc.jwob.

1. An abridged diagnostic medical logic module for pneumonia that was written the general purpose HELP decision language. References to two history findings (cough, fever) and a radiographic finding (infiltrate) are shown. A Bayesian inference mechanism used in the medical logic module to generate a probability of pneumonia.

Figure

in

is

in a set of data tools with an ex-

AND involved

(right_a riglfl_mi&lung

If Exist(pneumonic_infiltrate)

radiographs.

Information

-I;

pneumonic_infiltrate);

measure

pert system for medical diagnosis to produce a computerized retrospective audit of the quality of reports of chest

study

loca5zed..alveolar_inflltrate

for fever

while

of each

information inherent and combined these

Our

which is I-Pneumonia

diseasejrob

and false rates of(no, 0.8; ycs.O.2);

To implement this second technique, we developed a set of mathematical tools to assess the diagnostic

Medical

an expression containing which is i-LOCALIZED

as

pneumonicjnfiltrate

for cough use ouc rates

process.

MATERIALS

containing

of the useful

report,

an average

contribution

the diagnostic

as a decision

Statistics:

sure consistency of the radiologist’s report with the discharge diagnoses. These scores, generated for each reader,

were

141:1);

bacteria.L.pnewnonia Declarations:

cough whh is I-HAVE YOU HAD A COUGH WiTH THIS ILLNESS?-l; fever which is i-HAVE YOU HAD A FEVER WITH THIS ILLNESS?-t

senting information that would help establish a final diagnosis. By examining the radiographic report in light of the patient’s ultimate discharge diagnosis, we evaluated how well the reporting physician’s goal was achieved. Our premise was that the reports of more effective chest radiograph readers would contain more information to support the ultimate discharge diagnosis. The method suggested by this

Pneumonia_diagnosis[7:

Message:

diagnosis,

prognosis,

We have used medical logic modules to study the diagnostic process by applying a Bayesian model to revise estimates of disease probabilities (12-14). For this diagnostic model, a group of medical logic modules were created to estimate the probabilities of diseases on the basis of patient data entered into the clinical data base. A subset of these medical logic modules formed the basis for the experimental approach to audit described herein. Figure 1 demonstrates a much abridged medical logic module that was designed to estimate

the

using data radiograph. infiltrate”

likelihood

of pneumonia

by

from the report of the chest The variable “pneumonicand

the

associated

statistics

il-

lustrate the information that was used to link abnormalities from the report of the chest radiograph to the diagnosis. Included in the HELP data base were a number of indicators of patient outcome that were appropriate standards of reference to be used in a retrospective audit. Perhaps the most useful indicators were the discharge diagnoses, which included both the primary discharge diagnosis and any secondary and complicating diagnoses. The discharge diagnoses, entered by employees in the medical records department

by

means

fication

of disease

viewed

for accuracy

of international

codes

(15), and

classi-

re-

by the attending

physicians, were the principal source of outcome information used in the prototype of this audit system.

Information-based

Audit

The formulation of “information” used in this project was based on the work of Shannon and Weaver, who described the conceptual

underpinnings

of information

theory in 1949 (16). These mathematical tools have found widespread application in a variety of scientific and engineering fields but have been used infrequently medicine.

The

in a medical

underlying

in

is that

is frequently unthe presence or absence of a disease or set of diseases. This uncertainty can be expressed in terms of the probabilities of those diseases. Information is associated with findings or groups of findings that reduce the uncertainty of these diseases by helping either to rule them in or to rule them out. For example, the uncertainty associated with a group of diseases can be expressed mathematically as certainty

H(D)

setting

principle

associated

=

-P(D1)

there with

log P(D1)

-P(D3)logP(D3) -P(D9) =

-

:

-

P(D2)

-

...

log

P(D2)

log P(D9) P(D1)

log P(D1),

(1)

where H represents uncertainty, D is the set of diseases D and P(D,) is the probability of the ith disease. When clinical data such as a radiologic report are added to the patient record, the probabilities change to P(DIF ), which is the probability

July

1991

tamed in the report that specifically supports the discharge diagnoses. Since our analysis was done retrospectively, we used known outcomes recorded

of disease i when the findings F from the radiologic report are known. In this equation, F represents the set of findings F1, F, F3, . . ., F that are recorded in the report. The uncertainty changes to H(DF)

-P(D1IF)

=

as discharge

-P(D2IF)

log P(D2IF)

-P(D3IF)

log P(D3IF)

-

.

.

after the radiologic the patient record.

-P(D,F)logP(DF)

.

P(D,F)

-:I

H(D,)

=

P(D,IF)log

=

log P(D1).

(3)

whether

=

[H(D1)

+

H(D2IF)]

-

H(D3IF)1

+

.

.

.

+ I(F,D,)

.

H(D1IF)]

-

-

.

.

+ [H(D2)

‘4’ I

‘,

Knowledge

before

of disease

and

after

radiograph, calculations.

probabilities,

acquisition

is needed To generate

both

of the chest

to perform these

these probabili-

ties, we used a group of Bayesian medical logic modules. These modules were designed to estimate the probabilities of 29 diagnostic conditions.4 The modules use the reported radiographic findings to produce an estimate of the likelihood of a certain disease when those findings are present. The probabilities are used in a modified version of the information-content equations to enable calculation of a value that represents the information con-

failure,

diffuse

pneumonitis,

drome,

idiopathic emphysema,

fibrosis, disease,”

disease,

neoplasm,

worker’s congestive

influenza,

non-Hodgkin

drug-related syn-

histiocytosis lung

abscess,

lymphoma,

primary pulmonary neoplasm, primary pulmonary hypertension, pulmonary embolus, sarcoidosis, silicosis, spontaneous pneumothorax, tuberculosis, and Wegener granulomatosis. “No pulmonary disease” is included in this list to help assess the effect of the report of the chest radiograph in patients with no disease invo!ving the lungs.

Volume

180

#{149} Number

1

+

pneumocoheart

Goodpasture

“no pulmonary

non-Hodgkin

metastatic

coal

to the

by the reader

ODIC

information

supplied

by

the

diagnostic

of those

chest

radio-

contributed

in the

report

X,

=

KdI*

II(F,D1)I

Kf,,dII(F,D3)I+

of

+ K*II(F,D2)I .

.

.

Kdfl*II(F,Dfl)l,(5)

+

where Kd was 1 if (a) disease i was present and if its probability was increased by the report of the findings of the chest radiograph or (b) if disease i was absent and its probability was decreased by the report findings. KdI was - 1 if (a) disease i was present and its probability was decreased by

the

radiograph

and report It tion that

report

of the

findings

or if (b) disease

of the

i was increased

of the

chest

radiograph

radiographs

during

had

been

studies

originally

collected

of the expert

herein.

To be

systems

tools

included,

a radio-

graph had to have been obtained within the first 48 hours of a patient’s admission and a discharge diagnosis had to have been included in the patient’s records. Discharge diagnoses were obtained from the discharge summary that was recorded in the computerized clinical record by employees in the medical records department. The radiographs of an approximately equal mix of patients with and without pulmonary disease diagnoses were included. A patient was defined as having a pulmonary disease if his discharge diagnosis was included in the list of 29 diseases listed in footnote 4. A group

of five

to interpret whom was including

nists training

volunteered

radiologists

had

(one article),

and

undergone

two

of

inter-

subspecialty

in pulmonary

the radiologists and one had training

physicians

the chest radiographs not an author of this three

who

medicine.

Two

of

were trained as generalists undergone subspecialty

in chest

radiology.

This

mixture

of

readers was chosen in an attempt to assure measurable variability in reading behavior. The five readers were each given the 100 chest radiographs, in groups of 10. Blinded to all other clinical data, the readers were asked to interpret each of the radiographs

and

to indicate

any

abnormali-

ties on a form that contained a list of 60 radiographic findings and associated information. by

using

The

data

entered

into

collected

was

the clinical

a computerized

subse-

data

branching

base ques-

tionnaire. Because of errors in data collection, 11 of 500 readings could not be used. An additional data set was collected by using the same radiographs. This set of 100 readings was designed to represent a

chest

absent by the

in each

To assess the potential value of the ODIC, we designed an experiment to compare a typical multireader technique with the information-based approach for assessment of the ability of radiograph readers. We began by selecting a group of 100 standard, two-view chest radiographs obtamed for approximately 700 patients. All

quently

its probability was findings. should be emphasized that this equadiffers from Equation (4) primarily in the contribution of the report of the

findings

Analysis

described

radiograph would raise the probabilities. If this was true, we added the absolute value of their I(F,D) terms to the ODIC. If this was false, we subtracted the absolute value of I(F,D,). This portion of the ODIC reflected the ability of the report of the chest radiograph to support recognition of patient diseases. Conversely, for the diseases that were absent from the discharge diagnoses, we expected a decrease in probability. We subtracted the absolute value of I(F,D1) if the report increased the likelihood of the disease and added it if the report decreased the likelihood. The result was a value that represented the degree to which the information in the report of the chest radiograph enabled a disease a patient did not have to be ruled out. The complete equation used was ODIC

The following is a list of conditions for which medical logic modules were created: acute bronchitis, asbestosis, aspiration pneumoma, asthma, bacterial pneumonia, bronchiecta4

sis, chronic bronchitis, niosis, coccidioidomycosis,

contributed

process

of the

the chest

H(D,IF)].

-

the

information

+ [H(D3)

+ [H(D,)

of each

information graphs.

of the chest radiograph was consistent with the patient’s real illnesses. The result was considered to be consistent if the findings described increased the probability of those diseases that the patient had and decreased the probability of diseases that the patient did not have. Specifically, for diseases included in the discharge diagnoses, we anticipated that the

+ I(F,D2)

+ I(F,D3)

in favor

gener-

report

content

I(F,D1)

by the report

information

The principal change in the measure of information from Equation (4) was indusion of a weighting factor that reflected

This is the portion of the total information supplied by the findings F that is associated with disease i. Thus, Information

the

added to (3) was

(ODIC).

P(D1F)

-P(D1)

findings were Then Equation

At this point, we used the record of the final discharge diagnoses to interpret the values for the individual diseases. To do this, we developed a measure of information content in which the disease-specific components (the l[F,D1] terms) were weighted according to the support given their respective diseases. We called this the outcome-directed information content

H(D,IF)

-

interpret

29 conditions.

The information provided by F is the change in uncertainty: I(F,D) = H(D) H(DF ). As this notation suggests, it is dependent on both the findings noted in the chest radiograph and on the group of diseases whose likelihood is affected by these findings. In this equation, the contribution from each disease is =

to estimate

ated

log P(D,IF). (2)

I(F,D,)

to help

information content. We began by evaluating the disease probabilities before and

log P(DIIF)

used

=

diagnoses

the diseases was weighted according to whether the information increased or decreased the disease probability appropriately in light of the patient’s final diagnoses. This algorithm produced one value for each report, and the mean of these values was used as a measure of the average

of

“maximally

useful”

data

set

by

ensuring

that all pertinent abnormalities visible on each radiograph, no matter how subtle, were reported. “Pertinent” was operationally defined as anything relevant to the patient’s

diagnostic

workup.

Radiology

#{149} 273

;i

Re.n

In those

plete

cases

in which

agreement

readers,

we

and

set.

the

For

their

the radiographs

a panel

of three

P.R.F.) to

into

an

new

they

were

physicians

produce

five

this

in which

agreed,

After data collection was complete, the interpretations of the chest radiographs were analyzed with two different tech-

com-

interpretation

findings

all cases

was

the original

accepted

entered

data

there

among

niques.

dis-

submitted

(P.J.H.,

to

IT.,

additional

report.

third

radiologist

the individual era! months the

review

was

abstracted graphs

were

pulmonary

at the

from chest

he

the

of

radio-

presented on

as well

internist of each

time

bearing

status,

information tion. The

records

examined, of data

in

The

medical

patients;

summary

participated

an interval of sevto pass before

undertaken.

the

the study

had

readings, was allowed

the

any related by the three

pertinent

findings

a patient’s

as all relevant

images were then physicians, and all

were

recorded.

The five original readers disagreed about findings in 89 of the 100 chest radiographs. The panel read those radiographs and added their report to the data base. The five readers agreed completely about findings in the remaining 11 radiographs. It was thought little to these

that the examinations.

findings

added

by

was

This additional access to data

panel

to the reading that was

the five original readers. not intended to represent

could A copy

data was not

base.

of the findings on the Instead, it was designed test of the discriminating

ability

ODIC

chest to be

audit.

The following is a summary of abnormalities found on chest radiographs that were used in the comparative audit of the performance of the five original radiograph readers: cardiac abnormalities, hilar abnormalities, pulmonary infi!trates, hypoaeration and/or atelectasis, aortic abnormalities, bone changes, abnormalities of the pleura, pleura! effusions, hypennflation, pulmonary vascular abnormalities, other pulmonary parenchymal abnormalities, mediastinal abnormalities, pulmonary parenchymal masses, soft-tissue abnormalities, prosthesis, and pneumothorax.

274

Radiology

#{149}

involved

use

racy

readings

based

on

multiple

by a

Audit

In the development dure for reports based

of an audit on multiple

ings,

the

step

was

Rcader

2

Reader

illustrates

Rcader4

3

mean

pected to represent a maximum value for each radiogriaph. These results were analyzed analysis niques.

Comparison-based

first

graph

Reader

5

ODIC

values for the five radiologists (shaded and the review panel (solid bar).

of radiologists.

the

I

Rad,r

Bar

bars)

of the

designed to simuof report accu-

proceread-

definition

of a

standard of reference for the findings that were present on the chest radiographs used in this study. We chose to accept any finding about which three or more of the five observers agreed. This yielded from zero to five standard findings per radioTo compare the reported findings, we first had to accommodate variability in reporting styles. To do this, we compared the findings only after they had been sorted into categories. Thus, if one physician described cardiac enlargement and another described dia-pencardial

enlargement silhouette, these

were considered were grouped enlargement.” was to compare

to be equivalent and under “cardia-pencardial The effect of this analysis the radiologists on their

ability

to recognize

findings

of the findings

car-

in 16 catego-

ries.5

Once a basis for comparison fled, a standard set of findings defined for each examination. this

to

Therefore, it was an independent

determination radiographs. an additional of the

add of their

influenced available

first

2.

of variance Two-way

and analysis

ODIC by using

regression of variance

tech-

(reader and patient) was used to examine differences in the TPs, FPs, FNs, and ODICs of the five radiograph readers. In addition, the ODICs of the five readers were compared with the ODICs of the three-member

panel.

Finally,

we

ined, in part, the relation between ODIC and the standard measures ing the correlation between ODIC simple

accuracy

exam-

the by testand the

statistics.

graph.

the patient’s hospitalizaradiographs used in the

study and examined

The

ODIC. The second was late a typical comparison group

The goal of this pane! was to generate a list of the findings on radiographs that was consistent with information available in the patient record. The panel, who worked together to formulate these “authoritative” reports, included an internist, a general radiologist, and the radiologist who specialized in chest disease. Since this

Figure

standard,

generated chest sisted

of a count

of statistics

reader

examination.

TPs were recognized were

a group

for each

was identicould be By using

These of the

could

and FPs,

be

conand

FNs.

the number of standard findings by the individual reader. FNs

the

number

of standard

findings

the reader. FPs were the number of findings recorded by the reader that were absent from the standard set. missed

The methods described herein yielded two types of data: (a) standard measures of the quality of radiographic reporting in the form of TP, FP, and FN rates for each reader, and (b) measures of the information generated during the reporting process, the ODICs. We examined the effectiveness of these two measurement techniques.

Comparative

for each

statistics TPs,

RESULTS

by

The Table shows the means of the TPs, FPs, and FNs for the five radiograph readers. These values vary greatly. The differences for the TPs and FNs were found to be significant after two-way analysis of variance (reader by cases). FPs discrimination

Information Data

patient, showed (P

P

< .05 in both a trend toward < .10).

Content

Audit

Analysis

A group of data points were obtained after data collection and analysis for each reader and each radiograph. These data points

Audit

included

TPs,

FPs,

FNs,

and

the

ODIC. In addition, an ODIC was calculated for each patient by using the reports generated by the three-physician panel. If the ODIC, in fact, measured the information contribution, then this value was ex-

Figure 2 graphically demonstrates the mean ODIC values for the 100 reports for each of the five readers. In addition, the mean for the review group is shown on the left. The values range from a low of 0.62 to a high of 0.88 for the readers. The reports generated by the review panel had an ODIC of 0.92.

July 1991

Two-way analysis of variance (radiograph reader by patient) was applied to the ODIC to determine if it

sociated with diseases the patient did not have. The portion of the ODIC derived from the diseases that were

discriminated

present

results

with

among the readers. The were highly significant (F = 4 degrees of freedom; P < .002). the ODIC provided a strong

4.445,

Thus,

discriminator among these radiograph readers, at least equal in power to the FP, TP, and FN rates. However, this may not be the correct

analysis

Our

goal

to apply

was

to this

to use this

data

tool

in each

patient

correlated

ated from ries FPs

the TPs (r = .281, P > .0001). The of the ODIC that was associwith diseases that were absent the patients’ discharge summawas correlated with both TPs and (r .427, P > .001 and

r

-

portion

=

=

P > .001,

respectively).

set.

to dis-

rized

authoritative

or not

according they

diseases. Therefore, were more interested

had

to

the result that we in was the two-

way analysis of variance that compared the scores of different readers, categorized according to whether one or more pulmonary diseases was present. This analysis was performed and showed a significant difference (F = 2.818, 4 degrees of freedom; P < .03). Two-way analysis of variance was also performed, including the findings of the review panel (reader by presence or absence of pulmonary disease). The difference in the means (Fig 2) was significant (F = 3.325,5 degrees of freedom; P < .01). To avoid attributing the signfficance in the differences in the original five readers to this additional group of reports, a post hoc analysis was per-

formed,

which

demonstrated

reports generated by the panel independently distinguishable that of reader 5 (Student-NeumanKeuls test; P < .05).

that

the

were from

the

five

tions linear

ODIC independently among we tested for correlathem by using simple

readers,

regression

techniques.

Both

(r

= =

.388, .0001,

P

the

= .0001 and respectively).

failed to demonstrate correlation (r .031, =

Subgroup

analysis

formed. We were able ODIC into the portion

r = .470, Values of FNs a significant P = .498). was also per-

to divide

the

contributed by the diseases in the patient’s list of discharge diagnoses and the portion as-

Volume

180

#{149} Number

1

demonstrated result allows sample size of expertise large enough

(c)

ences

ODIC

diagnostic trates,

dard

less

process

to

than

pulmonary

accuracy

statistics

rate

of the

differences

formance

were

substantial.

the mean

TP rate for reader

more

any

tendency

Again, reports

findings

other to

the

reader, overread

overall

may

not

easier dard

mean

For

per-

example,

5 was

18%

radiograph

suggests

the radiographs.

clinical value of those be clearly defined by measures of accu-

ODICs

were

for

reader.

the The

designed

for the

reports

of

panel’s

re-

to offer

usefulness;

of the

goal.

was higher

than

that

panel

were

two

techniques

part,

stan-

these contradictory racy. The ODIC allowed discrimination among the readers that was at least as robust as that of the traditional measures. The meaning of differences

among

ports

surprising.

all abnor-

which

individual

this

be interability

maximum

therefore,

the

the in diagnostic information suggests that the ODIC does in fact reflect the diagnostic usefulness of the panel’s reports. Finally, the correlation between the

(infilthe

in reader

per

value

reports

any

in the

from

the

the

data

supplied

richest

is low. The

by

This

explanation

may

not

comes,

be in

the

fact that the abnormali3 do not each have the same clinical importance. The standard measures count each finding as a single element that adds the value “1” to the TP, FN, or FP. Atelectasis has the same value as a hilar mass. The ODIC cornpensates for clinical importance and therefore may not correlate well with the standard measures. A number of techniques have been used to evaluate the efficiency of radiologists as they attempt to recognize and record radiographic abnormalities. The simple quantification of rates of accuracy and inaccuracy in terms of TPs, FPs, and FNs is frequently used (17). Receiver operating characteristic curves are also popular, and their use allows a more careful analysis of the trade-off between IF and FP observations (18). However, both of these techniques have two significant disadvantages. The first disadvantage is the need for

ties in footnote

do others but

increases

can reasonably

accomplish

ODIC

finding

the

masses),

to

diagnostic

malities equally.

than

with for those

erroneously

as differences

The

to discriminate

substantially

ODICs

in

panel’s

among readers. Note, however, that examination of these results tells us litfie about the overall clinical significance of the different rates. Some abnormalities (atelectasis, calcified granulomas) may contribute

not in-

Information that reduces the of diseases that the patient have also adds to the ODIC. that

readers

measures did enable among the different the TP and the FN rates

significant variation. This some confidence that the and the differences in level among the readers were to allow a valid test of the

of the

diseases. likelihood does not

preted

both

readers;

ported

number of TPs and the number of FPs correlated poorly with the ODIC P

we ad-

reports are higher than of the reports of the individ-

correlate? The standard discrimination

ability

does

the probability of diseases that prove to be absent reduces the ODIC accordingly. The result is a value that appropnately reflects the presence or absence of diagnostic support for both diseases that are present and those that are absent. If we assume that the readers were all attempting to find abnormalities that would aid in the diagnosis of disease states in these patients, then the differ-

less than that of reader 3 and 12% less than that of reader 1. Reader 5 also had a 42% higher FP rate than reader 2 and an FN rate more than twice that of reader 3. Reader 3 achieved the highest TP rate, but this was accompanied by a fairly high FP rate. In fact, reader 3 re-

the TP, FP, and

between

of the data,

ual readers (as would be expected)? Do the results from the two techniques

Some

Correlations

Since both discriminated

In our analysis

the ODICs

pulmonary

measure,

ODIC value to be in accordance the importance of the findings

Information

DISCUSSION dressed three questions: (a) Do either the standard measures or the ODIC distinguish among the reports generated by the different readers? (b) Do values indicate that the ODICs of the panel’s

the patients

a single

dude contradictory evidence from different statistics. The ODIC attempts to include in one score the countervailing effects on the diagnostic process of TPs, FPs, and FNs. Recognition of findings that support the presence of those diseases that are present increases the

-

.437,

criminate among readers interpreting radiographs from different patients (ie, different examinations) after one reading. Otherwise, multiple readings of each radiograph would still be required. Since we knew the discharge diagnoses of each patient, we categowhether

ODIC,

may be somewhat

to interpret than that of the stanmeasures for many reasons. The

a

additional

readings

of a subset

of radio-

graphs to determine the findings are reproducibly associated with

that each examination. This additional effort is necessary whether a radiograph is reviewed by multiple readers, by two readers, or by each individual radiolo-

Radiology

#{149} 275

gist in a program that involves a blinded rereading of examinations for which he or she created the original report. The second disadvantage of review on the basis of simple accuracy statistics is the difficulty of translating the results of these approaches into clinically meaningful terms. If a radiologist fails to detect an infiltrate, that failure does not, in general, have the same implications as failure to detect a mass or a patch of atelectasis. A more meaningful approach for review is to forge a link between the findings that are recognized or overlooked and the diagnostic process

of which

the

examination

is a part.

Our goal in strate that the nation among diographs in a

this study was to demonODIC allowed discrimithe readers of chest raway that was similar to discrimination produced by a representative type of audit on the basis of multiple readings. Several aspects of our findings are encouraging. First, the ODIC did enable discrimination among the readers at least as well as did simple accuracy statistics. As a single measure, the ODIC negated the need for three different measures of accuracy (TPs, FPs, FN5); the use of multiple measures can make a direct comparison among readers (or an assessment of the skill of an individual reader) difficult. In addition, the reports generated by the review panel demonstrated the largest mean ODIC. Since these reports were specifically designed to maximize the information contributed, the agreement of the ODIC with our expectations lends support to the usefulness of this tool. These observations encourage us to believe that the ODIC can provide an effective measure of the quality of radiographic reporting. Several problems are associated with the routine use of the ODIC. First, it is highly dependent on the accuracy of the discharge summary. Fortunately, the use of diagnosis-related groups has resulted in increasing care in coding the discharge diagnoses. Second, there is a danger that the content of the radiographic report will so

affect

the

patient’s

workup

that,

right

or wrong, the radiographic report will define the discharge diagnosis. In that case,

not We

the

approach

work in the believe that,

setting,

enough

described

routine at least additional

here

clinical in the

will

setting. hospital

information

is collected to correct any initial misdiagnosis and that the final diagnosis tends to be correct. Another remedy for the danger of relying too heavily on the discharge diagnosis is to accept a diagnosis only if it is supported by other data in the com-

276

Radiology

#{149}

puterized

with

minor

system

could

medical record. modifications, check

For the

patients

2.

instance, expert

3.

discharged

with the diagnosis of pneumonia for records of positive sputum cultures or other supporting clinical data. The system would then generate ODICs only for “confirmed” cases of pneumonia. Third, the numerical value of the ODIC may not be intuitively meaningful. What should a reader of radiographs infer if his ODIC is significantly lower than that of his peers? Fortunately, there is information in the expert system that can be used to explain the ODIC. Specific links in the medical logic modules associate each relevant finding with the diseases that can cause it. We intend to make use of these links to develop an explanatory capability to go with the ODIC (19). Fourth, the accuracy of the ODIC can be expected to reflect the accuracy of the medical logic modules as they generate the required probabilities. Fortunately, data from the clinical information systems can be used to enhance the accuracy of these modules. The conditional probabilities upon which their accuracy is based can be derived from data stored in clinical data bases. Our experience with this process indicates that deriving the contents of the medical logic modules from the data base can significantly

improve

the

accuracy

4.

5.

6. 7.

8.

9.

10.

11.

12. 13.

14.

of

the modules

(14). Finally, this technique is dependent on the availability of the reported findings in a coded form. This is the substrate from which the required probabilities are calculated. Currently, only a small

number

of radiology

15.

16.

departments

17.

are in a position to capture the clinical data recorded in radiographic reports in a coded form. However, techniques for accomplishing

this

goal

continue

to be

a

subject of much research. Efforts range from terminal-based approaches (20,21), through systems that use bar-code entry (22), to attempts to extract coded information directly from free text (23). The advantages of successfully recording radiographic data in a form that cornputers can manipulate include use for patient care (24), and other forms of clinical audit (25) and increased accuracy of billing for radiologic procedures. We are currently developing a system that will automatically evaluate the report of each patient’s first chest radiograph at the time of discharge from the hospital. From this effort, we hope to determine the possibility of using the ODIC in a continuous, automated audit of the quality of radiographic reports. #{149}

References 1.

Herman PC, Cersen DE, Hesse! SJ, et ai. agreements in chest roentgen interpretation. Chest 1975; 68:278-282.

18. 19.

20.

Doubiiet P, Herman PC. Interpretation of radiographs: effects of a clinical history. AIR 1981; 137:1055-1058. Schreiber MH. The clinical history as a factor in roentgenogram interpretation. JAMA 1963; 185:137-139. Potchen EJ, Card JW, Lazar P. Lahaie P. Andary M. The effect of clinical history on chest film interpretation: direction or distraction (abstr). Invest Radio! 1979; 14:404. Yerusha!my J. Reliability of chest radiolo in the diagnosis of pulmonary lesions. Am Surg 1955; 89:231-240. Koran LM. The reliability of clinical methods, data, andjudgments (second of two parts). New Engi J Med 1975; 293:695-701. Rhea JT, Potsaid MS. DeLuca SA. Errors of interpretation as elicited by a quality audit of anemere,radio!ogy facility. Radiology Baines CJ, McFarlane DV, Wall C. Audit procedures in the national breast screening study: mammography interpretation. Can Assoc Radiol 3 1986; 37:256-260. Hessel 5;, Herman PC, Swensson RC. Improving performance by multiple interpretations of diest radiographs: effectiveness and cost. Radiology 1978; 127:589-594. Pryor TA, Cardner RM, Clayton PD, Warner HR. The HELP System. J Med Syst 1983; 7:87-102. Clayton PD, Pryor TA, Wigertz OB, Hripcsak C. Issues and structures for sharing medical knowledge among decision-making systems: the 1989 Arden Homestead Retreat. In: Kingsland LC, ed. Thirteenth annual symposium on computer applications in medical care. Los Angeles: IEEE Computer Society Press, 1989; 116-121. Haug P1, Warner HR, Clayton PD, et a!. A decision-driven system to collect the patient history. Comput Biomed Res 1987; 20:193-207. Haug P1, Rowe KC, Rich T, et a!. A comparison of computer-administered histories. rn HammondWE, ed. Proceedings AAMSI Congress. Washington, DC: American Association for Medical Systems and Informatics, 1988; 21-25. Haug P1, Clayton PD, Shelton P. et a!. Rev!sion of diagnostic logic using a clinical data base. Med Decis Making 1989; 9:84-90. International classification of disease-ninth revision. 2nd ed. U.S. Department of Health and Human Services. Publication no. (PHS) 80-1260. Washington, DC: Covernment Printing Office, 1980. Shannon CE, Weaver W. The mathematical theory of communication. Urbana, Ill: University of Illinois Press, 1949. Yerushalmy I. The statistical assessment of the variability in observer perception and description of roentgenographic pulmonary shadows. Radio! Clin North Am 1969; 7:381392. Metz CE. Basic principals of ROC analysis. Semin Nucl Med 1978; 8:283-298. Haug P1, Frederick PR. A proposal for cornputerized medical audit in radiology. In: Salamon R, Protti D, Jochen M, CroverB, eds. Medical Informatics and Education International Symposium. Victoria, BC, Canada: The University of Victoria, 1989; 613-617. Leeming BW, Simon M, Jackson JD, Horowitz CL, B!eich HL. Advances in radiology reporting with computerized language information , rocessing (CLIP). Radiology 1979; 133:349-

21.

Haug P1, Tocino IM, Clayton PD, Bair TL. Automated management of screening and dimammography. Radiology 1987; 164:

22.

Adams HC, Campbell AF. Automated radiographic report generation using barcode technology. AJR 1985; 145:177-180. Haug PJ, Ranurn DL, Frederick PR. Computerized extraction of coded findings from freetext radiology reports. Radiology 1990; 174:

23.

543-548.

24.

25.

Evans RS, Larsen RA, Burke JP, et a!. Cornputer surveillance of hospital-acquired infections and antibiotic use. JAMA 1986; 256:10071011. Sickles EA, Ominsky SH, Sollitto RA, Calvin I-LB. Monticciolo DL. Medical audit of a rapid-throughput mammography screening practice: methodology and results of 27,114 examinations. Radiology 1990; 175:323-327.

Dis-

July

1991

Chest radiography: a tool for the audit of report quality.

In a radiology department, clinical audit implies multiple readings of selected images to identify those findings that should be recognized and to doc...
1MB Sizes 0 Downloads 0 Views