Diagnostic Radiology
Visual Detection and Localization of Radiographic Images 1 Stuart J. Starr, B.S., Charles E. Metz, Ph.D., Lee B. Lusted, MD., and David J. Goodenough, Ph.D.2 In conventional receiverope ratingcharacteristic (ROC) curve analysis of visual detection performance, the observer is credited with a truepositive response if a visual signal is present somewhere in a radiograph called "positive" by the observer; however, the measured truepositive rate can be different for a given falsepositive rate if the observer is required to identify the correct location of the visual signal in order to receive credit for a truepositive response. The authors describe and have confirmed experimentally a model which can be used to predict observer performance in an experiment requiring both detection and localization on the basis of the conventional ROC curve determined in a detection experiment. Implications for the use of signal detection theory in the assessment of radiographic image quality are discussed. INDEX TERMS: Diagnostic Radiology, observer performance. Radiographs, image analysis. Radiographs, interpretation • Radiology and Radiologists
Radiology 116:533538, September 1975
VALUATION OF radiological imaging systems must
E
take into account the perceptual capabilities of human observers. In several studies reported recently, signal detection theory has been applied to measurement of observer performance in radiography (t, 35, 12); this was done by having the subject search for an image of a simple physical object in a series of sample radiographs containing background mottle (14), anyone of which might or might not have contained an image. Observer performance was assessed by generating a receiveroperatingcharacteristic (ROC) curve (6, 15), which is a plot of the conditional probability of truepositive responses vs. the conditional probability of falsepositive responses and permits the attribution of a measure of detectability to the given visual task (3, 9, 12, 15). A common aspect of these studies is that the image could appear at any location on the sample radiographs; however, the observers were required to specify only their degree of confidence that the image was present somewhere on the radiograph. Consequently, no provision was made for the possibility that some truepositive responses may actually have been the result of the observer failing to detect the image while at the same time falsely identifying an area of radiographic mottle as the image. Clearly, in order to properly apply signal detection theory to the assessment of observer performance in clinical radiology, where the location of a lesion is ordinarily unknown, it is necessary to understand the effect which requiring the observer to locate the image has upon ROC curve analysis. We wished to formulate a prediction of human observer performance for the task of simultaneously detecting and properly localizing a lesion to within a subre
gion of a radiograph, given the situation that both the location and the presence of the lesion were uncertain. This prediction of an ROCtype curve for the task of de. tection and localization (LROC) was expressed in terms of the conventional detectiontype ROC curve. In order to test our theoretical approach, we measured ROC and quadrant LROC curves generated by human observers viewing lowcontrast radiographic images of circular Plexiglas disks. THEORY
In this study, the coordinates of the LROC curvethe conditional "truepositive, correct location" and "falsepositive" probabilities for the various decision variable thresholds Xc and a given visual field size A, PodS,CLI s;xc,A) and pod S1 n;xc,A), respectivelyare expressed in terms of the corresponding coordinates of the conventional detectiontype ROC curve for the same thresholds and field size, Po(S1 s;xc,A) and p o n;xe,A). The subscripts DL and D indicate probabilities associated with "Detection and Localization" or "Detection only" experiments, respectively; S denotes a "Signal present" (i.e., positive) response; CL denotes "Correct Localization" of the signal; the symbol is to be read "given that"; and sand n indicate that the image actually contained "a signal plus noise" (i.e., was in fact positive) or "noise only" (negative), respectively. The decision variable threshold Xc can be thought of as the minimum subjective "degree of positiveness" apparent in a subregion of the image which the observer requires if he is to call that subregion "positive," and it is this threshold that the observer varies in order to operate at different points on the ROC (or LROC) curve.
(s1
I
1 From the Center for Radiologic Image Research and the Department of Radiology, University of Chicago, (S. J. S., C. E. M., L. B. L.. D. J. G.), and the Franklin McLean Memorial Research Institute (operated by the University of Chicago for the U.S. Energy Research and Development Administration) (C. E. M.), Chicago, III. Accepted for publication in April 1975. This work was supported in part by grant GM18940 from the USPHS. 2 John Hopkins Medical Institutions, Baltimore, Md. sjh
533
534
STUART
J.
STARR AND OTHERS
We therefore wished to determine the relationship between each point on the ROC curve (obtained by adopting a particular value of xc) and the corresponding point on the LROC curve (obtained by using the same value of xd. In the appendix it is shown that the relationships between the two kinds of coordinates are expressed by the equations
(1)
and PnL(S, CL Is;xc ' A) == P n(S! s;xc ' A) [ Aill~ 1
J1 n P
(5
In;,c' A)
0
x
[11 
)J dP n(51
Pn(S s;t,A Pn(S n;f,A)
.t
A)
(2)
11"
The conditional probabilities on the left side of each equationthe coordinates of the LROC curveare applicable to the situation in which the observer must state whether or not he believes an image is present on a given radiograph and, if so, must also specify in which one of M adjacent, congruent subregions spanning the visual field he believes it is located. Equation 1 indicates that for any choice of decision variable threshold xc, the values of the conditional falsepositive probabilities in the two experiments should be equal. Equation 2 indicates that for any conditional falsepositive probability, one can predict the corresponding vertical coordinate of the LROC curve PodS,C~ s;xc,A) knowing the relationship between the vertical and horizontal coordinates of the ROC curve po s;t,A) and po n;t,A) for all conditional falsepositive probabilities less than or equal to that in question. By repeating this process for each falsepositive probability between 0 and 1, one can predict the full LROC curve corresponding to a given ROC curve. Thus if equations 1 and 2 are valid, one should be able to measure an ROC curve in a conventional detection experiment and predict the LROe curve which would result from a detectionpluslocalization experiment performed under similar conditions. The theoretical model may be interpreted operationally as follows. One can see from equations 1 and 2 that the second term on the right in equation 2 describes the decrease in the conditional probability of "completely correct" positive responses when the requirement of correct localization is introduced. This decrease is due to the fact that some positive responses scored as correct in the detectiononly experiment are the result of the observer failing to identify the signal but instead identifying a cluster of noise as the signal in some images which are actually positive. When localization is required, some of these previously "correct" responses will be classified as "truepositive but incorrect location" responses, with the fraction of such situations increasing as progressively more precise local
(S1
(S1
September 1975
ization is required (i.e., as the number of subregions M increases). Since the term (M 1)/ M rapidly approaches 1 as M becomes large, one can predict from equation 2 that the truepositive, correctlocation probability approaches a limiting value which will differ from zero if some signals are actually visible to the observer. Equation 2 cannot be used in the limit as M approaches infinity for two reasons: (a) formally, the use of progressively smaller subregions eventually violates the assumption that the subregion area is much larger than the signal area, an assumption used in deriving the equation, and (b) practically, human observers cannot be expected to locate signals with microscopic accuracy even under ideal conditions. Nevertheless, this predicted behavior is qualitatively consistent with the intuitive notion that detectionlocalization performance should stabilize once required localization accuracy has become strict, since the joint event of missing the signal and identifying a noise cluster as the signal within a small surrounding region should become negligible. Equation 2 implicitly specifies the conditional probability of "truepositive but incorrect location" responses PodS,I~ s;xc,A), given the conventional fullfield ROC curve. Since the set of truepositive responses in the detection experiment is composed of the mutually exclusive subsets of "truepositive, correct location" and "truepositive, incorrect location" responses in the detectionlocalization experiment, we have
PnL(S, CL I s:xc ' A)
+
PnL(S, IL I s;xc ' A)
== P n(5!s;xc , A )
(3)
Hence PodS,/~ s;xc,A) is given by the second term on the right side of equation 2. We designed a signal detection experiment to test equations 1 and 2 for the case of image localization to a specific quadrant of a radiograph, i.e., M 4.
=
MA TERIALS AND METHODS
Radiographs: A diagnostic radiographic unit was used to obtain images of Plexiglas disks 0.89 mm thick and 13.2 mm in diameter. Since the visual field size was chosen to be a 7.6cm square, we were able to radiograph nine fields on each 25.4 X 30.5cm sheet of RP/R film, using a Par Speed screen in a vacuum cassette. The exposure factors were 80 kVp, 21 mA, 1/30 second, a nominal 1.2mm focal spot, 3 mm AI plus 0.5 mm Cu added filtration at the tube, and a 152.4cm focal spottofilm distance. The disks were in contact with the cassette without added scattering material, and the films were processed in an M6 XOmat. The radiographs used in the study were free of obvious artifacts and had a background density in the range 0.70 ± 0.03 measured at the center of the film with a calibrated MacBeth densitometer. A total of 120 fields were made, evenly divided between those containing an image of the circular disk on a background of radiographic mottle and those containing mottle alone. The images were
Vol. 116
535
VISUAL DETECTION AND LOCALIZATION OF RADIOGRAPHIC IMAGES
evenly distributed with respect to quadrant and position within quadrant. Equipment: The apparatus used to display the radiographs (Fig. 1) is mounted on a heavy slatetop platform. The beam from a 450W xenon arc lamp is filtered through a water tank to remove the ultraviolet components and is then passed through a diffusing screen which produces uniform intensity over a suitable area. The beam illuminates a plastic screen of the sort used in a conventional radiograph viewbox, and the position of the arc lamp is adjusted so that the luminance of the screen is the same as that of a conventional viewbox in our laboratory, namely 2.7 log foot lamberts as measured with a visual photometer (Salford Electrical Instruments, Inc.). A custombuilt device holds a 25.4 X 30.5cm radiograph in front of the plastic screen. A field mask limits the observer's view to one of the nine 7.6cm square sample areas of the radiograph, and the holder can be moved so as to display each field individually. A shutter is used to cover the opening of the mask while a new sample is positioned. Crosshairs are placed horizontally and vertically across the aperture to divide the visual field into quadrants. The subject sits on an adjustable chair, using a chin rest to maintain a constant position during viewing. A canopy covers the viewing area to shield the observer from stray light from the arc lamp housing, and a lighttight window shade makes the room otherwise completely dark. Subjects: Data were collected from 5 observers, consisting of 4 physicists and 1 radiologist. The subjects wore their customary prescription eyeglasses. Procedure: The subjects were instructed that the experiment was designed to test their ability to detect a lowcontrast image amid a background of radiographic mottle and that there would be no more than one image in any given sample. They were directed to use a fivecategory scale (1, 4, 7) to rate each sample according to their degree of confidence that the image was present, ranging from "almost definitely not present" to "almost definitely present, " and to specify which quadrant most likely contained the image. The observers were initially given feedback on the correctness of their responses in order to familiarize them with the appearance of both the image and the mottle, but not during the collection of data. The order in which the samples were presented was determined by lot for each session. The a priori probability that a sample contained an image was chosen to be 0.60. There was no time limit on viewing an individual sample. A session lasted approximately 75 to 90 minutes, including rest breaks. The number of samples which the observers completed per session varied from 70 to 100. At least two sessions were conducted with all but one subject. Method of Analysis: The ROC data points were calculated from the rating scale responses, using the methods described previously (1, 4, 7). The location responses were ignored for this calculation. The quadrant
Diagnostic Radiology
PLASTIC SCREEN
XENON ARC LAMP AND
H~USING
DIFFUSING SCREEN
~
(RADIOGRAPH
WATER) TA,NK
I
HOLDER \II/FIELD MASK RESEARCH ) , / " SUBJECT
Fig. 1. Diagrammatic representation of the apparatus used to display the sample radiographs to the research subjects.
LROC data points were generated in a similar manner, with the additional requirement that a truepositive response also include specification of the imagecontaining quadrant. RESULTS
Typical results are shown in Figure 2. Each graph is a plot of the customary ROC curve and the corresponding quadrant LROC curve generated by one of the observers. The ROC data points are shown as solid symbols, with different symbols representing separate observation sessions. The solid curve was drawn through these points freehand; the dashed curve, representing the LROe curve which should correspond to the ROC curve according to our theory, was not "fit" to the LROe data points but was predicted point by point from the ROC curve before the LROe data points were plotted, using equations 1 and 2 and appropriate techniques of numerical integration. The open symbols are the empirical quadrant LROe data points. DISCUSSION
The solid curves in Figure 2 were drawn freehand in a manner such that they lay within no more than one standard deviation of the ROC data points, assuming binomial variance (8), i.e., the variance associated with the statistical nature of the finite set of samples used as visual stimuli; instrumental and biological variance were not included. Variance of the LROC data points is related to that of the ROC data points in a rather complex manner, as can be seen by examining equation 2. Intuitively,one might expect the variance of the LROe data points to be greater than that of the corresponding ROC data points. The statistical uncertainty of the data points depends in part upon the number of samples completed per session. Ideally, each data point on the ROC or LROe graph should be computed from responses to a large number of samples, but observer fatigue sets a practical limit upon the number of samples which can be studied in
536
STUART
POL (S
J.
STARR AND OTHERS
September 1975
In; Xc. A)
In; xc. A)
POL (S
06
04
02
04
06
08
j08 ~ 0
I
..................iO ~ 6
t
r;
06
,#~
~
",..
,
i l
/
"
0.4
~
.l>
o
0
, ,,
!
0.4
0.6
0.4
02
,
06
,
08
po(SI n ; Xc. A)
Po(S/n; Xc' A)
POL (Sin;
02
04
xc. A) 06
08
08
u
':~
,
o
0.6
T
(f)
ca ,'t> "
"
r::...
"
()
r
t~
l"""
,,
06
04
08
02
,
I
/
I
0
j04~. I
/
]02 04
06
Po(S/n ~ Xc A)
10
_°t
,;
02
Po(SIn; Xc. A)
,or
~04r
~
0/
I
C oL''''J.:'::L7::'~g 02
~ 06r 02
1°2
),:;...........10 6 ~
~~~o 08
10

537
VISUAL DETECTION AND LOCALIZATION OF RADIOGRAPHIC IMAGES
Vol. 116
Diagnostic Radiology
Fig. 2. Prediction of observer performance in the detectionplusquadrantIocalization task from performance in the simple detection task. Each graph is a plot of two curves for a single observer. In each graph, the solid curve is the classical ROC curve, which describes the relationship between conditional truepositive (vertical axis) and falsepositive (horizontal axis) decision frequencies in the simple detection task, in which credit is given for a truepositive response without requiring that the observer identify the position of the visual signal. These solid curves were fitted by eye to the detectiontask data points, denoted by solid symbols. The dashed LROC curve in each graph represents the predicted relationship between conditional truepositive (vertical axis) and falsepositive (horizontal axis) detection frequencies for the same observer in the detectionplusquadrantIocalization task, in which credit is given for a truepositive response only if both the presence and correct quadrant of the visual signal are indicated by the observer in responding to a truly positive image. In each graph, the dashed curve was predicted from the measured solid curve by using equations 1 and 2 (see text). The measured detectionlocalizationtask data points are indicated by open symbols on each graph and appear to confirm the theoretical predictions. A. Subject FBA, viewing distance (VD) 116 cm, total number of samples (N) = 70 (e), 70 (A).). B. Subject LBL, VD 100 cm, N 90 (e), 80 ("'). C. Subject CEM, VD = 100 cm, N = 100 (e), 100 (...). D. Subject KR, VD 100 cm, N = 80 (e). E. Subject SJS, VD 100 cm, N 80 (e), 100 ("'),100 (_).
=
= = =
=
=
~
one session. Most of our subjects were able to view at least 80 samples per session; one subject completed only 70, which we considered to be barely acceptable. Until a more rigorous method of statistical analysis becomes available, we feel that the approach which we applied to our results permits sufficient assessment of errors for the present application. Thus on the basis of examination of the graphs from all observers in light of the above considerations, we feel that the theoretical model expressed by equations 1 and 2 adequately predicts observer behavior for the task of localizing an image to within a subregion of a radiograph.
and Vivianne C. Smith of the Department of Ophthalmology, University of Chicago for their many helpful conversations regarding visual psychophysics. Edward Kucinsky provided technical assistance and collected most of the data. We also thank the persons who volunteered to be subjects for this experiment. Box 420 Department of Radiology University of Chicago 950 E. 59th St. Chicago, III. 60637
REFERENCES SUMMARY AND CONCLUSIONS
Evaluation of radiological imaging systems requires an understanding of factors which affect image quality, both physical and visual/psychophysical. The radiologist generally does not have a priori knowledge of the presence, location, or form of a given lesion. If signal detection theory is to be successfully applied to assessing observer performance in clinical radiology, one must ascertain how these factors affect conventional ROC analysis. The theoretical model derived here can be used to predict observer behavior during detection and localization of an image to within a subregion of a radiograph from ROC curves measured for a task in which only detection is required, and predictions based on this theory have been verified for the special case of localization to within a quadrant of the visual field. This relationship is of considerable practical importance, since both the design and execution of experiments for measurement of detection performance are considerably less difficult than for simultaneous measurement of detection and localization performance and since accurate localization as well as detection is vital in many clinical applications of radiography. Establishment of this quantitative relationship between the classical ROC curve and observer performance in a more complex task represents a first step toward extending the concepts of signal detection theory to take into account the complex visual tasks demanded of the radiologist. ACKNOWLEDGMENTS:
We wish to thank Drs. Joel Pokorny
1. Goodenough OJ: Radiographic applications of signal detection theory. Ph.D. dissertation, University of Chicago, 1972 2. GoodenoughOJ, Metz CE: Effect of listening interval on auditory detection performance. J Acoust Soc Am 55: 111116, Jan 1974 3. GoodenoughOJ, Metz CE, t.usted.La: Caveat on use of the parameter d' for evaluation of observer performance. Radiology 106:565566, Mar 1973 4. Goodenough OJ, Rossmann K, Lusted LB: Radiographic applications of receiver operating characteristic (ROC) curves. Radiology 110:8995, Jan 1974 5. Goodenough OJ, Rossmann K, Lusted LB: Radiographic applications of signal detection theory. Radiology 105:199200, Oct 1972 6. Green OM, Swets JA: Signal Detection Theory and Psychophysics. New York, Wiley, 1966, pp 3052 7. Ibid, pp 99106 8. Ibid, pp 401404 9. Ibid, pp 404408 10. Hershman RL, Lichtenstein M: Oetection and localization: an extension of the theory of signal detectability. J Acoust Soc Am 42:446452, Aug 1967 11. Metz CE, Goodenough OJ: On failure to improve observer performance with scan smoothing: a rebuttal. Letter to the editor. J Nucl Med 14:873876, Nov 1973 12. Metz CE, Goodenough OJ. Rossmann K: Evaluation of receiver operating characteristic curve data in terms of information theory, with applications in radiography. Radiology 109:297303, Nov 1973 13. Revesz G. Kundel HL, Graber MA: The influence of structured noise on the detection of radiologic abnormalities. Invest Radiol 9:479486, NovDec 1974 14. Rossmann K: Spatial fluctuations of xray quanta and the recording of radiographic mottle. Am J RoentgenoI90:863869, Oct 1963 15. Swets JA: The relative operating characteristic in psychology. A technique for isolating effects of response bias finds wide use in the study of perception and cognition. Science 182:9901000, 7 Oec 1973
538
STUART
J.
STARR AND OTHERS
APPENDIX Consider a series of radiographs produced under similar conditions in which one or no visual signal may be present In the visual field in addition to a uniform noisy background. If a signal is present, it is randomly distributed throughout the field. The observer must state for each radiograph whether he believes the signal is present ("positive") or not ("negative"). If he believes that a signal is present, he must also indicate which one of M adjacent, congruent subregions spanning the visual field contains the signal. If the entire area of the field is A, the area of each subregion will be AIM; if the signal must be located within a quadrant of the visual field, M = 4. The problem is to derive expressions for the coordinates of an ROCtype curve for the task of detection and localization (LROC curve)the conditional "truepositive, correct localization" and "falsepositive" probabilities for the various decision variable thresholds Xc and a given field size A, designated as PodS, C~ s;xc,A) and podS1 n; xc, A)in terms of coordinates of the conventional detection experiment ROC curve for the same thresholds and field size, pofS1 s;xc,A) and pofS1 n;xc,A). The conditional probability of a falsepositive response in the detectionlocalization experiment using decision variable threshold Xc, podS1 n;xc,Aj, should be the same as that for a similar experiment in which no localization is required if the threshold is the same, i.e., pofS1 n;xc,A), since the question of location is irrelevant for images which are actually negative. Hence
The conditional probability of a truepositive response with correct localization can be expressed in terms of the coordinates of the ROC curve for a detection experiment in which the observer views and responds to subregions one at a time (11), pofS1 s;xc,AIM) vs. pofS1 n;xc,A/M). For a detectionlocalization experiment using the full visual field, assume that statistical outcomes of the decision variables in each subregion are independent from one subregion to another; this will be approximately true if the subregion area AIM is much larger than both the signal area and the "effective area" of the noise autocovariance function. Assume also that (a) the observer responds "negative" if outcomes of the subregion decision variables are less than some threshold Xc in all subregions and that (b) if at least one subregion variable outcome exceeds Xc, so that the observer responds "positive," the quadrant associated with the largest decision variable outcome is identified as containing the signal. From these assumptions one can infer that the conditional probability of a truepositive and correct localization response using threshold Xc is given (11) by3 _
jPD(SIS;.\"c,A/M)
PDL(S,CLls;xc,A) 
.
[1 PD(.';IJl.!,
o
AIM) ]M1dPD(S I s J, A IAl)
(A2)
in which t is a dummy variable representing the various decision
September 1975
variable thresholds associated with the different points on the subregion ROC curve. Equation A2 expresses the conditional probability of a truepositive, correct location response in the detectionlocalization experiment, using a threshold Xc and a visual field of area A, in terms of the coordinates of an ROC curve obtained in a detection experiment involving a field size equal to the localization subregion area AIM. Previous theory and experimentation (1, 2, 5) have shown that the relationship between two detectionexperiment ROC curves for two different field sizes can be described rather simply if the area of the smaller field is much larger than both the signal area and the effective area of the noise autocovariance function. Hence if the subregion in the localization experiment is much larger than both the signal and the noise autocorrelation distance (1, 2, 5),
P D(Slll:xc,A)
=
1  [1  PD(SIJl;Xc,AIJJ)]M
(A3)
and
1  {[ 1  P D ( s] s;xc ' A/AI)] X
[1 
FD(S! Jl:x c ' A/ll)]Ml}
(A4)
Using equations A3 and A4 to change the variable of integration, integrand, and limits of the integral in equation A2, one obtains equation 2 (see text). It may be of considerable practical importance to note that by making one additional assumption, the uniform noise field required here is no longer necessary. One can show (though not proved here) that if (a) the "noise only" and "signalplusnoise" decision variable probability distributions are different in each subregion, but (b) the observer adjusts his decision variable threshold in each subregion to keep the conditional falsepositive probability the same in each subregion, then equations 1 and 2 of the text still hold. The derivation is analogous to that given here, except that powers of probabilities conditional on "noise only" in equations A2 through A4 must be replaced by products of probabilities appropriate to each subregion, and probabilities conditional on s must be written. as sums of probabilities conditional on signals in the various subregions X the probability of a signal in that subregion, given the fact that a signal is present somewhere in the field. This result implies that equations 1 and 2 of the text might hold for detection and localization of lesions in chest radiographs, for example, if the observer were to be more cautious about calling suspicious structures positive in those regions where normal structures similar to lesions or "structured noise" (13) are known to exist, which seems reasonable. We have not yet tested this possible extended application of the theory. 3 A less general form of this relationship, requiring the assumption that the decision variables for each subregion are sampled from gaussian "noise only" or "signalplusnoise" distributions of equal variance, has been published previously by Hershman and lichtenstein (ref. 10, equations 21 and 22). Derivation of our equation A2 is analogous to that given by Hershman and Lichtenstein; however, it is a generalization of the formula given by these authors, who did not extend their results to derive a relationship analogous to our equation 2.