Anesth Prog 37:161-165

Diagnostic Decision Making Alexia Antczak-Bouckoms, DMD, DSc, MPH,*'t J.F.C. Tulloch, FDS, RCS,t Anthony J. Bouckoms, MD,¶ David Keith, BDS, FDS, RCS, DMD, 1 and Phillip Lavori, PhD** *Department of Health Policy and Management, Harvard School of Public Health; tDepartment of Dental Care Administration, Harvard School of Dental Medicine; tDepartment of Orthodontics, School of Dentistry, University of North Carolina; ¶Department of Psychiatry, Massachusetts General Hospital; 1 Department of Oral and Maxillofacial Surgery, Massachusetts General Hospital; **Department of Biostatistics, Massachusetts General Hospital

Table 1. Diagnostic Test Characteristics Disease state Test Absent Present result a (TP) b (FP) Positive d (TN) c (FN) Negative a + c b + d Total

Diagnostic or screening tests are used to help determine whether or not a patient has a certain condition or disease. The ability of a diagnostic test to correctly classify subjects is expressed by the four test characteristics-sensitivity, specificity, predictive value positive, and predictive value negative. This paper describes these characteristics and discusses methods for choosing optimal tests or cutoff points to maximize expected value considering the consequences of incorrect diagnoses. Data drawn from ongoing studies of facial pain are used to illustrate some of these concepts.

Total a + b c + d N

Sensitivity, a/(a + c); Specificity, d/(b + d); Predictive Value Positive, a/(a + b); Predictive Value Negative, d/(c + d). TP = true positive; TN = true negative; FP = False positive; FN = False negative.

DIAGNOSTIC TEST CHARACTERISTICS Diagnostic or screening tests are used to help determine whether or not a patient has a certain condition or disease. The test results do not influence the patient's state; the patient either has the condition or does not. The test result simply serves to alter the clinician's perception of the patient's state. The efficacy of a diagnostic test is defined as its ability to correctly indicate the presence or absence of a disease. When a diagnostic test is used there are four possible outcomes, which are illustrated in Table 1. First, the subject can have the disease and the diagnostic test can be positive, giving a true positive (TP) result. Similarly, the subject can be disease free and have a negative test, resulting in a true negative (TN) diagnosis. It is also possible that incorrect diagnoses can occur when a subject who has disease tests negatively, a false negative (FN) diagnosis, or when a subject who is disease free has a positive test, a false positive (FP) diagnosis. The proportion of true and false diagnoses produced by specific diagnosfic tests determines the test characteristics which are commonly expressed by the four indices: sensitivity, specificity, predictive value positive, and predicfive value negative.' Each of the test indices has a maximum value of 1 (or 100%). Sensitivity, calculated as a/(a + c), addresses the question: If disease is present, how likely

A ccurate diagnosis and treatment of chronic orofacial pain is complex due to the multifactorial causes of these disorders, uncertainty of the diagnostic criteria used, and imprecision in the literature. All of these factors contribute to difficulty in providing treatment on a rational basis and therefore, predicting therapeutic response. In recent years the development of analytic methods in the fields of clinical epidemiology and decision analysis has provoked a renewed interest in the evaluation and interpretation of diagnostic tests. The purpose of this paper is to review these methods and illustrate the characteristics of diagnostic tests using data gathered from patients seen through the Massachusetts General Hospital Facial Pain Program Project.

Address correspondence to Dr. Alexia Antczak-Bouckoms, Health Policy and Management, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115. © 1990 by the American Dental Society of Anesthesiology

ISSN 0003-3006/90/$3.50

161

162 Diagnostic Decision Making

10V

.

Designated disease free

Anesth

Designated diseased >

"cut-offt

Prog 37:161-165

Mgnostic Tests-J, TEST Fcl

TFTr f kl

a S

a)

TN 0

z

,4:(d) :

.FP...(b) ,r.T... T

a) a

(C):

,

. .,,..-.-.

Test result ...

test

x

x-x W x_

P ,; , MiTyT

0

a)

y,

Yyryltt .

Figure 1. Distribution of patient values and diagnostic test results for a continuous variable. is it that the test will be positive? Similarly, specificity, calculated as d/(b + d), addresses: If disease is absent, how likely is it that the test will be negative? The other two indices of diagnostic tests address the questions clinicians pose: If the test is positive, how likely is the patient to have disease? and: If the test is negative, how likely is the patient to be disease free? These indices are the predictive value positive and negative, respectively. They are calculated as

a/(a + b) and d/(c + d). Sensitivity and specificity are independent of disease prevalence. The predictive values of a test, however, depend not only on sensitivity and specificity but also on the disease prevalence. CLASSIFICATION ISSUES It is important to note that few conditions are dichotomous in the sense that a patient either has the condition or does not. Instead disease appears as a continuum of increasing severity with some level being selected as the cutoff point beyond which treatment is indicated. For example, when considering measures of blood pressure some level beyond which a patient is considered hypertensive and in need of therapy must be chosen. Figure 1 illustrates a theoretical distribution of values for a diagnostic test using a continuous variable in a group of patients, with and without disease. The distribution below the line represents the values recorded in people with the disease, that above the line, the values recorded in people without the disease. Considering the cutoff line, those to the right of the line are the ones designated as disease positive by the diagnostic test, those to the left designated disease negative. The subjects who are correctly classified as diseased are found in the lower right (a), those correctly classified as nondiseased in the upper left (d). The sensitivity and specificity of a test vary inversely but not necessarily proportionately with one another. One can see that if the cutoff level were moved to the left, the sensitivity of the test would increase and more people with disease (the distribution below the

,.

_.

Figure 2. Distribution of test results for various diagnostic tests.

line: TP) would be correctly identified by the test. At the same time, a greater proportion of those without disease (the distribution above the line: FP) would have a positive test, and thus the specificity would decrease. Sensitivity and specificity are the formal statistical interpretation of the distribution of test results given the known disease state as determined by some independent gold standard of diagnosis and are useful concepts for comparing diagnostic test performance. Figure 2 illustrates the spectrum of correct classification that diagnostic tests can achieve. The first case (a) is of a worthless test in that every value of the test result in those with disease is matched by an equal proportion of positive results in those without disease, giving no differentiation between the two groups. The second case (b) illustrates a perfect test where the test result can be used to achieve complete separation of diseased and nondiseased individuals. The final case (c) is typical of clinical diagnostic tests in that the distributions of test results overlap and no test threshold or cutoff point can be selected that will identify all diseased individuals as positive, without some concomitant incorrect classifications.2 Figure 3 describes the characteristics of a typical diagnostic test in more detail and illustrates what happens when different positivity criteria or cutoff points are used to indicate disease. Consider first using level 1 as the Figure 3. The effect of using alternative positivity criteria for diagnosis. Diagnostic Threshold level 0a) (a is

0

z

1

Aaual loyal I7.

True Negative--. *-.(j.

.

.

False Positive

I level 3

-

X-......... (c) Fft.:-....-

"

I.r

Test result ---, 41 Al T

w -.- M.amp-mm

!4x

x

False Negatives 0 U)

a

a)

U,2

True

positive

Anesth

Antczak-Bouekoms et al. 163

Prog 37:161-165

1.0 .80

0

-W

a

0

.60

.0

0L 0

.40

L.

1-

.20

Although ROC analysis is usually performed when diagnosis is based on some level of a continuous variable, it is possible to use such a model for clusters of variables. In this case, it is expected that when all signs and symptoms for a particular diagnosis must be present that the number of subjects identified as having the disease will be relatively small. Stated otherwise, when the inclusion criteria are strict, there will be few false positives, but numerous false negatives and the point will lie in the lower left quadrant of the curve. Conversely, when the diagnosis is made using only one or two of a group of signs and symptoms then there will be few false negatives and many false _:J : L positives anct tne t)e more towarct tne upper point Wlll right. Returning to the terminology of Figure 3, as the number of signs and symptoms required for positive diagnosis increases, the cutoff level moves from level 1 to ..11

.00

.20

.40

.60

.80

1.00

False Positive Rate Figure 4. Receiver operating characteristics c urve (ROC).

positivity criteria. Here the value of the diaignostic criteria results in virtually all individuals having the? disease being correctly classified. Moving the cutoff poiint to the right reduces the number of false positive diaginoses (b), but increases the number of false negatives (c ). Determining the optimal cutoff position for treatment o: r diagnosis will depend upon the distribution of false diacgnoses both in those with and those without disease, the prevalence of

disease, and intuitively should also include s,ome considerof the consequences of treatment, based on the different diagnoses.' Using different thresiholds, different distributions of true and false diagnoses can be used to calculate the different test performance chiaracteristics at different threshold levels. One formal way of comparing alterna tive diagnostic tests, or of evaluating the consequences of using different positivity criteria for a single diagnostic test iwith a continuous variable is through receiver operating characteristics analysis (ROC).7 Figure 4 presents an examiple of an ROC curve. On the vertical axis the true-positive rate is plotted. The false-positive rate (calculated as 1-spe cificity) is plotted on the horizontal axis. The ideal diagn(ostic test, and/ or cutoff criteria, would be found in the u pper left hand comer, the point that maximizes the truie-positive rate and minimizes the false-positive rate. One (obtains a ROC curve by plotting the values for alternative> cutoff points, then connecting the lines to make a smooti h curve. Different tests will have different curves. Most RIOC curves are convex running from the lower left corne r to the upper right. A straight line in that direction repre,sents a useless test, one that trades off one FP for each TF' gained as the cutoff criteria changes. To select the test t:hat maximizes the TP and TN rates, choose the most conivex curve. To select the best point on a curve one musAt consider the consequences associated with the various correct and incorrect diagnoses (see next section). ation

_

.

level 3.

CONSEQUENCES OF ALTERNATIVE CUTOFF CRITERIA The selection of the optimal diagnostic test, or test cutoff criteria can be resolved only by considering the number of true and false diagnoses occurring, and by incorporating some estimate of the value of the treatment rendered on the basis of the test result.' Although ROC analysis can help determine the test that maximizes the proportion of true diagnoses, that test may not maximize the outcome of interest when the consequences of diagnosis and treatment are considered. For example, if a positive test results in application of an invasive procedure with negative consequences for those without disease, one may wish to tradeoff some TP diagnoses for a smaller number of FP diagnoses. Stated otherwise, failing to treat someone with disease (FN) is viewed as preferable to treating a nondiseased person based on an incorrect diagnosis. Conversely, if failure or delay in application of treatment has dire consequences to those with disease, then one may wish to tradeoff some TNs to maximize the number of TPs. Choosing the level of tradeoff can be facilitated using the techniques of decision analysis.2 Figure 5 illustrates how this can be done. For each cutoff level there is an estimated probability of the test being positive and negative. Given a positive test, there is some probability the patient actually has disease resulting in a true positive diagnosis. This uppermost line in the decision tree represents graphically the predictive value positive. On the next line, patients who tested positively but did not have disease are false positives (FP). Following the lowermost branch are patients who tested negatively and did not have disease, true negatives (TN). This represents the predictive value negative of the test. If it is possible to account for the consequences of incorrect diagnoses by

164 Diagnostic Decision Making

Anesth

Figure 5. Decision tree used to calculate expected value of treatment following the choice of any one diagnostic threshold.

placing a value on the four possible outcomes, TP, FP, FN, and TN, multiplying the value by the probability of being in each of those categories will give the Expected Value of using that particular cutoff criteria. This process can be repeated for each cutoff: the cutoff with the greatest Expected Value then would be chosen as the optimal level for that diagnostic test. If changes in available therapy occur, or in the perception of the consequences of the alternative outcomes, then the analysis can be revised and perhaps a new cut-off will maximize Expected Value. DATA ON CHRONIC FACIAL PAIN Data collected as part of the Massachusetts General Hospital Facial Pain Program Project illustrate some of the concepts presented above. Patients with chronic orofacial pain seen through this project are given a variety of clinical, neurologic, biochemical, and psychological tests. A diagnosis for each subject is determined by these test results and by the consensus diagnosis of the study oral surgeon, neurologist, neurosurgeon, and psychiatrist. Seventeen clinical signs and symptoms are used to help classify patients as having either trigeminal neuralgia, myofascial pain, or atypical facial pain. Table 2 presents data from two signs or symptoms. The first, for the diagnosis of trigeminal neuralgia, is whether or not the pain is provokable. The sensitivity of this diagnostic criteria is 1.00, Table 2. Data from Patients with Chronic Facial Pain Test: provokable Disease: trigeminal neuralgia

Sensitivity Specificity PVP = PVN =

= =

1.00 0.60 0.54 1.00

Prog 37:161-165

specificity is 0.60, the predictive value positive (PVP) is 0.54, and the predictive value negative (PVN) is 1.00. A sensitivity of 100% means that all subjects with trigeminal neuralgia, as determined by the gold standard defined in this study, will be identified when "provokable" is used as a diagnostic criteria. A specificity of 0.60, however, means that only 60% of those without the disease will be correctly identified as disease negative using this test. The other 40% of those without disease will be incorrectly classified as false positives. Although the perfect sensitivity means that everyone with trigeminal neuralgia will have a positive test, and the PVP of 0.54 means that in this sample of patients the presence of a positive test does not say very much about whether or not the patient has trigeminal neuralgia. Nearly one-half of those with a positive test (46%) will be false positives. A PVN of 100%, however, means that if the test is negative, the patient certainly does not have the disease. This makes this criteria useful as an initial screen to help rule out trigeminal neuralgia in patients with facial pain. It is important to be able to identify those without trigeminal neuralgia as the treatment for this condition is often invasive. The second example uses the signs of impaired range to identify patients with the problem of myofascial pain. The calculated sensitivity is 0.90, the specificity is 0.95, the predictive value positive is 0.81, and predictive value negative is 0.97. A sensitivity of 0.90 indicates that 90% of those with the disease are correctly classified as true positives. In addition, the very high specificity of 0.95 means that most of the people without the disease are also correctly identified as such with this test. Even though the sensitivity is high, however, a PVP of 0.81 means that the presence of a positive test indicates that only approximately four of five patients with a positive test will indeed have myfascial pain leaving a 1 in 5 chance of a positive test occurring in patients without the condition. Conversely, with a PVN of 0.97, a negative test result carries reasonable assurance that the subject does not have disease. These two examples are simply used to illustrate the value of test characteristics and how they can be interpreted. Further evaluation of the specific signs and symptoms used in the Facial Pain Program Project will be forthcoming with consideration of the value of using multiple signs and symptoms in tandem and sequentially to optimally classify subjects.

Test: impaired range Disease: myofascial pain

Sensitivity

= 0.90 Specificity = 0.95 PVP = 0.81 PVN = 0.97

ACKNOWLEDGMENTS This work was supported by the Alfred P. Sloan Foundations grant #88-4-10 and NINCDS grant NS 23357-02.

Anesth Prog 37:161-165

REFERENCES 1. Ransohoff DF, Feinstein AR: Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978;299:926-930. 2. Weinstein MC, Fineberg HV: Clinical Decision Analysis. Philadelphia: WB Saunders, 1980. 3. Weinstein MC: Challenges for cost-effectiveness research. Med Decis Making 1986;6: 194-198. 4. Tulloch JFC, Antczak-Bouckoms AA, Berkey CS, Douglass CW: Selecting the optimal threshold for the radiographic

Antczak-Bouekoms et al. 165

diagnosis of interproximal caries. J Dent Educ 1988;52:630-636. 5. Eeckhoudt LR, Lebrun TC, Sailly J-CL: The informative content of diagnostic tests: an economic analysis. Soc Sci Med 1984;18:873-880. 6. Guyatt G, Drummond M, Feeny D, Tugwell P, Stoddart G, Haynes RB, Bennett K, Labelle R: Guidelines for the clinical and economic evaluation of health care technologies Soc Sci Med 1986;22:393-408. 7. Swets JA, Pickett RM, Whitehead SF, Getty DJ, Schnur JA, Swets JB, Freeman BA: Assessment of diagnostic technologies. Science 1979;205:753-759.

Diagnostic decision making.

Diagnostic or screening tests are used to help determine whether or not a patient has a certain condition or disease. The ability of a diagnostic test...
773KB Sizes 0 Downloads 0 Views