DIAGNOSTIC
Balancing Benefits: Approach Diagnostic Douglas
Risks and Another to Optimizing Tests
Mossman,
Eugene
TESTING
Somoza,
Neurosciences
1992;
and
where and
Clinical
have expanded earlier1 to
years
on finddevelop a
highly accurate biological test for the short-term prediction of future violence (the “SPFV”). The SPFV performs a painless, risk-free, and very low cost assay of several neurotransmitters and metabolites and integrates this information into discriminant function scores. These scores rank the likelihood of violence from 0 (lowest) to 100 (highest) and can be used in an emergency room to aid in the assessment of persons who might be subject to civil commitment because of dangerousness to others. The SPFV has already been subjected to the process by which one generates a mathematical description of a Received
May
Service,
Department
11,
1992;
sion of Neuroscience, nati College of Medicine, Dr. Somoza, Psychiatry Medical
Center,
Copyright
JOURNAL
3200 © 1992
OF
accepted
May
Veterans Department
11, 1992.
Affairs Medical of Psychiatry,
of
Cincinnati,
Ohio.
From
the Psychiatry
Center, and the DiviUniversity of Cincin-
Address
reprint
requests
Service (116A), Department of Veterans Vine Street, Cincinnati, OH 45220.
American
NEUROPSYCHIATRY
Psychiatric
Press,
the
An was
scores
appropriate followed
of the
Affairs
Z
[1]
4:331-335)
The year is 2014. Neuroscientists ings initially published 25
and
tested
group
of individuals
for several
nonviolent
days
subjects
thereafter.
were
com-
the scores of those who acted violently, the subjects’ scores were lower than the violent subjects’ (as one might expect). But because the SPFV is not a perfect predictor of future dangerousness, the groups’ scores formed overlapping distributions. These distributions are shown in Figure 1: the nonviolent individuals are represented by the distribution centered at 35, the violent individuals by the distribution centered at 65. Each distribution has a standard deviation of 10. It turns out that the SPFV results can be characterized using the “binormal assumption” used in receiver operating characteristic (ROC) analysis:4 in this case, the violent and nonviolent subjects’ results conform to two Gaussian curves with equal variances, separated by three standard deviations. In other words,
Ph.D.
of Neuropsychiatry
test.2
was
pared with nonviolent
In our last two articles, we showed that one can quantify the reduction in uncertainty that results from diagnostic testing, an insight that allowed us to describe a method for optimizing the performance of a test by choosing a cutoff that maximizes its information yield. Although minimizing uncertainty is an important feature of diagnostic testing, there are many situations in which diagnostic tests are most appropriately used to balance risks and benefits associated with the various possible courses of action available to a clinician. This article shows how tests can be used to maximize the “expected utility” associated with a clinical decision. (The Journal
diagnostic When
M.D. M.D.,
IN NEUROPSYCHIATRY
to
ZR ZFPR
is the is the
normal
normal
TPR
=
deviate deviate
FPR
+
,
of the of the
true false
positive positive
rate rate.
The article “Balancing Risks and Benefits: Another Approach to Optimizing Diagnostic Tests,” by Drs. Mossman and Somoza, is the eighth and last in a series of articles in The Journal of Neuropsychiatry and Clinical Neurosciences. The purpose of this series has been to inform and to educate the readers of the journal on the methodologies by which diagnostic tests in the realm of neuropsychiatry should be understood and interpreted. In this era of unprecedented innovation and advances in diagnostic testing, it is essential that the clinician be able to choose appropriately from the vast array of tests and to interpret accurately the meaningand implications of such tests. We are confident that those readers who devote the time and effort required to understand and apply the information that has appeared in Drs. Somoza and Mossman’s series of articles will be rewarded by insight and mastery in a realm where habit, convention, and fashion-rather than reason and understanding-most frequently guide test selection and interpretation.
Inc.
331
DIAGNOSTIC
TESTING
IN NEUROPSYCHIATRY
SPFV
scores in Figure 1 are easily converted to values of using the formula ZreR = (35 SPFV) / 10, and to values of ZTPR by using the formulaZR = (35SPFV)/10 ZR
-
MAXIMIZING EXPECTED UTILITY: BALANCING THE RESULTS OF POSSIBLE OUTCOMES
+3.
The SPFV will be ready for use once the emergency room clinicians have chosen some cutoff score at which a person might be committed involuntarily. Imagine three possible cutoffs (40,50, and 60) for the SPFV. In each case, persons with scores falling above the cutoff would be test positive, and those falling below, test negative. As one moves the cutoff higher (from 40 to 60), the performance of the scale changes: the fraction of actually violent persons correctly identified by the test (the test’s sensitivity) decreases, but the probability of correctly identifying a nonviolent person (the test’s specificity) increases. FIGURE
1.
Distributions
hatched
=
of scores on the violent individuals.
test
for short-term
The task of operationalizing a test requires that one effect a balance between sensitivity and specificity by choosing a cutoff score that reflects the risks and benefits attendant on test outcomes.5’6 Because even very accurate diagnostic tests are imperfect, diagnostic errors are inevitable; the actual use of tests therefore requires the adoption of a strategy for balancing the consequences of erroneous judgments and the benefits of correct decisions. In the case at hand, this will involve balancing the costs of a false positive prediction-committing a nonviolent person-and a false negative prediction-not committing a violent person-with the benefits yielded
prediction
of future
40
50
violence
(SPFV).
Striped
=
nonviolent
individuals;
0.
0.
0..
a) a)
0.
0.1
0. 0
10
20
30
SPFV 332
60
70
80
VOLUME
4
90
100
scores NUMBER
#{149}
3
#{149} SUMMER
1992
DIAGNOSTIC
by correct
(true positive and true negative) predictions. is, in theory, a rational way of finding the optimal operating point (OOP) that balances the likelihood and the values of test outcomes. One finds the point along the scale where the overall expected utility from the test is a maximum; that point, by definition, would be the test’s OOP. An optimal decision strategy would be one that maximized the SPFV’s utility; this would be accomplished if persons with scores above the OOP were deemed dangerous (and subject to commitment), and those with scores below, not dangerous. Not all of these There
judgments
would
be
correct,
but
they
the best balance of errors and correct The expected utility can be written terms,7 each of which is associated outcome. [2]
Equation
2 shows
U = Pr(TPR)UTP +
(1
-
Pr)(1
+
Pr(1
this
would
represent
judgments. as the sum of four with a possible test
TESTING
IN NEUROPSYCHIATRY
To find the SPFV’s OOP, we must first be able to express expected utility as a function of the cutoff, Zn,R. Once this is done, the rules of differential calculus can be used
reach
to choose
the
a maximum
cutoff
at
that
maximizes
that
value
utility:
of
ZFI,R
for
U will
which
dU/dZn,R=0.
To carry
out this task,
we assume that test utilities and utility then becomes a function of two variables, TPR and FPR. The binormal assumption4 summarized in Equation I allows us to express TPR and FPR simply as functions of the cutoff, Zn,R. If the distance (in units of standard deviation) between the means is A, and the SPFV scores follow Gaussian distributions with equal variances (as in Figure 1), then TPR and FPR may be expressed as follows: Pr are constants.
Expected
[3]
FPR
[4]
TPR =
relationship.
-TPR)U
+
(1
-
5
I =
-=
Pr)(FPR)Urv
e2dx;
FPR)UTN.
-
1
5
e2dx
In Equation 2, U is the expected utility, and each term of the equation is the product of the utility associated with test outcomes (U = utility of a true negative, Un, = utility of a false positive, etc.) and the probability of those outcomes. The probability of a true positive test (in this case, a correct prediction of violence) equals the base rate (or prior probability, Pr) for violence in the population multiplied by TPR; probabilities for other test outcomes
where Zn,R is the cutoff and x is a “dummy” variable of integration. Substituting Equations 3 and 4 into Equation 2 makes U a function of Zn, Differentiating U with respect to Zn, and setting the result equal to zero, we obtain
can
[5]
be calculated
similarly.
To operationalize a diagnostic test, one must make some explicit estimates of the quantities in Equation 2. The relative utility of outcomes customarily is quantified with a rating scale on which 0 represents the worst possible outcome and I represents the best outcome. If a false negative judgment (not detecting an individual who will commit a violent act) is deemed the worst outcome, then the U is set at 0. Correctly detecting a person who would otherwise commit an act of serious violence and correctly detecting a nonviolent person both allow for preservation of public safety without needless infringement of individual liberty. In this discussion, we shall assume
that
both
U
and
U
can
be assigned
values
such
a scale
decision
analysis
the next
section.
JOURNAL
OF
have
been
literature8’9
NEUROPSYCHIATRY
discussed and
extensively
are described
in the briefly
for convenience,
i
([1-Pr] Pr
RJ_.
the quantity
(U
-
Un,)/(Up
-
has been replaced by the single variable R. Notice that in the above derivation, we assumed that the variances of the two normal distributions were equal. This simplifies the mathematical discussion without comproU)
mising
more
fundamental
general
ASSIGNING
conclusions.
case-unequal
UTILITIES
We
have
discussed
the
variances-elsewhere.6
TO OUTCOMES
of
1, implying that they are equally desirable. A false positive judgment that involuntarily hospitalizes a nonviolent person needlessly deprives an individual of liberty. We shall assume that this outcome is neither as desirable as a correct judgment nor as undesirable as a false negative; it would therefore be assigned a utility value intermediate between 0 and 1. Methods for assigning utilities along
where,
ZFPRlflj
in
When the cutoff is chosen so that Zn,R satisfies Equation 5, the SPFV will be operationalized to allow the ideal balance of correct and incorrect predictions. To strike this balance, one need only know the base rate (Pr), the discriminating capacity of the test (represented by A), and the relative utilities of decision outcomes. In modern decision theory, a utility is established in the context of a set of consistent choices about alternatives.9 Suppose there are three states of affairs, L, M, and N, and that the decision maker values them in this order. We can
333
DIAGNOSTIC
TESTING
IN NEUROPSYCHIATRY
specify UM associated with state of affairs M when we know that the decision maker is indifferent between two alternatives: I) having that state of affairs M occur, or 2) engaging in a lottery or “standard gamble,”8 in which state of affairs L with known utility UL has chance C of occurring and state of affairs N with known utility UN has chance (1 C) of occurring. In mathematical terms, UM = CUL + (1- C)UN. From the standpoint of the public at large, a clinician’s predictions about violence can result in either: 1) state of affairs L-no one is harmed-which occurs when clinicians make correct positive or correct negative predictions of violence; or 2) state of affairs N-a person is harmed by a violent attack following what turned out to be a false negative prediction of violence. But considerations of equity require us not to ignore state of affairs M, the harm done to a nonviolent person who is hospitalized
needed
committed
Would
as a result
of a false
positive
prediction
of
violence. Let us assume that the period of involuntary hospitalization under consideration is 3 days. To find someone’s Un,, we would want to know for what value of C he would be indifferent between I) state of affairs M = being committed for 3 days, and 2) engaging in a lottery in which he had chance C of not being attacked (state of affairs L) and chance I C of being attacked (state of affairs N). State L occurs when the SPFV renders a true negative or a true positive decision, and has a utility of I; state N occurs when the SPFV renders a false negative decision, and has a utility of 0; state Mis expected to have a utility intermediate between I and 0. By definition, UM = CU1 + (I C)UN, 50 UM = Un, = C. Obviously, finding C for an individual or a population is a matter for empirical determination. We think that
and being prefer being
you
hospitalized
the victim attacked
for a week?
Suppose
-
involuntarily
the SPFV. Readers, please ask how long would an involuntary hospitalizato be for you to be indifferent between being
to operationalize
yourselves: tion have
that
indifferent
your
A month?
answers
between
of a violent attack? to being involuntarily
being
A year?
indicated
attacked
Five
that
and
years?
you
being
were
hospital-
ized for 6 months (i.e., you would prefer the attack to a longer hospitalization but prefer a shorter hospitalization to being attacked). We could then say that for you, the utility of an attack (U) equals the utility of a 6-month hospitalization; both have a utility (by definition) of 0. But we are interested in knowing Un,, the utility of a 3-day hospitalization, which is somewhat less than U = U = I, the utility of not being hospitalized and not being attacked. If we assume that utility is a linear function of time, we can determine the value of C such that you
would
be indifferent
for 3 days you
had
and chance
C) of being [6]
C
=
between
2) accepting C of no
attacked.
hospitalization
I
=
6 months
and
in which chance
(I
-
C as follows:
3 days 183 days
-
hospitalized
of a lottery
We can calculate
3 days
-
I) being
the result
I
0.016
-
0.984.
=
-
Recall that C = Un,. Suppose that Pr, the violence in the population with whom SPFV is 2%.
We
now
the SPFV,
can
using
determine
Equation
the
most
but
persons
studies
precise
would
show
value
to small
trouble giving comes have very individuals no about undergoing being the victim or “right” value utilities)
intrautility group range
merely
give
that
C a value
people differences
closer
have
to I than
difficulty
in probability
a
have
consistent answers when gamble outserious health outcomes.10’11 Moreover, doubt differ markedly in their feelings involuntary hospitalization and about of a violent attack. There is no absolute for C, because C (and for that matter, all reflects
persons’
preferences.
To reflect
or inter-individual differences, values of C and the quotient R, whether for an individual or for a of persons, can be described as extending over a of values. Several authors have described approaches and strategies for evaluating preferences to yield the numerical quantities needed to specify R.124 We explain here a method that, although oversimplified, will allow readers to make their own estimates of the utility quotient
334
1 =
-
ln(
cutoff
for
[1 -0.02]
0.984] [1 -0]
[1
0.02
-
]-
3
-
=
1.42.
to 0,
giving and
ZFPR
appropriate
5:
-
7
base rate of will be used,
On the SPFV
scale,
Table
I, we show
TABLE
1.
Decision of future
Zn,R
-1.42
=
the results outcomes violence,
implies
of this
from
test
for
in populations
Actual
Behavior
Pr
R
Cutoff
0.02
0.016
49.2
0.04
0.016
46.8
0.033
51.6
prediction
and
Decisions
(Released)
19 violent
1
76
Violent
18
Not
48
violent
Note: Pr = prior probability; R = Un.j U = expected utility, TN = true negative, positive, and FN = false negative; SPFV tion of future violence.
VOLUME
4
-
Total 20
904
980
846
960
39
40
114
Violent
In
in a
Test Negative
(Hospitalized)
Not violent 0.02
short-term of 1,000
Test Positive
Violent Not
of 49.2.
policy
Results
SPFV
SPFV
a cutoff
decision
2 932
20 980
-
UFP /Up U, where FP = false positive, TP = true = test for short-term predic-
NUMBER
#{149}
3
SUMMER
#{149}
1992
DIAGNOSTIC
psychiatric emergency 2%) of whom actually is a very
accurate
room population are violent. Even test,
coupled
with
an aversion
actually
violent
person hospitalizes
involuntarily actually
violent
person
the
low
base
to the
results
of 1,000, 20 (or though the SPFV rate
of
of releasing
1. Virkkunen logical
an
IN NEUROPSYCHIATRY
References
violence
leads to a decision strategy that 4 nonviolent persons for each
TESTING
3.
study.
et al: Relationship
Binary
diagnostic
Clinical
Neurosciences
Somoza
E, Soutoullo-Esperon
offenders
Arch
Psychiatry
tiniization
of diagnostic
tests. 1990;
of psychobio-
in violent Gen
D Introduction
making:
and
impulsive
1989; 46:600-
to neuropsychiatric Journal
decision and
of Neuropsychiatry
2:297-300 L, Mossman
tests
and information
tic analysis
J,
Bartko
to recidivism
fire setters: a follow-up 603 2. Somoza E, Mossman
hospitalized.
J,
M, De Jong variables
using
receiver
theory.
IntJ
and op-
D: Evaluation
operating Biomed
characteris-
Comput
1989;
24:153-189
HOW DO ESTIMATION AFFECT THIS PROCESS?
4. Somoza
ERRORS
tion. 1991;
diagnostic ers
should
balancing mathematical
JOURNAL
tests bear
and
for making in
is a strategy limitations.
mind,
clinical
decisions.
however, that whose application
OF NEUROPSYCHIATRY
and
the
improvement receiver operating
using
test
Biol Psychiatiy 1989; 25:159-173 E, Mossman D: “Biological markers”
agnosis: risk-benefit balancing using ROC try 1991; 29:811-826 7. Patton DD, Woolfenden JM: A utility-based of diagnostic
the cost-effectiveness 24:263-271 8. French 5: Decision of Rationality. RD,
Raiffa New
10. Hellinger health
H: Games
FJ: Expected Med
and psychiatric analysis.
Biol
model
studies.
dexameth-
di-
Psychia-
for comparing
Radiol
Invest
1989;
utility
Decisions:
Introduction
and
Criti-
1957 theory
and risky
choices
with
1989; 27:273-279
Care
JG, Lilford RJ: Prenatal diagnosis of Down’s syndrome: for measuring the consistency of women’s decisions.
Thorton method
Med
Decis
12.
Behn
RD,
New
York, Basic
13. Hogarth
McNeil vival
and
Dover,
outcomes.
11.
14.
York,
in the
characteristic
An Introduction to the Mathematics England, Ellis Horwood, 1988
Theory:
Chichester,
cal Survey.
Luce
assump-
Neurosciences
E: Assessing
D, Somoza suppression
analysis. 6. Somoza
9.
binormal
and Clinical
3:436-439
asone
Read-
risk-benefit has severe
curves
of Neuropsychiatry
5. Mossman
Recognizing that our values for Pr and R are only errorprone estimates, we might wish to perform a “sensitivity analysis,” i.e., an investigation of the effect that errors in the estimates of R and Pr might have on the cutoff. One simple way to do this is to substitute different values of Pr and R into Equation 5 and examine the impact on decision outcomes. Table I gives two examples using this approach. Notice that fairly large changes in Pr and R translate into cutoff changes of only a few points along the SPFV scale. Yet even these small cutoff changes greatly affect the number of nonviolent persons who are hospitalized. Clinical literature often advocates using risk-benefit considerations as a strategy for interpreting the results of
D: ROC
E, Mossman
Journal
Making Vaupel RM:
1990;
10:288-293
cancer.
for Busy Decision
Makers.
1982
and Choice.
BJ, Weichselbaum in lung
Analysis
JW: Quick
Books, Judgment
a
R, Pauker N EngI
J Med
New York, Wiley, 5: Fallacy
1978;
of five-year
1980 sur-
299:1397-1401
335