Balancing risks and benefits: another approach to optimizing diagnostic tests.

DIAGNOSTIC

Balancing Benefits: Approach Diagnostic Douglas

Risks and Another to Optimizing Tests

Mossman,

Eugene

TESTING

Somoza,

Neurosciences

1992;

and

where and

Clinical

have expanded earlier1 to

years

on finddevelop a

highly accurate biological test for the short-term prediction of future violence (the “SPFV”). The SPFV performs a painless, risk-free, and very low cost assay of several neurotransmitters and metabolites and integrates this information into discriminant function scores. These scores rank the likelihood of violence from 0 (lowest) to 100 (highest) and can be used in an emergency room to aid in the assessment of persons who might be subject to civil commitment because of dangerousness to others. The SPFV has already been subjected to the process by which one generates a mathematical description of a Received

May

Service,

Department

11,

1992;

sion of Neuroscience, nati College of Medicine, Dr. Somoza, Psychiatry Medical

Center,

Copyright

JOURNAL

3200 © 1992

OF

accepted

May

Veterans Department

11, 1992.

Affairs Medical of Psychiatry,

of

Cincinnati,

Ohio.

From

the Psychiatry

Center, and the DiviUniversity of Cincin-

Address

reprint

requests

Service (116A), Department of Veterans Vine Street, Cincinnati, OH 45220.

American

NEUROPSYCHIATRY

Psychiatric

Press,

the

An was

scores

appropriate followed

of the

Affairs

Z

[1]

4:331-335)

The year is 2014. Neuroscientists ings initially published 25

and

tested

group

of individuals

for several

nonviolent

days

subjects

thereafter.

were

com-

the scores of those who acted violently, the subjects’ scores were lower than the violent subjects’ (as one might expect). But because the SPFV is not a perfect predictor of future dangerousness, the groups’ scores formed overlapping distributions. These distributions are shown in Figure 1: the nonviolent individuals are represented by the distribution centered at 35, the violent individuals by the distribution centered at 65. Each distribution has a standard deviation of 10. It turns out that the SPFV results can be characterized using the “binormal assumption” used in receiver operating characteristic (ROC) analysis:4 in this case, the violent and nonviolent subjects’ results conform to two Gaussian curves with equal variances, separated by three standard deviations. In other words,

Ph.D.

of Neuropsychiatry

test.2

was

pared with nonviolent

In our last two articles, we showed that one can quantify the reduction in uncertainty that results from diagnostic testing, an insight that allowed us to describe a method for optimizing the performance of a test by choosing a cutoff that maximizes its information yield. Although minimizing uncertainty is an important feature of diagnostic testing, there are many situations in which diagnostic tests are most appropriately used to balance risks and benefits associated with the various possible courses of action available to a clinician. This article shows how tests can be used to maximize the “expected utility” associated with a clinical decision. (The Journal

diagnostic When

M.D. M.D.,

IN NEUROPSYCHIATRY

to

ZR ZFPR

is the is the

normal

normal

TPR

=

deviate deviate

FPR

+

,

of the of the

true false

positive positive

rate rate.

The article “Balancing Risks and Benefits: Another Approach to Optimizing Diagnostic Tests,” by Drs. Mossman and Somoza, is the eighth and last in a series of articles in The Journal of Neuropsychiatry and Clinical Neurosciences. The purpose of this series has been to inform and to educate the readers of the journal on the methodologies by which diagnostic tests in the realm of neuropsychiatry should be understood and interpreted. In this era of unprecedented innovation and advances in diagnostic testing, it is essential that the clinician be able to choose appropriately from the vast array of tests and to interpret accurately the meaningand implications of such tests. We are confident that those readers who devote the time and effort required to understand and apply the information that has appeared in Drs. Somoza and Mossman’s series of articles will be rewarded by insight and mastery in a realm where habit, convention, and fashion-rather than reason and understanding-most frequently guide test selection and interpretation.

Inc.

331

DIAGNOSTIC

TESTING

IN NEUROPSYCHIATRY

SPFV

scores in Figure 1 are easily converted to values of using the formula ZreR = (35 SPFV) / 10, and to values of ZTPR by using the formulaZR = (35SPFV)/10 ZR

-

MAXIMIZING EXPECTED UTILITY: BALANCING THE RESULTS OF POSSIBLE OUTCOMES

+3.

The SPFV will be ready for use once the emergency room clinicians have chosen some cutoff score at which a person might be committed involuntarily. Imagine three possible cutoffs (40,50, and 60) for the SPFV. In each case, persons with scores falling above the cutoff would be test positive, and those falling below, test negative. As one moves the cutoff higher (from 40 to 60), the performance of the scale changes: the fraction of actually violent persons correctly identified by the test (the test’s sensitivity) decreases, but the probability of correctly identifying a nonviolent person (the test’s specificity) increases. FIGURE

1.

Distributions

hatched

=

of scores on the violent individuals.

test

for short-term

The task of operationalizing a test requires that one effect a balance between sensitivity and specificity by choosing a cutoff score that reflects the risks and benefits attendant on test outcomes.5’6 Because even very accurate diagnostic tests are imperfect, diagnostic errors are inevitable; the actual use of tests therefore requires the adoption of a strategy for balancing the consequences of erroneous judgments and the benefits of correct decisions. In the case at hand, this will involve balancing the costs of a false positive prediction-committing a nonviolent person-and a false negative prediction-not committing a violent person-with the benefits yielded

prediction

of future

40

50

violence

(SPFV).

Striped

=

nonviolent

individuals;

0.

0.

0..

a) a)

0.

0.1

0. 0

10

20

30

SPFV 332

60

70

80

VOLUME

4

90

100

scores NUMBER

#{149}

3

#{149} SUMMER

1992

DIAGNOSTIC

by correct

(true positive and true negative) predictions. is, in theory, a rational way of finding the optimal operating point (OOP) that balances the likelihood and the values of test outcomes. One finds the point along the scale where the overall expected utility from the test is a maximum; that point, by definition, would be the test’s OOP. An optimal decision strategy would be one that maximized the SPFV’s utility; this would be accomplished if persons with scores above the OOP were deemed dangerous (and subject to commitment), and those with scores below, not dangerous. Not all of these There

judgments

would

be

correct,

but

they

the best balance of errors and correct The expected utility can be written terms,7 each of which is associated outcome. [2]

Equation

2 shows

U = Pr(TPR)UTP +

(1

-

Pr)(1

+

Pr(1

this

would

represent

judgments. as the sum of four with a possible test

TESTING

IN NEUROPSYCHIATRY

To find the SPFV’s OOP, we must first be able to express expected utility as a function of the cutoff, Zn,R. Once this is done, the rules of differential calculus can be used

reach

to choose

the

a maximum

cutoff

at

that

maximizes

that

value

utility:

of

ZFI,R

for

U will

which

dU/dZn,R=0.

To carry

out this task,

we assume that test utilities and utility then becomes a function of two variables, TPR and FPR. The binormal assumption4 summarized in Equation I allows us to express TPR and FPR simply as functions of the cutoff, Zn,R. If the distance (in units of standard deviation) between the means is A, and the SPFV scores follow Gaussian distributions with equal variances (as in Figure 1), then TPR and FPR may be expressed as follows: Pr are constants.

Expected

[3]

FPR

[4]

TPR =

relationship.

-TPR)U

+

(1

-

5

I =

-=

Pr)(FPR)Urv

e2dx;

FPR)UTN.

-

1

5

e2dx

In Equation 2, U is the expected utility, and each term of the equation is the product of the utility associated with test outcomes (U = utility of a true negative, Un, = utility of a false positive, etc.) and the probability of those outcomes. The probability of a true positive test (in this case, a correct prediction of violence) equals the base rate (or prior probability, Pr) for violence in the population multiplied by TPR; probabilities for other test outcomes

where Zn,R is the cutoff and x is a “dummy” variable of integration. Substituting Equations 3 and 4 into Equation 2 makes U a function of Zn, Differentiating U with respect to Zn, and setting the result equal to zero, we obtain

can

[5]

be calculated

similarly.

To operationalize a diagnostic test, one must make some explicit estimates of the quantities in Equation 2. The relative utility of outcomes customarily is quantified with a rating scale on which 0 represents the worst possible outcome and I represents the best outcome. If a false negative judgment (not detecting an individual who will commit a violent act) is deemed the worst outcome, then the U is set at 0. Correctly detecting a person who would otherwise commit an act of serious violence and correctly detecting a nonviolent person both allow for preservation of public safety without needless infringement of individual liberty. In this discussion, we shall assume

that

both

U

and

U

can

be assigned

values

such

a scale

decision

analysis

the next

section.

JOURNAL

OF

have

been

literature8’9

NEUROPSYCHIATRY

discussed and

extensively

are described

in the briefly

for convenience,

i

([1-Pr] Pr

RJ_.

the quantity

(U

-

Un,)/(Up

-

has been replaced by the single variable R. Notice that in the above derivation, we assumed that the variances of the two normal distributions were equal. This simplifies the mathematical discussion without comproU)

mising

more

fundamental

general

ASSIGNING

conclusions.

case-unequal

UTILITIES

We

have

discussed

the

variances-elsewhere.6

TO OUTCOMES

of

1, implying that they are equally desirable. A false positive judgment that involuntarily hospitalizes a nonviolent person needlessly deprives an individual of liberty. We shall assume that this outcome is neither as desirable as a correct judgment nor as undesirable as a false negative; it would therefore be assigned a utility value intermediate between 0 and 1. Methods for assigning utilities along

where,

ZFPRlflj

in

When the cutoff is chosen so that Zn,R satisfies Equation 5, the SPFV will be operationalized to allow the ideal balance of correct and incorrect predictions. To strike this balance, one need only know the base rate (Pr), the discriminating capacity of the test (represented by A), and the relative utilities of decision outcomes. In modern decision theory, a utility is established in the context of a set of consistent choices about alternatives.9 Suppose there are three states of affairs, L, M, and N, and that the decision maker values them in this order. We can

333

DIAGNOSTIC

TESTING

IN NEUROPSYCHIATRY

specify UM associated with state of affairs M when we know that the decision maker is indifferent between two alternatives: I) having that state of affairs M occur, or 2) engaging in a lottery or “standard gamble,”8 in which state of affairs L with known utility UL has chance C of occurring and state of affairs N with known utility UN has chance (1 C) of occurring. In mathematical terms, UM = CUL + (1- C)UN. From the standpoint of the public at large, a clinician’s predictions about violence can result in either: 1) state of affairs L-no one is harmed-which occurs when clinicians make correct positive or correct negative predictions of violence; or 2) state of affairs N-a person is harmed by a violent attack following what turned out to be a false negative prediction of violence. But considerations of equity require us not to ignore state of affairs M, the harm done to a nonviolent person who is hospitalized

needed

committed

Would

as a result

of a false

positive

prediction

of

violence. Let us assume that the period of involuntary hospitalization under consideration is 3 days. To find someone’s Un,, we would want to know for what value of C he would be indifferent between I) state of affairs M = being committed for 3 days, and 2) engaging in a lottery in which he had chance C of not being attacked (state of affairs L) and chance I C of being attacked (state of affairs N). State L occurs when the SPFV renders a true negative or a true positive decision, and has a utility of I; state N occurs when the SPFV renders a false negative decision, and has a utility of 0; state Mis expected to have a utility intermediate between I and 0. By definition, UM = CU1 + (I C)UN, 50 UM = Un, = C. Obviously, finding C for an individual or a population is a matter for empirical determination. We think that

and being prefer being

you

hospitalized

the victim attacked

for a week?

Suppose

-

involuntarily

the SPFV. Readers, please ask how long would an involuntary hospitalizato be for you to be indifferent between being

to operationalize

yourselves: tion have

that

indifferent

your

A month?

answers

between

of a violent attack? to being involuntarily

being

A year?

indicated

attacked

Five

that

and

years?

you

being

were

hospital-

ized for 6 months (i.e., you would prefer the attack to a longer hospitalization but prefer a shorter hospitalization to being attacked). We could then say that for you, the utility of an attack (U) equals the utility of a 6-month hospitalization; both have a utility (by definition) of 0. But we are interested in knowing Un,, the utility of a 3-day hospitalization, which is somewhat less than U = U = I, the utility of not being hospitalized and not being attacked. If we assume that utility is a linear function of time, we can determine the value of C such that you

would

be indifferent

for 3 days you

had

and chance

C) of being [6]

C

=

between

2) accepting C of no

attacked.

hospitalization

I

=

6 months

and

in which chance

(I

-

C as follows:

3 days 183 days

-

hospitalized

of a lottery

We can calculate

3 days

-

I) being

the result

I

0.016

-

0.984.

=

-

Recall that C = Un,. Suppose that Pr, the violence in the population with whom SPFV is 2%.

We

now

the SPFV,

can

using

determine

Equation

the

most

but

persons

studies

precise

would

show

value

to small

trouble giving comes have very individuals no about undergoing being the victim or “right” value utilities)

intrautility group range

merely

give

that

C a value

people differences

closer

have

to I than

difficulty

in probability

a

have

consistent answers when gamble outserious health outcomes.10’11 Moreover, doubt differ markedly in their feelings involuntary hospitalization and about of a violent attack. There is no absolute for C, because C (and for that matter, all reflects

persons’

preferences.

To reflect

or inter-individual differences, values of C and the quotient R, whether for an individual or for a of persons, can be described as extending over a of values. Several authors have described approaches and strategies for evaluating preferences to yield the numerical quantities needed to specify R.124 We explain here a method that, although oversimplified, will allow readers to make their own estimates of the utility quotient

334

1 =

-

ln(

cutoff

for

[1 -0.02]

0.984] [1 -0]

[1

0.02

-

]-

3

-

=

1.42.

to 0,

giving and

ZFPR

appropriate

5:

-

7

base rate of will be used,

On the SPFV

scale,

Table

I, we show

TABLE

1.

Decision of future

Zn,R

-1.42

=

the results outcomes violence,

implies

of this

from

test

for

in populations

Actual

Behavior

Pr

R

Cutoff

0.02

0.016

49.2

0.04

0.016

46.8

0.033

51.6

prediction

and

Decisions

(Released)

19 violent

1

76

Violent

18

Not

48

violent

Note: Pr = prior probability; R = Un.j U = expected utility, TN = true negative, positive, and FN = false negative; SPFV tion of future violence.

VOLUME

4

-

Total 20

904

980

846

960

39

40

114

Violent

In

in a

Test Negative

(Hospitalized)

Not violent 0.02

short-term of 1,000

Test Positive

Violent Not

of 49.2.

policy

Results

SPFV

SPFV

a cutoff

decision

2 932

20 980

-

UFP /Up U, where FP = false positive, TP = true = test for short-term predic-

NUMBER

#{149}

3

SUMMER

#{149}

1992

DIAGNOSTIC

psychiatric emergency 2%) of whom actually is a very

accurate

room population are violent. Even test,

coupled

with

an aversion

actually

violent

person hospitalizes

involuntarily actually

violent

person

the

low

base

to the

results

of 1,000, 20 (or though the SPFV rate

of

of releasing

1. Virkkunen logical

an

IN NEUROPSYCHIATRY

References

violence

leads to a decision strategy that 4 nonviolent persons for each

TESTING

3.

study.

et al: Relationship

Binary

diagnostic

Clinical

Neurosciences

Somoza

E, Soutoullo-Esperon

offenders

Arch

Psychiatry

tiniization

of diagnostic

tests. 1990;

of psychobio-

in violent Gen

D Introduction

making:

and

impulsive

1989; 46:600-

to neuropsychiatric Journal

decision and

of Neuropsychiatry

2:297-300 L, Mossman

tests

and information

tic analysis

J,

Bartko

to recidivism

fire setters: a follow-up 603 2. Somoza E, Mossman

hospitalized.

J,

M, De Jong variables

using

receiver

theory.

IntJ

and op-

D: Evaluation

operating Biomed

characteris-

Comput

1989;

24:153-189

HOW DO ESTIMATION AFFECT THIS PROCESS?

4. Somoza

ERRORS

tion. 1991;

diagnostic ers

should

balancing mathematical

JOURNAL

tests bear

and

for making in

is a strategy limitations.

mind,

clinical

decisions.

however, that whose application

OF NEUROPSYCHIATRY

and

the

improvement receiver operating

using

test

Biol Psychiatiy 1989; 25:159-173 E, Mossman D: “Biological markers”

agnosis: risk-benefit balancing using ROC try 1991; 29:811-826 7. Patton DD, Woolfenden JM: A utility-based of diagnostic

the cost-effectiveness 24:263-271 8. French 5: Decision of Rationality. RD,

Raiffa New

10. Hellinger health

H: Games

FJ: Expected Med

and psychiatric analysis.

Biol

model

studies.

dexameth-

di-

Psychia-

for comparing

Radiol

Invest

1989;

utility

Decisions:

Introduction

and

Criti-

1957 theory

and risky

choices

with

1989; 27:273-279

Care

JG, Lilford RJ: Prenatal diagnosis of Down’s syndrome: for measuring the consistency of women’s decisions.

Thorton method

Med

Decis

12.

Behn

RD,

New

York, Basic

13. Hogarth

McNeil vival

and

Dover,

outcomes.

11.

14.

York,

in the

characteristic

An Introduction to the Mathematics England, Ellis Horwood, 1988

Theory:

Chichester,

cal Survey.

Luce

assump-

Neurosciences

E: Assessing

D, Somoza suppression

analysis. 6. Somoza

9.

binormal

and Clinical

3:436-439

asone

Read-

risk-benefit has severe

curves

of Neuropsychiatry

5. Mossman

Recognizing that our values for Pr and R are only errorprone estimates, we might wish to perform a “sensitivity analysis,” i.e., an investigation of the effect that errors in the estimates of R and Pr might have on the cutoff. One simple way to do this is to substitute different values of Pr and R into Equation 5 and examine the impact on decision outcomes. Table I gives two examples using this approach. Notice that fairly large changes in Pr and R translate into cutoff changes of only a few points along the SPFV scale. Yet even these small cutoff changes greatly affect the number of nonviolent persons who are hospitalized. Clinical literature often advocates using risk-benefit considerations as a strategy for interpreting the results of

D: ROC

E, Mossman

Journal

Making Vaupel RM:

1990;

10:288-293

cancer.

for Busy Decision

Makers.

1982

and Choice.

BJ, Weichselbaum in lung

Analysis

JW: Quick

Books, Judgment

a

R, Pauker N EngI

J Med

New York, Wiley, 5: Fallacy

1978;

of five-year

1980 sur-

299:1397-1401

335

Comparing and optimizing diagnostic tests: an information-theoretical approach.

Rotavirus vaccines--balancing intussusception risks and health benefits.

Vigorous exercise in clinical practice: balancing risks and benefits.

Preventive Pap-smears: balancing costs, risks and benefits.

The commercialization of university-based research: Balancing risks and benefits.

Accelerating bleaching in vitiligo: balancing benefits versus risks.

Acidosis in the critically ill - balancing risks and benefits to optimize outcome.

Optimizing diagnostic tests for persulphate-induced respiratory diseases.

Saving a life: balancing risks, harms, and benefits in palliative care.

Balancing risks and benefits of bariatric surgery for type 2 diabetes.

Balancing the benefits and risks of empirical antibiotics for sinusitis: a teachable moment.

Balancing the risks of the elderly donor with recipient benefits in LDLT.

Balancing Enthusiasm for Innovative Technologies with Optimizing Value: An Approach to Adopt New Laboratory Tests for Infectious Diseases Using Bloodstream Infections as Exemplar.

Vasectomy: benefits and risks.

Androgens: risks and benefits.

Whose Risks and Benefits?

Mesenchymal stem cells as a therapeutic approach to glomerular diseases: benefits and risks.

Benefits and risks of sugammadex.

[Benefits and risks of hypnotics].

Contraceptive methods: risks and benefits.

Inhaled corticosteroids: benefits and risks.

Carotid revascularization: risks and benefits.

Nutrients and Chemical Pollutants in Fish and Shellfish. Balancing Health Benefits and Risks of Regular Fish Consumption.

Moonshot Science-Risks and Benefits.