Is earlier diagnosis really better? The misleading effects of lead time and length biases.

Downloaded from www.ajronline.org by 47.21.15.130 on 11/12/15 from IP address 47.21.15.130. Copyright ARRS. For personal use only; all rights reserved

625

Perspective

Is Earlier of Lead William

Diagnosis

Really

Time and Length

C. Black1

and

Alexander

Better?

The Misleading

Biases

Ling

Advances in diagnostic testing that increase the detectability of disease can distort our perception of disease and its response to medical intervention through the mechanisms of lead time and length biases [1 -3]. Lead time bias pertains to comparisons that do not account for the progression of disease over time, while length bias pertains to comparisons that do not account for the variability of disease progression. Because of these biases, we may erroneously attribute clinical benefit to a new test that permits earlier diagnosis or a new treatment that coincidentally accompanies earlier diagnosis when these interventions provide no benefit to patients or actually harm them. These biases are especially relevant to preclinical disease, detected either incidentally or by deliberate screening. Unlike other forms of bias in the assessment of radiologic technology, lead time and length biases are not readily corrected by feedback in the clinical or research environment. Consequently, the initial appearance of benefit can initiate a vicious cycle of increasingly aggressive testing and treatment that strays far from any scientific basis. Lead time and length biases have received little attention in the radiologic literature outside the narrow context of mass screening (e.g., mammography). However, these biases also pertain to the numerous daily decisions we must make regarding the performance and interpretation of radiologic tests in individual patients. This article explains how lead time and length biases are created and propagated by advances in radiologic technology and how we can unburden ourselves of

these

biases

by viewing

disease

from its proper,

September

1990 0361 -803X/90/1553-0625

© American

dynamic

perspective.

Theory To understand

recognize

lead time

that diseases

and length

biases,

are dynamic

we must

processes,

first

not static

entities. Consider the course of a hypothetical disease process, disease X, where X refers to a particular histologic appearance-the conventional gold standard (Fig. 1 ). Disease x steadily increases in size or anatomic extent with time. (For simplicity, we model progression in time as a linear process, although lead time and length biases also pertain to nonlinear

rates of progression.)

Disease X crosses

the clinical threshold

at time C when it is sufficient in size to cause signs or symptoms in the patient. The course disease X follows from time zero to C is often referred to as the total preclinical

phase [2]. Disease X reaches the death threshold at time D. Therefore, when detected clinically, disease X is associated with a survival

Lead

of D minus

C years.

Time Bias

Suppose a new test can first detect disease X at some time T during the total preclinical phase (Fig. 1 ). The time interval between T and C is the detectable preclinical phase

Received January 19, 1990; accepted after revision March 13, 1990. 1 Both authors: Department of Diagnostic Radiology, Warren G. Magnuson Clinical Center, National Institutes of Health, Bethesda, of Radiology, Georgetown University Medical Center, Washington, DC 20007. Address reprint requests to W. C. Black, Diagnostic 1 0, Rm. 1 C660, National Institutes of Health, Bethesda, MD 20892. AJR 155:625-630,

Effects

Roentgen

Ray Society

MD 20892; and Department Radiology

Department,

Bldg.

626

BLACK

SIZE

TESTI DX

DEATHTHRESHOID

DISE,.’1 :EAD


CUNICALTHRESHOLD

TIMEVIVAL

i_.__.#._.-#{149}”i -

0

T

AFTERCLINICALDX

C

D

AND

LING

AJR:155,

September

1990

nized clinically. Therefore, clinically detected disease X is equated with disease X0 and associated with a 0% cure rate. However, if the new test is used to screen asymptomatic patients or diagnose diseases unrelated to disease X, then testing detects diseases X0 and X1 in proportion to the length of their respective DPCPs (assuming equal incidences of diseases X0 and X1). Therefore, in an unadjusted comparison of patients with disease X detected clinically vs by the test, the test appears to increase the cure rate percentage for disease X from 0 to DPCP1 divided by DPCPO plus DPCP1. Example

TIME

Fig. 1.-Lead clinical,

time bias. Disease

and death

thresholds

X enlarges

at times

and crosses

the test,

T, C, and D, respectively.

Random

testing of asymptomatic patients detects disease X during detectable preclinical phase (DPCP). Survival measured from time of diagnosis (DX) is prolonged in tested patients by lead time.

Suppose disease X can progress at three different rates beginning at age 45 (Fig. 3). Disease X0 progresses twice as rapidly as disease X1 , which progresses twice as rapidly as

disease X2. Patients with disease X0 die from their disease at age 65, whereas patients with diseases X1 and X2 die from other causes at age 75, never having been affected by their milder forms of disease X. Before testing is available, disease X is associated with a 0% cure rate and survival from time of diagnosis of 5 years. Suppose test 1 , which can detect disease X in its preclinical

state

rROM

DISEASE

X0

DEATH

FROM

OTHER

CAUSES

above

a certain

size threshold,

becomes

available.

Patients with disease X incidentally detected by test 1 would have disease X0 and X1 in the proportion of 1:2 (DPCP0:DPCP1). Even if no effective treatment for disease X0 existed, test 1 would appear to improve the cure rate of disease X to 67%. In addition, test 1 would appear to prolong survival because of the average lead time of 2.5 years for patients with disease X0. Suppose the improved prognosis for disease X is erroneously attributed to early detection by test 1 . Radiologists might be encouraged to lower the positivity criterion or some-

TIME

Fig. 2.-Length bias. Disease X progresses rapidly(X)in some patients and slowly (X,) in others. Random testing of asymptomatic patients detects diseases X0 and X1 in proportion to detectable preclinical phases (DPCP5), Dpcp0 and DPCP1.

SIZE DEATH

FROM

DISEASE X0

DEATH

DISEASE

THRESHOLD

DEATH FROM

:

of disease X [2]. If testing detects disease X during DPCP but provides no actual benefit (i.e., treatment begun during DPCP is no more effective than treatment begun at C), survival measured from the time of diagnosis is increased by the lead time of testing, and death occurs at the same age. Lead time is a function of the frequency of testing and the duration of DPCP. Continuous testing begun before T would provide a lead time equal to DPCP. A random one-shot screen would provide an average lead time equal to one-half (DPCP)

OTHER

CAUSES

CLINICAL

DPCP. 45

50

55

60

65

70

75

AGE

Length

Bias

Suppose rapidly,

Fig. 3.-Example.

disease

X can progress

as X0, or slowly,

in size or anatomic

as X1 (Fig.

2). Disease

extent

X0 causes

from other

causes.

tively).

only

disease

X0 is recog-

different

rates

as X,

Test

1 detects

diseases

X, and X1 in the proportion

of

preclinical phase 0 [DPCP0] and DPCP, equal 5 and 10 respectively). Test 2 detects diseases X, X1, and X2 in the proportion

normal

testing,

at three

1:2 (detectable

years, of 1:2:1 (DPCP0,

Without

X progresses

X1, and X2. Regardless of treatment, disease X causes death at age 65. Diseases X1 and X2 have no clinical effect on patients who die at age 75

death before the patient would die from other causes, while disease X1 remains clinically occult throughout the patient’s life span.

Disease

Dpcp1,

and

DPCPS

equal

10, 20, and

10 years,

respec-


AJR:155,

September

VALUE

1990

OF

EARLY

how refine the testing technique such that the detection threshold would be further lowered to that of test 2 (Fig. 3). A deliberate screening of the asymptomatic population would detect diseases Xo, X1 , and X2 in the proportion of 1:2:1 (DPCP0:DPCP1:DPCP2). Therefore, test 2 would appear to further improve the cure rate of disease X to 75%. In addition, test 2 would appear to further prolong survival because of the average lead time of 5.0 years for disease X0 patients. If enough milder forms of disease X existed, continual improvements in testing could permit the apparent cure rate to approach 1 00%, even if treatment remained totally ineffective. Age,

Detection

Threshold,

Real, and Spurious

627

criteria that lowers the detection threshold is associated with an apparent effect on the outcome of diseased patients equal to the real effect plus the spurious improvement produced by

lead time and length biases: Apparent

effect

=

real effect

+

spurious

improvement.

Therefore, even when the new testing method is actually detrimental to the tested population on the whole (which includes patients who are treated over a longer period than they would have been previously as well as patients who are treated for disease that would not have been previously

diagnosed),

and Rate of Progression

The expected rate of progression of any disease process is related to its size and the patient’s age at detection (Fig. 4). More precisely, the minimum rate of progression (averaged over the duration of the process) is equal to the detection size divided by the maximum duration of the process before detection (i.e., patient’s age at detection minus the earliest age of onset). For example, consider the effects of detection size and patient age on the growth rate of tumor with a particular histologic appearance whose earliest onset is 30 years. If this tumor was detected as a 5-cm mass in a symptomatic 35-year-old patient, then its minimum growth rate would be 1 0 mm/year. However, if this tumor was detected by CT as a 2-cm asymptomatic mass, then its minimum growth rate would be 4 mm/year. Finally, if the patient was 50 years old when the tumor was detected by CT, then the minimum growth rate would be 1 mm/year. Because of these constraints, the expected rate of progression decreases as the detection threshold is lowered and as the patient’s age increases. Apparent,

DIAGNOSIS

the new method

appears

to be beneficial

spurious improvement is greater than the magnitude detrimental change. When a new treatment coincidentally accompanies test, the real effect of both is equal to the real effect

From the perspective of the referring clinician and radiologist, each new technique or modification of interpretation

a new of the

new treatment, plus the real effect of earlier diagnosis afforded by the new test. When the new test is not taken into account,

the

apparent

effect

of the

treatment

change

greater than its real effect by the sum of the spurious

is

improve-

ment plus the real effect of earlier diagnosis. When this latter component is positive, that is, when earlier but not necessarily new treatment is more effective, the advantage of the new treatment is overestimated by two components. Consequently, coincidental advances in imaging can strongly bias the selection of new treatments.

Clinical

Evidence

For lead time and length biases to be operational clinical setting, there must be a reservoir of clinically

disease.

Furthermore,

this reservoir

Breast

in the occult

should appear to expand

with advances in imaging that lower the detection What is the evidence that this expanding reservoir

Effects

if the of the

threshold. exists?

Cancer

Before the use of mammography,

breast cancer

was usu-

ally diagnosed by palpation [4]. The vast majority of these palpable neoplams were invasive carcinomas, whose natural history is almost always fatal. Ductal carcinoma in situ (DCIS) constituted only 1 -5% of breast cancers (Table 1) [4]. How-

10 mm/yr

ever, the proportion of breast cancers that are small and nonpalpable has increased with the refinements and increasing application of screening mammography. DCIS constituted 8% of mammographically detected breast cancers reported in 1 981 [5] and 25_30% reported in 1988 [6]. Furthermore, DCIS constituted 40-50% of breast cancers detected as

1 mm/yr

TABLE Cancers

1: Detection Threshold and Percentage Attributed to Carcinoma in Situ

of Breast

Threshold 30

Palpable [4] YEARS

Fig. 4.-Relationship detection

and minimum

between size (cm) rate of progression.

Mammogram, Mammogram, of tumor

and

patient

age

at

% 1-5

1 981 [5] 1 988 [6]

Microcalcifications

only [6]

8

25-30 40-50

628

BLACK

TABLE 2: Age of Patient and Percentage Short and Long Doubling Times

Tumors

with

Doubling Time of Primary

Age of

Tumor

patient


of Breast

AND

(years)

8 months

0-29

67

0

30-49

49

19

50-69 70+

34 30

21 30

microcalcifications without report. The natural history

[1 0]

observed

seen

with

growth breast

that rapidly growing

tumors,

Kusama

patients growing

with

et al.

that is, those with

doubling times of less than 2 months, constituted 67% tumors in patients under 30 years and only 30% of tumors

of in

over 70 years (Table 2). On the other hand, slowly tumors, that is, those with doubling times of greater

than 8 months, were not seen in patients under 30 years, but constituted 30% of the tumors in patients over 70 years. Lead time and length biases may largely explain the discrepancy between the improving relative 5-year survival rate for breast cancer (from 63% to 75% over the period 19601 984) [1 1 1 and the worsening age-adjusted mortality rate over the same period [1 2]. Moskowitz [1 3] has argued that

younger

patients

should be screened

by mammography

more

frequently than older patients because the former have more rapidly growing tumors and longer life expectancies. Unfortunately, mammography is less accurate in younger patients.

a normal

Cancer

Cancer

The prevalence

of these

tumors

in patients

without

a

premortem suspicion of renal cell carcinoma depends on how closely the kidneys are examined after death. At routine autopsy, where the threshold size by macroscopic inspection is about 2 cm, the prevalence of these renal tumors is about 1 -2% [1 5, 1 6]. Two-thirds of the afflicted patients appear to have died “with rather than from” [1 6] their tumors. When kidneys are examined by serial 2- to 3-mm sectioning, the prevalence of these tumors exceeds 22% [1 7]. This contrasts

sharply of kidney

with recently cancer,

reported

which

age-specific

are only

0.03-0.06%

prevalence

rates

in the general

imaging

were to

of male cancer,

patients undergoing radical cys61 % of those 60-74 years old

rectal examination

had histologic

evidence

of

Overall

Although it is commonly believed that modern imaging and innovative treatments have significantly increased cure rates and prolonged survival times associated with most forms of cancer, the age-adjusted mortality rate (the most reliable in the United

There is considerable confusion over the distinction between renal cell carcinoma and renal adenoma. In fact, the two are generally regarded as indistinguishable by any “gross, histologic, histochemical, immunologic, or ultrastructural teatures . . . if the information on tumor size is not available” [1 4].

of radiologic

prostatic cancer [1 9]. Eighteen percent of patients in this age group had high-grade (Gleason score greater than or equal to 6) prostatic cancer, making them candidates for definitive treatment (radical prostatectomy). However, only 3.2% of all deaths in men over 60 years of age were attributed to prostatic cancer in the United States in 1 986 [20] (26,237 deaths from prostatic cancer divided by 81 1 ,963 total deaths). In other words, of asymptomatic males between the ages of 60 and 74 years, only about one in six with histologic evidence of high-grade prostatic cancer (one in 20 with any grade) will die from their disease. Furthermore, this favorable prognosis cannot be attributed to recent improvements in diagnosis and treatment because the age-specific mortality rate of prostatic cancer has not changed since 1 950 [12].

index of progress) Renal

threshold

Cancer

In a recent study tectomy for bladder

rate and patient age cancer.

1990

population and 0.3% in males over 70 years of age [1 8], the most commonly afflicted group. Furthermore, less than 0.5% of all deaths in the United States in 1 986 were attributable to kidney cancer [1 9] (8987 deaths from cancer of the kidney and renal pelvis divided by 2,1 05,361 total deaths). Given the approximate 50% cure rate [1 1 ], only about 1 % of the population develop potentially lethal renal cell carcinoma. How-

Prostatic

of untreated patients with DCIS develop invasive cancer [79]. Furthermore, because DCIS is often multifocal and bilateral [7-9], an even smaller percentage of individual DCIS lesions

is also

September

fall to that of serial sectioning and if histology were to remain the gold standard for diagnosis, then 22% of the screened population could be said to have pathologically proved renal cell carcinoma!

because, until recently, it was assumed to inevitably progress to invasive cancer, and nearly all patients had mastectomies. However, more recent evidence suggests that only 25-50%

at diagnosis

AJR:i55,

ever, if the detection

an associated mass in the later of DCIS is not precisely known

progress to invasive cancer. A strong relationship between

LING

Nonneoplastic

States

of cancer from

overall actually

increased

8.7%

1 962 to 1 982 [12].

Disease

Other diseases, including infectious and inflammatory conditions, atherosclerosis, and various degenerative processes of aging, also vary in their rates and patterns of progression according to individual host and environmental factors. Consequently, advances in imaging are also exposing a deep reservoir of subclinical nonneoplastic diseases whose natural history is more favorable than traditionally believed. For example, the incidence of small (

Is microcrystalline theophylline really better?

NHS England's winter campaign: is earlier better?

Time to initiation of adjuvant chemotherapy for early breast cancer and outcome: the earlier, the better?

The earlier the better or unnecessary therapy?

Percutaneous pulmonary valve implantation: is earlier valve implantation better?

On meetings and conferences: is bigger really better?

Is the illicit cigarette market really growing? The tobacco industry's misleading math trick.

Aging and visual length discrimination: sequential dependencies, biases, and the effects of multiple implicit standards.

Obesity in chronic obstructive pulmonary disease: is fatter really better?

Is epidural anesthesia really better for major vascular surgery?

Treatment of phosphate retention: The earlier the better?

Timing of Gene Therapy Interventions: The Earlier, the Better.

The timing of renal replacement therapy initiation in acute kidney injury: is earlier truly better?*.

[Is certified quality really better than quality? On the use of hospital and practice certifications].

Towards earlier diagnosis of COPD.

Mutation-specific therapy in cystic fibrosis: the earlier, the better.

Chagas disease in transplantation: time to enter an era of better diagnosis and better outcomes.

Reply: Is an Earlier and More Intensive Physical Therapy Program Better?

Anal cancer: the case for earlier diagnosis.

Dementia statistic is misleading.

Is it really better to have your brain lesion early? A revision of the "Kennard principle".

Earlier snowmelt and warming lead to earlier but not necessarily more plant growth.

Anal cancer: the case for earlier diagnosis.

Misleading results in the diagnosis of atrophic gastritis.