Downloaded from www.ajronline.org by 47.21.15.130 on 11/12/15 from IP address 47.21.15.130. Copyright ARRS. For personal use only; all rights reserved
625
Perspective
Is Earlier of Lead William
Diagnosis
Really
Time and Length
C. Black1
and
Alexander
Better?
The Misleading
Biases
Ling
Advances in diagnostic testing that increase the detectability of disease can distort our perception of disease and its response to medical intervention through the mechanisms of lead time and length biases [1 -3]. Lead time bias pertains to comparisons that do not account for the progression of disease over time, while length bias pertains to comparisons that do not account for the variability of disease progression. Because of these biases, we may erroneously attribute clinical benefit to a new test that permits earlier diagnosis or a new treatment that coincidentally accompanies earlier diagnosis when these interventions provide no benefit to patients or actually harm them. These biases are especially relevant to preclinical disease, detected either incidentally or by deliberate screening. Unlike other forms of bias in the assessment of radiologic technology, lead time and length biases are not readily corrected by feedback in the clinical or research environment. Consequently, the initial appearance of benefit can initiate a vicious cycle of increasingly aggressive testing and treatment that strays far from any scientific basis. Lead time and length biases have received little attention in the radiologic literature outside the narrow context of mass screening (e.g., mammography). However, these biases also pertain to the numerous daily decisions we must make regarding the performance and interpretation of radiologic tests in individual patients. This article explains how lead time and length biases are created and propagated by advances in radiologic technology and how we can unburden ourselves of
these
biases
by viewing
disease
from its proper,
September
1990 0361 -803X/90/1553-0625
© American
dynamic
perspective.
Theory To understand
recognize
lead time
that diseases
and length
biases,
are dynamic
we must
processes,
first
not static
entities. Consider the course of a hypothetical disease process, disease X, where X refers to a particular histologic appearance-the conventional gold standard (Fig. 1 ). Disease x steadily increases in size or anatomic extent with time. (For simplicity, we model progression in time as a linear process, although lead time and length biases also pertain to nonlinear
rates of progression.)
Disease X crosses
the clinical threshold
at time C when it is sufficient in size to cause signs or symptoms in the patient. The course disease X follows from time zero to C is often referred to as the total preclinical
phase [2]. Disease X reaches the death threshold at time D. Therefore, when detected clinically, disease X is associated with a survival
Lead
of D minus
C years.
Time Bias
Suppose a new test can first detect disease X at some time T during the total preclinical phase (Fig. 1 ). The time interval between T and C is the detectable preclinical phase
Received January 19, 1990; accepted after revision March 13, 1990. 1 Both authors: Department of Diagnostic Radiology, Warren G. Magnuson Clinical Center, National Institutes of Health, Bethesda, of Radiology, Georgetown University Medical Center, Washington, DC 20007. Address reprint requests to W. C. Black, Diagnostic 1 0, Rm. 1 C660, National Institutes of Health, Bethesda, MD 20892. AJR 155:625-630,
Effects
Roentgen
Ray Society
MD 20892; and Department Radiology
Department,
Bldg.
626
BLACK
SIZE
TESTI DX
DEATHTHRESHOID
DISE,.’1 :EAD
Downloaded from www.ajronline.org by 47.21.15.130 on 11/12/15 from IP address 47.21.15.130. Copyright ARRS. For personal use only; all rights reserved
CUNICALTHRESHOLD
TIMEVIVAL
i_.__.#._.-#{149}”i -
0
T
AFTERCLINICALDX
C
D
AND
LING
AJR:155,
September
1990
nized clinically. Therefore, clinically detected disease X is equated with disease X0 and associated with a 0% cure rate. However, if the new test is used to screen asymptomatic patients or diagnose diseases unrelated to disease X, then testing detects diseases X0 and X1 in proportion to the length of their respective DPCPs (assuming equal incidences of diseases X0 and X1). Therefore, in an unadjusted comparison of patients with disease X detected clinically vs by the test, the test appears to increase the cure rate percentage for disease X from 0 to DPCP1 divided by DPCPO plus DPCP1. Example
TIME
Fig. 1.-Lead clinical,
time bias. Disease
and death
thresholds
X enlarges
at times
and crosses
the test,
T, C, and D, respectively.
Random
testing of asymptomatic patients detects disease X during detectable preclinical phase (DPCP). Survival measured from time of diagnosis (DX) is prolonged in tested patients by lead time.
Suppose disease X can progress at three different rates beginning at age 45 (Fig. 3). Disease X0 progresses twice as rapidly as disease X1 , which progresses twice as rapidly as
disease X2. Patients with disease X0 die from their disease at age 65, whereas patients with diseases X1 and X2 die from other causes at age 75, never having been affected by their milder forms of disease X. Before testing is available, disease X is associated with a 0% cure rate and survival from time of diagnosis of 5 years. Suppose test 1 , which can detect disease X in its preclinical
state
rROM
DISEASE
X0
DEATH
FROM
OTHER
CAUSES
above
a certain
size threshold,
becomes
available.
Patients with disease X incidentally detected by test 1 would have disease X0 and X1 in the proportion of 1:2 (DPCP0:DPCP1). Even if no effective treatment for disease X0 existed, test 1 would appear to improve the cure rate of disease X to 67%. In addition, test 1 would appear to prolong survival because of the average lead time of 2.5 years for patients with disease X0. Suppose the improved prognosis for disease X is erroneously attributed to early detection by test 1 . Radiologists might be encouraged to lower the positivity criterion or some-
TIME
Fig. 2.-Length bias. Disease X progresses rapidly(X)in some patients and slowly (X,) in others. Random testing of asymptomatic patients detects diseases X0 and X1 in proportion to detectable preclinical phases (DPCP5), Dpcp0 and DPCP1.
SIZE DEATH
FROM
DISEASE X0
DEATH
DISEASE
THRESHOLD
DEATH FROM
:
of disease X [2]. If testing detects disease X during DPCP but provides no actual benefit (i.e., treatment begun during DPCP is no more effective than treatment begun at C), survival measured from the time of diagnosis is increased by the lead time of testing, and death occurs at the same age. Lead time is a function of the frequency of testing and the duration of DPCP. Continuous testing begun before T would provide a lead time equal to DPCP. A random one-shot screen would provide an average lead time equal to one-half (DPCP)
OTHER
CAUSES
CLINICAL
DPCP. 45
50
55
60
65
70
75
AGE
Length
Bias
Suppose rapidly,
Fig. 3.-Example.
disease
X can progress
as X0, or slowly,
in size or anatomic
as X1 (Fig.
2). Disease
extent
X0 causes
from other
causes.
tively).
only
disease
X0 is recog-
different
rates
as X,
Test
1 detects
diseases
X, and X1 in the proportion
of
preclinical phase 0 [DPCP0] and DPCP, equal 5 and 10 respectively). Test 2 detects diseases X, X1, and X2 in the proportion
normal
testing,
at three
1:2 (detectable
years, of 1:2:1 (DPCP0,
Without
X progresses
X1, and X2. Regardless of treatment, disease X causes death at age 65. Diseases X1 and X2 have no clinical effect on patients who die at age 75
death before the patient would die from other causes, while disease X1 remains clinically occult throughout the patient’s life span.
Disease
Dpcp1,
and
DPCPS
equal
10, 20, and
10 years,
respec-
Downloaded from www.ajronline.org by 47.21.15.130 on 11/12/15 from IP address 47.21.15.130. Copyright ARRS. For personal use only; all rights reserved
AJR:155,
September
VALUE
1990
OF
EARLY
how refine the testing technique such that the detection threshold would be further lowered to that of test 2 (Fig. 3). A deliberate screening of the asymptomatic population would detect diseases Xo, X1 , and X2 in the proportion of 1:2:1 (DPCP0:DPCP1:DPCP2). Therefore, test 2 would appear to further improve the cure rate of disease X to 75%. In addition, test 2 would appear to further prolong survival because of the average lead time of 5.0 years for disease X0 patients. If enough milder forms of disease X existed, continual improvements in testing could permit the apparent cure rate to approach 1 00%, even if treatment remained totally ineffective. Age,
Detection
Threshold,
Real, and Spurious
627
criteria that lowers the detection threshold is associated with an apparent effect on the outcome of diseased patients equal to the real effect plus the spurious improvement produced by
lead time and length biases: Apparent
effect
=
real effect
+
spurious
improvement.
Therefore, even when the new testing method is actually detrimental to the tested population on the whole (which includes patients who are treated over a longer period than they would have been previously as well as patients who are treated for disease that would not have been previously
diagnosed),
and Rate of Progression
The expected rate of progression of any disease process is related to its size and the patient’s age at detection (Fig. 4). More precisely, the minimum rate of progression (averaged over the duration of the process) is equal to the detection size divided by the maximum duration of the process before detection (i.e., patient’s age at detection minus the earliest age of onset). For example, consider the effects of detection size and patient age on the growth rate of tumor with a particular histologic appearance whose earliest onset is 30 years. If this tumor was detected as a 5-cm mass in a symptomatic 35-year-old patient, then its minimum growth rate would be 1 0 mm/year. However, if this tumor was detected by CT as a 2-cm asymptomatic mass, then its minimum growth rate would be 4 mm/year. Finally, if the patient was 50 years old when the tumor was detected by CT, then the minimum growth rate would be 1 mm/year. Because of these constraints, the expected rate of progression decreases as the detection threshold is lowered and as the patient’s age increases. Apparent,
DIAGNOSIS
the new method
appears
to be beneficial
spurious improvement is greater than the magnitude detrimental change. When a new treatment coincidentally accompanies test, the real effect of both is equal to the real effect
From the perspective of the referring clinician and radiologist, each new technique or modification of interpretation
a new of the
new treatment, plus the real effect of earlier diagnosis afforded by the new test. When the new test is not taken into account,
the
apparent
effect
of the
treatment
change
greater than its real effect by the sum of the spurious
is
improve-
ment plus the real effect of earlier diagnosis. When this latter component is positive, that is, when earlier but not necessarily new treatment is more effective, the advantage of the new treatment is overestimated by two components. Consequently, coincidental advances in imaging can strongly bias the selection of new treatments.
Clinical
Evidence
For lead time and length biases to be operational clinical setting, there must be a reservoir of clinically
disease.
Furthermore,
this reservoir
Breast
in the occult
should appear to expand
with advances in imaging that lower the detection What is the evidence that this expanding reservoir
Effects
if the of the
threshold. exists?
Cancer
Before the use of mammography,
breast cancer
was usu-
ally diagnosed by palpation [4]. The vast majority of these palpable neoplams were invasive carcinomas, whose natural history is almost always fatal. Ductal carcinoma in situ (DCIS) constituted only 1 -5% of breast cancers (Table 1) [4]. How-
10 mm/yr
ever, the proportion of breast cancers that are small and nonpalpable has increased with the refinements and increasing application of screening mammography. DCIS constituted 8% of mammographically detected breast cancers reported in 1 981 [5] and 25_30% reported in 1988 [6]. Furthermore, DCIS constituted 40-50% of breast cancers detected as
1 mm/yr
TABLE Cancers
1: Detection Threshold and Percentage Attributed to Carcinoma in Situ
of Breast
Threshold 30
Palpable [4] YEARS
Fig. 4.-Relationship detection
and minimum
between size (cm) rate of progression.
Mammogram, Mammogram, of tumor
and
patient
age
at
% 1-5
1 981 [5] 1 988 [6]
Microcalcifications
only [6]
8
25-30 40-50
628
BLACK
TABLE 2: Age of Patient and Percentage Short and Long Doubling Times
Tumors
with
Doubling Time of Primary
Age of
Tumor
patient
Downloaded from www.ajronline.org by 47.21.15.130 on 11/12/15 from IP address 47.21.15.130. Copyright ARRS. For personal use only; all rights reserved
of Breast
AND
(years)
8 months
0-29
67
0
30-49
49
19
50-69 70+
34 30
21 30
microcalcifications without report. The natural history
[1 0]
observed
seen
with
growth breast
that rapidly growing
tumors,
Kusama
patients growing
with
et al.
that is, those with
doubling times of less than 2 months, constituted 67% tumors in patients under 30 years and only 30% of tumors
of in
over 70 years (Table 2). On the other hand, slowly tumors, that is, those with doubling times of greater
than 8 months, were not seen in patients under 30 years, but constituted 30% of the tumors in patients over 70 years. Lead time and length biases may largely explain the discrepancy between the improving relative 5-year survival rate for breast cancer (from 63% to 75% over the period 19601 984) [1 1 1 and the worsening age-adjusted mortality rate over the same period [1 2]. Moskowitz [1 3] has argued that
younger
patients
should be screened
by mammography
more
frequently than older patients because the former have more rapidly growing tumors and longer life expectancies. Unfortunately, mammography is less accurate in younger patients.
a normal
Cancer
Cancer
The prevalence
of these
tumors
in patients
without
a
premortem suspicion of renal cell carcinoma depends on how closely the kidneys are examined after death. At routine autopsy, where the threshold size by macroscopic inspection is about 2 cm, the prevalence of these renal tumors is about 1 -2% [1 5, 1 6]. Two-thirds of the afflicted patients appear to have died “with rather than from” [1 6] their tumors. When kidneys are examined by serial 2- to 3-mm sectioning, the prevalence of these tumors exceeds 22% [1 7]. This contrasts
sharply of kidney
with recently cancer,
reported
which
age-specific
are only
0.03-0.06%
prevalence
rates
in the general
imaging
were to
of male cancer,
patients undergoing radical cys61 % of those 60-74 years old
rectal examination
had histologic
evidence
of
Overall
Although it is commonly believed that modern imaging and innovative treatments have significantly increased cure rates and prolonged survival times associated with most forms of cancer, the age-adjusted mortality rate (the most reliable in the United
There is considerable confusion over the distinction between renal cell carcinoma and renal adenoma. In fact, the two are generally regarded as indistinguishable by any “gross, histologic, histochemical, immunologic, or ultrastructural teatures . . . if the information on tumor size is not available” [1 4].
of radiologic
prostatic cancer [1 9]. Eighteen percent of patients in this age group had high-grade (Gleason score greater than or equal to 6) prostatic cancer, making them candidates for definitive treatment (radical prostatectomy). However, only 3.2% of all deaths in men over 60 years of age were attributed to prostatic cancer in the United States in 1 986 [20] (26,237 deaths from prostatic cancer divided by 81 1 ,963 total deaths). In other words, of asymptomatic males between the ages of 60 and 74 years, only about one in six with histologic evidence of high-grade prostatic cancer (one in 20 with any grade) will die from their disease. Furthermore, this favorable prognosis cannot be attributed to recent improvements in diagnosis and treatment because the age-specific mortality rate of prostatic cancer has not changed since 1 950 [12].
index of progress) Renal
threshold
Cancer
In a recent study tectomy for bladder
rate and patient age cancer.
1990
population and 0.3% in males over 70 years of age [1 8], the most commonly afflicted group. Furthermore, less than 0.5% of all deaths in the United States in 1 986 were attributable to kidney cancer [1 9] (8987 deaths from cancer of the kidney and renal pelvis divided by 2,1 05,361 total deaths). Given the approximate 50% cure rate [1 1 ], only about 1 % of the population develop potentially lethal renal cell carcinoma. How-
Prostatic
of untreated patients with DCIS develop invasive cancer [79]. Furthermore, because DCIS is often multifocal and bilateral [7-9], an even smaller percentage of individual DCIS lesions
is also
September
fall to that of serial sectioning and if histology were to remain the gold standard for diagnosis, then 22% of the screened population could be said to have pathologically proved renal cell carcinoma!
because, until recently, it was assumed to inevitably progress to invasive cancer, and nearly all patients had mastectomies. However, more recent evidence suggests that only 25-50%
at diagnosis
AJR:i55,
ever, if the detection
an associated mass in the later of DCIS is not precisely known
progress to invasive cancer. A strong relationship between
LING
Nonneoplastic
States
of cancer from
overall actually
increased
8.7%
1 962 to 1 982 [12].
Disease
Other diseases, including infectious and inflammatory conditions, atherosclerosis, and various degenerative processes of aging, also vary in their rates and patterns of progression according to individual host and environmental factors. Consequently, advances in imaging are also exposing a deep reservoir of subclinical nonneoplastic diseases whose natural history is more favorable than traditionally believed. For example, the incidence of small (