0895-4356/92 $5.00+0.00

JClio Epidemiol Vol. 45.No.9.pp. 1035-1039, 1992 Printed

in Great

Britain.

All

Copyright

rights reserved

0

1992 Pergamon

Press Ltd

Comment to the Editors CHANCE-CORRECTED SENSITIVITY AND SPECIFICITY FOR THREE-ZONE DIAGNOSTIC TESTS

I would like to comment on Feinstein’s Editorial Rumination [l] on the clinical reality of threezone diagnostic decisions and on the subsequent reactions from Sackett [2] and Simel et al. [3], and also to propose an alternative solution to the conditional parameters discussed by the latter authors. As pointed out by Feinstein [l], the result of a qualitative diagnostic test is not always dichotomous but generally at least trichotoor intermediate mous, with indeterminate results in a middle zone between the clear positive and negative conclusions. Sackett [2] and Simel et al. [3] agree with this assertion. But, in his rumination, Feinstein [l] suggests also that the binary state of the actual condition of the patient must be expanded to a threezone classification too. In my opinion, this assertion is more debatable. Indeed, the threezone classification of the true status of the patient can only exist because the decision to define the subject as ill or not ill is based on the result of another diagnostic test, usually the “gold standard”, which might also be inconclusive. But this reference test must only be considered as an estimate of the true status of the patient which may not be intermediate because a subject is or is not struck down by a given disease. If a reference test is not able to classify a patient as having or not a disease, it may not be used as gold standard. In this situation parameters like sensitivity and specificity have to be estimated by particular methods as the use of instrumental variables [4,5]. Simel and his coworkers “believe that clinical investigators should not force diagnostic test into the 4 cells of a 2 x 2 matrix, but should report data in the middle zone using a 3 x 2 matrix” [3]. They present therefore a complete set of parameters, namely the conditional sensitivity and specificity, the conditional likelihood

ratios of a positive, a negative and a nonpositive non-negative test, and the positive and negative test yields. I think this set of estimators is rather complex to allow a clear conclusion about the quality of the diagnostic test and I would like to suggest another solution to assess sensitivity and specificity in this situation. In a first step, I think a descriptive 3 x 2 matrix must be provided by the researcher as advocated by previously cited authors [l-3]. But, in a second step, the clinician may force indeterminate results into the classical 2 x 2 table and derive estimations of sensitivity and specificity which then must be corrected for agreement by chance between the result of the test and the true status of the patient expressed by the gold standard. I propose, therefore, to estimate these chance-corrected sensitivity and specificity, say Se* and Sp*, by relations se* =

Se- Se’ 1 - Se’ ’

sp* =

sp - Sp’ 1 - Sp’

where Se and Sp are the crude observed sensitivity and specificity, and Se’ and Sp’ are the chance-expected values of these parameters. The reader will easily recognize the analogy between these measures and the classical kappa index initially described by Cohen [6]. As, considering the usual 2 x 2 table of Fig. 1, the expected value of number a is (a + c)(u + b)/n, with n = a + b + c + d, the chance-expected sensitivity is Se’= (a + b)/n, which leads by some algebraic manipulations to: ad-bc Se* = (a + c) (c + d)’

Similarly the chance-corrected

1035

sp* =

specificity is

ad-bc (a + b) (b + d)’

1036

Comment to the Editors

present

absent

and Var(Sp*) x

Var(Sp) + (1 - Sp*)2 Var(Sp’) (1 - Sp’)2

with Fig. I. Conventional 2 x 2 table.

These parameters vary from -(a + b)/(c + d) for Se* and -([c + d)/(a + b) for Sp* till 1, and this maximal value is only reached when the observed sensitivity and specificity are also equal to the unity. They are null when the observed parameters are equal to their chanceexpected values, and, because Se* = (Se + Sp - l)(b + d)/(c + d) and Spy = (Se + Sp - l)(a + ~)/(a + b), when Se = 1 - Sp, i.e. when the proportions of positive and negative results are identical in diseased and non-diseased patients. Both parameters Se* and Sp* can only be null together. It must be emphasized that a non-discriminant diagnostic test implies the theoretical values Se = Sp = 0.5, but Se* = Sp* = 0. For inferential purpose, the null hypothesis Se* = Sp* = 0 can easily be tested against the alternative hypothesis Se* # 0 and Sp* # 0 by the usual chi-square test of independence of the 2 x 2 matrix, as shown in Appendix A. The statistic may be directly derived from the parameters to be assessed by x2 = n Se* Sp*. Furthermore, a comparison between the chance-corrected sensitivities or specificities of two diagnostic methods studied in two different sets of subjects may be performed by the classical z test ISe*l - Se*21 ’ = J[Var(Se* 1) + Var(Se*2)]

(1 - Se’) = q,

(1 - Sp’) = e+!,

VWv)

bd = tb + dj3,

and Var(Se’) = Var(Sp’) = (’ + ‘ijc

+ d).

These formulas overestimate the variances of the chance-corrected sensitivity and specificity and thus lead to conservative tests of hypothesis, that is on the safer side. As illustration of the proposed methodology, let us suppose a new diagnostic test studied on a well defined population allows one to observe data of Fig. 2. I repeat that this matrix has to be described in a first step. In a second step, the clinician is asked to force the 12 and 18 indeterminate results observed in diseased and non-diseased patients respectively into positive or negative conclusions. He (she) then reports 7 positive and 5 negative results for diseased patients, and 9 positive and 9 negative conclusions for non-diseased subjects, leading to the 2 x 2 matrix of Fig. 3. The proposed parameters are thus Se* = (88 x 141) - (53 x 18) = o 67960 ’ (88 + 18) x (18 + 141) and (88 x 141) - (53 x 18) sp* = (88 + 53) x (53 + 141) = o.41873* The null hypothesis of non-discriminant sensitivity and specificity, taking the possible agreement by chance of the forced conclusions into account, is tested by computing the chi-square statistic x2 = 300 x 0.67960 x 0.41873 = 85.371

1sp*1 - sp*21 ’ = J[var(Sp*l) + Var(Sp*2)]’ the variances of the chance-corrected parameters being approached by relations derived in Appendix B, i.e. Var(Se*) x

Var(Se) + (1 - Se*)* Var(Se’) (1 - Se’)2

Disease present absent positive Test result

indeterminate negative

Fig. 2. Results of a hypothetical test: the initial 3 x 2 matrix.

1037

Comment to the Editors Disease absent present

Fig. 3. Results of a hypothetical test: the 2 x 2 matrix after forced conclusions.

which allows to reject this hypothesis with p < 0.00001. For estimating the variances of these chance-corrected sensitivity and specificity, we have to compute (1 -

Se’) =

particularly the fact that the parameters have to be estimated in a well defined population, since, like the usual positive and negative predictive values, they are dependent on the prevalence of the disease. I do not claim, therefore, that the chance-corrected sensitivity and specificity are the best way of solving the problem. This methodology is only proposed as an alternative solution which requires further theoretical work and extensive applications on simulated and real data sets to assess its value comparatively to other approaches. JACQUES JAMART

‘“$d”’= 0.53

Biostatistical Consultation

Mont-Codinne Academic Hospital (I - Sp’) = y

Yvoir Belgium

= 0.47 REFERENCES

88 x 18 Var(Se) = (88 + 18)’= 0.00133 53 x 141 Var(Sp) = (53 + 141)3 = 0.00102 and Var(Se’) = Var(Sp’) = (88 + 53) (18 + 141) = o 00083 3 3o03 to

estimate

Var(Se*) x

0.00133+ {(l-0.67960)*

x 0.00083}

0.53*

1. Feinstein AR. The inadequacy of binary models for the clinical reality of three-zone diagnostic decisions. J CIIn Epidemiol 1990; 43: 109-113. 2. Sackett DL. Clinical reality, binary models, babies and bath water. J CIin Epidehiol 1991; 44: 217-218. 3. Simel DL. Matchar DB. Feussner JR. Diaanostic tests are not aiways black or white: or, all that glitters is not [a] gold [standard]. J Clin Epidemiol 1991; 44: 967-97 1. 4. Hui SL, Walter SD. Estimating the error rates of diagnostic tests. Biometrics 1980; 36: 167-171. 5. Nagelkerke NJD, Fidler V, Buwalda M. Instrumental variables in the evaluation of diagnostic test procedures when the true disease state is unknown. Stat Med 1988; 7: 739-744. 6. Cohen J. A coefficient of agreement for nominal scales. Educ Psycbal Meas 1960; 20: 37-46. 7. Mood AM, Graybill FA, Boes DC. IntmductIonto the Theory of StatIstIca International student edition. Tokyo: McGraw-Hill; 1974.

= 0.00504 APPENDIX

and Var(Sp*) x

0.00102+{(1-0.41873)*x0.00083} 0.47*

= 0.00589.

The final estimations expressed with their standard-errors, computed as square roots of variances, are Se* = 0.68 + 0.07 and Sp* = 0.42 f 0.08. I think, like Feinstein [1] and previous discussants [2,3], that the assessment of a qualitative diagnostic test may allow some results to be non-positive and non-negative. The suggested methodology is intended as a solution to the inferential problem arising from such an assertion. Some properties of the proposed estimators may be viewed as disadvantages, CE 4$9.--H

A

Chi-square Test Se* =0

if Se = Se’, an equality between two nonindependent proportions. However, another expression of this relation a p= a+c

a+b (a+c)+(b+d)

allows us to see that the equality is only possible if a/(a+c)=b/(b+d).

Both terms of this new null hypothesis are now two independent proportions and their equality may be assessed by the usual chi-square test (ad-bc)2n “=(a+c)(b+d)(a+b)(c+d)

= n Se* Sp*.

APPENDIX B Approached Variances of Se* and Sp* The

variance of Se* is the variance of the ratio (Se - Se’)/(l - Se’) which, using a Taylor series expansion

1038

Comment to the Editors

(see e.g. Mood er al. (71,p. 181), may be approached by the formula

It may be seen that the second term of the right side of the equation can be reduced to

* _ (Se - !G+ Var(Se - Se’) + Var( I - Se’) Var(Se ) u (, _ se’)2 (I - Se’)2 [ (Se - Se’)2 -

1’

2 Cov(Se - se’, 1 - Se’) (Se - se’) (1 - Se’)

Some algebraic manipulations and the use of the relation 2 Cov(Se - Se’, 1- W) = Var(Se - Se’) + Var( I - W)

- Var{(Se - Se’) - (1 - Se’)}

with the first factor always r0 and the second one always CO. The second term is thus always ~0, and, making it vanishing leads to the following overestimation of the variance of the chance-corrected sensitivity (Se - Se’)2 Var(Se) + Var(Se’) Var(W) VarW*) = (, _ Se,)2 (Se - Se’)2 +(1 [ 2 Var(Se’)

allows us to rewrite the variance expression

1

- (Se - Se’) (1 - Se’) Var(Se*)

(Se - Se’)2 Var(Se) + Var(Se’) + Var(Se’) x7(Se - S+ (I -Se)* [ 2 Var(Se’)

+ 1

- (Se - Se’) (1 - Se’) 2 Cov(Se,se’)

’ [ (Se - Se’)( I - Se’) -

Finally, using (Se - Se’) = Se*( I - Se’), other manipulations reduce the expression to

(Se - Se’)2 Var(Se*) z

(1 - Se’)2

2 Cov(Se,Se’) (Se - Se’)2

1.

Var(Se) + (1 - Se*)2 Var(Se’) (1 -se’)2

The derivation of Var(Sp*) is of course similar to the preceding one.

Editors’ Note-This comment was sent to the pertinent previous authors, of whom Dr Sackett declined to respond. The other responses are as follows:

Response We enjoy the debate over new clinical paradigms for diagnostic test results that Feinstein isencouraging and facilitating [l-3]. Jamart accepts our proposal to display qualitative tests in 3 x 2 tables, but concludes that concepts of conditional test characteristics and test yield are too complex. Instead, a 2-staged analytic strategy is proposed requiring clinicians to guess about the meaning of non-positive, non-negative results. The guess allows hypothesis tests to ascertain whether the sensitivity and specificity from the resulting 2 x 2 table was greater than expected by chance. Although there is nothing statistically wrong with Jamart’s approach, inferential testing is unnecessary for three reasons. First, Jamart requires guesses that lack clinical meaning. Rather than reducing the response to two levels, the approach asks clinicians to rate the result as “positive”, “non-positive, nonnegative but maybe positive”, “non-positive, non-negative, but maybe negative”, or “negative”. We do not understand how this is less complex and more meaningful for clinicians. We believe that the “eyeball test” is more than adequate for assessing the diagnostic ability of test results. The null hypothesis for Jamart’s proposal tests whether a guess following the clinician’s best assessment yields results better than predicted by chance. However, when

a clinician or investigator cannot look at the data in a 2 x 2 or 3 x 2 table and determine the test’s usefulness, then the test likely will be clinically useless. No amount of hypothesis testing will confer clinical significance on tests with sensitivity and specificity approaching 0.5, even when the p value is highly significant as in Jamart’s hypothetical example. Finally, we have proposed a framework that does not require clinicians to make an additional guess, does not require hypothesis testing, and is clinically sensible [4]. We advocate that likelihood ratios be calculated for each response level. The investigator may choose whether to use multilevel or conditional likelihood ratios, but the important point is that the ratios come from the raw data without requiring additional guesses [5]. This approach has the additional advantage for investigators who can use clinically sensible desired likelihood ratios for sample size estimates, and can adjust likelihood ratios for clinical covariates. (Although confidence intervals can be calculated with a hand calculator, we repeat our offer to provide a program that calculates likelihood ratio confidence intervals using the SAS language, and performs sample size estimates for studies assessing diagnostic tests [4]. Interested readers should send the authors a DOS-formatted diskette to receive the program.)

Chance-corrected sensitivity and specificity for three-zone diagnostic tests.

0895-4356/92 $5.00+0.00 JClio Epidemiol Vol. 45.No.9.pp. 1035-1039, 1992 Printed in Great Britain. All Copyright rights reserved 0 1992 Pergam...
386KB Sizes 0 Downloads 0 Views