Sensitivity and specificity for correlated observations.

STATISTICS IN MEDICINE, VOL. 1 1 , 1503-1509 (1992)

SENSITIVITY AND SPECIFICITY FOR CORRELATED OBSERVATIONS PHILIP J. SMITH Statistics Section ChieJ Division oJDiabetes Translation, Centers for Disease Control, Mail Stop K-10, 1600 CliJon Road, N.E., Atlanta, GA 30333, U.S.A.

AND ALULA HADGU Mathematical Statistician, Division of Sexually Transmitted Diseases, Centers for Disease Control. Mail Stop E-02, 1600 Clifton Road, N.E., Atlanta, GA 30333, U.S.A.

SUMMARY A general estimating equation approach is used to obtain estimates of sensitivity and specificity when the data consist of correlated binary outcomes. First order approximations to the variances of estimated

sensitivity and specificity for prospective and retrospective studies are given. Data from a dental study are used to motivate and illustrate the methods.

1. INTRODUCTION

The performance of a new diagnostic test for predicting the presence or absence of a medical condition can be evaluated by estimating its sensitivity and specificity with respect to a traditionally used and accepted test regarded as a ‘gold standard’ in making the diagnosis. In this context, sensitivity is the probability that the new test indicates presence of the condition when the gold standard does. Specificity is the probability that the new test indicates absence of the condition when the gold standard indicates that it is absent. Fleiss’ discusses the problem of estimating sensitivity and specificity when the observations are independent; Lachenbruch’ discusses the situation where a new test is given to each subject several times; and Hujoel el a1.j discuss the use of the specialized correlated binary models of Bahadur4 and Kupper and Haseman’ for obtaining standard errors of sensitivity and specificity estimates when observations are correlated. Swets and Picket6 provide a review of methods for evaluating diagnostic tests. Tosteson and Begg6 describe a general regression methodology for receiver operating characteristic curves. This paper describes the use of a generalized estimating equation (GEE) approach developed by Liang and Zeger’ and Zeger and Liang’ for estimating sensitivity and specificity and their robust standard errors from clustered observations. Motivation for this research came from a dental study. The purpose of this study was to assess the efficacy of a new test for predicting progression to periodontal disease. For each subject in the study, data were collected at each of five different tooth sites in the mouth. At each selected tooth site a small flat rectangular strip (measuring approximately 0.0625 inch x 0.25 inch) was inserted into the crevice between the gum and the tooth. These strips were specially designed for the collection and evaluation of elastase 0277-67151921 11 1503-07SO8.50 0 1992 by John Wiley & Sons, Ltd.

Received October 1991 Revised February 1992

1504

P.J. SMITH AND A. HADGU

concentration in gingival crevicular fluid of individual teeth. Since there is a belief that elevated concentrations of elastase are precursors of periodontal disease (that is, bone loss and loss of attachment between the tooth and the gingival tissue), indications of high elastase concentration from the strips provide a new test for predicting progression to periodontitis. Progression to periodontal disease was measured by bone loss coupled with a change in attachment loss between an initial baseline measurement and a second measurement 6 months later. An indication of bone loss from X-rays and change in loss of attachment of greater than 0.6 mm were taken to indicate progression and to represent the gold standard. In estimating sensitivity and specificity and their standard errors in the dental study, a reasonable assumption is that the observations within a mouth are correlated; owing to the manner in which an individual cares for his/her teeth, the expectation is that if some teeth have progressed to periodontal disease, other teeth within the same mouth are likely to progress also. Therefore, statistical methods that recognize and account for the correlation of observations within a subject are appropriate. Emerich" describes the use of statistical methods that ignore the correlation of measurements obtained within subjects as the most common statistical problem found in the periodontal research literature. Zeger' gives an overview of statistical methods for analysing correlated binary data and Neuhaus et aLL2provide a comparison of these methods. In the next section we sketch the GEE methodology. In Section 3 we describe how the GEE methodology can be used to estimate sensitivity and specificity and their standard errors for clustered correlated data. Data from the dental study are used to illustrate the methodologies.

2. A SKETCH OF THE GEE METHODOLOGY Liang and Zeger' and Zeger and Liang' describe the GEE approach for analysing either longitudinal or clustered data. In particular, the GEE approach extends Wedderburn's' quasilikelihood methodology to allow the analysis of correlated data. A GEE model may be summarized by five components: 1. Clustered observations { yij, xij}j = 1, . . . , ni obtained on each of i = 1, . . . , K subjects. Here, yij is the outcome variable and xij is a p x 1 vector of covariates for the jth observation from the ith subject. Let y i denote the p x 1 vector of outcomes ( y i l , . . . ,yi,JT and let pi = ( p i l , . . . , piJT denote their means. 2. A variance function q ( ' ) that expresses the variance of y i , ui, in terms of its mean pi: ui = g ( p i ) / b .Here q5 is a scale parameter. 3 . A linear predictor qi = xijp, where p is a p x 1 vector of regression coefficients. 4. A link function h( . ) that 'links' the mean p i to the linear predictor q i : h ( p i ) = qi. 5. An ni x ni 'working' correlation matrix R i ( a )= {&,(a)) for each cluster of observations y i that is fully specified as a function of the unknown s x 1 vector of parameters a. Liang and Zeger show how a consistent estimator bRof fl can be obtained by using iteratively reweighted least squares with a subiteration for estimating the correlation and scale parameters LY and 4. The asymptotic covariance matrix of is

bR

where Vi = A ! ' 2 R i ( a ) A ! ' Z / q 5A, i is an n i x ni diagonal matrix with g ( p i j ) as the jth diagonal element, Di = dpi/ap, and S i = yi - p i . If the data { yij) represent binary outcomes; the variance function is uij = p i j (1 - pii); the link function is h ( p i j )= log { jcij/( 1 - p i j ) } ;and the working correlation matrix is chosen to be R i = Zni;

SENSITIVITY AND SPECIFICITY FOR CORRELATED OBSERVATIONS

I505

then the GEE model is identical to a logistic regression model in which all outcomes are assumed to be independent. Correlation between observations within a subject may be obtained by letting Rijk = a. For the dental example this is equivalent to assuming that the correlation between the outcomes of any two teeth within a specific subject’s mouth is identical to the correlation between any other two teeth within the same mouth. This corresponds to an ‘exchangeable’ correlation assumption. It may be argued that outcomes of adjacent teeth are likely to be more closely correlated than those for non-adjacent teeth. However, a useful property of the GEE-methodology is that it is not necessary for the working correlation matrix to be correctly specified to obtain consistent estimates of regression parameters and their standard errors. Zeger and Liang discuss this issue.

3. SENSITIVITY AND SPECIFICITY IN RETROSPECTIVE STUDIES

In a retrospective study aimed at estimating sensitivity and specificity, one selects subjects for study because of their outcomes indicated by a gold standard test. The outcome of the new test is subsequently observed as a result of searching medical records, as one possibility. To account for this sampling plan the appropriate probability model will condition on the results of the gold standard test. In terms of the dental example the gold standard corresponds to the determination of whether the selected sites in a subject’s mouth progressed to periodontal disease, and the new test corresponds to whether the test strip indicated an elevated elastase concentration in the gingival crevicular fluid of corresponding teeth. Therefore, sensitivity and specificity may be modelled directly. Let Yij

1 if high elastase concentration is indicated at site j of subject i 0 otherwise,

=

y.. = .CJ

{

1 if periodontitis is indicated at site j of subject i 0 otherwise.

A GEE model with a Bernoulli variance function, a logistic link, and a working correlation matrix R i = I n , specifies a logistic model with sensitivity

and specificity P r ( y i j = Olxij = 0) =

1

1 + exp(P,)

= noo.

By letting Rijk= a,j # k, the correlation between sites within a mouth is taken into account in estimating the regression coefficients and their standard errors. Let

A1

= (n1,(1

A0

= (-

-

n,,),n11(1 - rill))

and noo(1 - noo), 0).

First order approximations to the variances of estimates of sensitivity and specificity that account

1506

P. J. SMITH A N D A. HADGU

for clustering in retrospective studies are given by

4. SENSITIVITY AND SPECIFICITY IN PROSPECTIVE STUDIES

In a prospective study one selects subjects because of their outcomes obtained from a new test and then follows subjects until the outcome of the gold standard is subsequently observed. To account for this sampling plan we model the probability of the gold standard results conditional on the outcomes of the new test. Table I lists probability models that account for prospective sampling by conditioning on elastase concentration. These models correspond to a GEE model with a Bernoulli variance function, a logistic link, and a working correlation matrix Ri = In,.By letting Rijk = 2 the correlation between sites within mouths is taken into account in estimating the regression coefficients and their standard errors. Letting N , + and N 2 + denote the numbers of tooth sites identified as having high and low elastase concentration, respectively, sensitivity may be modelled indirectly using Bayes rule:

Specificity may be obtained similarly: N2+P22 Pr(yij = O l x i j = 0) = P r ( x i j = Oly,, = O)Pr(yij= 0) = noo. N2+P22 + NI+PIZ Pr(yij = 0)

?no0 __ - N2+P22"ltPl2(1 - P1Z)l Q Z Di D1 = N l t p l l N 2 + p 2 Land , Do = N I + p 1 + 2 N 2 + p Z 2First . order approximation to the variances of estimates of sensitivity and specificity that account for clustering in prospective studies are V(i&,o)= AoVRA:. V ( f ? l l ) = A 1 VRA:, -.

1

+

When cases are selected for study as a result of the outcome observed from the new test, Begg and Greenesi4 have shown that a verification bias may be incurred, affecting sensitivity and specificity if they were estimated directly using the methods described in Section 3. As one referee pointed out. the methods described in this section correct for verification bias. 5. SENSITIVITY AND SPECIFICITY STUDY OF THE TEST STRIPS To assess the sensitivity and specificity of the test strips, 31 subjects were recruited for a prospective study. For each subject five tooth sites were selected and a test strip was applied to each


1507

Table I. Probabilities that can be modelled directly in a prospective study

Progression to periodontitis Test strip

x i .= 1

y.. '1

=0

selected site. After 8 minutes the test strips were removed from the sites and 'read by the dentist. Six months later patients returned to the clinic where it was determined whether selected sites had progressed to periodontitis. When held under ultraviolet light, strips fluoresce along the length of the strip. The extent to which each strip fluoresces along its length depends upon the elastase concentration in the crevicular fluid. Standardized locations along the length of each strip enabled the dentist to determine an integer score for the strip that could range between 0 indicating weak elastase concentration and 4 indicating stronger concentration. Part of the aim of the study was to pick a strip score for use as a standard in practice by dentists to predict patients' chances of progressing to periodontitis within 6 months with acceptable sensitivity and specificity. Table 11 lists the estimated sensitivity and specificity of the strip device accounting for clustering of observations within subjects' mouths of the prospective study. Diagnostic performance of the strip was estimated at each of four cut points. Table I1 shows that as strip cut point increases, sensitivity decreases from 0.96 to 0.26 and specificity increases from 0.19 to 0.95. The strip cut point that provides moderate diagnostic performance corresponds to a cut point of 2 + . At this level the estimates of sensitivity and specificity are 0.80 and 0.68, respectively. Cluster adjusted 95 per cent confidence intervals are moderately broad. The lower confidence bounds of these intervals suggest a less than impressive diagnostic performance. Our internal reviewer at the Centers for Disease Control suggests that current knowledge in oral radiology indicates that measurement error associated with ascertaining the extent of bone loss is large enough to prohibit accurate measurement in bone height within 1-2 mm even after standardization of radiographic methods. Further, the method permits one to observe only the proximal sites of teeth. In this regard, our reviewer suggests that the results we have obtained portray a somewhat optimistic performance of the sensitivity and specificity of the new test. Indeed, upon close scrutiny of the radiologic data we found many bone height measurements taken at 6 months to be greater than the baseline measurements, indicating measurement error. Estimates of the 95 per cent lower confidence bounds for sensitivity and specificity declined to 0.51 and 0.58, respectively, when the outcome variable was taken only to be loss of attachment. A possible criticism of our analysis is that a sample size of 31 subjects is too small for the assumption of asymptotic normality of the distribution of estimated regression coefficients to be valid. If this were the case, the confidence levels of the confidence intervals listed in Table I1 could be far from their nominal 95 per cent confidence levels. However, Smith" shows in simulation studies that for samples having ten subjects, each with as few as three correlated observations, cofidence intervals have nominal coverage levels.

1508

P.J. SMITH AND

A. HADGU

Table 11. GEE study of sensitivity and specificity of the test strip at 8 minutes Sensitivity

Cut point 1+ 2+ 3+ 4+

Specificity

L 95% CB

Estimate

U 95% CB

L 95% CB

Estimate

U 95% CB

0.90 0.66 0.25 0.14

0.96 0.80 0.44 0.26

1.oo

0.92 0.64 0.38

0.17 0.63 0.77 0.93

0.19 0.68 0.8 1 0.95

0.2 1 0.12 0.85 0.98

r

The estimated working correlation matrix is

R(2)=

1.00 0.05 0.05 0.05 0.05 1.00 0.05 0.05 005 0.05 1.00 0.05 0.05 0.05 0.05 1.00 0.05

0.05 0.05 0.05 0.05

005 005 005

1.00

I

,

indicating that the correlation between clustered observations within a mouth may be quite small. In this case we may regard observations within subjects as being nearly independent and our effective sample size as being more like 3 1 x 5 = 155 independent observations than 5 correlated observations taken from 3 1 subjects independently. In this case, Smith’s simulation study indicates that the assumption of asymptotic normality holds and that probability statements coincide with what would be expected from the large sample theory. As a last and interesting point, our CDC reviewer suggests that the low observed correlation may be more of a function untoward variation resulting from poor definition of periodontal disease and its inappropriate quantification, and warns that our result pertaining to the test may not be reproducible using more appropriate data. ACKNOWLEDGEMENT

The authors thank Mr. Vincent Cary for providing the computer program for GEE. This program is available through the Statlib archive described in News and Notes. Also, Cary16 gives a technical description of the program’s construction. The authors thank Professor Colton, J. R. Landis and two anonymous referees for many helpful comments that enabled us to strengthen our work. Finally, the authors thank Dr. E. D. Beltran in the Division of Oral Health at the Centers for Disease Control for his incisive review of our paper. REFERENCES

1. Fleiss, J. L. Statisficd Methodsfor Rates und Proportions, 2nd edn, Wiley-Interscience. New York, 1981. 2. Lachenbruch. P. A. ‘Multiple reading procedures: the performance of diagnostic tests’, Statistics in Medicine, 7 , 549-557 (1988). 3. Hujoel, P. P., Moulton, L. H. and Loesche, W. J. ‘Estimation of sensitivity and specificity of site-specific diagnostic tests’, Journal qf’ Periodonfal Research, 25, 193- I96 (1990). 4. Bahadur, R. R. ‘A representation of the joint distribution of responses to n dichotomous items’, in Soloman, H.(ed.), Studies in item Analysis and Prediction, Stanford University Press, Stanford, 1961. 5. Kupper. L. L. and Haseman, K. J. ‘The use of a correlated binomial model for the analysis of certain toxicological experiments’, Biomerrics, 34, 69 (1978). 6. Swets, J. A. and Pickett, R. M. Ecaluation ofDiagnostic Systems. Methods from Signal Detection Theory. Academic Press, New York, 1982.


1509

7. Tosteson, A. N. A. and Begg, Colin B. ‘A general regression methodology for ROC curve estimation’, Medical Decision Making, 8, 204215 (1988). 8. Liang, K.-Y. and Zeger, S. L. ‘Longitudinal data analysis using generalized linear models’, Biometrika, 73, 13-22 ( 1 986). 9. Zeger, S. L. and Liang, K. Y. ‘Longitudinal data analysis for discrete and continuous outcomes’, Biornetrics, 42, 121-130 (1986). 10. Emerich, L. J. ’Common problems with statistical aspects of periodontal research papers’, Journal O J Periodontal Research, 61, 206-208 (1990). 11. Zeger, S. L. ‘Commentary’, Statistics in Medicine, 7(1/2), 161-168 (1988). 12. Neuhaus, J. M., Kalbfleisch, J. D. and Hauck, W. W. ‘A comparison of cluster-specific and populationaveraged approaches for analysing correlated binary data’, International Statistical Review, 59, 25-35 (1991). 13. Wedderburn, R. W. M. ‘Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method’, Biometrika, 39, 439447 (1974). 14. Begg, Colin B. and Greenes, R. A. ‘Assessment of diagnostic tests when disease verification is subject to selection bias’, Biornetrics, 39, 207-21 5 (1983). 15. Smith, P. J. (1992) ‘A comparative study of generalized estimating equation approach with the CochranMantel-Haenszel procedure for estimating the common odds in K 2 x 2 tables’, unpublished manuscript. 16. Cary, V. ‘Data objects for matrix computations’, Computing Science and Statistics, Proceedings of the 21sr Symposium on the Inferface, American Statistical Association, Alexandria, Virginia, 1989, pp. 157-160.

Comparison of two correlated ROC curves at a given specificity or sensitivity level.

Preliminary Observations on Sensitivity and Specificity of Magnetization Transfer Asymmetry for Imaging Myelin of Rat Brain at High Field.

Correlated Observations, the Law of Small Numbers and Bank Runs.

Sensitivity and specificity of clinical testing for carpal tunnel syndrome.

Sensitivity and specificity of diagnostic tests for impaired nasal respiration.

Histoscanning has low sensitivity and specificity for seminal vesicle invasion.

Chance-corrected sensitivity and specificity for three-zone diagnostic tests.

Sensitivity and specificity of a diagnostic test determined by repeated observations in the absence of an external standard.

Sensitivity and specificity of the empirical lymphocyte genome sensitivity (LGS) assay: implications for improving cancer diagnostics.

Confidence interval construction for the difference between two correlated proportions with missing observations.

Specificity and sensitivity of glucocorticoid signaling in health and disease.

Ophthalmic statistics note 5: diagnostic tests—sensitivity and specificity.

Sensitivity, specificity and efficiency of speech-evoked ABR.

Maximizing sensitivity and specificity of PCR by pre-amplification heating.

Sensitivity and specificity of lung cancer screening in Osaka, Japan.

Modified head shake sensory organization test: Sensitivity and specificity.

Murine monoclonal antibodies to human pancreatic cancer: specificity and sensitivity.

Feasibility, Sensitivity, and Specificity of Postprocedure Peritoneal Cytology.

Sensitivity and specificity of hypnosis effects on gastric myoelectrical activity.

Variola virus-specific diagnostic assays: characterization, sensitivity, and specificity.

Observations of Tunable Resistive Pulse Sensing for Exosome Analysis: Improving System Sensitivity and Stability.

Cognitive Vulnerabilities for Depression and Anxiety in Childhood: Specificity of Anxiety Sensitivity and Rumination.

Sensitivity and Specificity of Plasma ALT, ALP, and Bile Acids for Hepatitis in Labrador Retrievers.

Specificity and sensitivity of QRS criteria for diagnosis of single and multiple myocardial infarcts.