This article was downloaded by: [New York University] On: 08 February 2015, At: 09:40 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Biopharmaceutical Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lbps20

Test Equality Between Two Binary Screening Tests with a Confirmatory Procedure Restricted on Screen Positives a

b

Kung-Jong Lui & Kuang-Chao Chang a

Department of Mathematics and Statistics, College of Sciences, San Diego State University, San Diego, California, USA b

Department of Statistics and Information Science, Fu-Jen Catholic University, New, Taipei, Taiwan Accepted author version posted online: 16 May 2014.Published online: 20 Jan 2015.

Click for updates To cite this article: Kung-Jong Lui & Kuang-Chao Chang (2015) Test Equality Between Two Binary Screening Tests with a Confirmatory Procedure Restricted on Screen Positives, Journal of Biopharmaceutical Statistics, 25:1, 29-43, DOI: 10.1080/10543406.2014.919932 To link to this article: http://dx.doi.org/10.1080/10543406.2014.919932

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Downloaded by [New York University] at 09:40 08 February 2015

Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Journal of Biopharmaceutical Statistics, 25: 29–43, 2015 Copyright © Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2014.919932

TEST EQUALITY BETWEEN TWO BINARY SCREENING TESTS WITH A CONFIRMATORY PROCEDURE RESTRICTED ON SCREEN POSITIVES Kung-Jong Lui1 and Kuang-Chao Chang2

Downloaded by [New York University] at 09:40 08 February 2015

1

Department of Mathematics and Statistics, College of Sciences, San Diego State University, San Diego, California, USA 2 Department of Statistics and Information Science, Fu-Jen Catholic University, New Taipei, Taiwan In studies of screening accuracy, we may commonly encounter the data in which a confirmatory procedure is administered to only those subjects with screen positives for ethical concerns. We focus our discussion on simultaneously testing equality of sensitivity and specificity between two binary screening tests when only subjects with screen positives receive the confirmatory procedure. We develop four asymptotic test procedures and one exact test procedure. We derive sample size calculation formula for a desired power of detecting a difference at a given nominal α-level. We employ Monte Carlo simulation to evaluate the performance of these test procedures and the accuracy of the sample size calculation formula developed here in a variety of situations. Finally, we use the data obtained from a study of the prostate-specific-antigen test and digital rectal examination test on 949 Black men to illustrate the practical use of these test procedures and the sample size calculation formula. Key Words: Power; Sample size determination; Sensitivity; Specificity; Test equality; Type I error.

1. INTRODUCTION For ethical concerns or economic constraints in studies of screening accuracy, it is quite common that we may administer a gold standard procedure only to those subjects with screen positives, especially when the confirmatory procedure for a disease is invasive and expensive (Schatzkin et al., 1987; Cheng and Macaluso, 1997). It is certainly difficult to justify taking an invasive biopsy sample from a subject who shows negative in both of the two screening tests under comparison. For example, the confirmatory procedure of biopsy in a study of prostate cancer screening was performed only on patients who showed positive in either prostate-specific-antigen (PSA) test or digital rectal examination (DRE) test (Smith et al., 1997). Other examples, in which the true disease status is not ascertained among subjects showing negative in both screening tests, may include a comparison of Papanicolaou (Pap) smear test with a new test for cervical cancer, or a study of mammography screening vs. physical exam for breast cancer (Schatzkin et al., 1987; Alonzo et al., 2004; Stock et al., 2012). Received August 1, 2012; Accepted July 8, 2013 Address correspondence to Kung-Jong Lui, Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182-7720, USA; E-mail: [email protected]

29

Downloaded by [New York University] at 09:40 08 February 2015

30

LUI AND CHANG

The sensitivity (SN) and specificity (SP) are probably the two most important and commonly used indices to measure the diagnostic accuracy of a screening test (Zhou et al., 2002). When a new test is developed, we may be interested in studying whether the accuracy of the new test is different from that of the standard test with respect to both SN and SP. Because patients who showed negative responses for all screening tests would not receive the confirmatory procedure to ascertain their true disease status, the SN and SP are not estimable here without making additional assumptions or possessing external information (Cheng et al., 1999; Walter, 1999). Schatzkin et al. (1987) noted, however, that one could still apply McNemar’s test for paired-sample data to test equality of SN (or SP) despite one cannot estimate these indices directly from data. Cheng and Macaluso (1997) proposed a conditional logistic regression and discussed interval estimation of the ratio of sensitivity (RSN) and the ratio of false positive rate (FPR) in these cases. Alonzo et al. (2004) concentrated their attentions on data with the confirmatory procedure limited to subjects testing positively and discussed estimation of relative accuracy in small sample cases. Pepe and Alonzo (2001) proposed a marginal regression model for making inference about the relative accuracy when only screen positives are ascertained for their true disease status. Also, Alonzo et al. (2002) considered joint confidence intervals for the RSN and the ratio of the false positive rates (RFPR), and proposed two separate test procedures based on the RSN and RFPR, respectively. To control the overall Type √ I error of multiple tests to equal a given α-level, we need to use a nominal level (= 1 − 1 − α) smaller than α for each individual test. Thus, if our interest is to detect whether there is a difference in SN or SP as focused here, use of the two separate test procedures based on the RSN and RFPR, each with a smaller α-error, may generally lose power. Note that recently Lui (2012) has focused discussion on testing noninferiority based on the odds ratio (OR) under the partial verification design with a confirmatory procedure limited to screen positives. This is, however, different from testing equality with respect to both SN and SP between two screening tests. Note further that Stock et al. (2012) has focused attention on estimation of the disease prevalence, true positive rate (TPR), and FPR by proposing a hierarchical Bayesian logit model under the partial verification design considered here as well. In this paper, we focus our discussion on simultaneously testing equality of SN and SP between two binary screening tests when only subjects with screen positives receive the confirmatory procedure. We develop four asymptotic test procedures and one exact test procedure. We derive sample size calculation formula for a desired power of detecting a difference at a given nominal α-level. We apply Monte Carlo simulation to evaluate the performance of these test procedures and the accuracy of the sample size calculation formula in a variety of situations. Finally, we use the data obtained from a study of the PSA test and DRE test on 949 Black men (Smith et al., 1997) to illustrate the practical use of these test procedures and the sample size calculation formula developed here. 2. NOTATION AND TEST PROCEDURES Consider comparing the accuracy of two screening tests, a standard test and a new test. Let D denote the disease status and D = 1 for case, and = 0 for non-case. Let S and N denote the subject response for the standard and new tests, respectively, and S (or N) = 1 for positive and = 0, otherwise. Suppose that we take a random sample of size n subjects from a targeted population and apply the two screening tests under comparison to each subject. Let nijd denote the number of subjects among these n sampled subjects with test responses (S = i, N = j) (i = 1, 0 and j = 1, 0) and disease status D = d (= 1, 0).

TEST EQUALITY BETWEEN TWO BINARY SCREENING TESTS

31

Table 1 The probability structure and the observed frequency (in parentheses) under the partial verification design

Response Standard Test

S=1 S=0 Total

New N=1

Test N=0

Total

π11d (n11d ) π01d (n01d ) π+1d (n+1d )

π10d (n10d ) π00d (?) π+0d (?)

π1+d (n1+d ) π0+d (?) π++d (?)

Downloaded by [New York University] at 09:40 08 February 2015

Note. Disease Status (d = 1 for case, and = 0, otherwise); ?, non-observable.

Let πijd denote the cell probability corresponding to nijd (i = 1, 0, j = 1, 0, and d = 1, 0). The random vector (n111 , n101 , n011 , n001 , n110 , n100 , n010 , n000 ) then follows the multinomial distribution with parameters n and (π111 , π101 , π011 , π001 , π110 , π100 , π010 , π000 ). For ethical reasons, suppose that we employ the gold standard procedure only to those subjects with a positive response in either test and thereby, we can observe only n001 + n000 rather than n001 and n000 , separately. We let “+” denote the summation over that subscript. For example, we define πi+d = πi1d + πi0d (i = 1, 0, d = 1, 0) and π+jd = π1jd + π0jd (j = 1, 0, d = 1, 0). For clarity, we summarize the probability structure and observed data (in parentheses) with the disease status d (= 1, 0) in Table 1. Note that the SN (or the TPR) for the standard and new tests are, by definition, given by π1+1 /π++1 and π+11 /π++1 , respectively. Similarly, the SP of the standard and new tests are given by π0+0 /π++0 and π+00 /π++0 . Note also that because n00d is not observable, the cell probability π00d is non-estimable, and so are the SN and SP of the standard and the new tests. In this paper, we focus our discussion on studying whether there is a difference in either SN or SP between two screening tests. That is, we wish to test the null hypothesis H0 : π1+1 /π++1 = π+11 /π++1 and π0+0 /π++0 = π+00 /π++0 vs. the alternative hypothesis Ha : π1+1 /π++1  = π+11 /π++1 or π0+0 /π++0  = π+00 /π++0 when the confirmatory procedure is available only to the subjects with a positive response in either test. Note that the SP π0+0 /π++0 = π+00 /π++0 between two tests holds if and only if the FPR π1+0 /π++0 = π+10 /π++0 between two tests holds. Under the multinomial distribution, the maximum likelihood estimator (MLE) of πijd is πˆ ijd = nijd /n for (i, j) =(1, 1), (1, 0), and (0, 1) and d = 1, 0. Using the functional invariance property of the MLE (Casella and Berger, 1990), we obtain the MLEs for π1+d and π+1d as given by πˆ 1+d = πˆ 11d + πˆ 10d , and πˆ +1d = πˆ 11d + πˆ 01d , respectively. 2.1. Asymptotic Test Procedures When testing H0 : π1+1 /π++1 = π+11 /π++1 and π0+0 /π++0 = π+00 /π++0 , we may consider testing H0 : π1+1 − π+11 = 0 and π1+0 − π+10 = 0. Note that the MLE for π1+d − π+1d (for d = 1, 0) is simply πˆ 1+d − πˆ +1d Note also that the covariance Cov(πˆ 1+1 − πˆ +11 , πˆ 1+0 − πˆ +10 = −[(π1+1 − π+11 )(π1+0 − π+10 )]/n, which is equal to 0 under H0 . Thus, as shown in Appendix, we may obtain the following asymptotic test statistic based on πˆ 1+d − πˆ +1d for d = 1, 0: T1 = (n101 − n011 )2 /(n101 + n011 ) + (n100 − n010 )2 /(n100 + n010 ).

(1)

2 2 , where χα,df denotes the upper 100( α)th perWe will reject H0 at the α-level if T1 > χα,2 2 centile of the central χ -distribution with df degrees of freedom. Note that the test statistic

32

LUI AND CHANG

T1 (1) is actually the sum of two McNemar’s tests (Fleiss, 1981; Selvin, 1996; Lui, 2004) with no continuity corrections for testing the equality of SN and SP separately. The null hypothesis H0 : π1+1 /π++1 = π+11 /π++1 and π0+0 /π++0 = π+00 /π++0 focused here can also be reexpressed as H0 : π1+1 /π+11 = 1 and π1+0 /π+10 = 1. Thus, when testing H0 , we may consider using πˆ 1+d /πˆ +1d instead of πˆ 1+d − πˆ +1d . Using the Multivariate Central Limit Theorem (Agresti, 1990), we obtain the following asymptotic test statistic (Appendix):  ar(log(πˆ 1+1 /πˆ +11 )) T2 = [log(πˆ 1+1 /πˆ +11 )]2 /V

Downloaded by [New York University] at 09:40 08 February 2015

 ar(log(πˆ 1+0 /πˆ +10 )), + [log(πˆ 1+0 /πˆ +10 )]2 /V

(2)

 where V ar log(πˆ 1+d /πˆ +1d )) = (n10d + n01d )/[(n10d + n11d )(n01d + n11d )] for d = 1, 0. Under H0 , the test statistic T2 (2) approximately follows the central χ 2 -distribution with two degrees of freedom when the number n of subjects is large. Thus, we will reject 2 . H0 at the α-level if T2 > χα,2 Based on the prior information, we may sometimes know that both the SN and SP of a new test are expected to be higher (or lower) than those of a standard test. In other words, we want to test H0 : π1+1 /π++1 = π+11 /π++1 and π0+0 /π++0 = π+00 /π++0 vs. Ha∗ : π1+1 /π++1 > π+11 /π++1 and π0+0 /π++0 > π+00 /π++0 ; or π1+1 /π++1 < π+11 /π++1 and π0+0 /π++0 < π+00 /π++0 . To improve the power of T1 (1) or T2 (2), we may incorporate this prior information on the same relative directions of SN and SP between two screening tests as specified in Ha∗ into test statistics as follows. Under Ha∗ with the same relative direction in SN and SP, the statistic πˆ 1+1 − πˆ +11 − (πˆ 1+0 − πˆ +10 ) tends to lie away from 0. To improve power of T1 , we may consider combining πˆ 1+1 − πˆ +11 and πˆ 1+0 − πˆ +10 in (1) and employ the following test statistic: T3 = [πˆ 1+1 − πˆ +11 − (πˆ 1+0 − πˆ +10 )]2 /[(πˆ 101 + πˆ 011 + πˆ 100 + πˆ 010 )/n].

(3)

Under H0 , the test statistic T3 (3) asymptotically follows the central χ 2 -distribution with 2 . Note that the one degree of freedom. Thus, we will reject H0 at the α-level if T3 > χα,1 test statistic T3 (3) is, in fact, the same as that proposed elsewhere (Schatzkin et al., 1987) but with no continuity correction. Similarly, we may also consider combining log(πˆ 1+1 /πˆ +11 ) and log(πˆ 1+0 /πˆ +10 ) in T2 to improve power under Ha∗ , and use the following test statistic: T4 = [log(πˆ 1+1 /πˆ +11 ) − log(πˆ 1+0 /πˆ +10 )]2 / {Vaˆr(log(πˆ 1+1 /πˆ +11 )) + Vaˆr(log(πˆ 1+0 /πˆ +10 ))}.

(4)

2 We will reject H0 at the α-level if T4 > χα,1 . Note that when the SN of the standard test is larger than that of the new test, but the SP of the former is smaller than that of the latter, or vice versa, the value for test statistic T3 (or T4 ) can be small. Thus, using T3 (3) or T4 (4) in this case could be lack of power. Note also that the complement of the parameter space in Ha∗ is not equal to the parameter space under H0 . When employing T3 or T4 , we may need to explicitly assume that the SN and SP of a new test cannot be in different relative directions as compared with a standard test to avoid a possible misinterpretation of Type I error.

TEST EQUALITY BETWEEN TWO BINARY SCREENING TESTS

33

2.2. Exact Test Procedure

Downloaded by [New York University] at 09:40 08 February 2015

Test procedures (1)–(4) are derived on the basis of large sample theory. When the underlying prevalence of disease in the targeted population is low and the number of studied subjects in a trial is small, these procedures can be theoretically invalid due to small n10 d and n01d . In this case, we may wish to develop and use an exact test procedure. Define n(dis) d = n10 d + n01d (d = 1, 0) as the total number of subjects with discordant responses among subjects with disease status d. One can easily show that the conditional distribution of n10d , given n(dis)d follows the binomial distribution with parameters n(dis)d and π10d /(π10d + π01d ). Furthermore, we can show that the joint conditional probability mass function (pmf ) for the bivariate vector (n101 , n100 ) , given n(dis)1 and n(dis) 0 fixed, is given by the product of the two binomial distributions: f (n101 , n100 |n(dis)1 , n(dis)0 ) =

 n101  n011 n(dis)1 ! π101 π011 n101 !n011 ! π101 + π011 π101 + π011  n100  n010 n(dis)0 ! π100 π010 . n100 !n010 ! π100 + π010 π100 + π010

(5)

Under H0 : π1+1 /π++1 = π+11 /π++1 and π0+0 /π++0 = π+00 /π++0 , the above pmf f (n101 , n100 |n(dis)1 , n(dis)0 ) (5) reduces to  n101 +n011  n100 +n010 n(dis)0 ! n(dis)1 ! 1 1 f (n101 , n100 |n(dis)1 , n(dis)0 , H0 ) = . n101 !n011 ! 2 n100 !n010 ! 2

(6)

On the basis of the pmf f (n101 , n100 |n(dis)1 , n(dis)0 , H0 )(6), we can calculate the exact p-value for a given observed bivariate vector (n101 , n100 ) as 

f (x, y|n(dis)1 , n(dis)0 , H0 ),

(7)

(x,y)∈C

where C = {(x, y)|f (x, y|n(dis)1 , n(dis)0 , H0 ) ≤ f (n101 , n100 |n(dis)1 , n(dis)0 , H0 )}. If the p-value (7) on the basis of the exact distribution (6) is less than a given small α, we will reject H0 at the α-level.

2.3. Sample Size Determination To assure that we can have a desired power of detecting a difference between the accuracy of two screening tests, it is important that we take an adequate number n of studied subjects in our trial. Because the ratios of TPR and FPR (rather than the differences of TPR and FPR) between two tests are estimable in data when a confirmatory procedure is limited to only those screen positives, we focus our attention on sample size determination on the 2 . basis of T2 . Note that the critical region consists of the sample points for which T2 > χα,2 Note also that under Ha , as the total number n of subjects is large, the test statistic T2 approximately follows the noncentral χ 2 -distribution with two degrees of freedom and the noncentral parameter given by

34

LUI AND CHANG

λ(n, RTPR, RFPR, π111 ,π110 , π+11 , π+10 ) = (log(RTPR))2 /[(π1+1 + π+11 − 2π111 )/(nπ1+1 π+11 )]+ (log(RFPR))2 /[(π1+0 + π+10 − 2π110 )/(nπ1+0 π+10 )],

Downloaded by [New York University] at 09:40 08 February 2015

where RTPR = π1+1 /π+11 and RFPR = π1+0 /π+10 denoting the ratio of TPR and FPR, respectively. Therefore, we may obtain the minimum required sample size for a desired power of detecting a difference between two screening tests, given RTPR and RFPR in the situation specified by the parameter values π111 , π110 , π+11 , and π+10 by searching for the smallest n satisfying P(χ 2 (2, λ(n, RTPR, RFPR, π111 , π110 , π+11 , π+10 )) > χα2 (2)) ≥ 1 − β.

(8)

We may write program in SAS (1990) and apply the trial-and-error procedure to find the smallest n satisfying equation (8) easily. 3. MONTE CARLO SIMULATION AND RESULTS To evaluate the finite sample performance of asymptotic test procedures using T1 (1), T2 (2), T3 (3), T4 (4), and the exact test procedure (7), we use Monte Carlo simulation. Given the sensitivity (SNs ) and the specificity (SPs ) of the standard test and the disease prevalence (= π++1 ), under H0 we can determine all the margins of the two tables via formulas: π1+1 = π+11 = SNs π++1 , π0+1 = π+01 = (1 − SNs )π++1 , π1+0 = π+10 = (1 − SPs )(1 − π++1 ), and π0+0 = π+00 = SPs (1 − π++1 ). Note that the prevalence π++0 of non-case simply equals 1 − π++1 . Note further that given the margins of a 2 × 2 table fixed with disease status d, we can uniquely determine all the cell probabilities πijd when π11d is once given. To assure all πijd to fall between 0 and 1; however, the parameter π11d must satisfy ad ≤ π11d ≤ bd , where ad = max{0, max(0, π1+d − min{π1+d , π++d − π+1d }), max(0, π+1d − min{π+1d , π++d − π1+d }), π1+d + π+1d − π++d }, and bd = min{π1+d , π+1d , π1+d + π+1d − π++d + min(π++d − π1+d , π++d − π+1d )}. We arbitrarily choose π11d = ad + (bd − ad )/3 in the following simulations. To evaluate Type I error of asymptotic test procedures using T1 (1), T2 (2), T3 (3), T4 (4) and the exact test procedure (7), we consider the situations in which the underlying prevalence rate ( π++1 =) 0.10, 0.20, 0.30; the SN of two screening tests equals 0.80, 0.90, 0.95; the SP of two screening tests equals 0.90, 0.95, and the total number of subjects n = 200, 500, 1000. For each configuration determined by a combination of these parameters, we write programs in SAS (1990) and generate 10,000 repeated samples from the desired multinomial distribution to calculate the estimated Type I error of these procedures. For completeness, we also calculate the proportion of the total simulated samples for which we fail to apply the given test procedure. For example, we cannot apply test procedures using T1 or T2 , or the exact test procedure (7) in the simulated samples with either n(dis)1 (= n101 + n011 ) or n(dis)0 (= n100 + n010 ) equal to 0. If this occurs, we will take no action and thereby, we will not reject the null hypothesis H0 . We summarize in Table 2 the estimated Type I error of asymptotic test procedures using T1 (1), T2 (2), T3 (3), T4 (4) and the exact test procedure (7) at the 0.05 level. We first note that except for a few extreme cases (in which the underlying prevalence rate and the sample size n are small, but the SN

TEST EQUALITY BETWEEN TWO BINARY SCREENING TESTS

35

Table 2 The estimated Type I error for test procedures using T1 (1), T2 (2), T3 (3), T4 (4) and the exact test procedure (7) at the 0.05 level π++1

SN

SP

n

(1)

(2)

(3)

(4)

(7)

0.10

0.80

0.90

200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000

0.046 0.047 0.047 0.043 0.047 0.054 0.032 0.044 0.047 0.030 0.042 0.043 0.020 0.029 0.044 0.020 0.034 0.045 0.047 0.048 0.050 0.044 0.045 0.053 0.042 0.045 0.047 0.042 0.050 0.048 0.029 0.048 0.054 0.027 0.044 0.046 0.048 0.049 0.054 0.048 0.051 0.043 0.048 0.048 0.047 0.043 0.050 0.046 0.039 0.045 0.050

0.042 0.046 0.047 0.034 0.045 0.053 0.030 0.043 0.047 0.023 0.040 0.042 0.018 0.028 0.044 0.016 0.032 0.044 0.043 0.046 0.050 0.038 0.043 0.052 0.040 0.045 0.046 0.034 0.048 0.047 0.026 0.047 0.053 0.021 0.041 0.045 0.045 0.048 0.054 0.039 0.049 0.042 0.045 0.047 0.047 0.035 0.047 0.045 0.034 0.044 0.050

0.053 0.049 0.046 0.049 0.050 0.053 0.052 0.051 0.050 0.047 0.049 0.048 0.051 0.046 0.047 0.048 0.052 0.050 0.050 0.050 0.050 0.053 0.046 0.057 0.052 0.049 0.050 0.048 0.054 0.053 0.058 0.050 0.051 0.046 0.050 0.052 0.049 0.046 0.051 0.051 0.052 0.048 0.055 0.052 0.051 0.049 0.051 0.046 0.052 0.047 0.048

0.050 0.048 0.047 0.042 0.049 0.051 0.045 0.049 0.050 0.040 0.046 0.046 0.045 0.046 0.047 0.044 0.048 0.048 0.045 0.050 0.047 0.040 0.046 0.051 0.046 0.047 0.049 0.040 0.049 0.049 0.050 0.047 0.053 0.040 0.047 0.050 0.047 0.044 0.054 0.035 0.048 0.046 0.046 0.051 0.049 0.035 0.045 0.046 0.046 0.046 0.050

0.051 0.049 0.047 0.046 0.048 0.055 0.042 0.046 0.048 0.038 0.044 0.044 0.030 0.040 0.049 0.029 0.043 0.047 0.049 0.048 0.051 0.046 0.046 0.053 0.047 0.046 0.047 0.044 0.052 0.048 0.042 0.051 0.055 0.038 0.047 0.047 0.049 0.050 0.055 0.049 0.052 0.043 0.051 0.048 0.047 0.043 0.050 0.046 0.045 0.045 0.051

0.95

0.90

0.90

Downloaded by [New York University] at 09:40 08 February 2015

0.95

0.95

0.90

0.95

0.20

0.80

0.90

0.95

0.90

0.90

0.95

0.95

0.90

0.95

0.30

0.80

0.90

0.95

0.90

0.90

0.95

0.95

0.90

(Continued)

36

LUI AND CHANG

Table 2 (Continued). π++1

SN

SP

n

(1)

(2)

(3)

(4)

(7)

0.95

200 500 1000

0.036 0.047 0.049

0.026 0.044 0.048

0.044 0.051 0.050

0.036 0.045 0.049

0.041 0.047 0.050

Downloaded by [New York University] at 09:40 08 February 2015

Notes. π++1 , the underlying prevalence rate; SN, the sensitivity of two tests; SP, the specificity of two tests; and n, the total number of subjects in the trial.

and SP are large), the probability of failure to apply test procedures considered here is generally small or negligible ( ≈ 0.000) in Table 2. We see that all test procedures discussed here can perform reasonably well; all their estimated Type I errors are either less than or approximately equal to the nominal 0.05 level (Table 2). To compare power between different test procedures, we consider the situations in which the underlying prevalence rate ( π++1 =) 0.20; the SN and the SP of the standard test: SNs = 0.80, 0.90, 0.95, SPs = 0.90, 0.95, and the SN and the SP of the new test: SNn = SNs +δ1 , and SPn = SPs + δ2 , where δ1 = –0.05, 0.03 and δ2 = –0.05, 0.03; as well as the total number of subjects n = 200, 500, 1000. We calculate the simulated power as the proportion of the total 10,000 simulated samples for which we reject the null hypothesis H0 . We summarize in Table 3 the simulated power for test procedures using T1 (1), T2 (2), T3 (3), T4 (4) and the exact test procedure (7). We find that when the relative direction of SN and SP for the new test vs. the standard test is the same (i.e., δ1 > 0 and δ2 >0 or δ1 < 0 and δ2 0 and δ2 0). We find that when δ 1 and δ 2 are in opposite relative directions, the exact test procedure developed here can be generally preferable to all the other asymptotic test procedures with respect to power (Table 3). For example, when δ1 = –0.05, and δ2 = 0.03, the simulated powers of test procedures using T1 , T2 , T3 , T4 and the exact test procedure in (7) for SNs = 0.95, SPs = 0.90, and n = 200 at 0.05 level (Table 3) are 0.196, 0.183, 0.093, 0.130, and 0.213, respectively. To demonstrate the use of sample size calculation formula (8) derived from T2 , we calculate the minimum required number of subjects n for a desired power 80% of detecting a difference at the 0.05 level based on equation (8). We consider the situations in which the underlying prevalence rate ( π++1 =) 0.10, 0.20; the ratio of TPR and the ratio of FPR, RTPR = 0.90, 1.10, and RFPR = 0.90, 1.10; the SN and the SP of the new test: SNn = 0.80, 0.90 and SPn = 0.90, 0.95. To further evaluate the accuracy of sample size formula (8), we again apply Monte Carlo simulation and generate 10,000 repeated samples of the resulting sample size n, each following a desired multinomial distribution, to calculate the simulated power. We summarize in Table 4 the estimated minimum required number n of subjects and the corresponding simulated power of the procedure using T2 for a desired power of 80% detecting a difference between two screening tests at the 0.05 level. For example, when π++1 = 0.10, RTPR = 0.90, RFPR = 0.90, SNn = 0.90, and SPn = 0.95, the estimated minimum required number of subjects for a desired power 80% is 2428. The corresponding simulated power based on the procedure using T2 is 0.807. Note that when RTPR = 1.10 and SNn = 0.90, the estimated minimum required number of subjects based on equation (8) can be conservative. For example, for π++1 = 0.10, RTPR = 1.10, RFPR

TEST EQUALITY BETWEEN TWO BINARY SCREENING TESTS

37

Table 3 The simulated power for test procedures using T1 (1), T2 (2), T3 (3), T4 (4) and the exact test procedure (7) δ1 = –0.05 SNs

SPs

n

(1)

(2)

(3)

(4)

(7)

0.80

0.90

200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000

0.265 0.603 0.900 0.382 0.803 0.982 0.283 0.642 0.925 0.408 0.827 0.988 0.299 0.698 0.950 0.430 0.861 0.992

0.254 0.598 0.898 0.350 0.795 0.982 0.270 0.638 0.924 0.377 0.818 0.987 0.288 0.696 0.950 0.399 0.855 0.992

0.329 0.690 0.935 0.437 0.819 0.982 0.377∗ 0.746∗ 0.962∗ 0.519∗ 0.888∗ 0.994∗ 0.403∗ 0.782∗ 0.971∗ 0.563∗ 0.919∗ 0.997∗

0.334∗ 0.705∗ 0.944∗ 0.468∗ 0.875∗ 0.992∗ 0.359 0.725 0.955 0.487 0.886 0.993 0.357 0.738 0.959 0.490 0.893 0.994

0.271 0.604 0.900 0.389 0.805 0.983 0.293 0.645 0.926 0.420 0.829 0.988 0.324 0.702 0.950 0.457 0.864 0.992

0.95

0.90

0.90

0.95

Downloaded by [New York University] at 09:40 08 February 2015

δ2 = –0.05

0.95

0.90

0.95

δ1 = –0.03 0.80

0.90

200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000

0.95

0.90

0.90

0.95

0.95

0.90

0.95

0.165 0.390∗ 0.687 0.291 0.684 0.946∗ 0.190 0.437 0.756∗ 0.313 0.716 0.964∗ 0.196 0.505 0.825 0.329 0.761 0.976∗

δ2 = 0.03

0.152 0.385 0.684 0.187 0.655 0.943 0.178 0.431 0.754 0.206 0.694 0.962 0.183 0.500 0.824 0.223 0.743 0.975

0.080 0.119 0.189 0.095 0.160 0.271 0.092 0.135 0.220 0.106 0.193 0.334 0.093 0.141 0.244 0.107 0.222 0.403

δ1 = 0.03 SNs

SPs

0.80

0.90

0.95

0.90

0.90

n 200 500 1000 200 500 1000 200 500

0.119 0.258 0.460 0.197 0.630 0.913 0.133 0.281 0.503 0.206 0.645 0.926 0.130 0.285 0.520 0.206 0.649 0.929

0.168∗ 0.390∗ 0.688∗ 0.294∗ 0.685∗ 0.946∗ 0.196∗ 0.439∗ 0.756∗ 0.320∗ 0.718∗ 0.964∗ 0.213∗ 0.507∗ 0.826∗ 0.345∗ 0.763∗ 0.976∗

δ2 = –0.05

(1)

(2)

(3)

(4)

(7)

0.251 0.572 0.880∗ 0.368 0.780 0.980∗ 0.255 0.594

0.239 0.568 0.879 0.340 0.771 0.979 0.243 0.590

0.188 0.406 0.684 0.248 0.521 0.804 0.213 0.455

0.231 0.509 0.812 0.372 0.766 0.971 0.250 0.551

0.258∗ 0.575∗ 0.880∗ 0.377∗ 0.783∗ 0.979 0.277∗ 0.600∗ (Continued)

38

LUI AND CHANG

Table 3 (Continued). δ1 = –0.03 SNs

SPs

0.95

0.95

0.90

Downloaded by [New York University] at 09:40 08 February 2015

0.95

δ2 = –0.05

n

(1)

(2)

(3)

(4)

(7)

1000 200 500 1000 200 500 1000 200 500 1000

0.899 0.379 0.802 0.983∗ 0.241 0.660 0.940 0.364 0.838 0.992∗

0.899 0.346 0.794 0.982 0.228 0.656 0.939 0.338 0.833 0.991

0.738 0.299 0.616 0.886 0.225 0.480 0.780 0.328 0.666 0.922

0.847 0.397 0.802 0.978 0.256 0.567 0.862 0.399 0.807 0.980

0.900∗ 0.408∗ 0.808∗ 0.983∗ 0.279∗ 0.678∗ 0.943∗ 0.415∗ 0.852∗ 0.991

δ1 = 0.03 0.80

0.90

0.95

0.90

0.90

0.95

0.95

0.90

0.95

200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000

0.142 0.344 0.626 0.276 0.658 0.934 0.158 0.375 0.678 0.282 0.676 0.943 0.142 0.442 0.780 0.268 0.734 0.970

0.131 0.338 0.622 0.166 0.629 0.930 0.148 0.369 0.675 0.174 0.651 0.939 0.130 0.438 0.778 0.165 0.710 0.967

δ2 = 0.03 0.189∗ 0.419 0.686 0.283∗ 0.588 0.867 0.234∗ 0.476∗ 0.779∗ 0.355∗ 0.733 0.958 0.262∗ 0.526∗ 0.824∗ 0.410∗ 0.838∗ 0.987∗

0.181 0.440∗ 0.724∗ 0.276 0.750∗ 0.967∗ 0.197 0.444 0.743 0.273 0.746∗ 0.967∗ 0.201 0.446 0.749 0.270 0.746 0.968

0.145 0.345 0.626 0.280 0.662 0.934 0.172 0.378 0.680 0.296 0.679 0.943 0.168 0.462 0.781 0.304 0.747 0.971

Notes. δ1 = SNn –SNs , where SNn and SNs : the sensitivity for the new and the standard tests; δ2 = SPn –SPs , where SPn and SPs : the specificity for the new and standard tests; π++1 : the underlying prevalence rate; and n: the total number of subjects in the trial. ∗ The simulated power is the largest among test statistics considered here.

= 1.10, SNn = 0.90, and SPn = 0.95, the estimated minimum required number of subjects is 1183, while its corresponding simulated power is 88% (Table 4). 4. AN EXAMPLE To illustrate the use of asymptotic test procedures T1 , T2 , T3 , T4 and the exact test procedure (7), we consider the data regarding 949 Black men (Table 5) with the results of the PSA and DRE tests (Smith et al., 1997; Pepe and Alonzo, 2001). The confirmatory procedure of biopsy is employed only to those patients with a positive test response in either test. For readers’ information, we summarize the data in Table 5. When applying the MLEs (πˆ 1+1 /πˆ +11 ) and (πˆ 1+0 /πˆ +10 ) to estimate the RTPR and RFPR, we obtain 2.111 and 1.414, respectively. These suggest that the SN of the PSA is higher than the DRE, but the

TEST EQUALITY BETWEEN TWO BINARY SCREENING TESTS

39

Table 4 The estimated minimum required number n of subjects and its corresponding simulated power of test procedure using T2 (2) for a desired power of 80% detecting a difference between two screening tests at the 0.05 level π++1

RTPR

RFPR

SNn

SPn

n

Simulated power

0.10

0.90

0.90

0.80

0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.95

3799 4399 2233 2428 3898 4465 2267 2448 2871 3201 1131 1179 2927 3235 1139 1183 2239 2411 1226 1276 2269 2429 1235 1281 1621 1710 592 604 1637 1718 594 605

0.811 0.800 0.802 0.807 0.802 0.799 0.814 0.810 0.808 0.812 0.877 0.885 0.804 0.807 0.879 0.882 0.811 0.803 0.816 0.814 0.803 0.803 0.813 0.807 0.812 0.802 0.880 0.886 0.807 0.810 0.886 0.879

0.90 1.10

0.80 0.90

Downloaded by [New York University] at 09:40 08 February 2015

1.10

0.90

0.80 0.90

1.10

0.80 0.90

0.20

0.90

0.90

0.80 0.90

1.10

0.80 0.90

1.10

0.90

0.80 0.90

1.10

0.80 0.90

Notes. π++1 : the underlying prevalence rate; RTPR: the ratio of TPR; RFPR: the ratio of FPR; SNn : the sensitivity of the new test; and SPn : the specificity of the new test.

SP of the former is lower than that of the latter. Suppose that we wish to find out whether there is a difference in accuracy between the PSA and DRE screening tests. When applying test procedures T1 , T2 , T3 , T4 and the exact test procedure (7), we obtain the p-values to be 0.001, 0.002, 0.424, 0.219, and 0.001, respectively. Note that because the relative directions for SN and SP between the PSA and DRE tests in these data are not the same, as noted previously, test procedures using T3 or T4 are not appropriate for use. Based on the p-values for test procedures in T1 (1), T2 (2) or the exact test procedure (7), we may conclude that there is strong evidence that the accuracy of the PSA is different from that of the DRE. Based on the data in Table 5, we may obtain the following parameter estimates: πˆ 111 = 0.011, πˆ 110 = 0.003, πˆ +11 = 0.019, and πˆ +10 = 0.031. To illustrate the use of sample size calculation formula (8), suppose that we wish to find out the minimum required number of subjects for a desired power 80% of detecting a difference between the PSA and

40

LUI AND CHANG

Table 5 The observed frequency of 949 Black men with the test results of the prostate-specific-antigen (PSA) test and digital rectal examination (DRE) test when the prostate cancer was confirmed by biopsy limited to only those screen positives Prostate cancer DRE + PSA + PSA –

10 8

No prostate cancer DRE +

DRE – 28 ?

PSA + PSA –

3 26

DRE – 38 ?

Downloaded by [New York University] at 09:40 08 February 2015

+ and −, positive and negative test responses, respectively; ?, non-observable.

DRE tests when RTPR(= π1+1 /π+11 ) = 2 and RFPR(= π1+0 /π+10 ) =1.50, π111 = 0.011, π110 = 0.003, π+11 = 0.019 and π+10 = 0.031 (as determined by the parameter estimates from the data in Table 5) at the 0.05 level. We obtain the estimate of the minimum required sample size n as 729. To evaluate the accuracy of this sample size estimate, we generated 10,000 repeated samples, each having the size of 729 subjects in the above situation with the underlying prevalence, say, equal to 0.10. We obtain the simulated power corresponding to asymptotic test procedures (1), (2) and the exact test procedure (7) to be 0.828, 0.820, and 0.830. These suggest that the estimated sample size 729 should be marginally conservative, but be applicable in use of test procedures (1), (2), and (7). 5. DISCUSSION When the relative directions of SN and SP for the new test vs. the standard test are known to be the same before the trial, we note that test procedures using T3 or T4 accounting for this prior information can improve power. Although we may follow a similar idea as for deriving T3 (or T4 ) by considering test statistic based on πˆ 1+1 − πˆ +11 + ( πˆ 1+0 − πˆ +10 ) (or log(πˆ 1+1 /πˆ +11 ) + log(πˆ 1+0 /πˆ +10 )) to improve power when the relative direction of SN and SP are known to be in opposite sides, we should cautiously employ these test procedures in practice. This is because if the assumed relative direction of SN and SP should be different from what we would expect, test procedures designed for detecting specific alternative directions can be, as shown in the example, powerless. On the other hand, because the exact test procedure (7) is theoretically valid for a small number of subjects and is generally more powerful than the other asymptotic procedures considered here without the need of assuming any specific direction in SN or SP in the alternative hypothesis, we recommend the exact test procedure for general use. Note that when employing procedures using T1 and T3 , we may consider use of the continuity correction as commonly suggested in application of McNemar’s test (Fleiss, 1981; Selvin, 1996). However, Table 2 shows that the estimated Type I errors for procedures using T1 and T3 (without the continuity correction) are generally less than or agree with the nominal 0.05-level well. In fact, we have also applied Monte Carlo simulation to evaluate the performance of these test procedures with the continuity correction in the same set of configurations as those considered in Table 3. We have found that use of T1 or T3 with the continuity correction can cause a substantial loss of power. For example, when δ1 = –0.05, δ2 = –0.05, SNs = 0.80, SPs = 0.90, and n = 200, we find that the simulated power for the test procedure using T1 drops from 0.265 (as shown in Table 3) to 0.175 (with the continuity correction); this translates the relative loss of power to be 34% (= (0.265–0.175)/0.265). Thus, we do not recommend use of the continuity correction when employing T1 and T3

Downloaded by [New York University] at 09:40 08 February 2015

TEST EQUALITY BETWEEN TWO BINARY SCREENING TESTS

41

here. Similar conclusions as found here regarding the use of the continuity correction in the other situations appeared elsewhere (Lui, 2001; Lui and Lin, 2003). Note that because the differences in TPR and FPR between two tests are, as noted before, not estimable in data in which a confirmatory procedure is restricted to only those patients with a positive response in either of two tests, sample size calculation based on test procedure using T1 (or T3 ) in terms of the non-estimable difference in TPR or in FPR is likely of limited use. However, note that the powers of asymptotic test procedure using T1 (1) and of the exact test procedure (7) are generally larger than or approximately equal to that of the test procedure using T2 (Table 3), the estimated minimum required sample size n (8) determined by using T2 can be still applicable, but will tend to provide a conservative sample size estimate when we apply the test procedure using T1 (1) or the exact test procedure (7). In fact, we have illustrated this point in Section 4. Except for a few extreme cases, Table 4 indicates that the sample size calculation formula derived from equation (8) is accurate, as shown by the close agreement between the desired power and the simulated power. When RTPR = 1.10 and SNn = 0.90 (Table 4), the simulated power can be somewhat larger than the desired power 80% and thereby the estimated minimum required number of subjects n derived from equation (8) tends to be conservative. To investigate what may cause this loss of accuracy in sample size calculation, we calculate the expected number of subjects given the resulting estimate of sample size n obtained from equation (8). We find that the expected numbers of patients nπ101 are all less than 1 when the simulated power is much larger than the desired power in Table 4. This may essentially account for the reason why sample size calculation formula (8) derived from normal approximation of the test statistic T2 loses accuracy in these cases. In summary, we have developed four asymptotic test procedures and one exact test procedure. We have developed sample size formula for simultaneously testing equality of SN and SP for a desired power at a given nominal α-level. We have further employed Monte Carlo simulation to evaluate the performance of these test procedures and the accuracy of the proposed sample size formula in a variety of situations. We have noted that the procedures using T3 (or T4 ) (or their other modified versions as noted previously in the discussion) can be of use when we have prior information on the relative directions of SN and SP. Given no prior information on the relative directions of SN and SP, we have found that the exact test developed here is generally preferable to the other asymptotic test procedures and hence is recommended for general use. The results, the findings, and the discussions presented here should have use for clinicians and biostatisticians when they wish to test equality between two binary screening tests with a confirmatory procedure applied only to subjects with screen positives. APPENDIX To estimate the difference π1+d − π+1d in the positive response rates between two screening tests, we may consider the MLE πˆ 1+d − πˆ +1d for d = 1, 0. We can easily show that the asymptotic variance for πˆ 1+d − πˆ +1d under the assumed multinomial distribution is given by (Fleiss, 1981; Agresti, 1990; Lui, 2004) Var(πˆ 1+d − πˆ +1d ) = [π10d + π01d − (π1+d − π+1d )2 ]/n.

(A.1)

42

LUI AND CHANG

We can further show that the covariance between πˆ 1+1 − πˆ +11 and πˆ 1+0 − πˆ +10 is given by Cov(πˆ 1+1 − πˆ +11 , πˆ 1+0 − πˆ +10 ) = −[(π1+1 − π+11 )(π1+0 − π+10 )]/n.

(A.2)

Under the null hypothesis, H0 : π1+1 = π+11 and π1+0 = π+10 , the above asymptotic variance (A.1) and covariance (A.2) reduce to VarH0 (πˆ 1+d − πˆ +1d ) = (π10d + π01d )/n and CovH0 (πˆ 1+1 − πˆ +11 , πˆ 1+0 − πˆ +10 ) = 0. Thus, we may obtain the estimated asymptotic variance under H0 as

Downloaded by [New York University] at 09:40 08 February 2015

 V arH0 (πˆ 1+d − πˆ +1d ) = (πˆ 10d + πˆ 01d )/n.

(A.3)

 On the basis of V arH0 (πˆ 1+d − πˆ +1d ) (A.3) and CovH0 (πˆ 1+1 − πˆ +11 , πˆ 1+0 − πˆ +10 ) = 0, we obtain the following test statistic by using the multivariate Central Limit Theorem: T1 = (πˆ 1+1 − πˆ +11 )2 /[(πˆ 101 + πˆ 011 )/n] + (πˆ 1+0 − πˆ +10 )2 /[(πˆ 100 + πˆ 010 )/n],

(A.4)

which asymptotically follows the central χ 2 -distribution with two degrees of freedom under H0 . Note that the test statistic T1 (A.4) can be rewritten as (n101 − n011 )2 /(n101 + n011 ) + (n100 − n010 )2 /(n100 + n010 ),

(A.5)

which is actually the sum of two McNemar’s tests for testing the equality of SN and SP, separately. Note that the null hypothesis H0 : π1+1 /π++1 = π+11 /π++1 and π0+0 /π++0 = π+00 /π++0 is true if and only if H0 : π1+1 /π+11 = 1 and π1+0 /π+10 = 1. Thus, we may also consider the test statistic based on the ratio πˆ 1+d /πˆ +1d (d = 1, 0). Using the delta method (Agresti, 1990; Lui, 2004), we can show that the estimated asymptotic variance for log(πˆ 1+d /πˆ +1d ) is given by  V ar(log(πˆ 1+d /πˆ +1d )) = (n10d + n01d )/[(n10d + n11d )(n01d + n11d )].

(A.6)

Furthermore, using the delta method again, we can show that the asymptotic covariance Cov(log(πˆ 1+1 /πˆ +11 ), log(πˆ 1+0 /πˆ +10 )) = 0. On the basis of these results, we obtain the following statistic for testing H0 : π1+1 /π++1 = π+11 /π++1 and π0+0 /π++0 = π+00 /π++0 :   ar(log(πˆ 1+1 /πˆ +11 )) + [log(πˆ 1+0 /πˆ +10 )]2 /V ar(log(πˆ 1+0 /πˆ +10 )), T2 = [log(πˆ 1+1 /πˆ +11 )]2 /V (A.7) that asymptotically follows the central χ 2 -distribution with two degrees of freedom under H0 . ACKNOWLEDGMENTS The authors thank the associate editor and the reviewer for many useful and valuable comments to improve the contents and clarity of this paper.

TEST EQUALITY BETWEEN TWO BINARY SCREENING TESTS

43

Downloaded by [New York University] at 09:40 08 February 2015

REFERENCES Agresti, A. (1990). Categorical Data Analysis. New York: Wiley. Alonzo, T. A., Braun, T. M., Moskowitz, C. S. (2004). Small sample estimation of relative accuracy for binary screening tests. Statistics in Medicine 23:21–34. Alonzo, T. A., Pepe, M. S., Moskowitz, C. S. (2002). Sample size calculations for comparative studies of medical tests for detecting presence of disease. Statistics in Medicine 21:835–852. Casella, G., Berger, R. L. (1990). Statistical Inference. Belmont, CA: Duxbury. Cheng, H., Macaluso, M. (1997). Comparison of the accuracy of two tests with a confirmatory procedure limited to positive results. Epidemiology 8:104–106. Cheng, H., Macaluso, M., Waterbor, J. (1999). Estimation of relative and absolute test accuracy. Epidemiology 10:566–567. Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, 2nd edn. New York: Wiley. Lui, K.-J. (2001). Notes on testing equality in dichotomous data with matched pairs. Biometrical Journal 43:313–321. Lui, K.-J. (2004). Statistical Estimation of Epidemiological Risk. New York: Wiley. Lui, K.-J. (2012). Notes on testing non-inferiority under the partial verification design with a confirmatory procedure limited to screen positives. Contemporary Clinical Trials 33:563–571. Lui, K.-J., Lin, C.-D. (2003). A revisit on comparing the asymptotic interval estimators of odds ratio in a single 2x2 table. Biometrical Journal 45:226–237. Pepe, M. S., Alonzo, T. A. (2001). Comparing disease screening tests when true disease status is ascertained only for screen positives. Biostatistics 2:249–260. SAS Institute, Inc. (1990). SAS Language, Reference Version 6, 1st edn. Cary, NC: SAS Institute. Schatzkin, A., Connor, R. J., Taylor, P. R., Bunnag, B. (1987). Comparing new and old screening tests when a reference procedure cannot be performed on all screenees. American Journal of Epidemiology 125:672–678. Selvin, S. (1996). Statistical Analysis of Epidemiological Data, 2nd edn. New York: Oxford University Press. Smith, D., Bullock, A., Catalona, W. (1997). Racial differences in operating characteristics of prostate cancer screening tests. Journal of Urology 158:1861–1866. Stock, E. M., Stamey, J. D., Sankaranarayanan, R., Young, D. M., Muwonge, R., Arbyn, M. (2012). Estimation of disease prevalence, true positive rates, and false positive rate of two screening tests when disease verification is applied on only screen-positives: A hierarchical model using multi-center data. Cancer Epidemilogy 36:153–160. Walter, S. D. (1999). Estimation of test sensitivity and specificity when disease confirmation is limited to positive results. Epidemilogy 10:67–72. Zhou, X.-H., Obuchowski, N. A., McClish, D. K. (2002). Statistical Methods in Diagnostic Medicine. New York: Wiley.

Test equality between two binary screening tests with a confirmatory procedure restricted on screen positives.

In studies of screening accuracy, we may commonly encounter the data in which a confirmatory procedure is administered to only those subjects with scr...
205KB Sizes 0 Downloads 3 Views