XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

Article

A better confidence interval for the sensitivity at a fixed level of specificity for diagnostic tests with continuous endpoints

Statistical Methods in Medical Research 0(0) 1–12 ! The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0962280214544313 smm.sagepub.com

Guogen Shan

Abstract For a diagnostic test with continuous measurement, it is often important to construct confidence intervals for the sensitivity at a fixed level of specificity. Bootstrap-based confidence intervals were shown to have good performance as compared to others, and the one by Zhou and Qin (2005) was recommended as the best existing confidence interval, named the BTII interval. We propose two new confidence intervals based on the profile variance method and conduct extensive simulation studies to compare the proposed intervals and the BTII intervals under a wide range of conditions. An example from a medical study on severe head trauma is used to illustrate application of the new intervals. The new proposed intervals generally have better performance than the BTII interval. Keywords Confidence intervals, diagnostic test, profile variance, sensitivity, specificity

1 Introduction In diagnostic clinical studies, sensitivity and specificity of a diagnostic test play a very important role in demonstrating the accuracy of the test.1 The sensitivity is defined as the proportion of positive outcomes in the diseased group, and the specificity is the proportion of negative outcomes in the non-diseased group. In practice, outcomes from a study are often continuous, and the specificity is given at the beginning of the study. The specificity is often needed to be large (e.g. 80%, 90%) to make the diagnostic test useful. For a pre-defined specificity level, the corresponding cutpoint (or threshold) must be estimated from the group of subjects without disease. Having obtained the estimator of the threshold, one then estimates the sensitivity of the diagnostic test from the diseased group. The confidence interval estimate for the sensitivity can then be calculated. The confidence interval Department of Environmental and Occupational Health, Epidemiology and Biostatistics Program, School of Community Health Sciences, University of Nevada Las Vegas, Las Vegas, NV, USA Corresponding author: Guogen Shan, Department of Environmental and Occupational Health, Epidemiology and Biostatistics Program, School of Community Health Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA. Email: [email protected]

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

2

Statistical Methods in Medical Research 0(0)

estimate, without considering the variability of the threshold estimate, would not be appropriate, and the coverage of this confidence interval was shown to fall below the level.2 For this reason, Linnet2 proposed a modified confidence interval by considering the extra variability from the estimate of the threshold. Later, Platt et al.3 showed that the confidence interval as proposed by Linnet may have poor coverage in certain scenarios and proposed another confidence interval by adopting Efron’s bias correction and acceleration bootstrap method. They showed that this confidence interval performs better than the one proposed by Linnet for non-normally distributed data but does not have good performance under normal distributions. Later on, Zhou and Qin4 proposed two types of bootstrap confidence intervals (BTI and BTII) based on the improved version of the Wald-type interval. The difference between the BTI and BTII lies in the estimate for the sensitivity, which can be estimated from either the data or samples in the bootstrap step. The BTII using the sensitivity estimate from the bootstrap samples generally outperforms the BTI with regard to the coverage probability. The BTII intervals are probably the best intervals available. Conventional confidence intervals are often calculated by plugging in parameter estimates in the intervals. There is another way to estimate the confidence intervals by treating the parameter of interest in the variance estimate as unknown, then solving the inequality with the parameter of interest as the only unknown parameter to find the intervals. This method is called the variance profile method.5 The confidence intervals based on this method were an improvement gaining better coverage probability when compared to normal approximate confidence intervals in categorical data analysis. This method was also used by Lee and Tu6 to improve the performance of confidence intervals for Cohen’s Kappa. We propose two new confidence intervals based on the variance profile method: one is based on the sensitivity estimate from the data and the other obtains the sensitivity estimate from bootstrap samples. We conduct extensive Monte Carlo exact simulation studies to compare the two new confidence intervals with the BTII intervals. The rest of this article is organized as follows. In Section 2, we briefly review the BTII bootstrap confidence intervals for sensitivity given specificity and propose two new confidence intervals based on the profile variance method. We then conduct Monte Carlo simulation studies to compare the new and existing confidence intervals with regard to coverage probability and length in Section 3. An example from a clinical study on severe head trauma is illustrated to further show the advantage of the proposed confidence intervals at the end of Section 3. Finally, some conclusions are drawn in Section 4.

2 Confidence intervals Suppose X and Y are diagnostic results for the patients from the non-diseased group and the diseased group, respectively. Let FX and FY be the associated distribution function for X and Y. For a given threshold value , sensitivity and specificity of a diagnostic test are defined as Sen ¼ PðY  Þ ¼ 1  FY ðÞ

and Spe ¼ PðX  Þ ¼ FX ðÞ

1 At a fixed level of specificity, p, the threshold value can be estimated as ^ ¼ F1 is X ð pÞ, where F the inverse function of F. The sensitivity at the given level of specificity p, can be estimated by plugging in the estimate of , which has the expression as   Senð pÞ ¼ 1  FY F1 X ð pÞ

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

Shan

3

Let X1 , X2 , . . . , Xm and Y1 , Y2 , . . . , Yn be the observations from the non-diseased group and diseased group, respectively. The sensitivity of the test can be estimated as  Pn  1 ^ I Y  F ð pÞ i X i¼1 d pÞ ¼ Senð n n o where F^X1 ð pÞ ¼ sup z : F^X ðzÞ  p and I(a) is an indicator function with value 1 if a is true and 0 otherwise. In this article, we focus on the construction of two-sided confidence intervals for the sensitivity at a fixed level of specificity. Many methods have been developed to improve the estimate of variance of the sensitivity.2–4 The confidence intervals based on bootstrap method were shown to work well regarding coverage probability and length. Among these, the one from Zhou and Qin4 has the best performance. We briefly introduce their confidence intervals as follow.

2.1

BTII confidence intervals

Zhou and Qin4 proposed bootstrap-based confidence intervals for sensitivity given specificity. The approach starts with samples from the control group X ¼ ðX1 , X2 , . . . , Xm Þ and the diseased group Y ¼ ðY1 , Y2 , . . . , Yn Þ. The first step is to draw m samples from X and n samples from Y with replacement, and the bootstrap samples are denoted as X* and Y*. The second step is to estimate the bootstrap version of the sensitivity, which has the form Pn d ð pÞ ¼ Sen

i¼1

  1 2 I Yi  F^1 X ð pÞ þ 2 z1=2 n þ z21=2

The first two steps are repeated K times, and one obtains K sensitivity estimates based on the d ð pÞ, . . . , Sen d ð pÞ. The bootstrap-based estimates for mean and d ð pÞ, Sen bootstrap samples, Sen 1 2 K variance are calculated as K X d ð pÞ g ð pÞ ¼ 1 Sen Sen i K i¼1

and c  ð pÞ ¼ var

K  2 1 X g ð pÞ d ð pÞ  Sen Sen i K  1 i¼1

The corresponding bootstrap based confidence intervals at the nominal level of 1   are  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi g ð pÞ þ z1=2 var g ð pÞ  z1=2 var c  ð pÞ, Sen c  ð pÞ Sen

ð1Þ

This approach is referred to as the BTII interval by Zhou and Qin.4 They also proposed another g ð pÞ with Senð d pÞ in bootstrap-based confidence interval by replacing the bootstrap mean Sen equation (1), called the BTI interval. They have shown that the BTII interval is generally better

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

4

Statistical Methods in Medical Research 0(0)

than the BTI interval with regard to the coverage probability and length. For this reason, the BTI interval is not included in the comparisons in this article.

2.2

Two new confidence intervals

We now propose a new method to construct confidence intervals for the sensitivity at a fixed level of specificity. The variance of the sensitivity estimate is a function of the sensitivity, which is often estimated either from the data or bootstrap samples. The latter is the bootstrap-type confidence interval. By substituting the sensitivity estimate in the variance, one then computes the confidence interval for the sensitivity. Existing confidence intervals relied on the accuracy of the variance estimate. It is notable from the simulation by Zhou and Qin4 that these intervals do not have accurate coverage probability. To address the unsatisfactory coverage property of existing confidence intervals, the profile variance method5 may be used for constructing the confidence interval. This method has been shown to improve the coverage probability in many statistical problems.6,7 The sensitivity in the variance is considered as an unknown parameter in this method. The confidence interval has to be solved from an inequality. We utilize the profile variance approach to construct confidence intervals. The parameter of interest in the variance estimate is considered as an unknown parameter. The confidence interval based on the profile variance approach can be obtained by solving the following inequality h i2 d pÞ Senð pÞ  Senð  z21=2 varðSenð pÞÞ pÞð1Senð pÞÞ where varðSenð pÞÞ ¼ Senð nþz and z21=2 is added in the denominator of the variance by 2 1=2

following the confidence interval from Agresti and Coull.8 A quadratic polynomial of sensitivity can be formed as a result, and its two roots are the lower and upper confidence limits. The explicit forms for lower and upper limits are pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! E  E2  4DF E þ E2  4DF ðLNewA ð pÞ, UNewA ð pÞÞ ¼ , 2D 2D d pÞ, and F ¼ Sen d2 ð pÞ. The where D ¼ ½n þ 2z21=2 =½n þ z21=2 , E ¼ ðz21=2 Þ=½n þ z21=2   2Senð d pÞ in E derivation of the confidence interval is given in Appendix 1. The sensitivity estimate Senð g ð pÞ to generate another and F may be replaced with the bootstrap sensitivity estimate Sen confidence interval ðLNewB ð pÞ, UNewB ð pÞÞ.

3 Simulation study We conducted extensive Monte Carlo simulation studies to compare the performance between the two new methods based on the variance profile approach (NewA and NewB) and the bootstrapbased BTII method. Specificity is given at 80%, 85%, and 90%, which are the most commonly used specificities in diagnostic medicine. We compare the three methods at the 95% ( ¼ 0:05) confidence interval for sensitivity at each given specificity. The distributions for the diseased group and the non-diseased group are either beta or normal distributions. There are 15 total distribution configurations considered for comparison in

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

Shan

5 Table 1. The parameters used in the simulation study under beta distributions.

1a 1b 1c 1d 1e

ðh ,h Þ

ðd ,d Þ

Specificity

Sensitivity

(1,3.5) (1,3) (1,3) (2,4) (2,3)

(4,1) (3,1) (3,1) (4,2) (3,2)

0.9 0.8 0.9 0.8 0.8

0.95 0.93 0.95 0.82 0.55

The non-diseased group and the diseased group are from beta distributions of Bðh ,h Þ and Bðd ,d Þ.

Table 2. The parameters used in the simulation study under normal distributions.

2a 2b 2c 2d 2e

d

Specificity

Sensitivity

2.9264 2.5631 2.1231 2.4865 1.6832

0.9 0.9 0.9 0.8 0.8

0.95 0.90 0.80 0.95 0.80

The non-diseased group and the diseased group are from normal distributions of Nð0,1Þ and Nðd ,1Þ.

this article: five for both groups from beta distributions (Table 1); five for both groups from normal distributions (Table 2); and five for one group from a beta distribution and the other from a normal distribution (Table 3). In each configuration, sensitivity is calculated under the distribution from the diseased group for a given specificity. To calculate the coverage probability and length for each configuration, we start with simulating 2000 data sets from the distributions in each configuration and then generate 500 bootstrap samples from each data set. We compute the 95% confidence intervals for the BTII method, the NewA method, and NewB method. Sample sizes (m, n) from m ¼ 20, 40, and 60 and n ¼ 20, 40, and 60 are studied. It is difficult to put all the results in one table; therefore, we separate the tables by the number of subjects in the non-diseased group. Tables 4, 5, and 6 are the results under beta distributions for m ¼ 20, 40, and 60, respectively. As seen in Table 4 with m ¼ 20, the difference between the NewA method and the NewB method is negligible; similar results are observed for other cases. Thus, it is not necessary to report both results in all tables. In addition, the NewA method does not require bootstrap samples as in the NewB method; therefore, the NewA method is computationally easy. For these reasons, we only report the results from the NewA method in the following comparisons. In this table, the coverage probability of the BTII method could be lower than the nominal 95% level in multiple cases, which could be as low as 79.4%. The NewA method generally has better coverage than the BTII method. When both methods have similar coverage probabilities (e.g. the case 1e), the new method has shorter length than the BTII. Surprisingly, the new method has better coverage and shorter length than the BTII in some cases, for example, cases 1c, 1d, and 1e. Given the sample size in the non-diseased group and the distribution parameters, the length of confidence interval generally decreases as the sample size in the diseased group increases for all cases, but the coverage probability does not increase as the

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

6

Statistical Methods in Medical Research 0(0) Table 3. The parameters used in the simulation study under beta and normal distributions.

3a 3b 3c 3d 3e

ðh ,h Þ

Specificity

Sensitivity

(1,2) (1,5) (2,1) (3,1) (4,3)

0.8 0.85 0.9 0.8 0.8

0.93 0.95 0.85 0.86 0.90

The non-diseased group is from a beta distribution Bðh ,h Þ, and the diseased group is from a normal distribution Nð2,1Þ.

Table 4. Coverage probabilities and average lengths at a 95% confidence interval under beta distributions with m ¼ 20. n ¼ 20

n ¼ 40

n ¼ 60

Setting

Method

Coverage

Length

Coverage

Length

Coverage

Length

1a

BTII NewA NewB BTII NewA NewB BTII NewA NewB BTII NewA NewB BTII NewA NewB

0.794 0.972 0.972 0.928 0.966 0.966 0.950 0.989 0.989 0.954 0.985 0.985 0.935 0.940 0.947

0.224 0.262 0.267 0.252 0.273 0.277 0.374 0.299 0.304 0.386 0.310 0.315 0.525 0.359 0.361

0.952 0.992 0.992 0.968 0.986 0.986 0.948 0.978 0.990 0.956 0.984 0.984 0.929 0.929 0.930

0.216 0.172 0.177 0.236 0.183 0.188 0.352 0.213 0.217 0.363 0.224 0.228 0.498 0.272 0.275

0.954 0.990 0.990 0.965 0.988 0.988 0.924 0.943 0.956 0.948 0.962 0.967 0.930 0.930 0.930

0.214 0.133 0.139 0.223 0.144 0.149 0.348 0.172 0.176 0.349 0.185 0.189 0.481 0.229 0.230

1b

1c

1d

1e

sample size in the diseased group increases, such as the case 1c. Similar results are observed for other sample sizes under beta distributions. The second five configurations are from normal distributions with the variance assumed to be 1 for both diseased and non-diseased groups. The mean from the non-diseased group is chosen to be 0, and d for the diseased group. Five d values are given in Table 2. The coverage probabilities and lengths are reported in Table 7 for m ¼ 20, and tables for other sample sizes may be obtained from the author’s personal website: https://faculty.unlv.edu/gshan/ The NewA method generally has better coverage than the BTII. Similar results can be concluded under both the normal distribution and the beta distribution. We compare the performance of the NewA method and the BTII method with the same type of distribution for both groups in the first 10 configurations. The last five configurations are for mixed

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

Shan

7 Table 5. Coverage probabilities and average lengths at a 95% confidence interval under beta distributions with m ¼ 40. n ¼ 20

n ¼ 40

n ¼ 60

Setting

Method

Coverage

Length

Coverage

Length

Coverage

Length

1a

BTII NewA BTII NewA BTII NewA BTII NewA BTII NewA

0.808 0.942 0.914 0.958 0.957 0.982 0.956 0.978 0.950 0.957

0.198 0.262 0.216 0.271 0.331 0.303 0.343 0.314 0.468 0.362

0.957 0.978 0.956 0.973 0.966 0.978 0.966 0.986 0.948 0.949

0.182 0.173 0.196 0.186 0.309 0.219 0.308 0.228 0.420 0.276

0.966 0.984 0.965 0.982 0.961 0.968 0.951 0.964 0.946 0.946

0.175 0.134 0.181 0.145 0.290 0.177 0.286 0.187 0.396 0.232

1b 1c 1d 1e

Table 6. Coverage probabilities and average lengths at a 95% confidence interval under beta distributions with m ¼ 60. n ¼ 20

n ¼ 40

n ¼ 60

Setting

Method

Coverage

Length

Coverage

Length

Coverage

Length

1a

BTII NewA BTII NewA BTII NewA BTII NewA BTII NewA

0.743 0.919 0.894 0.948 0.940 0.966 0.954 0.976 0.950 0.954

0.182 0.263 0.206 0.274 0.316 0.305 0.322 0.314 0.441 0.363

0.954 0.970 0.942 0.966 0.964 0.976 0.958 0.980 0.956 0.956

0.167 0.174 0.180 0.185 0.282 0.219 0.282 0.229 0.385 0.277

0.966 0.982 0.964 0.977 0.960 0.969 0.951 0.964 0.951 0.951

0.154 0.136 0.165 0.146 0.262 0.179 0.261 0.188 0.356 0.233

1b 1c 1d 1e

distributions, with a normal distribution for the diseased group and a beta distribution for the nondiseased group. The results are presented in Table 8 for m ¼ 20, and tables for other sample sizes can be found at the author’s personal website: https://faculty.unlv.edu/gshan/ In small sample settings with m ¼ 20 (Table 8), the coverage probabilities for the BTII method are very low for the case 3a and the case 3b when n ¼ 20, which could be as low as 52.9% at the nominal level of 95%. The coverage probability of the BTII method is generally lower than the nominal level. Although the length of the NewA method is slightly longer than that of the BTII method under these five configurations, the NewA has much better performance regarding the coverage property. We also present the comparison between the two methods by some figures. Figure 1 shows the coverage comparison between the NewA method and the BTII method when the data follow normal distribution for both diseased and non-diseased group, and the sample size in the non-diseased

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

8

Statistical Methods in Medical Research 0(0) Table 7. Coverage probabilities and average lengths at a 95% confidence interval under normal distributions with m ¼ 20. n ¼ 20

n ¼ 40

n ¼ 60

Setting

Method

Coverage

Length

Coverage

Length

Coverage

Length

2a

BTII NewA BTII NewA BTII NewA BTII NewA BTII NewA

0.794 0.976 0.941 0.987 0.946 0.987 0.830 0.944 0.956 0.978

0.214 0.258 0.307 0.281 0.431 0.313 0.212 0.263 0.412 0.319

0.948 0.994 0.948 0.994 0.923 0.961 0.972 0.984 0.959 0.978

0.207 0.169 0.296 0.192 0.411 0.226 0.196 0.172 0.388 0.233

0.958 0.994 0.938 0.997 0.912 0.932 0.968 0.990 0.950 0.959

0.200 0.130 0.286 0.152 0.400 0.186 0.182 0.132 0.375 0.191

2b 2c 2d 2e

Table 8. Coverage probabilities and average lengths at a 95% confidence interval under a normal distribution for the diseased group and a beta distribution for the non-diseased group with m ¼ 20. n ¼ 20

n ¼ 40

n ¼ 60

Setting

Method

Coverage

Length

Coverage

Length

Coverage

Length

3a

BTII NewA BTII NewA BTII NewA BTII NewA BTII NewA

0.771 0.898 0.529 0.841 0.910 0.956 0.908 0.951 0.854 0.937

0.178 0.274 0.128 0.260 0.242 0.304 0.244 0.304 0.203 0.286

0.925 0.956 0.867 0.950 0.944 0.972 0.948 0.962 0.942 0.958

0.152 0.185 0.112 0.169 0.197 0.218 0.195 0.218 0.171 0.201

0.953 0.966 0.934 0.974 0.950 0.975 0.948 0.970 0.950 0.968

0.134 0.147 0.100 0.131 0.169 0.179 0.167 0.178 0.147 0.161

3b 3c 3d 3e

group is 20. For all the five parameter configurations, the NewA method has higher coverage than the BTII method. We also present the length comparison between the two methods under five configurations with sample size n ¼ 20, 40, and 60, in Figure 2. The dots under the diagonal line represent that the length of the NewA method is shorter than that of the BTII method. Out of total 15 cases, the NewA method has shorter length than the BTII method in 13 cases.

3.1

An example

We consider an example from a clinical trial in Hans et al.9 to predict the outcome of severe head trauma. A total of n þ m ¼ 60 patients with possible severe head trauma were enrolled in

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

Shan

9 (b)

(a)

(d)

(c)

(e)

Figure 1. Coverage for the NewA method and the BTII method under normal distributions when m ¼ 20.

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

10

Statistical Methods in Medical Research 0(0)

Figure 2. Compare the length of the NewA method and the BTII method under normal distributions when m ¼ 20. Three dots in each line represent the length of the two methods for n ¼ 20, 40, and 60. The length is a non-increasing function of sample size n. The dots under the diagonal line show that the NewA method has shorter length than the BTII method.

Table 9. Confidence intervals and lengths for sensitivity at the fixed level of specificity of 80%, 85%, and 90% at the nominal level of 95% for the trauma example. Confidence interval BTII

Length NewA

NewB

Specificity

Sensitivity

Lower

Upper

Lower

Upper

Lower

Upper

BTII

NewA

NewB

0.80 0.85 0.90

0.659 0.634 0.561

0.451 0.428 0.413

0.870 0.813 0.775

0.499 0.477 0.412

0.768 0.750 0.691

0.526 0.486 0.447

0.791 0.758 0.723

0.419 0.385 0.362

0.270 0.273 0.279

0.265 0.272 0.276

the hospital. Each patient was measured for cerebrospinal fluid creatine kinase (CK)-BB isoenzyme on admission which presents within 24 hours after injury. Among these 60 patients, m ¼ 19 patients were considered non-diseased subjects, and the remaining n ¼ 41 patients were in poor condition and considered to be in the diseased group. The associated data for the measurement of CK-BB may be found in Table 4.9 of Zhou et al.10 For given specificities at the level of 80%, 85%, and 90%, the coverage probabilities and lengths for the new method and the BTII are presented in Table 9. The two new methods are comparable in this example.

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

Shan

11

The lengths of two-sided 95% confidence intervals for the NewA method and the NewB method are similar, and both are shorter than those from the BTII method.

4 Conclusion In this article, we propose two confidence intervals for sensitivity at given specificity, for diagnostic tests with continuous endpoints. The new methods are based on the profile variance approach, and the confidence intervals are solved from a quadratic polynomial. We compare the proposed intervals with the existing BTII interval which is known to be associated with good performance in coverage probability and length. By conducting extensive Monte Carlo exact simulation studies, we show that the new method generally has better coverage and shorter length than the BTII method. The coverage probability of the BTII method could be well below the nominal level in some cases when the sample size is small. The proposed confidence intervals are recommended for use in practice. The program is written in R code,11 which is available from the author. We consider applying the profile variance method to construct other confidence intervals as future work (e.g. confidence interval for the difference in sensitivities given specificity,12 confidence interval for trends with binary endpoints13–15). Acknowledgment The author would like to thank the two referees for their valuable comments and suggestions that helped to improve this manuscript.

Funding The author’s work is partially supported by a FOA grant from UNLV.

References 1. Shapiro DE. The interpretation of diagnostic tests. Stat Methods Med Res 1999; 8: 113–134. 2. Linnet K. Comparison of quantitative diagnostic tests: type I error, power, and sample size. Stat Med 1987; 6: 147–158. 3. Platt RW, Hanley JA and Yang H. Bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test. Stat Med 2000; 19: 313–322. 4. Zhou XH and Qin G. Improved confidence intervals for the sensitivity at a fixed level of specificity of a continuous-scale diagnostic test. Stat Med 2005; 24: 465–477. 5. Wilson EB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc 1927; 22: 209–212. 6. Lee JJ and Tu ZN. A better confidence interval for kappa () on measuring agreement between two raters with binary outcomes. J Comput Graph Stat 1994; 3: 301–321. 7. Bickel PJ and Doksum KA. Mathematical statistics. San Francisco: Holden-Day, Inc, 1977. 8. Agresti A and Coull BA. Approximate is better than ‘‘exact’’ for interval estimation of binomial proportions. Am Stat 1998; 52: 119–126.

9. Hans P, Albert A, Born JD, et al. Derivation of a bioclinical prognostic index in severe head injury. Intensive Care Medicine 1985; 11: 186–191. 10. Zhou X-H, Obuchowski NA and McClish DK. Statistical methods in diagnostic medicine. Hoboken, NJ: John Wiley & Sons, Inc, 2011. 11. Shan G and Wang W. ExactCIdiff: an R package for computing exact confidence intervals for the difference of two proportions. R Journal 2013; 5: 62–71. 12. Qin G, Hsu YS and Zhou XH. New confidence intervals for the difference between two sensitivities at a fixed level of specificity. Stat Med 2006; 25: 3487–3502. 13. Shan G, Ma C, Hutson AD, et al. An efficient and exact approach for detecting trends with binary endpoints. Stat Med 2012; 31: 155–164. 14. Shan G, Ma C, Hutson AD, et al. Some tests for detecting trends based on the modified Baumgartner-WeißSchindler statistics. Comput Stat Data Anal 2013; 57: 246–261. 15. Shan G and Ma C. Unconditional tests for comparing two ordered multinomials. Stat Methods Med Res 2012. Epub ahead of print 13 June 2012. DOI: 10.1177/0962280212450957.

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

XML Template (2014) [17.7.2014–9:46am] [1–12] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140086/APPFile/SG-SMMJ140086.3d (SMM) [PREPRINTER stage]

12

Statistical Methods in Medical Research 0(0)

Appendix 1 The confidence interval based on the variance profile method can be obtained by solving the equation " # h i2 Senð pÞ ð 1  Senð pÞ Þ d pÞ ¼ z2 Senð pÞ  Senð 1=2 n þ z21=2 It can be rewritten as D  Senð pÞ2 þ E  Senð pÞ þ F ¼ 0 d pÞ, where D ¼ ½n þ 2z21=2 =½n þ z21=2 , E ¼ ðz21=2 Þ=½n þ z21=2   2Senð Therefore, the lower and upper limits are pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! E  E2  4DF E þ E2  4DF ðLð pÞ, Uð pÞÞ ¼ , 2D 2D

Downloaded from smm.sagepub.com at UNIV OF PITTSBURGH on November 18, 2015

and

d2 ð pÞ. F ¼ Sen

A better confidence interval for the sensitivity at a fixed level of specificity for diagnostic tests with continuous endpoints.

For a diagnostic test with continuous measurement, it is often important to construct confidence intervals for the sensitivity at a fixed level of spe...
262KB Sizes 0 Downloads 3 Views