Regression analysis of informative current status data with the additive hazards model.

Lifetime Data Anal DOI 10.1007/s10985-014-9303-y

Regression analysis of informative current status data with the additive hazards model Shishun Zhao · Tao Hu · Ling Ma · Peijie Wang · Jianguo Sun

Received: 24 January 2014 / Accepted: 16 July 2014 © Springer Science+Business Media New York 2014

Abstract This paper discusses regression analysis of current status failure time data arising from the additive hazards model in the presence of informative censoring. Many methods have been developed for regression analysis of current status data under various regression models if the censoring is noninformative, and also there exists a large literature on parametric analysis of informative current status data in the context of tumorgenicity experiments. In this paper, a semiparametric maximum likelihood estimation procedure is presented and in the method, the copula model is employed to describe the relationship between the failure time of interest and the censoring time. Furthermore, I-splines are used to approximate the nonparametric functions involved and the asymptotic consistency and normality of the proposed estimators are established. A simulation study is conducted and indicates that the proposed approach works well for practical situations. An illustrative example is also provided. Keywords Additive hazards model · Current status data · Efficient estimation · Informative censoring

S. Zhao · P. Wang College of Mathematics, Jilin University, Changchun 130012, People’s Republic of China T. Hu School of Mathematical Sciences and BCMIIS, Capital Normal University, Beijing 100048, People’s Republic of China L. Ma · J. Sun (B) Department of Statistics, University of Missouri, 146 Middlebush Hall, MO 65211, USA e-mail: [email protected]

123

S. Zhao et al.

1 Introduction This paper discusses regression analysis of current status failure time data arising from the additive hazards model in the presence of informative censoring. By current status data, we mean that the failure time T of interest is observed only once at a censoring or observation time C and the informative censoring means that T and C may be correlated (Keiding 1991; Sun 2006; Titman 2013; Wang et al. 2012; Zhang et al. 2005). In other words, T is known only to be either smaller or greater than C. In the following, we discuss semiparametric regression analysis of such data and present a sieve maximum likelihood estimation procedure. An example of informative current status data occurs in the tumorigenicity experiments that concern the time to tumor onset. In these cases, the study animals are usually only observed at their death for the presence or absence of the tumor under study and thus on the time to tumor onset, the failure time of interest, only current status data are available. Since most of tumors are between lethal and non-lethal, the tumor onset time and the death time tend be correlated, thus yielding infomative current status data. For the analysis of such data, without extra information, one usually has to rely on sensitivity analysis with respect to the type and degree of the correlation (Lagakos and Louis 1988). It is easy to see that one way for specifying the type of the correlation is through some models such as those described below. For the analysis of informative failure time data, a commonly used method is the frailty model approach, which describes the relationship between the failure time of interest and censoring time through some latent variables (Chen et al. 2012a,b; Huang and Wolfe 2002; Zhang et al. 2005). For example, Huang and Wolfe (2002) and Zhang et al. (2005) applied the approach to regression analysis of informative right-censored and current status failure time data, respectively, arising from the proportional hazards model. In both of the proposed methods, they assumed that the dependence between the failure time of interest and the right censoring or observation time can be characterized by some latent random variables. Chen et al. (2012a) considered the same problem as Zhang et al. (2005), but under the semiparametric transformation model. It is not difficult to see that the frailty model approach is restrictive as it can describe only limited types of correlations. In this paper, we adopt the copula model approach, which is much more general and commonly used for the analysis of correlated failure time data (Chen et al. 2009; Hougaard 2000; Nelsen 2006; Zheng and Klein 1995). The additive hazards model is one of the most commonly used regression models in failure time data analysis and many authors have considered the inference about it under various situations (e.g. Chen and Sun 2009; Li et al. 2012; Lin et al. 1998; Lin and Ying 1994; Martinussen and Scheike 2002; Sun et al. 2006; Tong et al. 2012; Wang et al. 2010; Zhou and Sun 2003). For example, Lin and Ying (1994) discussed regression analysis of right-censored data arising from the additive hazards model and Chen and Sun (2009) presented an imputation procedure for the model when one observes current status data. Lin et al. (1998) and Martinussen and Scheike (2002) investigated the same problem as Chen and Sun (2009) and developed some estimating equationbased approaches. More recently Li et al. (2012) and Tong et al. (2012) studied the fitting of the model to clustered interval-censored data and bivariate current status

123

Analysis of informative current status data

data, respectively. More work on the model can be found in Chen et al. (2012a) and Sun (2006) among others. In the following, we will first describe the assumptions and models that will be used throughout the paper in Sect. 2. In particular, it will be assumed that the failure time of interest follows the additive hazards model, and we will employ the copula model to describe the possible correlation between the failure time and the observation time. A main advantage of copula models is their flexibility as they can describe various correlations. For inference, Sect. 3 develops a sieve maximum likelihood estimation procedure and in the method, I-spline functions are used to approximate the unknown functions involved (Ramsay 1988; Lu et al. 2007). In addition, the asymptotic properties of the proposed estimators are provided in Sect. 3. Section 4 presents some numerical results and they indicate that the proposed methodology seems to work well in practical situations. 2 Assumptions and models Consider a failure time study that consists of n independent subjects. For subject i, let Ti denote the failure time of interest and Z i the p-dimensional vector of covariates related to the subject. To describe the covariate effects, we assume that the hazard function of the Ti ’s has the form λ(t|Z ) = λ10 (t) + Z β.

(1)

Here λ0 (t) is an unspecified baseline hazard function and β denotes the vector of regression parameters. That is, the Ti ’s follow the additive hazards model (Lin and Ying 1994). As mentioned above, the additive hazards model is one of the most commonly used regression models in failure time data analysis. This is especially the case when one is interested in the risk difference as often the situation in, for example, epidemiology and public health (Kulich and Lin 2000). For inference about model (1), suppose that we observe current status data given by { Ci , δi = I (Ti ≤ Ci ) }, where Ci denotes the censoring or observation time on subject i. In other words, each subject is observed only once at Ci and we only knows if Ti is left- or right-censored at Ci . In practice, the observation times Ci ’s may depend on covariates too. For this, we will assume that they follow the proportional hazards model given by (2) λ(c|Z ) = λ20 (c) exp(γ Z ), where λ20 (c) denotes an unspecified baseline hazard function and γ a vector of regression parameters. Let FT and FC denote the marginal distribution functions of the Ti ’s and Ci ’s given covariates, respectively, and F their joint distribution. Then it follows from the Theorem 2.3.3 of Nelsen (2006) that there exists a copula function Mα (u, v) defined on I 2 = [0, 1]×[0, 1] with Mα (u, 0) = Mα (0, v) = 0, Mα (u, 1) = u and Mα (1, v) = v such that F(t, c) = Mα ((FT (t), FC (c)).

123

S. Zhao et al.

In the above, the parameter α represents the association between the Ti ’s and Ci ’s. Furthermore, by following the conditional inversion idea (Nelsen 2006), we have P(T ≤ t|C = c, Z ) =

∂ Mα (u, v) . u=FT (t),v=FC (c) ∂v

In the following, we will denote the function above by m α (FT (t), FC (c)) for simplicity. As mentioned above, the copula model is commonly used to describe the correlation between variables and among others, one advantage is that it allows one to model the correlation and the marginal distribution separately (Zheng and Klein 1995). On the other hand, it has been pointed out that given the copula function, the association parameter α is generally not identifiable without prior or extra information (Wang et al. 2012; Titman 2013; Zheng and Klein 1995). In the following, we will assume that both the copula function and α are known and the interest is to estimate regression parameters β and γ . More comments on this are given below. 3 Estimation and inference procedures Now we discuss inference about model (1) as well as model (2). For this, we will employ the sieve maximum likelihood approach and first develop the likelihood function and a sieve space constructed based on monotone increasing and non-negative I-spline functions. c t Define Λ10 (t) = 0 λ10 (s)ds, Λ20 (c) = 0 λ20 (s)ds, and θ = (β, γ , Λ1 , Λ2 ). Let f C denote the marginal density function of the Ci ’s given covariates. Then under the models above, we have FT (t) = 1 − exp{−Λ10 (t) − Z βt}, FC (c) = 1 − exp{−Λ20 (c) exp(γ Z )}, and f C (c) = exp{−Λ20 (c) exp(γ Z )} λ20 (c) exp(γ Z ). Furthermore, the likelihood function can be written as L(θ) =

n

P(Ci = ci , δ = 1|Z i )δi P(Ci = ci , δi = 0|Z i )1−δi

i=1

=

n

P(Ti ≤ ci |Ci = ci , Z i ) f C (ci )

δi

[1 − P(Ti ≤ ci |Ci = ci , Z i )] f C (ci )

1−δi

i=1 n δi 1−δi m α (FT (ci ), FC (ci )) f C (ci ) [1 − m α (FT (ci ), FC (ci ))] f C (ci ) = . i=1

(3)

123


Let ψ = (Λ1 , Λ2 ) and u c denote an upper bound of the observation times Ci ’s. Define the sieve space Θn = θn = (β, γ , ψn ) : ψn = (Λ1n (t), Λ2n (t)) = B ⊗ M1n ⊗ M2n , where B = {(β, γ ∈ R 2p , β + γ ≤ M}, M1n = {Λ1n : Λ1n (t) =

k+k n

ξ j I j1 (t), ξ j ≥ 0,

j=1

k+k n

|ξ j | ≤ Mn , t ∈ [0, u c ]},

j=1

and M2n = {Λ2n : Λ2n (t) =

k+k n j=1

η j I j2 (t), η j ≥ 0,

k+k n

|η j | ≤ Mn , t ∈ [0, u c ]}.

j=1

In the above, the I j1 ’s and I j2 ’s are I-spline base functions and k and kn = o(n ν ) represent the order and the number of interior knots of the functions, respectively, with 0 < ν < 0.5. For estimation of the parameter θ , we propose to maximize the log likelihood ˆ γˆ , Λˆ 1n , Λˆ 2n ) denote function ln (θ ) = log L(θ ) over the sieve space Θn . Let θˆ = (β, the value of θ that maximizes ln (θn ) over Θn . Then under some regularity conditions, we can show that they are consistent estimators and the joint distribution of βˆ and γˆ can be approximated by a normal distribution. More specifically, as n → ∞, βˆ and γˆ are strongly consistent and we have

and

Λˆ 1n − Λ10 2 −→0 , Λˆ 2n − Λ20 2 −→0 almost surely

(4)

Λˆ 1n − Λ10 2 + Λˆ 2n − Λ20 2 = O p (n −(1−ν)/2 + n −r ν ),

(5)

where · 2 denotes the L 2 norm and r = k + η with η being defined in the Appendix. Furthermore, we have n 1/2 ( (βˆ − β0 ) , (γˆ − γ0 ) ) → N (0, Σ)

(6)

in distribution and βˆ and γˆ are semiparametrically efficient, where β0 and γ0 denote the true values of β and γ , respectively, and Σ is defined in the Appendix. The proofs of the results given above are sketched in the Appendix. It is apparent that for inference about β and γ , one needs to estimate the asymptotic covariance matrix Σ. For this, we suggest to use the observed information matrix approach, which estimates Σ by the submatrix of the inverse of the observed information matrix from l(θn ) corresponding to β and γ . Note that for a given problem, this method could be computationally intensive and also one could have to deal with a singular matrix.

123

S. Zhao et al.

On the other hand, the numerical study given in the next section indicates that the method seems to work reasonably well as long as k and kn are not too large. For the implementation of the methodology proposed above, one needs to choose k and kn and a simple approach for this is to try several different numbers and compare the conclusions. Also in practice, one needs to choose the copula function M and the association parameter α and for this, as pointed out above, a common and simple approach is similar to conducting a sensitivity analysis. In other words, as for the selection of k and kn , one can try several different copula function M as well as different association parameter α and compare the obtained results. Another method is to apply some selection criterion such as AIC for their selection. More comments on the both methods are given in the next section. 4 Numerical studies In this section, we first present some results obtained from a simulation study conducted to assess the finite sample performance of the proposed methodology in the previous sections and then give an illustrative example. For the simulation study, we supposed that the covariate Z follows the Bernoulli distribution with the success probability of 0.5 and considered two copula models given below, Mα (u, v) =

uv + αuv(1 − u)(1 − v) , −1 ≤ α ≤ 1, logα {1 + (α u − 1)(α v − 1)/(α − 1)} , α > 0, α = 1.

The fomer is commonly referred to as the FGM model, while the latter the Frank model. Since the association parameter α has different ranges for different copula models, we use the Kendall’s τ to measure the association between the Ti ’s and Ci ’s. Note that for the FGM copula, we have τ = 2α/9 and for the Frank copula,the relationship is x τ = 1 + 4x −1 {D1 (x) − 1}, where x = − log α and D1 (x) = x −1 0 t (et − 1)−1 dt. For subject i, to generate the failure time Ti and the observation time Ci , we first generated two independent random numbers u i and wi from the uniform distribution over (0, 1) and then obtained the number vi by solving the equation wi = ∂ Mα (u, v)/∂u|u=u i ,v=vi . It follows to define Ti = ti and Ci = ci , where ti and ci denote the solutions to the equations FT (ti ) = u i and FC (ci ) = vi , respectively. The results given below are based on n = 200, 500 replications and λ10 (t) = λ20 (c) = 1. Table 1 presents the results on estimation of β and γ based on the simulated data generated under the FGM model with the true values β0 = 0 and γ0 = −0.4, −0.2, 0, 0.2, 0.4 and τ = −0.2, −0.1, 0, 0.1, 0.2. Here for the sieve space, we used quadratic spline functions with the interior knots set to be the 0.2, 0.4, 0.6 and 0.8 quantiles of the observation times Ci ’s. That is, k = 3 and kn = 4. The results include the estimated bias (Bias) given by the average of the estimators minus the true value, the sample standard deviation (SSE) of the estimators, the average of the estimated standard errors (SEE), and the 95 % empirical coverage probability (CP). The results given in Table 2 are obtained under the same set-up as in Table 1 except β0 = 0.4. One can see from the two tables that the proposed estimator seems to unbiased and the variance estimation procedure also seems to be reasonable. In addition, the normal

123

Analysis of informative current status data Table 1 Estimation of regression parameters under the FGM model with β0 = 0 βˆ γ0 -0.4

-0.2

0.0

0.2

0.4

τ

Bias

γˆ SSE

SEE

CP

Bias

SSE

SEE

CP

−0.20

−0.0305

0.2903

0.2644

0.942

−0.0222

0.1396

0.1456

0.964

−0.10

−0.0251

0.2666

0.2451

0.936

−0.0224

0.1396

0.1455

0.960

0.00

−0.0138

0.2463

0.2305

0.944

−0.0226

0.1408

0.1455

0.956

0.10

−0.0162

0.2254

0.2179

0.948

−0.0224

0.1426

0.1455

0.950

0.20

−0.0097

0.2187

0.2084

0.950

−0.0217

0.1440

0.1451

0.952

−0.20

−0.0188

0.2935

0.2654

0.940

−0.0177

0.1379

0.1436

0.966

−0.10

−0.0218

0.2645

0.2441

0.941

−0.0180

0.1381

0.1435

0.958

0.00

−0.0125

0.2394

0.2288

0.945

−0.0180

0.1392

0.1435

0.956

0.10

−0.0139

0.2256

0.2165

0.958

−0.0177

0.1412

0.1434

0.954

0.20

−0.0089

0.2172

0.2069

0.952

−0.0169

0.1422

0.1430

0.952

−0.20

−0.0065

0.2874

0.2683

0.962

−0.0136

0.1369

0.1428

0.958 0.956

−0.10

−0.0110

0.2564

0.2466

0.953

−0.0136

0.1377

0.1427

0.00

−0.0124

0.2375

0.2290

0.949

−0.0134

0.1389

0.1428

0.950

0.10

−0.0139

0.2211

0.2179

0.960

−0.0128

0.1408

0.1427

0.958

0.20

−0.0093

0.2161

0.2078

0.939

−0.0119

0.1417

0.1422

0.956

−0.20

0.0021

0.2858

0.2706

0.953

−0.0096

0.1369

0.1433

0.962

−0.10

−0.0011

0.2516

0.2494

0.967

−0.0097

0.1379

0.1433

0.958

0.00

−0.0034

0.2360

0.2346

0.953

−0.0093

0.1396

0.1434

0.948

0.10

−0.0072

0.2237

0.2209

0.952

−0.0082

0.1415

0.1433

0.950

0.20

−0.0062

0.2165

0.2089

0.941

−0.0069

0.1424

0.1427

0.960

−0.20

0.0081

0.2943

0.2764

0.954

−0.0056

0.1377

0.1452

0.964

−0.10

−0.0018

0.2594

0.2559

0.947

−0.0061

0.1387

0.1451

0.958

0.00

0.0003

0.2366

0.2398

0.965

−0.0054

0.1406

0.1452

0.958

0.10

−0.0049

0.2253

0.2284

0.957

−0.0041

0.1430

0.1451

0.956

0.20

−0.0007

0.2196

0.2182

0.960

−0.0024

0.1441

0.1446

0.958

approximation to the distribution of the estimated regression parameters appears to work well too. The estimation results based on the simulated data generated under the Frank model are given in Tables 3 and 4. In these situations, all other set-ups are the same as in Tables 1 and 2, respectively except τ = −0.5, −0.25, −0.05, 0.05, 0.25 and 0.5. We can see that the results here give similar conclusions to those obtained from Tables 1 and 2. In addition, as seen in Tables 1 and 2, the behavior of the proposed estimator of β seems to change according to the association parameter, while the behavior of the proposed estimator of γ seems to be robust with respect to the association parameter. Following the suggestion of a reviewer, we also performed a study to compare the proposed estimator to the estimator that one would obtain if ignoring the informative censoring. For this, for the simulated data generated above, we protended τ = 0 and obtained the proposed estimator, which will be denoted by βˆ0 below. Table 5 presents

123

S. Zhao et al. Table 2 Estimation of regression parameters under the FGM model with β0 = 0.4 βˆ γ0 -0.4

-0.2

0.0

0.2

0.4

τ

Bias

γˆ SSE

SEE

CP

Bias

SSE

SEE

CP

−0.20

−0.0132

0.3438

0.3149

0.940

−0.0232

0.1394

0.1456

0.964

−0.10

−0.00009

0.3180

0.2965

0.946

−0.0229

0.1398

0.1455

0.956

0.00

0.0044

0.2895

0.2819

0.958

−0.0226

0.1408

0.1455

0.956

0.10

0.0091

0.2687

0.2704

0.954

−0.0215

0.1429

0.1455

0.950

0.20

0.0095

0.2633

0.2592

0.960

−0.0199

0.1446

0.1452

0.948

−0.20

−0.0067

0.3464

0.3169

0.936

−0.0186

0.1378

0.1436

0.964

−0.10

0.0019

0.3145

0.2958

0.942

−0.0186

0.1382

0.1435

0.958

0.00

0.0027

0.2866

0.2776

0.948

−0.0180

0.1392

0.1435

0.956

0.10

−0.0008

0.2675

0.2645

0.952

−0.0172

0.1413

0.1435

0.954

0.20

0.0137

0.2587

0.2539

0.950

−0.0155

0.1428

0.1431

0.954

−0.20

−0.0006

0.3533

0.3203

0.924

−0.0144

0.1371

0.1428

0.956

−0.10

−0.0029

0.3184

0.2957

0.934

−0.0142

0.1374

0.1427

0.954

0.00

−0.0016

0.2860

0.2782

0.952

−0.0134

0.1389

0.1428

0.950

0.10

0.0002

0.2674

0.2632

0.960

−0.0122

0.1410

0.1427

0.958

0.20

0.0094

0.2636

0.2519

0.950

−0.0110

0.1422

0.1423

0.958

−0.20

0.0142

0.3426

0.3259

0.950

−0.0103

0.1370

0.1434

0.960

−0.10

0.0056

0.3100

0.3004

0.950

−0.0100

0.1380

0.1433

0.956

0.00

0.0069

0.2882

0.2814

0.964

−0.0093

0.1396

0.1434

0.948

0.10

0.0012

0.2637

0.2652

0.954

−0.0078

0.1417

0.1433

0.952

0.20

0.0113

0.2646

0.2529

0.946

−0.0057

0.1429

0.1429

0.960

−0.20

0.0280

0.3372

0.3338

0.950

−0.0065

0.1377

0.1451

0.964 0.958

−0.10

0.0138

0.3185

0.3073

0.942

−0.0061

0.1388

0.1451

0.00

0.0021

0.2867

0.2865

0.954

−0.0054

0.1406

0.1452

0.958

0.10

0.0013

0.2697

0.2709

0.952

−0.0040

0.1428

0.1451

0.956

0.20

0.0053

0.2624

0.2576

0.940

−0.0016

0.1439

0.1447

0.956

the estimated bias of βˆ0 based on the simulated data generated under the Frank model with the same set-ups as in Tables 3 and 4, and for comparison, the estimated biases of βˆ given in Tables 3 and 4 are also included in Table 5. One can easily see that the estimator βˆ0 seems to be biased, especially when τ is away from zero as expected. To illustrate the methodology proposed in the previous sections, we apply it to a set of current status data arising from a lung tumor study on 144 male RFM mice discussed in Sun (2006) among others. In the study, each animal was put into either conventional environment (96 mice) or germ-free environment (48 mice) and one objective of interest is to compare the tumor growth rates between the two treatments. The observed information includes the death time in days and the presence or absence of the lung tumor at the death. Among the mice in the two groups, 27 and 35 mice were observed to have the lung tumor at the death time, respectively. For the analysis, let Ti denote the time to tumor onset and define Z i = 1 if the ith animal was in conventional

123

Analysis of informative current status data Table 3 Estimation of regression parameters under the Frank model with β0 = 0 βˆ γ0 -0.4

-0.2

τ

Bias

0.2

0.4

SSE

SEE

CP

Bias

SSE

SEE

CP

−0.50

−0.0147

0.2907

0.2717

0.938

−0.0204

0.1388

0.1453

0.962

−0.25

−0.0166

0.2716

0.2594

0.954

−0.0218

0.1384

0.1454

0.960

−0.05

−0.0216

0.2542

0.2365

0.949

−0.0227

0.1403

0.1455

0.958

0.05

−0.0164

0.2286

0.2240

0.958

−0.0224

0.1416

0.1455

0.956

0.25

−0.0008

0.2081

0.1957

0.954

−0.0207

0.1445

0.1449

0.948

0.50

0.0046

0.1762

0.1677

0.946

−0.0149

0.1464

0.1425

0.946

−0.50

−0.0082

0.2833

0.2677

0.960

−0.0154

0.1357

0.1434

0.964

−0.25

−0.0096

0.2732

0.2610

0.956

−0.0175

0.1363

0.1433

0.964

−0.05

−0.0162

0.2524

0.2358

0.946

−0.0181

0.1386

0.1435

0.958

0.05

−0.0103

0.2236

0.2216

0.951

−0.0178

0.1401

0.1435

0.950

0.25

−0.0050

0.2017

0.1929

0.945

−0.0161

0.1426

0.1428

0.946

0.50 0.0

γˆ

−0.50

0.0039

0.1700

0.1656

0.950

−0.0100

0.1451

0.1402

0.942

−0.00008

0.2819

0.2674

0.942

−0.0113

0.1349

0.1428

0.964

−0.25

0.0005

0.2693

0.2644

0.954

−0.0138

0.1353

0.1428

0.966

−0.05

−0.0092

0.2442

0.2376

0.951

−0.0136

0.1382

0.1428

0.952

0.05

−0.0108

0.2274

0.2229

0.961

−0.0132

0.1397

0.1428

0.952

0.25

−0.0026

0.2018

0.1953

0.949

−0.0109

0.1426

0.1420

0.950

0.50

0.0037

0.1715

0.1654

0.952

−0.0044

0.1448

0.1391

0.952

−0.50

0.0037

0.2881

0.2703

0.956

−0.0072

0.1344

0.1431

0.968

−0.25

0.0063

0.2761

0.2642

0.947

−0.0104

0.1351

0.1430

0.968

−0.05

−0.0011

0.2412

0.2398

0.951

−0.0096

0.1387

0.1433

0.952

0.05

−0.0040

0.2266

0.2266

0.955

−0.0088

0.1404

0.1434

0.955

0.25

−0.0047

0.2032

0.1992

0.950

−0.0066

0.1428

0.1425

0.956

0.50

0.0040

0.1757

0.1690

0.940

0.0020

0.1445

0.1395

0.946

−0.50

0.0130

0.2962

0.2775

0.941

−0.0023

0.1365

0.1446

0.960 0.968

−0.25

0.0082

0.2867

0.2695

0.952

−0.0063

0.1362

0.1447

−0.05

−0.0017

0.2468

0.2463

0.951

−0.0059

0.1396

0.1451

0.958

0.05

0.0007

0.2303

0.2329

0.968

−0.0048

0.1420

0.1452

0.954

0.25

0.0013

0.2077

0.2065

0.956

−0.0017

0.1446

0.1444

0.952

0.50

0.0061

0.1853

0.1766

0.933

0.0060

0.1469

0.1412

0.960

environment and 0 otherwise. Note that for the tumorigenicity experiments like the one discussed here, they are usually designed to determine whether a suspected agent or environment accelerates the time to tumor onset in the experimental animals, and one observes only current status data as mentioned above. For the analysis, as in the simulation study, we considered both the FGM and Frank models with a number of different possible values for τ and calculated the AIC for each set-up. Table 6 presents the estimated treatment effect on the tumor growth at the value of τ that gives the smallest AIC and several other close values under both the FGM and

123

S. Zhao et al. Table 4 Estimation of regression parameters under the Frank model with β0 = 0.4 βˆ γ0 -0.4

-0.2

0.0

0.2

0.4

τ

Bias

γˆ SSE

SEE

CP

Bias

SSE

SEE

CP

−0.50

0.0159

0.3450

0.3177

0.946

−0.0227

0.1383

0.1454

0.968

−0.25

0.0032

0.3345

0.3113

0.940

−0.0227

0.1384

0.1454

0.962

−0.05

0.0047

0.3000

0.2875

0.952

−0.0229

0.1403

0.1455

0.958

0.05

0.0071

0.2756

0.2749

0.958

−0.0221

0.1418

0.1455

0.956

0.25

0.0201

0.2466

0.2486

0.958

−0.0190

0.1457

0.1451

0.950

0.50

0.0397

0.2350

0.2295

0.954

−0.0120

0.1478

0.1438

0.954

−0.50

0.0401

0.3422

0.3207

0.944

−0.0177

0.1352

0.1432

0.966

−0.25

−0.0006

0.3241

0.3102

0.950

−0.0182

0.1363

0.1434

0.960

−0.05

0.0031

0.2997

0.2862

0.952

−0.0183

0.1386

0.1435

0.960

0.05

0.0077

0.2785

0.2712

0.944

−0.0175

0.1402

0.1435

0.950

0.25

0.0150

0.2409

0.2403

0.966

−0.0146

0.1435

0.1430

0.952

0.50

0.0239

0.2148

0.2129

0.954

−0.0068

0.1466

0.1413

0.940

−0.50

0.0352

0.3445

0.3236

0.942

−0.0125

0.1341

0.1427

0.968

−0.25

0.0144

0.3307

0.3143

0.944

−0.0144

0.1356

0.1426

0.960

−0.05

−0.0031

0.2906

0.2853

0.948

−0.0139

0.1381

0.1428

0.952

0.05

0.0060

0.2788

0.2697

0.958

−0.0127

0.1398

0.1428

0.950

0.25

0.0084

0.2467

0.2370

0.952

−0.0102

0.1429

0.1422

0.958

0.50

0.0211

0.2116

0.2054

0.954

−0.0024

0.1456

0.1401

0.944

−0.50

0.0426

0.3652

0.3328

0.926

−0.0082

0.1346

0.1432

0.968

−0.25

0.0255

0.3379

0.3200

0.954

−0.0107

0.1352

0.1431

0.968

−0.05

0.0007

0.2968

0.2889

0.946

−0.0097

0.1387

0.1433

0.954

0.05

0.0094

0.2733

0.2724

0.962

−0.0085

0.1405

0.1434

0.948

0.25

0.0085

0.2402

0.2372

0.948

−0.0046

0.1433

0.1427

0.960

0.50

0.0182

0.2090

0.2034

0.960

0.0026

0.1463

0.1405

0.944

−0.50

0.0530

0.3785

0.3462

0.926

−0.0043

0.1355

0.1450

0.966

−0.25

0.0314

0.3285

0.3276

0.950

−0.0072

0.1360

0.1449

0.966

−0.05

0.0096

0.2969

0.2962

0.948

−0.0059

0.1397

0.1451

0.956

0.05

0.0038

0.2756

0.2776

0.954

−0.0047

0.1418

0.1452

0.954

0.25

0.0080

0.2451

0.2427

0.942

−0.0006

0.1443

0.1446

0.954

0.50

0.0170

0.2104

0.2063

0.950

0.0084

0.1461

0.1422

0.948

Frank models with k = 3 and kn = 5. In addition, we also calculated and included in the table the estimated standard errors and the p values for testing no treatment effect. These results indicate that the treatment clearly had significant effect on the animal death with the animals in germ-free environment surviving much longer than in conventional environment. On the other hand, with respect to the treatment effect on the tumor growth, it is apparent that the conclusion is not straightforward and depends on the knowledge about τ or the correlation between the tumor onset time and the animal death time. If one believes that the two times or events had no or

123

Analysis of informative current status data Table 5 Estimated biases of regression parameter β under the Frank model

β0 = 0

β0 = 0.4

γ0

τ

for βˆ

for βˆ0

-0.4

−0.50

−0.0147

−0.3190

0.0159

−0.4265

−0.25

−0.0166

−0.1799

0.0032

−0.2221

−0.05

−0.0216

−0.0540

0.0047

−0.0386

0.05

−0.0164

−0.0159

0.0071

0.0503

0.25

−0.0008

0.1497

0.0201

0.2484

0.50

0.0046

0.3155

0.0405

0.6712

−0.50

−0.0082

−0.1934

0.0398

−0.3044

−0.25

−0.0096

−0.1050

−0.0006

−0.1412

−0.05

−0.0162

−0.0331

0.0031

−0.0216

0.05

−0.0103

−0.0061

0.0077

0.0323

0.25

−0.0050

−0.0692

0.0150

0.1428

-0.2

0.0

0.2

0.4

for βˆ

for βˆ0

0.50

0.0039

0.1467

0.0241

0.3659

−0.50

−0.0008

−0.0677

0.0350

−0.1861

−0.25

0.0005

−0.0162

0.0144

−0.0379

−0.05

−0.0092

−0.0097

−0.0031

−0.0079

0.05

−0.0108

−0.0103

0.0060

0.0107

0.25

−0.0026

−0.0043

0.0084

0.0340

0.50

0.0037

−0.0133

0.0210

0.1192

−0.50

0.0037

0.0454

0.0422

−0.0388

−0.25

0.0063

0.0755

0.0255

0.0686

−0.05

−0.0011

0.0151

0.0007

0.0166

0.05

−0.0040

−0.0202

0.0094

−0.0067

0.25

−0.0047

−0.0832

0.0086

−0.0662

0.50

0.0040

−0.1591

0.0179

−0.1023

−0.50

0.0130

0.1721

0.0533

0.1471

−0.25

0.0082

0.1677

0.0314

0.1799

−0.05

−0.0017

0.0324

0.0096

0.0474

0.05

0.0007

−0.0330

0.0038

−0.0339

0.25

0.0013

−0.1546

0.0079

−0.1695

0.50

0.0061

−0.2886

0.0170

−0.3060

little correlation, the analysis would indicate that there exists some mild treatment effect. However, if one believes that the correlation exists, the analysis tells us that there seems no treatment effect at all. As pointed out before, in this type of studies, one usually expects some correlation. We tried several other values for k and kn and obtained similar results. 5 Discussion This paper proposed a sieve semiparametric maximum likelihood estimation procedure for regression analysis of current status data arising from the additive hazards model

123

S. Zhao et al. Table 6 Estimated treatment effect on lung tumor onset time with kn = 5 βˆ

τ

FGM

−0.10

0.00079

0.00042

0.056

−1.9683

0.2391

0.0222

1996.163

0

0.00078

0.00045

0.085

−1.9683

0.2386

0.0111

1993.974

0.01

0.00078

0.00046

0.091

−1.9683

0.2387

0.0111

1993.952

Frank

SEE

p value

γˆ

p value (10−14 )

Model

SEE

AIC

0.10

0.00076

0.00041

0.065

−1.9683

0.2400

0.0222

1995.236

0.20

0.00021

0.00039

0.590

−1.9672

0.2387

0.0222

1994.366

−0.05

0.00079

0.00043

0.067

−1.9683

0.2386

0.0111

1994.580

0

0.00078

0.00045

0.085

−1.9683

0.2386

0.0111

1993.974

0.01

0.00078

0.00046

0.091

−1.9683

0.2387

0.0111

1993.951

0.05

0.00077

0.00061

0.208

−1.9683

0.2391

0.0222

1994.186

0.25

0.00005

0.00036

0.885

−1.9700

0.2408

0.0333

1995.035

0.50

−0.00025

0.00024

0.296

−1.9190

0.2499

1.6098

2005.380

in the presence of informative censoring. As mentioned above, the additive hazards model is one of the most commonly used models in failure time data analysis and many inference procedures on it have been developed for various situations. However, it does not seem to exist an established approach for the situation discussed here. Also as mentioned before, the copula model approach used here makes the presented method applicable to much more general situations than most of the existing methods for failure time data with informative censoring since they rely on the frailty model or latent variable approach. For the presented methodology, it has been assumed that the observation times Ci ’s follow the proportional hazards model. It is straightforward to develop a similar method if they follow the additive hazards model instead. Also we have assumed that one knows the underlying copula function M and the association parameter α since they are not identifiable in general without prior or extra information. On the other hand, it is apparent that this may not be true in practice. To address this, one can use the simple procedure used in the previous section, which is to try different copula models and association levels and then to make the comparison and the selection based on the AIC or other criteria. As mentioned before, these ideas have been used in sensitivity analysis and tumorgenicity studies with unknown lethality among other areas. In the previous section, we only considered current status data from the additive hazards model. It is apparent that it would be useful to develop similar inference procedures for interval-censored failure time data or the data arising from other regression models (Kalbfleisch and Prentice 2002; Sun 2006). By interval-censored data, we mean the data in which the failure time of interest is observed only to belong to certain windows or intervals and they include current status data as a special case. Other commonly used regression models in failure time data analysis include the proportional odds model and linear transformation models. Another possible direction for future research is to employ the nonparametric maximum likelihood estimation for Λ1 and Λ2 instead of the I -spline approximation. One general issue with this is that the determination of the resulting estimator may be more difficullt or complex. Also the convergence of the estimator could be slower too.

123

Analysis of informative current status data Acknowledgments The authors wish to thank the guest editors and two reviewers for their constructive and helpful comments and suggestions. This work was partly supported by the Humanities and Social Science Research Project of Ministry of Education of P. R. China (11YJAZH125) to the Shishun Zhao, the NSFC of P. R. China (11371062) to the Tao Hu, and NSF and NIH Grants to the Jianguo Sun.

6 Appendix: Proofs of asymptotic consistency and normality In this Appendix, we will sketch the proofs for the asymptotic consistency and normality of the proposed estimators described in Sect. 3. First we will give the proofs for the consistency results given in (4) and (5) and then the proof for the asymptotic result described in (6). The following are the regularity conditions needed for these results. (A1) The covariates Z i ’s have a bounded support. (A2) The copula function M(·, ·) has bounded first order partial derivatives and both the partial derivatives are Lipschitz. (A3) Assume that inf Pl(θ, X ) > Pl(θ0 , X ). d(θ,θ0 ) 0 K

and (8) and (9) give inf P M(θ, X ) ≤ ζ1n + ζ2n + P M(θ0 , X ) = ζn + P M(θ0 , X ), K

where ζn = ζ1n + ζ2n . Hence we have that ζn ≥ δ and furthermore, {θˆn ∈ K } ⊆ {ζn ≥ δ }. Then by using the condition (A1) and the strong law of large numbers, one can show that ζ1n = o(1) and ζ2n = o(1) almost surely. The consistency result thus ∞ ∞ ∞ ˆ follows from ∪∞ k=1 ∩n=k {θn ∈ K } ⊆ ∪k=1 ∩n=k {ζn ≥ δ }. Now we show the convergence rate result given in (5) and assume that the regularity conditions (A1)–(A4) given above hold. For any η > 0, define the class Fη = {l(θn0 , X ) −l(θ, X ) : θ ∈ Θn , d(θ, θn0 ) ≤ η} with θn0 = (β0 , γ0 , Λ1n0 , Λ2n0 ). Following the calculation of Shen and Wong (1994) (p. 597), we can establish that log N[] (ε, Fη , · 2 ) ≤ C N log(η/ε) with N = 2(k + kn ). Moreover, some algebraic calculations lead to l(θn0 , X ) − l(θ, X )22 ≤ Cη2 for any l(θn0 , X ) − l(θ, X ) ∈ Fη . Therefore it follows from Lemma 3.4.2 of van der Vaart and Wellner (1996) that E P n

1/2

Jη (ε, Fη , · 2 ) , (Pn − P)Fη ≤ C Jη (ε, Fη , · 2 ) 1 + η2 n 1/2

(10)

η where Jη (ε, Fη , · 2 ) = 0 {1 + log N[] (ε, Fη , · 2 )}1/2 dε ≤ C N 1/2 η. Note that the right-hand side of (10) gives φn (η) = C(N 1/2 η + N /n 1/2 ). Also it is easy to see that φn (η)/η decreases in η and rn2 φn (1/rn ) = rn N 1/2 + rn2 N /n 1/2

123


< 2n 1/2 , where rn = N −1/2 n 1/2 = n (1−ν)/2 with 0 < ν < 0.5. Hence we have n (1−ν)/2 d(θˆ , θn0 ) = O P (1) by Theorem 3.2.5 of van der Vaart and Wellner (1996). This together with d(θn0 , θ0 ) = O p (n −r ν ) (Lemma A1 in Lu et al. 2007) yields that d(θˆ , θ0 ) = O p (n −(1−ν)/2 + n −r ν ). The choice of ν = 1/(1 + 2r ) gives the rate of convergence as d(θˆn , θ0 ) = O p (n −r/(1+2r ) ) and completes the proof. Finally we will provide the sketch for the proof of the asymptotic distribution result given in (6). For this, suppose that the regularity conditions (A1)–(A5) given above hold and r > 2. For this, denote V as the linear span of Θ0 − θ0 where Θ0 denotes the true parameter space. Let l(θ, W ) be the log-likelihood for a sample of size one and δn = (n −(1−ν)/2 + n −r ν ). For any θ ∈ {θ ∈ Θ0 : θ − θ0 = O(δn )}, define the first order directional derivative of l(θ, X ) at the direction v ∈ V as ˙ X )[v] = dl(θ + sv, X ) , l(θ, s=0 ds

(11)

and the second order directional derivative as ˙ + s˜ v, d 2 l(θ + sv + s˜ v, ˜ X ) d l(θ ˜ X ) ¨ X )[v, v] l(θ, ˜ = = . s=0 s˜ =0 s˜ =0 d s˜ ds d s˜ Also define the Fisher inner product on the space V as ˙ X )[v]l(θ, ˙ X )[v] < v, v˜ > = P l(θ, ˜ and the Fisher norm for v ∈ V as v1/2 =< v, v > . Let V¯ be the closed linear span of V under the Fisher norm. Then (V¯ , · ) is a Hilbert space. In addition, define the smooth functional of θ as γ (θ ) = b1 β + b2 γ , where b = (b1 , b2 ) is any vector of 2 p dimension with b ≤ 1. For any v ∈ V, we denote γ˙ (θ0 )[v] =

dγ (θ0 + sv) = r (v) s=0 ds

whenever the right hand-side limit is well defined. Note that γ (θ ) − γ (θ0 ) = γ˙ (θ0 ) [θ − θ0 ]. It follows by the Riesz representation theorem that there exists v ∗ ∈ V¯ such that γ˙ (θ0 )[v] =< v ∗ , v > for all v ∈ V¯ and v ∗ 2 = γ˙ (θ0 ). Let εn be any positive sequence satisfying εn = o(n −1/2 ). For any v ∗ ∈ Θ0 , by (A4) and Corollary 6.21 of Schumaker (1981) (p. 227), there exists Πn v ∗ ∈ Θn such that Πn v ∗ −v ∗ = o(1) and δn Πn v ∗ −v ∗ = o(n −1/2 ). Also define g[θ −θ0 , X ] = ˙ X )[θ − θ0 ]. Then by the definition of θˆ , we have l(θ, X ) − l(θ0 , X ) − l(θ, ˆ W ) − l(θˆ ± εn Πn v ∗ , W )] 0 ≤ Pn [l(θ, = (Pn − P)[l(θˆ , W ) − l(θˆ ± εn Πn v ∗ , W )] + P[l(θˆ , W ) − l(θˆ ± εn Πn v ∗ , W )]

123

S. Zhao et al.

˙ W )[Πn v ∗ ] + (Pn − P) g[θˆ − θ0 , W ] − g[θˆ ± εn Πn v ∗ − θ0 , W ] = ±εn Pn l(θ, + P g[θˆ − θ0 , W ] − g[θˆ ± εn Πn v ∗ − θ0 , W ] ˙ ; W )[v ∗ ] ± εn Pn l(θ, ˙ W )[Πn v ∗ − v ∗ ] + (Pn − P) g[θˆ − θ0 , W ] = ∓εn Pn l(θ − g[θˆ ± εn Πn v ∗ − θ0 , W ] + P g[θˆ − θ0 , W ] − g[θˆ ± εn Πn v ∗ − θ0 , W ] ˙ W )[v ∗ ] + I1 + I2 + I3 . := ∓εn Pn l(θ, Note that for I1 , it follows from Conditions (A1)–(A2), Chebyshev inequality and Πn v ∗ − v ∗ = o(1) that I1 = εn × o p (n −1/2 ). For I2 , we have ˙ 0 , W )[Πn v ∗ ] I2 = (Pn − P) l(θˆ , W ) − l(θˆ ± εn Πn v ∗ , W ) ± εn l(θ ˙ θ˜ , W ) − l(θ ˙ 0 , W )[Πn v ∗ ] , = ∓εn (Pn − P) l( where θ˜ lies between θˆ and θˆ ± εn Πn v ∗ . By Theorem 2.8.3 in of van der Vaart and ˙ ; W )[Πn v ∗ ] : θ − θ0 = O(δn )} is Donsker Wellner (1996), we know that {l(θ class. Therefore, by Theorem 2.11.23 of van der Vaart and Wellner (1996), we have I2 = εn × o p (n −1/2 ). For I3 , note that ˙ 0 , W [θ − θ0 ])} P(g[θ − θ0 , W ]) = P{l(θ, W ) − l(θ0 , W ) − l(θ ¨ θ, ¨ 0 , W )[θ − θ0 , θ − θ0 ]} ˜ W )[θ − θ0 , θ − θ0 ] − l(θ = 2−1 P{l( −1 ¨ 0 , W )[θ − θ0 , θ − θ0 ]} + 2 P{l(θ −1 ¨ 0 , W )[θ − θ0 , θ − θ0 ]} + εn × o p (n −1/2 ), = 2 P{l(θ where θ˜ lies between θ0 and θ and the last equation is due to Taylor expansion, (A1)– (A2) and r > 2. Therefore, I3 = −2−1 {θˆ − θ0 2 − θˆ ± εn Πn v ∗ − θ0 2 } + εn × o p (n −1/2 ) = ±εn < θˆ − θ0 , Πn v ∗ > +2−1 εn Πn v ∗ 2 + εn × o p (n −1/2 ) = ±εn < θˆ − θ0 , v ∗ > +2−1 εn Πn v ∗ 2 + εn × o p (n −1/2 ) = ±εn < θˆ − θ0 , v ∗ > +εn × o p (n −1/2 ). In the above, the last equality holds since δn Πn v ∗ − v ∗ = o(n −1/2 ), CauchySchwartz inequality, and Πn v ∗ 2 → v ∗ 2 . ˙ 0 , W [v ∗ ]) = 0, one can show that By combing the above facts together with P l(θ 0 ≤ Pn {l(θˆ , W ) − l(θˆ ± εn Πn v ∗ , W )} ˙ 0 , W )[v ∗ ] ± εn < θˆ − θ0 , v ∗ > +εn × o p (n −1/2 ) = ∓εn Pn l(θ ˙ 0 , W )[v ∗ ]} ± εn < θˆ − θ0 , v ∗ > +εn × o p (n −1/2 ). = ∓εn (Pn − P){l(θ

123


√ √ ˙ 0 , W )[v ∗ ]} + o p (1) → Therefore, we have n < θˆ − θ0 , v ∗ >= n(Pn − P){l(θ ∗ 2 N (0, v ) by the central limits theorem with the the asymptotic variance being ˙ 0 , W )[v ∗ ]2 . This implies that n 1/2 (γ (θ) ˆ − γ (θ0 )) = n 1/2 < equal to v ∗ 2 = l(θ ˆθ −θ0 , v ∗ > +o p (1) → N (0, v ∗ 2 ) in distribution. Furthermore, the semiparametric efficiency can be established by applying the result of Bickel and Kwon (2001) or Theorem 4 in Shen (1997). ∗ , b∗ ) the solution For each component ϑq , q = 1, 2, . . . , 2 p, denote by ψq∗ = (b1q 2q to 2 ∗ ∗ ] − lb2∗ [b2q ] , inf∗ E lϑ · eq − lb1∗ [b1q ψq

∗ ] and l ∗ [b∗ ] are defined similar to (11). Now let ψ ∗ = where lϑ = (lβ , lγ ) , lb1∗ [b1q b2 1q (ψ1∗ , · · · , ψq∗ ). By the calculations of Chen et al. (2006), we have v ∗ 2 = γ˙ (θ0 ) = |γ˙ (θ0 )[v]| v

= b Σb, where Σ = E(Sϑ Sϑ ), Sϑ = {lϑ − lb1∗ b1∗ − lb2∗ b2∗ }. − β0 ) , (γˆ − γ0 ) ) =< θˆ − θ0 , v ∗ >, the final results follows from Now, since the Cramer-Wold ´ device (Theorem 3.2 in Chap. 13 of Shorack 2000). supv∈V¯ :v>0

b ((βˆ

References Bickel PJ, Kwon J (2001) Inference for semiparametric models: some questions and an answer. Stat Sin 11:863–960 Chen X, Fan Y, Tsyrennikov V (2006) Efficient estimation of semiparametric multivariate copula models. J Am Stat Assoc 101:1228–1240 Chen L, Sun J (2009) A multiple imputation approach to the analysis of current status data with the additive hazards model. Commun Stat Theory Methods 38:1009–1018 Chen MH, Tong X, Sun J (2009) A frailty model approach for regression analysis of multivariate current status data. Stat Med 28:3424–3426 Chen CM, Lu TFC, Chen MH, Hsu CM (2012a) Semiparametric transformation models for current status data with informative censoring. Biom J 54:641–656 Chen DGD, Sun J, Peace K (2012b) Interval-censored time-to-event data: methods and applications. Chapman & Hall/CRC Hougaard P (2000) Analysis of multivariate survival data. Springer, New York Huang X, Wolfe RA (2002) A frailty model for informative censoring. Biometrics 58:510–520 Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York Keiding N (1991) Age-specific incidence and prevalence: a statistical perspective (with discussion). J R Stat Soc A 154:371–412 Kulich M, Lin DY (2000) Additive hazards regression with covariate measurement error. J Am Stat Assoc 95:238–248 Lagakos SW, Louis TA (1988) Use of tumor lethality to interpret tumorgenicity experiments lacking causingof-death data. Appl Stat 37:169–179 Li J, Wang C, Sun J (2012) Regression analysis of clustered interval-censored failure time data with the additive hazards model. J Nonparametr Stat 24:1041–1050 Lin DY, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81:61–71 Lin DY, Oakes D, Ying Z (1998) Additive hazards regression with current status data. Biometrika 85:289– 298 Lu M, Zhang Y, Huang J (2007) Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika 94:705–718 Martinussen T, Scheike TH (2002) Efficient estimation in additive hazards regression with current status data. Biometrika 89:649–658 Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York

123

S. Zhao et al. Pollard D (1984) Convergence of stochastic processes. Springer, New York Ramsay JO (1988) Monotone regression splines in action. Stat Sci 3:425–441 Schumaker LL (1981) Spline functions: basic theory. Wiley, New York Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22:580–615 Shorack GR (2000) Probability for statisticians. Springer, New York Shen X (1997) On methods of sieves and penalization. Ann Stat 25:2555–2591 Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York Sun L, Park D, Sun J (2006) The additive hazards model for recurrent gap times. Stat Sin 16:919–932 Titman AC (2013) A pool-adjacent-violators type algorithm for non-parametric estimation of current status data with dependent censoring. Lifetime Data Anal. Published Online: 22 June 2013 Tong X, Hu T, Sun J (2012) Efficient estimation for additive hazards regression with bivariate current status data. Sci China Math 55:763–774 van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York Wang L, Sun J, Tong X (2010) Regression analysis of case II interval-censored failure time data with the additive hazards model. Stat Sin 20:1709–1723 Wang C, Sun J, Sun L, Zhou J, Wang D (2012) Nonparametric estimation of current status data with dependent censoring. Lifetime Data Anal 18:434–445 Zhang Z, Sun J, Sun L (2005) Statistical analysis of current status data with informative observation times. Stat Med 24:1399–1407 Zheng M, Klein JP (1995) Estimates of marginal survival for dependent competing risk based on an assumed copula. Biometrika 82:127–138 Zhou X, Sun L (2003) Additive hazards regression with missing censoring information. Stat Sin 13:1237– 1258

123

REGRESSION ANALYSIS OF CASE II INTERVAL-CENSORED FAILURE TIME DATA WITH THE ADDITIVE HAZARDS MODEL.

Regression analysis of clustered interval-censored failure time data with the additive hazards model.

A multiple imputation approach to the analysis of clustered interval-censored failure time data with the additive hazards model.

Additive hazards regression and partial likelihood estimation for ecological monitoring data across space.

Regression analysis of longitudinal data with irregular and informative observation times.

Predicting the Survival Time for Bladder Cancer Using an Additive Hazards Model in Microarray Data.

Regression analysis of mixed recurrent-event and panel-count data with additive rate models.

Survival analysis 1982-1991: the second decade of the proportional hazards regression model.

A Proportional Hazards Regression Model for the Sub-distribution with Covariates Adjusted Censoring Weight for Competing Risks Data.

Continuous-Time Proportional Hazards Regression for Ecological Monitoring Data.

Multiple regression analysis of twin data: a model-fitting approach.

Instrumental variable additive hazards models.

Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring.

The Effect of Ignoring Statistical Interactions in Regression Analyses Conducted in Epidemiologic Studies: An Example with Survival Analysis Using Cox Proportional Hazards Regression Model.

Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).

Analyzing Recurrent Event Data With Informative Censoring.

ANALYSIS OF MULTIVARIATE FAILURE TIME DATA USING MARGINAL PROPORTIONAL HAZARDS MODEL.

Using structured additive regression models to estimate risk factors of malaria: analysis of 2010 Malawi malaria indicator survey data.

Regression analysis of aggregate continuous data.

A Bayesian proportional hazards model for general interval-censored data.

Analysis of recurrent events with an associated informative dropout time: Application of the joint frailty model.

Parametric regression model for survival data: Weibull regression model as an example.

Tensor Regression with Applications in Neuroimaging Data Analysis.

Joint partially linear model for longitudinal data with informative drop-outs.