This article was downloaded by: [University of Sydney] On: 30 April 2015, At: 08:10 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Biopharmaceutical Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lbps20

A Mixture of Hierarchical Joint Models for Longitudinal Data with Heterogeneity, Non-normality, Missingness and Covariate Measurement Error a

b

c

c

Yangxin Huang , Chunning Yan , Ping Yin & Meixia Lu a

Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, 33612, Tampa, Florida, USA b

School of Management, Shanghai University, Shanghai, 200444, P.R. China

c

Department of Epidemiology and Biostatistics, School of Public Health, Huazhong University of Science and Technology, Wuhan, Hubei, 430030, P.R. China Accepted author version posted online: 28 Jan 2015.

Click for updates To cite this article: Yangxin Huang, Chunning Yan, Ping Yin & Meixia Lu (2015): A Mixture of Hierarchical Joint Models for Longitudinal Data with Heterogeneity, Non-normality, Missingness and Covariate Measurement Error, Journal of Biopharmaceutical Statistics, DOI: 10.1080/10543406.2014.1000547 To link to this article: http://dx.doi.org/10.1080/10543406.2014.1000547

Disclaimer: This is a version of an unedited manuscript that has been accepted for publication. As a service to authors and researchers we are providing this version of the accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proof will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to this version also.

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

A Mixture of Hierarchical Joint Models for Longitudinal Data with Heterogeneity, Nonnormality, Missingness and Covariate Measurement Error Yangxin Huang1*, Chunning Yan2*, Ping Yin3 and Meixia Lu3 1

2

cr ip

t

Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, 33612, Tampa, Florida, USA School of Management, Shanghai University, Shanghai, 200444, P.R. China

3

us

*Joint corresponding authors: Yangxin Huang([email protected]) and Chunning Yan([email protected])

an

Abstract

ce pt ed

M

Longitudinal data arise frequently in medical studies and it is a common practice to analyze such complex data with nonlinear mixed-effects (NLME) models. However, the following four issues may be critical in longitudinal data analysis. (i) A homogeneous population assumption for models may be unrealistically obscuring important features of between-subject and withinsubject variations; (ii) normality assumption for model errors may not always give robust and reliable results, in particular, if the data exhibit skewness; (iii) the responses may be missing and the missingness may be nonignorable; and (iv) some covariates of interest may often be measured with substantial errors. When carrying out statistical inference in such settings, it is important to account for the effects of these data features; otherwise, erroneous or even misleading results may be produced. Inferential procedures can be complicated dramatically when these four data features arise. In this article, the Bayesian joint modeling approach based on a finite mixture of NLME joint models with skew distributions is developed to study simultaneous impact of these four data features, allowing estimates of both model parameters and class membership probabilities at population and individual levels. A real data example is analyzed to demonstrate the proposed methodologies, and to compare various scenarios-based potential models with different specifications of distributions. Keywords: Bayesian analysis; Covariate measurement error; Longitudinal data; Mixture joint models; Missing response; Skew distributions.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Department of Epidemiology and Biostatistics, School of Public Health, Huazhong University of Science and Technology, Wuhan, Hubei, 430030, P.R. China

1

t

1. Introduction

cr ip

Modeling viral load (plasma HIV-1 RNA copies) trajectories (HIV dynamics) after initiation of potent antiretroviral (ARV) therapies has been one of the most important areas in AIDS research

us

models, reflect the potency (efficacy) of ARV therapies (Perelson et al., 1997; Wu and Ding, 1999). Nonlinear mixed-effects (NLME) models in conjunction with HIV dynamic equations

an

(Huang et al., 2006;Wu and Ding, 1999) are used to model the viral load trajectories as well as to

M

quantify between-subject and within-subject variations in viral load longitudinal measurements. Modeling longitudinal data is an active area of statistical research that has received a lot of

ce pt ed

attention in the recent years. A considerable number of statistical modeling and analysis methods have been suggested in the literature for analyzing such data with various complex features including, but not limited to, NLME modeling approach (Higgins et al., 1997; Wu and Ding, 1999), nonparametric NLME modeling approach (Liu and Wu, 2007;Wu and Zhang, 2006), joint modeling approach via Monte Carlo EM algorithm (Liu and Wu, 2007; Wu, 2002), and Bayesian NLME modeling approach via Markov chain Monte Carlo (MCMC) procedure (Huang et al., 2006; Huang and Dagne, 2011). However, there is a relatively little work done on

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

in the past two decades. The viral decay rates, as regression parameters of the viral dynamic

simultaneously accounting for non-normality, heterogeneity, missingness and covariate measurement error, which are inherent features of longitudinal data.

2

First, in the previous longitudinal modeling, most studies assume that all subjects come from a homogeneous population where large between- and within-individual variations were accommodated by random-effects and/or time-varying covariates in those models. This typical

cr ip

t

large between-and within-individual variations are shown in Figure 1(a), viral load trajectory profiles of 6 representative patients in AIDS Clinical Trials Group study 398 (ACTG398) (Hammer et al., 2002)(see Section 2 for details of this study and data), which can be roughly

us

mixture joint modeling to be conducted in this article was to cluster all patients into two classes

an

with virologic suppression and failure (i.e., viral load rebound), respectively, which is of main interest from a clinical prospective. But, the other class was incorporated (class 1) to capture

M

some patients who withdrew too early to be clustered into either class 2 with virologic suppression or class 3 with viral load rebound. Thus, the number of components in this analysis

ce pt ed

was determined empirically based on the viral load trajectory patterns and clinical interpretability. Note that class 1 is referred as a confirmed ‘short-term virologic response’ due to early dropout which may be informative dropout and may not indicate virologic suppression in long-term treatment. We, therefore, can reasonably assume that patients are from a population which consists of three relatively homogeneous classes and, thus, it is motivated to consider a finite mixture of NLME models for such data set. Second, a common fundamental assumption in

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

classified into three classes with the following considerations. The original motivation based on

most studies is that model random errors are normally (symmetrically) distributed, but this assumption may lack robustness against departures from normality and/or outliers and, thus, statistical inference and analysis with normal assumption may lead to misleading results (Huang and Dagne, 2011; Lu and Huang, 2014; Sahu et al., 2003). Figure 1(b) displays the distributions

3

of repeated viral load measurements (in log10 scale) for 379 subjects enrolled in ACTG398 (Hammer et al., 2002). It can be seen that, for this data set to be analyzed in this paper, the viral load response (even after log10 transformation) is skewed and, thus, a normality assumption is

cr ip

t

not quite realistic. Alternatively, an asymmetric distribution such as skew-t (ST) and skewnormal (SN) distributions (Arellano-Valle and Genton, 2005; Azzalini and Capitanio, 1999, 2003; Lu and Huang, 2014; Sahu et al., 2003) should be more appropriate than a symmetric

us

important requirement that variables are “perfectly” measured. In practice, however, collected

an

data are often far from “perfect.” Although longitudinal studies are designed to collect data from every participant in the study at each assessment time, missing data including missing

M

observations during study and earlier dropouts from study in the response are often encountered due to patient’s dropouts or other problems. The missing data are likely to be informative (non-

ce pt ed

ignorable) in the sense that missingness may be related to the missing values. Last, measurement error in covariates is another typical feature of longitudinal data, and ignoring this phenomenon may result in unreliable statistical inference.

The majority of the statistical literature for the modeling of longitudinal data has focused on the development of models that aim at capturing only specific aspects of the motivating case studies. However, relatively few studies have been conducted on simultaneously accounting for the

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

(normal) distribution for this data set. Third, the validity of inference methods relies on an

impacts induced by heterogeneity, asymmetry, missing observations and mismeasured covariates. It is not clear how these typical features often observed in longitudinal data may inherently interact and simultaneously influence inferential procedures. This article proposes a finite mixture of NLME joint (MNLMEJ) models with skew distributions, which include ST and

4

SN distributions, to simultaneously account for response with skewness, heterogeneity and nonignorable missingness and covariate with measurement error. We demonstrate a Bayesian inferential approach to estimate both model parameters and class membership probabilities based

cr ip

t

on a finite mixture model with nonlinear mean functions (components). It is noted that the ST distribution is approximate to the SN distribution when its degrees of freedom approach infinity, and the SN distribution reduces to a normal distribution if skewness parameter is zero. Thus, we

us

extends to the other distributions such as the SN distribution. In what follows, we consider

an

multivariate ST and SN distributions introduced by Sahu et al. (2003), which are suitable for a Bayesian inference and briefly discussed in Appendix A. The rest of this paper is organized as

M

follows. Section 2 describes the data set that motivated this research and investigates a specific MNLMEJ model for viral load response process with missing data mechanism and for CD4

ce pt ed

covariate process with measurement error. The Bayesian inference method that simultaneously accounts for heterogeneity, skewness, missingness and covariate measurement error is presented in Section 3. In Section 4, an AIDS data set described in Section 2 is analyzed to demonstrate the proposed methodologies and the analysis results are reported. Finally, we conclude the paper with discussion in Section 5.

2. Data and a mixture of joint models with ST sistribution for HIV dynamics

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

use an ST distribution to develop the mixture modeling methodologies, as it can be easily

2.1. Motivating data set The data set that motivated this research is from AIDS Clinical Trials Group 398 study (ACTG398), a randomized, double-blind, placebo-controlled, with an extension to more than 72

5

weeks study of saquinavir, indinavir, or nelfinavir added as second protease inhibitor to the 4drug class regimen in patients with virologic failure (Hammer et al., 2002). Although RNA viral load was designed to be measured in copies per milliliter at study weeks 0, 2, 4, 8, 16, and every

cr ip

t

8 weeks thereafter until week 72, it contains missing observations. The number of viral load measurements for each individual varies from 2 to 13. This study consists of 481 HIV-1 infected patients. Out of total 481 patients, 379 patients who had more than 2 measurements were

us

time points. A log10 transformation of viral load was used in the analysis in order to stabilize the

an

variation of the measurement errors and to speed up estimation algorithm. CD4 cell counts were also measured throughout study on a similar scheme. The detailed data description can be found

M

in Hammer et al. (2002).

ce pt ed

As discussed in Section 1, viral load trajectory profiles of 6 representative patients displayed in Figure 1(a) can be roughly classified into three classes: (i) decrease rapidly and constantly in a short-term period (two solid lines: ID=31, 105), (ii) decrease at the beginning and then stay stable at a low level (two dashed lines: ID=29, 132), and (iii) decrease at the beginning, but rebound later (two dotted line: ID=33, 99). Along with these observations, we can reasonably assume that patients from a population consist of three relatively homogeneous classes. We can also see from the right panel in Figure 1(b), which shows the pooled viral load measurements

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

included in this data analysis. 46 patients have missing viral load measurements at scheduled

from all patients, that viral load response (even after logarithmic transformation) is skewed and, thus, an asymmetric distribution (such as ST or SN) should be more appropriate than a symmetric (normal) distribution to model this data set.

6

2.2. Measurement error process This section briefly discusses (CD4) covariate measurement error models. Various covariate mixed-effects models were investigated in the literature (Carroll et al., 2006; Wu, 2002). For

the number of measurements on the ith subject by ni. Let

. In the presence of covariate measurement

error, we employ the following LME model for CD4 covariate process.

(

)

where

,

(

zi* = zi*1 ,… , zin* i

)

)

T

with

T

zij

being the observed CD4 covariate value for individual i at time

zij* = uijT α + vijT ai

ce pt ed

tij

(

zi = zi1 ,… , zini

M

iid

an

zij = uijT α + vijT ai + ε ij ( ≡ zij* + ε ij ) , ε i ~ N 0, σ 12 I ni , (1)

and

covariate value at time ai = ( ai1 ,… , ail )

tij

,

uij

and

vij

may be viewed as the true (but unobservable)

α = ( α1 , … , α l )

T

are l × 1 design vectors,

and

T

are unknown population (fixed-effects) and individual-specific (randomiid

effects) parameter vectors, respectively. We assume that

ai ~ Nl ( 0, Σ a )

, where Σ a is unrestricted

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

be the observed CD4 covariate

us

tij ( i = 1,… n; j = 1,…, ni )

value for individual i at time

zij

cr ip

t

simplicity, we consider a single time-varying covariate. Denote the number of subjects by n and

covariance matrix. To model the CD4 covariate process conducted in this study, at the absence of theoretical rationale for the CD4 trajectories, we consider empirical polynomial LME models for the CD4 process, and choose the best model based on AIC and BIC values. Specifically, we consider the

7

uij = vij = (1, tij ,… , tijl −1 )

T

CD4 covariate model (1) with

and focus on linear (l = 2), quadratic ( l =

3) and cubic ( l = 4) polynomials. The resulting AIC (BIC) values are 1805.03 (1836.14), 1733.49 (1749.80) and 1773.44 (1792.21), respectively. We thus adopted the following quadratic

(2)

2 ij

ai = ( ai1 , ai 2 , ai 3 )

α = (α 1 , α 2 , α 3 )

T

is individual-specific (random-effects) vector with

N3 ( 0, Σ a )

. To avoid too small (large) estimates, which may be

M

multivariate normal distribution

is population (fixed-effects)

an

parameter vector,

T

us

where z ij = (α1 + ai1 ) + (α 2 + ai 2 ) tij + (α 3 + ai 3 ) t , *

unstable, we standardize the time-varying covariate CD4 cell counts and rescale the original time

ce pt ed

(in days) so that the time scale is between 0 and 1.

2.3. A mixture of NLME models with ST distribution for HIV dynamics Finite mixture models are used in longitudinal research studies (Muthen and Shedden, 1999; Pauler and Laird, 2000), where the latent classes corresponding to the mixture components and cluster individuals may provide a better model fitting. However, most finite mixture models are

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

zij = (α1 + ai1 ) + (α 2 + ai 2 ) tij + (α 3 + ai 3 ) tij2 + ε ij ,

cr ip

t

polynomial LME model for the CD4 process.

currently based on linear (polynomial) (Muthen and Shedden, 1999) or piecewise linear (Pauler and Laird, 2000) mean functions. The partial reason is that the computation for inference can be relatively conveniently carried out because the likelihood function of a model based on these linear mean functions has a closed form (Muthen and Shedden, 1999). However, in practice,

8

most longitudinal trajectories appear to be nonlinear patterns. When a model is extended to incorporate nonlinear mean functions which will be conducted in this article, inferential procedures can become complicated dramatically because a closed form of likelihood function is

be the viral load value (in log10 scale) for individual i at time

and

missing response indicator such that yi = ( ymis ,i , yobs ,i )

, where

ymis ,i

and

(

yi = yi1 ,… , yini rij = 1

yobs ,i

if

yij

)

T

. Let

subject to informative

(

ri = ri1 ,… , rini

)

T

be a vector of

us

missingness

( i = 1,… n; j = 1,…, ni )

tij

is missing and 0 otherwise. We write

an

yij

are the collections of the missing and observed

g k ( ⋅)( k = 1,…, K )

, which are known to be specified. The true

ce pt ed

classes with mean functions

M

components of yi , respectively. We assume that there are K plausible nonlinear trajectory

trajectory mean function of the ith subject might be

π k = P ( ci = k )

which satisfies ∑

K

k =1

πk =1

g k ( ⋅)

with unknown probability

, where ci is a latent indicator. A statistical nonlinear

trajectory model of individual i , given ci = k , can be formulated as follows.

( yi | ci = k ) = gk ( ti , Ak β i ) + ei ,

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Let

cr ip

t

no longer available.

where

ti = (ti1 ,…, tini )T

,

iid

(

)

ei ~ STni ,ν − J (ν ) δ 1ni , σ 22 I ni , δ I ni ,

βi = ( β i1 ,…, β in )T i

,

β ij = ( βij1 ,…, βijs )T

(3)

is a vector of individual

J (ν ) = (ν / π ) {Γ[(ν − 1) / 2] / Γ (ν / 2 )} 1/ 2

parameters for the ith individual,

is to adjust the ST

distribution with mean zero (see (A.3) in Appendix A), where Γ(·) is a Gamma function; the

9

vector of random errors

ei = (ei1 ,…, eini )T

ν,

variance

freedom

unknown

follows a multivariate ST distribution with degrees of

parameter

and

skewness

δ;

parameter

, and Ak ( s × s )(k = 1, …, K ) is known square

t

g k (ti , Ak β i ) = g k (ti1 , Ak β i1 ),…, g k (tin i , Ak β ini ))T

σ 22

g1 (⋅),..., g K ( ⋅)

, which are specified

, may only involve different subsets of

us

by the nonlinear functions

yi j (i = 1, … n; j = 1, …, ni )

β ij

. By

to 0 in the k th trajectory class,

an

introducing Ak , Ak β ij will set unrelated elements of

β ij

respectively. We will illustrate the use of Ak and specify nonlinear mean functions gk(·) in the

M

HIV dynamic application below.

Similar to discussion in Lu and Huang (2014), model (3) can be given conditionally and

ce pt ed

marginally by

( yi | ci = k ) ~ STni ,ν ( gk (ti , Ak β i ) − J (ν )δ 1ni , σ 22 I ni , δ I ni )

K

(

)

yi ~ ∑ π k STni ,ν gk ( ti , Ak β i ) − J (ν )δ 1ni , σ 22 I ni , δ I ni . k =1

(4)

(5)

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Ak is introduced because the mean functions of

cr ip

indicator matrix of which diagonal elements are either 0 or 1 and off-diagonal elements are all 0.

The preceding model defines a finite mixture of NLME models. In (5) the vector of mixture T probabilities π = (π 1 , … , π K ) can be also viewed as the mixture weights of all plausible

components within the finite mixture model framework. Model (5) is identifiable, as long as each of the component models is identifiable and distinguishable from each other. When the

10

component models are identifiable but not distinguishable from each other, some constraints may be required to make model (5) identifiable(Titterington et al., 1985).

β ij

, we assume,

cr ip

t

For individual-specific parameter vector iid

(6)

us

T b = (bi1 ,…, biq )T in which β = ( β1 ,… , β r ) is a vector of universal population parameters; i is a

an

Ζ (s × r ) vector of random-effects; Σ b ( q × q ) is an unknown variance-covariance matrix; ij is a design matrix including time-independent and/or time-varying covariates such as CD4 cell count.

zij*

is a summary of true (but unobservable) time-varying

M

We assume one of the elements of Zij,

ce pt ed

CD4 covariate value at time tij (see Section 2.2 in detail). It is noted that r ≥ s ≥ q . For the HIV dynamic application, based on discussion in Appendix B, we consider one- and twocompartment models with constant decay rate(s) for trajectory classes 1 and 2, respectively, and propose a two-compartment model with a time-varying decay rate in the second compartment for trajectory class 3. Thus, the mean functions of K = 3 components in the mixture model are specified as follows.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

β ij = Z ij β + bi , bi ~ N q (0, ∑b )

1. One-compartment model with a constant decay rate for class 1 trajectory g1 (tij , A1β ij ) = log10 (e

pi1 − λi 1tij

), (7)

2. Two-compartment model with constant decay rates for class 2 trajectory

11

g 2 (tij , A2 β ij ) = log10 (e

pi1 − λi1tij

+e

pi 2 − λi 2 tij

),

(8)

3. Two-compartment model with constant and time-varying decay rates for class 3 trajectory

+e

pi 2 − λij* 2tij

).

(9)

t

pi 1 − λi 1tij

us

β i1 = pi1 = β1 + bi1 , β i 2 = λi1 = β 2 + β3 zi 0 + bi 2 , βi 3 = pi 2 = β 4 + bi 3 , β 4 i = λi 2 = β5 + bi 4 , βij 5 = λij*2 = β 5 + β 6 zij* + bi 4 ,

(10)

an

β ij = ( β i1 , β i 2 , βi 3 , βi 4 , βij 5 )T , β = ( β1 , β 2 , β3 , β 4 , β5 , β 6 )T ,

zij*

is true (but unobservable) value of CD4 at time tij defined in

ce pt ed

where zi0 is the baseline CD4,

M

A1 = diag(1, 1, 0, 0, 0), A2 =diag(1, 1, 1, 1, 0), A3 = diag(1, 1, 1, 0, 1),

(2). The decay rate of the second compartment in (9) parameters in

β ij

βij 5

is time-varying due to

zij*

, but other

are time independent. As mentioned previously, mean functions in different

components may involve different subsets of

β ij

; for example, g1 (⋅) only involves parameters

βi1 and β i 2 , and g 2 (⋅) and g 2 (⋅) share the same parameters βi1 , β i 2 , and β i 3 but have different β decay rates of the second compartment, β 4i and ij 5 , respectively. The diagonal indicator

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

In (7)–(9),

cr ip

g3 (tij , A3 β ij ) = log10 (e

matrices, A1, A 2 , and A 3 , determine which elements of

β ij

are involved and set other unrelated

parameters to be 0 in mean functions, g1 (⋅) , g 2 (⋅) , and g 3 (⋅) , respectively. It is noted that (9) is a natural extension of (8) to create a time-varying decay rate for capturing viral rebound in class

12

3 trajectories. With this mixture clustering, our mixture modeling can be used to estimate probabilities of class membership which is either viral rebound, suggesting virologic failure (class 3) or viral decrease continuously, indicating a confirmed short-term virologic response

cr ip

t

(class 1) and a confirmed long-term virologic response (class 2). By relaxing the homogeneous population assumption, finite mixture models allow for parameter

us

considering individual variation around a single mean trajectory, a finite mixture model allows different classes of individuals to vary around several different mean trajectories. In this way,

an

finite mixture modeling may do a better job of capturing inter-individual variation because it

M

does not require that all individuals follow the same average trajectory over time. To allow for non-ignorable missing mechanism in viral load response, we should incorporate a

ce pt ed

missing data model for the missing response mechanism to avoid possibly biased results. Toward this end, we may consider that both missing observations during study and earlier dropouts are likely to be informative or non-ignorable missing in the sense that the probability of missing data may depend on the missing values. Although more complicated missing models can be considered, we focus on the following simple independent missing data model to avoid too many nuisance parameters where the missingness may be related to the viral load value being missing

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

differences across several unobserved classes within a heterogeneous population. Instead of

and current CD4 observation.



f ( ri | η ) = ∏ i =1 ∏ ji=1  P ( rij = 1| η )  1 − P ( rij = 1| η )  i =1 n

n

n

1− rij

rij

13

,

(11)

where

logit  P ( rij = 1| η )  = η0 + η1 zij + η2 yij

η = (η 0 ,η1 ,η 2 )

T

and

is a vector of unknown

nuisance parameters. Note that the non-ignorable missing data model (11) is not testable based on the observed data, so we should consider alternative missing data models to perform

cr ip

t

sensitivity analysis (see Section 4.3 in detail).

us

In a longitudinal study, the longitudinal response and covariate processes are usually connected physically or biologically. Although a simultaneous inference method based on a joint likelihood

an

for the covariate and response data may be favorable, the computation associated with the joint likelihood inference in a finite MNLMEJ models for longitudinal data can be extremely intensive

M

and may lead to convergence problems(Liu and Wu, 2007; Wu, 2002). Here we propose a fully Bayesian inference method for covariate model (2) and mixture of NLME models (5) and (6) in

ce pt ed

the presence of missing data model (11) at population and individual levels, simultaneously. We obtain numerical approximations to posterior distributions using Markov chain Monte Carlo (MCMC) methods.

Let

θ = {α , β ,η , σ 12 , σ 22 , Σ a , Σ b ,ν , δ }

be the collection of unknown parameters in all models

except for the mixture weight π in (5). Under the Bayesian framework, we next need to specify

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

3. Inferential procedure

prior distributions for all the unknown parameters in these models as follows:

α ~ N 3 (τ 1 , Λ 1 ) , σ 12 ~ IG (ω1 , ω2 ) , Σ a ~ IW ( Ω1 , ρ1 ) , β ~ N 6 (τ 2 , Λ 2 ) , σ 22 ~ IG (ω3 , ω4 ) , Σ b ~ IW ( Ω2 , ρ 2 ) ,

η ~ N 3 (τ 3 , Λ 3 ) , δ ~ N ( 0, γ ) , ν ~ Exp ( v0 ) I (ν > 2 ) ,

14

(12)

where the mutually independent Inverse Gamma ( I G ), Normal (N), Exponential (Exp) and Inverse Wishart ( I W ) prior distributions are chosen to facilitate computations. The truncation point in exponential prior distribution for ν was chosen to assure a finite variance. The super-

cr ip

t

Λ parameter matrices 1 , Λ 2 , Λ 3 , Ω1 and Ω2 can be assumed to be diagonal for convenient

implementation. By its definition, the latent indicating variables

iid

(13)

an

in which π = ( π 1 , π 2 , π 3 )

T

follows a Dirichlet distribution (Dir) (Diebolt and Robert, 1994;

M

Lavine et al., 1992),

(14)

ce pt ed

π ~ Dir (φ1 , φ2 , φ3 ) .

follow a

us

ci ~ Cat ( (1, 2,3) , (π 1 , π 2 , π 3 ) ) ,

An MCMC scheme for this mixture model is composed of following two basic steps:

(i). Sampling class membership indicators

ci ( i = 1,… , n )

, conditional on population parameters,

θ, and individual random-effects, ai and bi.

ci ( i = 1,… , n )

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Categorical distribution ( C a t )

ci ( i = 1,… , n )

Generate

P ( ci = k | ai , bi , θ , yi ) =

from

π k f ( yi | ai , bi , ci = k , θ )

∑ m=1π m f ( yi | ai , bi , ci = m,θ ) 3

15

,

(15)

where

f ( yi | ai , bi , ci = k ,θ )( k = 1, 2,3)

is a conditional density function of

yi

based on (4).

Then, update the probability (population proportion) π for next iteration from distribution

numk = ∑ i =1 I ( ci = k ) n

where

,

( k = 1, 2,3) , in which I (⋅)

t

(16)

cr ip

(π | num1 , num2 , num3 ) ~ Dir (φ1 + num1 ,φ2 + num2 ,φ3 + num3 ) ,

is an indicator function. Note that an

us

(ii). Sampling parameters θ, and individual random-effects ai and bi, conditional on class c = ( c1 ,… , cn )

an

T

membership indicator

.

M

Following the study by Sahu et al. (2003), it can be shown, conditional on ci determined in step

ce pt ed

(i), that by introducing the random vector wi and random variable ui based on the stochastic zi

representation for the ST distribution (see Appendix A for details), with random-effects bi, and

(

)

rij

( (

)

)

yi | zi , wi , ui , bi , ri , ci ~ N ni gci ti , Aci β ij + δ  wi − J (ν ) 1ni  , ui−1σ 22 I ni ,

(

−1 i ni

wi | ui ~ N ni 0, u I

) I (w

i

> 0 ) , ui ~ Γ (ν / 2,ν / 2 ) ,

bi ~ N ( 0, Σ ) , rij | yij , zij ~ Bernoulli ( Pij ) .

(

where zi* = zi*1 ,… , zin* i

)

T

with random-effects ai, yi

in missing data model (11) can be hierarchically formulated as

zi | ai ~ N zi* , σ 11 I ni , ai ~ N ( 0, Σ a ) ,

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

individual could belong to different classes in different iterations, determined by (15).

and Pij = P ( rij = 1| η ) .

16

(17)

Let the observed data

D = { yobs ,i , ti , zi , ri , ( i = 1,… , n )}

,

f ( ⋅ | ⋅)

be a conditional density function

2 2 and h ( . ) be prior density function, respectively. We assume that α , β , η, σ 1 , σ 2 , Σ a , Σ b , ν

are

independent

of

each

h (θ ) = h (α ) h ( β ) h (η ) h (σ 12 ) h (σ 22 ) h ( Σ a ) h ( Σ b ) h (ν ) h (δ )

other,

i.e.,

t

δ

cr ip

and

. After we specify the models for the

observed data conditional on membership indicator and prior distributions of the unknown model

us

the Bayesian framework. When the missing response data are nonignorable, we need to assume

an

only a possible missing data model and then incorporate it into likelihood. Thus, the joint

f (θ | D, c ) ∝

{∏

M

posterior density of θ conditional on D and c can be given by

∫ ∫ ∫ f ( y | z , w , u , b , r , c ) f ( w | u , w > 0) f (u ) × f ( z | a ) f ( r | y , z ) f ( a ) f ( b ) dy da db } h (θ ) , i

i

i

i

i

i

i

i

i

i

i

i

i

i

mis ,i

i

i

i

i

i

(18)

ce pt ed

i

n

i =1

In general, the integrals in (18) are of high dimension and do not have a closed form. Analytic approximations to the integrals may not be sufficiently accurate. Therefore, it is prohibitive to directly calculate the posterior distribution of θ based on the observed data and class membership. As an alternative, the MCMC procedure can be used to sample population parameters, θ, and random-effects, ai and bi ( i = 1, …, n), from conditional posterior

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

parameters, we can draw samples for the parameters based on their posterior distributions under

distributions, based on (18), using the Gibbs sampler along with the Metropolis-Hastings (M-H) algorithm. Steps (i) and (ii) are repeated alternatively in iterations of MCMC until convergence reached. An important advantage of the above representations based on the hierarchical models

17

is that they can be very easily implemented using the freely available WinBUGS software (Lunn

cr ip us an M ce pt ed Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

t

et al., 2000) interacted with a function called bugs in the package R2WinBUGS of R.

18

4. AIDS data analysis

t

4.1. Model implementation

cr ip

As shown in Figure 1(b), the histogram of viral load in log10 scale clearly indicates the

asymmetric feature. Thus, it seems plausible to fit models with a skew distribution to the data.

us

specifications for model error in the finite MNLMEJ models for viral load response in

compare their performance:

Model ST: A mixture of NLME models where three mean functions specified by (7)–(9)

M



an

conjunction with the missing data model (11) and the CD4 covariate model (2) were employed to



ce pt ed

with the ST distribution for response model error.

Model SN: A mixture of NLME models where three mean functions specified by (7)–(9)

with the SN distribution for response model error. • Model N: A mixture of NLME models where three mean functions specified by (7)–(9) with the normal (N) distribution for response model error.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

With this information, the following four statistical models with different distribution

• Model C: A commonly used NLME model where the mean function specified by (9) alone with the ST distribution for response model error. We conducted the following three scenarios. First, because a normal distribution is a special case

of an SN distribution when the skewness parameter is zero, while the ST distribution reduces to

19

the SN distribution when degrees of freedom are large, we investigated how asymmetric distributions for model error (Models ST and SN) contribute to modeling results and parameter estimation in comparison with a symmetric (normal) distribution for model error (Model N).

cr ip

t

Second, we estimated the model parameters by using the “naive” method (denoted by NM), which ignores measurement error in CD4 covariate and missing data in viral load response. That

zij*

in the response model (5) without the missing viral load data model incorporated into

us

values

model fitting. We used it as a comparison to the joint modeling (JM) approach proposed in

an

Section 3. This comparison attempted to investigate how the measurement errors in CD4 and missing data in viral load contribute to modeling results. Finally, we further investigated a

M

commonly used NLME model with ST distribution (Model C) where the mean function is specified by (9) alone. This model has been widely applied to analyze viral load data (Huang et

ce pt ed

al., 2006; Liu and Wu, 2007; Wu and Ding, 1999; Wu, 2002) in significantly advancing the understanding of pathogenesis of HIV infection and the assessment of effectiveness of ARV treatment. We compared this commonly used NLME joint model (ignoring data feature of heterogeneous population) with the MNLMEJ model proposed in this article to explore how heterogeneous feature influences modeling results.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

is, the “naive” method uses only the observed CD4 values zij rather than unobservable CD4

To carry out the Bayesian inference, we specified the values of the hyper-parameters in the prior

distributions. In the Bayesian analysis, we specified only the priors at the population level. We took weakly-informative prior distributions for the parameters in the MNLMEJ models. In particular, (i) fixed-effects were taken to be independent normal distributions N(0,100) for each

20

2 component of the population parameter vectors α, β and η ; (ii) for the variance parameters σ 1 2 and σ 2 we assume a limiting non-informative inverse gamma prior distribution, IG(0.01, 0.01)

t

so that the distribution has mean 1 and variance 100; (iii) the priors for the variance-covariance

ρ1 )

and

IW(Ω2,

ρ2 )

with

cr ip

matrices of the random-effects Σa and Σb were taken to be inverse Wishart distributions IW(Ω1, covariance

matrices

Ω1 = diag (0.01, 0.01, 0.01) ,

us

ν ~ Exp(0.1) I (ν > 2 ) parameter ν followed truncated exponential distribution ; (v) for the

an

skewness parameter δ, we chose independent normal distributions N(0,100); and finally (vi) we

φ1 = φ2 = φ3 = 1 , where hyper-parameters

M

set hyper-parameters of Dirichlet distribution in (14),

φ1 , φ2 and φ3 are set to be equal because we initially assume individuals have equal probabilities

ce pt ed

of coming from any one of three classes.

The MCMC procedure was implemented using WinBUGS software (Lunn et al., 2000) interacted with a function bugs in the package R2WinBUGS of R, and the program codes for Model ST are available in Appendix C. When the MCMC procedure was applied to the actual clinical data, convergence of the generated samples was assessed using standard tools within WinBUGS software such as trace plots and Gelman-Rubin (GR) diagnostics(Gelman and Rubin,

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Ω2 = diag (0.01, 0.01, 0.01, 0.01) ρ = ρ 2 = 4 , respectively; (iv) the degrees of freedom and 1

1992). Figure I in Appendix D shows the dynamic version of GR diagnostics based on Model ST as obtained from the WinBUGS output for the representative parameters where the three curves are given: the middle and bottom curves below the dashed horizontal line (indicated by the value ˆ ˆ one) represent the pooled posterior variance (V ) and average within-sample variance (W ) ,

21

ˆ respectively, and the top curve represents their ratio ( R ) (Gelman and Rubin, 1992). It is seen

ˆ ˆ that Rˆ tends to 1, and V and W settle down as the number of iterations increases indicating that

t

the algorithm has reached convergence. With the convergence of diagnostics observed, we

cr ip

proposed that the three chains were run with the following considerations. After an initial

number of 50,000 burn-in iterations, every 50th MCMC sample was retained from the next

us

targeted posterior distributions of the unknown parameters for statistical inference. The computational burden for fitting such mixture of highly nonlinear joint models via MCMC

an

procedure was reasonable. For example, to fit Model ST using JM approach, it took about 7.0

M

hours on a Window PC with Intel Core i7-2600 CPU @ 6.80GHz and RAM of 16 GB. We also check the k-lag serial correlation of the samples for each parameter to diagnose independence of

ce pt ed

MCMC samples. We graphically checked the last 500 samples drawn from MCMC sampling scheme for each parameter (plot not shown here) and found that the consecutive samples move randomly towards different directions, indicating that MCMC is not “sticky” and MCMC samples are independent for each parameter, suggesting the convergence to the stationary distribution.

4.2. Comparison of modeling results

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

50,000 iterations for each chain. Thus, we obtained a total of 3,000 samples for each of the

The Bayesian modeling approach in conjunction with the CD4 covariate model (2) and the

mixture of NLME response model discussed in Section 2.3 with different distribution specifications for the model error was used to fit the viral load and CD4 data simultaneously. Tables 1 and 2 present the population posterior mean (PM), the corresponding standard deviation

22

(SD) and 95% credible interval (CI) for fixed-effects parameters based on the four models with two methods (JM approach and naive method). The following findings are obtained for estimated results of parameters.

cr ip

t

In the response mixture model, the findings, particularly for the fixed-effects ( β 5 , β 6 ) , which

are parameters of the second-phase viral decay rate, show that these estimates are statistically

us

Nevertheless, for the estimate of the coefficient of CD4 covariate β 6, its estimate is significantly positive. This means that CD4 has a significantly positive effect on the second-phase viral decay

an

rate, suggesting that the CD4 covariate may be an important predictor of the second-phase viral decay rate during the treatment. For the fixed-effects (β 2, β 3), which are parameters of the first-

M

phase viral decay rate, show that β 2 is significantly estimated, while the estimate of β 3, where

ce pt ed

the corresponding baseline CD4 covariate is interacted with time t, is not significant, indicating that the baseline CD4 has no effect on the first-phase viral decay rate. In addition, there are 2 marked differences in posterior means of the variance (scale parameter) σ 2 (0.08, 0.16 and 0.31

for Models ST, SN and N, respectively) across the three models. The estimated values of the variance for both Models ST and SN are much smaller than that of Model N because the former models take into account skewness of the data while the latter does not. The estimates of the

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

significant for three Models ST, SN and N since the 95% credible intervals do not contain zero.

skewness parameter ( δ ) of Models ST and SN are, respectively, 0.50 with 95% CI (0.39, 0.58)

and 0.62 with 95% CI (0.51, 0.72). This finding suggests that there is a significantly positive skewness in the data and confirms the fact that the distribution of the original data is skewed even after taking log10-transformation (see Figure 1(b)). Thus, incorporating a skewness

23

parameter in the modeling of the data is recommended. Furthermore, the estimate (0.50) of the skewness parameter based on Model ST is less than that (0.62) based on Model SN. This may be due to the fact that an additional parameter ν (degrees of freedom) for heaviness in the tails was

cr ip

t

estimated to be 3.77 trading-off the effect of skewness. For parameter estimates of the CD4 covariate model (2), the estimates of the coefficient α2 based on Models ST, SN and N are

us

2 negative. The estimate (0.73) of the variance σ 1 are equivalent for the three models.

an

We have seen from model fitting results that, in general, all of Models ST, SN and N provided reasonably good fit to the observed data for most patients in our study, although the fitting for a

M

few patients was not completely satisfactory due to unusual viral load fluctuation patterns for these patients, particularly for Model N. Figure 2 displays estimated viral load trajectories based

ce pt ed

on the three models for the six representative subjects shown in Figure 1(a). The estimated individual trajectories based on Models ST and SN fit the observed viral load values more closely than those based on Model N. Note that the lack of smoothness in Models ST and SN estimates of individual trajectories may be explained by the fact that a random component wi was incorporated in the expected functions related to fitted values (see model (17) for details) based on the property of the skewed distributions for “chasing the data” to the extent.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

slightly different but significantly positive, whereas the estimates of α1 and α3 are significantly

To assess the goodness-of-fit for the proposed statistical models, Figure II in Appendix D

presents the diagnosis plots of the observed values versus the fitted values (left panel), and ST, SN, and normal Q-Q plots of different distributions (right panel) based on Models ST, SN and N. It can be seen from the left panel in Figure II that Models ST and SN gave a similar performance,

24

but provided better fit to observed data compared with Model N. This finding was also confirmed by the residual Q-Q plots (right panel) that all show the existence of outliers, but Models ST and SN clearly have fewer outliers and, thus, give better fit to the data than Model N.

cr ip

t

Further, Model ST did an even better job than Model SN in accounting for skewness. This finding is confirmed by their sums of squared residuals (SSR), which are 439.2 (Model ST),

us

To select the best model that fits the data adequately, a Bayesian selection criterion, known as deviance information criterion (DIC) suggested by Spiegelhalter et al. (2002), is used. As with

an

other model selection criteria, we caution that DIC is not intended for identification of the “correct” model, but rather merely as a method of comparing a collection of alternative

{∑

i, j

(y

rep ,ij

− yobs ,ij )

2

} for model comparison, where the predictive value y

ce pt ed

by

EPD = E

M

formulations. As an alternative, we also evaluate expected predictive deviance (EPD) formulated

rep,ij

is a

replicate of the observed yobs,ij and the expectation is taken over the posterior distribution of the model parameters θ (see Gelman et al. (2003) in detail). This criterion chooses the model where the discrepancy between predictive values and observed values is the lowest. For mixture models, the structure of DIC does not allow for automatic computation in WinBUGS software. We wrote R code interacted with WinBUGS program to calculate estimated DICs using the joint

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

458.7 (Model SN), and 875.0 (Model N), respectively.

modeling approach based on Models ST, SN and N, which are 11368.9, 11572.8 and 12699.1, respectively. As mentioned previously, it is difficult to tell which model is “correct”, only which one fits the data better. Therefore, based on the DIC, the results indicate that Model ST is the best fitting model and Model SN is next, supporting the contention of a departure from

25

normality. This finding is confirmed by the results of the EPD values (see Table 2). These results are also consistent with those in diagnosis of the goodness-of-fit displayed in Figure II in Appendix D, indicating that Model ST performs best. In summary, our results may suggest that it

cr ip

t

is important to assume a skew distribution in the mixture models for the viral load response in order to achieve reliable results, in particular if the data exhibit skewness. Based on these

us

4.3. Analysis results based on Model ST

an

As mentioned in Section 1, one of the primary objectives in this analysis was to cluster all individuals’ membership into 3 classes based on viral load trajectories. Based on the mixture

M

modeling, we are able to obtain a summary of class membership at both the population and individual levels. At population level, the MCMC procedure yields samples from the posterior (π 1 ,… , π K ) K = 3 , in (16), the population proportion of individuals in each class.

ce pt ed

distribution of

The estimates of population proportion and associated 95% CI of ( π 1 , π 2 , π 3 ) for three classes are 22.16% (20.06%, 24.39%), 41.30% (38.52%, 44.16%) and 36.54% (33.71%, 39.47%), respectively. It can be seen that class 2 (decrease and maintain stable) has the largest proportion 41.30%, followed by class 3 (decrease and rebound later) with proportion 36.54% and then class 1 (decrease all the time) with the lowest proportion 22.16%. Thus, out of 379 patients, the

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

observations, we will further report our results in detail only for the best Model ST below.

patterns of changing viral load of 84, 157 and 138 patients followed classes 1, 2 and 3, respectively. This indicates that a confirmed short-term (class 1 due to early dropout) and longterm (class 2) virologic responses were observed in a total of 63.46% of the patients in classes 1 and 2.

26

At individual level, the posterior probability of individual i belonging to the k th ( k = 1, 2, 3)

class,

pik = E [ I (ci = k )]

1 , can be approximated by M



M m =1

I (ci( m ) = k )

(m) , where ci is class

t

membership of individual i drawn from the posterior distribution (15) in the mth MCMC

cr ip

iteration ( m = 1,… , M ) and M is a total number of posterior sample iterations (M = 3000 in this

study). Barplot shown in Figure 3 displays the probabilities for the selected 20 individuals (from

us

classified as either viral load rebound or not may help physicians to refine treatment strategy and

an

to identify the reason of viral load rebound for such individual patient. Table 3 shows individual posterior probabilities for the six representative patients shown in Figure 1(a). The probabilities

M

shown in Table 3 and the classes of trajectories shown in Figure 1(a) are matched quite well: The patients 31 and 105 belong to class 1 because their viral loads decrease constantly in a early

ce pt ed

short-term period, with probabilities 81% and 57%; the viral loads of the patients 29 and 132 decrease and then maintain stable, and thus, they belong to class 2, with probabilities 92% and 87%; and finally, the patients 33 and 99 are in class 3 (with viral load rebound), with probabilities 87% and 100%.

The estimated results of fixed-effects presented in Table 1 based on Model I indicate that the first-phase decay rate, and the second-phase decay rate without and with time-varying CD4

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

the 21st to the 40th subjects). The probability corresponding to individual patient who is

covariate

may

be

approximated

by

λˆ1 = 116.5 + 4.62z0 ,

λˆ2 = −4.35

and

λˆ2* (t ) = −4.35 + 9.77(−0.21 + 1.42t − 1.46t 2 ) , respectively, where z is the baseline CD4 value. 0 Thus, the population viral load processes of 3 classes may be approximated by

27

{

},

{

}

Vˆ1 (t ) = exp 10.5 − λˆ1t

{

}

{

Vˆ2 (t ) = exp 10.5 − λˆ1t + exp 6.8 − λˆ2t

{

Vˆ3 (t ) = exp 10.5 − λˆ1t + exp 6.8 − λˆ2* (t )t

}.

}

and

Since the first-phase viral decay rate, λ1 , is not

t

significantly associated with the baseline CD4 (due to insignificant estimate of β3 ) and the

cr ip

* second-phase viral decay rate λ2 (t ) in component 3 is significantly associated with the time-

us

V(t) may be significantly associated with time-varying CD4 values, but not associated with the baseline CD4. This simple approximation considered here may provide a rough guidance and

an

point to further research even though the true association described above may be more complicated. The analysis results suggest that for patients with viral load rebound, the CD4

M

process at time t has a significantly positive effect on the second-phase viral decay rate; this finding confirms that the CD4 covariate is a more significant predictor on viral decay rate during

ce pt ed

late stage. More rapid decrease in CD4 cell count may be associated with slower viral decay in late stage, which, in turn, may result in a viral load rebound. In order to investigate how the measurement errors in CD4 and missing data in viral load contribute to modeling results, we further compare two methods for estimation based on Model ST: the proposed joint modeling approach (JM) and the ‘naive’ method (NM) where the raw

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

varying CD4 values (due to significant estimate of β 6 ), this suggests that the viral load change

(observed) CD4 values

zij

, rather than the true (unobservable) CD4 values

zij*

, are substituted in

the mixture model and missing data in viral load are ignored. It can be seen from Tables 1 and 2 that there are important differences in the estimates for the parameters β5 and β6, which are directly associated with whether or not ignoring potential CD4 measurement errors for inference.

28

The NM may produce biased estimates and substantially underestimate the covariate CD4 effect

(NM : βˆ6 = 0.02 vs. JM : βˆ6 = 9.77) . The estimated SD for the CD4 effect ( β ) using JM is at 6 least ten times larger than that using NM, probably because JM incorporates the variation from

cr ip

t

fitting the CD4 process. The difference of estimates between JM and NM, due to whether or not potential CD4 measurement errors and missing data in viral load are ignored for inference,

us

analysis. We also obtained estimated DIC values with 11368.9 (JM) and 12017.4 (NM) based on Model I, respectively, suggesting that JM performs better in comparison with NM. Thus, it is

an

important to take the CD4 measurement errors and the missing data in viral load into account

M

when collected data are “imperfectly” measured.

We further investigated a commonly used NLME model with ST distribution (Model C, ignoring

ce pt ed

data feature of heterogeneous population), where the mean function is specified by (9) alone, and compared it with the mixture of NLME model (Model ST) to explore how heterogeneous feature influences modeling results. We found that the important differences in the estimates for the parameters ( β 2 , β 3 ), which are associated with the first-phase viral decay rate, and for the parameters ( β 5 , β 6 ), which are associated with the second-phase viral decay rate. Based on the estimates of β 3 (Model ST:

βˆ3 = 4.62 vs. Model C: βˆ3 = −6.83 ), which is directly associated

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

indicates that CD4 measurement errors and missing data in viral load cannot be ignored in the

with effect of baseline CD4 covariate, and β6 (Model ST:

βˆ6 = 9.77 vs. Model C: βˆ6 = −13.5 ),

which is directly associated with effect of time-varying CD4 covariate, Models ST and C provide completely opposite relation between viral decay rates and covariates.

29

The basic tool for investigating model uncertainty is sensitivity analysis. That is, we simply make reasonable modifications to the assumptions in question, recompute the posterior quantities of interest, and see whether they have changed in a way that significantly affects the

cr ip

t

interpretations or conclusions. If the results are robust against the suspected assumptions, we can report the results with confidence. However, if the results are sensitive to the assumptions, we can choose to communicate the sensitivity results and interpret the results with caution (Gelman

us

parameter estimates on the prior distributions and missing data models. First, we implemented

an

the MCMC sampling scheme and monitored several independent MCMC runs, starting from different super-parameter values in IG (⋅, ⋅) prior distribution based on Model ST. The results

M

follow the same patterns as those presented in this paper. The conclusions of our analysis remain unchanged (data not shown due to space limitations). Second, it is important to check sensitivity

ce pt ed

of parameter estimates to various plausible missing data models. Subject-area knowledge may help us to determine alternative missing data models. It is conceivable that missing values may be related to current and previous viral load and CD4 measurements and/or missingness may link to the random-effects that characterize the true response processes in practice. Note that we should avoid building a too complicated missing data model since the parameters may become non-identifiable. Thus, we consider the mixture joint model with the ST distribution for viral

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

et al., 2003). Thus, we performed the sensitivity analysis to examine the dependence of

load response yij and CD4 covariate zij in conjunction with the following plausible missing data

models (MDM) for sensitivity analysis. • MDM I: logit [ P ( rij = 1| η )] = η0 + η1 zij + η 2 yij ;

30

• MDM II: logit [ P ( rij = 1| η )] = η0 + η1ri , j −1 + η 2 ai1 + η3 ai 2 + η 4bi 2 + η5bi 4 ; • MDM III: logit [ P ( rij = 1| η )] = η0 + η1 yi , j −1 + η 2 zij + η3 yij .

cr ip

t

MDM I is one in (11) discussed in Section 2.3. MDM II is given by inducing combination of the previous missing status and random-effects from the mixture joint models. Note that the random-

us

intercept and slope), so they are included; although ai3 in the MLE model (2) quantifies variation of individual CD4 trajectories through a quadratic form, which seems not highly predictive for

an

missing value, it is excluded here to reduce number of parameters. The random-effects bi2 and bi4 represent individual variations in the first- and second-phase viral decay rates, respectively, so

M

they may be related to missing values. While bi1 and bi3 represent variations in the baseline viral loads, they may not appear to be highly related to missing data, so they are excluded from the

ce pt ed

model to reduce the number of parameters. MDM III incorporates the previous and current viral load measurements being missing as well as current CD4 value; By comparing the results based on the MDMs I, II and III presented in Tables 1 and 2, it is seen that all of three models have very similar posterior estimates of population parameters, suggesting that the estimation of population parameters is robust against the MDMs and, thus,

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

effects ai1 and ai2 capture the main features of individual CD4 trajectories (such as initial

the parameter estimates are reliable. In addition, we found that adding previous viral load missingness (MDM III) does not improve the DIC over MDM I; this may be explained by the fact that the current observations may be most predictive for missing data since measurements are relatively sparse in this study and early observed values may not have much influence on patients’ missing values. MDM II (DIC=11437.5), which is induced by combination of the

31

previous missing status and random-effects, gives largest DIC value in comparison with MDM I (DIC=11368.9) and MDM III (DIC=11388.3). These findings are also confirmed by the results of the EPD and SSR values.

cr ip

t

5. Concluding discussion

For longitudinal biomarkers, viral load and CD4 count, that have been used as outcome measures

us

of HIV infection and to assess risk of disease progression to AIDS, we have developed a

an

Bayesian approach to the MNLMEJ models that use longitudinal viral load data with features of heterogeneous population, skewness and missingness in the presence of mismeasured CD4

M

covariate to provide estimates of both model parameters and class membership probabilities at individual and population levels. This kind of mixture modeling approach is important in many

ce pt ed

biostatistical application areas, allowing accurate inference of parameters while adjusting for heterogeneity, non-normality and incompleteness in the longitudinal data. Although this paper is motivated by AIDS clinical study, the basic concepts of the newly-developed MNLMEJ models have generally broader applications whenever the relevant technical specifications are met and longitudinal measurements are assumed to arise from two or more identifiable subclasses within a population.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

to evaluate treatment effects of ARV therapies in AIDS clinical trials, to understand pathogenesis

This article have considered two mixture modeling methods (NM and JM) to compare potential models with different distribution specifications and various scenarios. In particular, (i) we investigated how asymmetric distributions for model error (Models ST and SN) contribute to modeling results and parameter estimation in comparison with a symmetric (normal) distribution

32

for model error (Model N). Note that the results from symmetric t model are not presented in this article because the findings are similar to those from Model N. (ii) We compared NM with JM to investigate how the measurement errors in CD4 and missing data in viral load contribute to

cr ip

t

modeling results. (iii) Under the assumption of ST distribution for model error, we further explored how heterogeneous feature influences modeling results by comparing Model ST with a commonly used NLME model (Model C), in which the data feature of heterogeneous population

us

the proposed mixture joint models and methods may have a significant impact on HIV/AIDS

an

research and ARV treatments to AIDS patients and help improve understanding of the pathogenesis of HIV infection and evaluation of ARV treatments.

M

One advantage of mixture modeling approach-based MNLMEJ models proposed in this article is

ce pt ed

its flexibility to study simultaneous impact of various data characteristics (heterogeneity, nonlinearity, skewness and incompleteness). Another advantage of our mixture modeling is not only to provide model parameter estimates, but also model-based probabilistic clustering to obtain class membership probabilities. In the data analysis presented in this paper, the probabilities of belonging to any one of three classes can be obtained at both population and individual levels. This information may help physicians refine general therapeutic strategy and develop individualized treatment. In summary, for the data analysis to demonstrate the proposed the

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

is ignored. The important findings for these scenarios are detailed in Section 4.3. In summary,

MNLMEJ models and associated methodologies, the results indicate that for reliable estimation of HIV dynamic parameters and class membership probabilities, we should simultaneously address asymmetry, heterogeneity, missingness and measurement error in longitudinal data. We conclude that it is important to take the CD4 measurement errors and viral load missing data into

33

account, indicating that JM outperforms NM; it is preferable to offer mixture models with a skew distribution, in particular, when the data exhibit skewness; and it is critical to account for heterogeneous population feature for inference if longitudinal measurements arise from two or

cr ip

t

more identifiable subclasses within a population. For inference of mixture models, parameter (or model) identifiability can be an important but

g k ( ⋅)( j = 1,…, K )

us

As discussed by Pauler and Laird (2000), the mixture components,

, in the

an

finite mixture model may have the same family of densities but differ only in specific values of parameters such as in their means and/or variances, or have completely different functional

M

forms with parameters of different dimensions and meanings across the submodels, which is the case adopted in this paper (see equations (7)–(9) in detail). These mean functions have been well

ce pt ed

developed and have biologically meaningful interpretation based on HIV dynamics (Nowak and May, 2000; Perelson et al., 1997; Wu and Ding, 1999). It is noted that we must ensure each component model to be identifiable to make whole mixture model identifiable. To make (8) and (9) identifiable, we assume

* λ1i > λ2i in (8) and λ1i > λ2ij in (9) as discussed in Appendix B. In

practice, if the models are not identifiable, the MCMC algorithm would diverge quickly. In the application considered in this article, the MCMC algorithm was converged without problems and

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

difficult problem when a large number of model parameters must be estimated simultaneously.

we did not observe potential identifiability problems. Instead of specifying different mean functions for classes, alternatively, we may also specify an universal mean function for all classes, e.g. the mean function (9), and let data themselves determine the number of clusters and/or shapes of trajectories. In that way the identifiability issue may arise. Because of

34

complexity of longitudinal studies, an advantage of choosing each component with different functional forms

gk

,

( j = 1,… , K )

rather than with the same general family in the finite mixture

t

models is that the non-identifiability can be avoided (Pauler and Laird, 2000).

cr ip

The two issues related to this study are noted as follows. First, in this article, we imputed the censored value by a limit of detection (LOD) was imputed by half of LOD for simplification, but

us

multiple imputation. This complicated problem is beyond the focus of this article, but a further

an

study may be warranted. Second, it is our purpose that this article is to demonstrate the proposed mixture joint models and methods with various scenarios for real data analysis in comparison of

M

symmetric distributions with asymmetric distributions for model error, although a limited simulation study may be conducted to evaluate the performance from different model

ce pt ed

specifications and the corresponding methods. However, since this paper investigated many different scenarios-based models and methods with real data analysis, the complex natures considered in this article, especially, skew distributions involved, will pose some challenges for such simulation studies which requires additional efforts. We are actively investigating these important research problems and hope that the relative results could be reported in the near future.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

is not intended to consider data with LOD problems using more advanced techniques such as

Acknowledgements The authors would like to gratefully acknowledge the Editor and two anonymous referees for their insightful comments and constructive suggestions that led to a marked improvement of the article.

35

References Arellano-Valle RB, Genton, M. (2005). On fundamental skew distributions. Journal of

t

Multivariate Analysis 96: 93–116.

mixed models. Journal of Applied Statistics 34: 663–682.

cr ip

Arellano-Valle RB, Bolfarine H, Lachos, VH. (2007). Bayesian inference for skew-normal linear

us

distribution. Journal of Royal Statistical Society, Series B 61: 579–602.

an

Azzalini A. and Capitanio A. (2003). Distributions generated by perturbation of symmetry with

367–389.

M

emphasis on a multivariate skew-t distributions. Journal of Royal Statistical Society, Series B 65:

Azzalini A, Genton M. (2008). Robust likelihood methods based on the skew-t and related

ce pt ed

distributions. International Statistical Review 76: 106–129. Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edition. London: Chapman and Hall. Diebolt, J., Robert, C. (1994). Estimation of finite mixture distributions by Bayesian sampling. J. Royal Statist. Soc. Series B 56, 363–375.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Azzalini A, Capitanio A. (1999). Statistical applications of the multivariate skew normal

Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (2003). Bayesian Data Analysis. London:

Chapman and Hall. Gelman, A., Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–511.

36

Hammer S.M., et al. (2002). Dual vs single protease inhibitor therapy following antiretroviral treatment failure: a randomized trial. The Journal of the American Medical Association 288, 169–180.

cr ip

t

Higgins, M., Davidian, M. and Gilitinan, D. M. (2007). A two-step approach to measurement

error in time-dependent covariates in nonlinear mixed-effects models, with application to IGF-I

us

Ho HJ, Lin TI. (2010). Robust linear mixed models using the skew-t distribution with application

an

to schizophrenia data. Biometrical Journal 52, 449–469.

Huang, Y., Liu, D., Wu, H. (2006). Hierarchical Bayesian methods for estimation of parameters

M

in a longitudinal HIV dynamic system. Biometrics 62, 413–423.

Huang, Y., and Dagne, G. (2011). A Bayesian approach to joint mixed-effects models with a

ce pt ed

skew-normal distribution and measurement errors in covariates. Biometrics 67, 260–269. Jara A, Quintana F, Martin ES. (2008). Linear mixed models with skew-elliptical distributions: A Bayesian approach. Computational Statistics and Data Analysis 52, 5033–5045. Lavine, M., West, M. (1992). A Bayesian method for classification and discrimination. Canadian Journal of Statisttics 20, 451–461.

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

pharmacokinetics. Journal of the American Statistical association 92, 436–448.

Liu, W., and Wu, L. (2007). Simultaneous inference for semiparametric nonlinear mixed-effects

models with covariate measurement errors and missing responses. Biometrics 63, 342–350. Lu X. and Huang, Y. (2014). Bayesian analysis of nonlinear mixed-effects mixture models for longitudinal data with heterogeneity and skewness. Statistics in Medicine 33, 2830–2849.

37

Lunn, D. J., Thomas, A., Best, N., Spiegelhalter, D. (2000). WinBUGS - a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing 10, 325–337. Muthen, B., Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM

cr ip

t

algorithm. Biometrics 55, 463–469.

Nowak, M.A., May, R.M., (2000). Virus Dynamics: Mathematical Principles of Immunology and

us

Pauler, D., Laird, N.M. (2000). A mixture model for longitudinal data with application to

an

assessment of noncompliance. Biometrics 56, 464–472.

Perelson, A.S., Essunger, P., Cao, Y., Vesanen, M., Hurley, A., Saksela, K., Markowitz, M., Ho,

Nature 387, 188–191.

M

D.D. (1997). Decay characteristics of HIV-1-infected compartments during combination therapy.

ce pt ed

Sahu S.K., Dey D.K., Branco M.D. (2003). A new class of multivariate skew distributions with applications to Bayesian regression models. The Canadian Journal of Statistics 31, 129–150. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B 64, 583–639. Titterington, D.M., Smith, A.F.M., Makov, U.E. (1985). Statistical Analysis of Finite Mixture

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Virology. Oxford: Oxford University Press.

Distributions. Chichester, U.K.: John Wiley and Sons. Wu, H., Ding, A.A. (1999). Population HIV-1 dynamics in vivo: Applicable models and inferential tools for virological data from AIDS clinical trials. Biometrics 55, 410–418.

38

Wu, L. (2002). A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. Journal of the American Statistical Association 97, 955–964.

cr ip

t

Wu, H., Zhang, J.T. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis.

us an M ce pt ed Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Hoboken, New Jersey: Wiley.

39

Model

I

JM

ST

βl

β2

β3

β4

cr ip

Method

β5

β6

αl

α2

α3

us

MDM

an

PM 10.5 116.5 4.62 6.80 −4.35 9.77 −0.21 1.42 −1.46

M

LCI 10.3 108.0 −1.76 6.48 −5.38 6.86 −0.28 0.81 −2.34

ce pt ed

UCI 10.7 125.7 11.4 7.13 −3.35 12.9 −0.14 2.01 −0.57

SD 0.10 4.542 3.37 0.16 0.53 1.58 0.04 0.31 0.45

I

JM

SN

PM 10.4 107.1 5.38 6.55 −5.60 10.1 −0.20 1.33 −1.28

LCI 10.2 98.31 −1.12 6.21 −6.70 7.15 −0.28 0.72 −2.13

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

t

Table 1: Summary of estimated posterior mean (PM) of population (fixed-effects) parameters, the corresponding standard deviation (SD), and lower limit (LCI) and upper limit (UCI) of 95% equal-tail credible interval (CI) based on “naive” method (NM) and joint modeling approach (JM) for a finite mixture of NLME joint models (Models ST, SN and N) and a commonly used NLME joint model (Model C) in conjunction with non-ignorable missing data models (MDM) and mismeasured covariate model (2).

UCI 10.6 116.5 11.9 6.87 −4.47 13.3 −0.13 1.94 −0.40

SD 0.10 4.602 3.27 0.16 0.56 1.60 0.04 0.32 0.46

40

I

JM

N

PM 10.4 106.2 3.78 6.57 −5.49 10.0 −0.21 1.41 −1.38

cr ip

t

LCI 10.2 97.97 −2.37 6.24 −6.68 7.32 −0.28 0.89 −2.21

UCI 10.6 114.8 9.86 6.90 −4.46 13.8 −0.14 1.96 −0.63

C

us

JM

PM 10.6 134.1 −6.83 7.43 −0.90 −13.5 −0.21 1.18 −0.64

an

I

M

LCI 10.4 124.2 −14.7 7.14 −2.35 −17.0 −0.26 0.88 −0.90

ce pt ed

UCI 10.8 143.9 −1.09 7.70 0.51 −10.2 −0.14 1.43 −0.37

SD 0.10 4.959 4.03 0.14 0.71 1.81 0.03 0.13 0.12

I

NM

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

SD 0.09 4.320 3.13 0.17 0.56 1.64 0.04 0.27 0.39

ST

PM 10.5 112.7 4.61 6.87 −3.96 0.02 –





LCI 10.3 104.3 −2.15 6.56 −4.95 −0.25 –





UCI 10.7 121.5 11.5 7.19 −3.02 0.29 –





41

SD 0.11 4.402 3.46 0.16 0.49 0.14 –

JM

ST



PM 10.5 120.3 4.64 6.94 −3.71 9.76 −0.20 1.19 −1.02

cr ip

t

II



LCI 10.3 109.0 −1.71 6.52 −5.38 6.76 −0.28 0.67 −2.17

us

JM

ST

PM 10.5 120.1 4.44 6.95 −3.67 9.69 −0.20 1.17 −0.98

M

III

an

SD 0.11 6.593 4.16 0.25 1.03 1.81 0.04 0.40 0.68

ce pt ed

LCI 10.3 108.7 −1.69 6.51 −5.25 6.90 −0.27 0.65 −2.30

UCI 10.7 133.6 10.1 7.45 −1.77 12.3 −0.12 1.98 −0.27

SD 0.11 6.704 4.72 0.26 0.94 1.56 0.04 0.33 0.57

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

UCI 10.8 134.2 10.3 7.43 −1.51 13.5 −0.13 1.98 −0.22

42

JM

ST

σ 22

δ

PM

0.73

0.08

0.50

LCI

0.68

I

0.78

ce pt ed

UCI

JM

SN

ν

cr ip

I

σ 12

DIC

EPD

SSR

11368.9

0.310

439.2

11572.8

0.323

458.7

us

Model

3.77

an

Method

0.06

0.39

M

MDM

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

t

Table 2: Summary of estimated posterior mean (PM) for parameters of scale, skewness, and degree of freedom, and corresponding standard deviation (SD) and lower limit (LCI) and upper limit (UCI) of 95% equal-tail credible interval (CI) as well as DIC, EPD and SSR values based on “naive” method (NM) and joint modeling approach (JM) for a finite mixture of NLME joint models (Models ST, SN and N) and a commonly used NLME joint model (Model C) in conjunction with non-ignorable missing data models (MDM) and mismeasured covariate model (2).

3.11

0.12

0.58

4.64

SD

0.02

0.02

0.05

0.39

PM

0.73

0.16

0.62



LCI

0.68

0.12

0.51



UCI

0.78

0.21

0.72



43

JM

N

0.03

0.02

0.05



PM

0.73

0.31





LCI

0.68

0.29





UCI

0.78

0.33





SD

0.02

0.01



PM

0.78

12699.1

0.617

875.0

C

I

0.73

ce pt ed

LCI

NM

ST

cr ip

us –

an

JM

0.07

0.43

M

I

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

t

I

SD

3.29

0.05

0.34

3.01

UCI

0.83

0.10

0.51

3.85

SD

0.02

0.02

0.04

0.23

PM



0.09

0.50

3.75

LCI



0.07

0.39

3.10

44

13767.0

0.320

462.4

12017.4

0.353

499.1



0.13

0.59

4.66

SD



0.02

0.05

0.40

PM

0.75

0.08

0.49

3.65

LCI

0.69

0.05

0.38

3.04

UCI

0.81

0.11

0.58

SD

0.03

ST

PM

0.323

449.2

0.318

443.2

us

11437.5

4.60

an 0.02

0.05

M

JM

ST

0.75

ce pt ed

III

JM

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

II

cr ip

t

UCI

0.42

0.08

0.49

3.64

LCI

0.69

0.05

0.38

3.04

UCI

0.81

0.11

0.58

4.62

SD

0.03

0.01

0.05

0.41

45

11388.3

pi1

pi2

1

31

81%

17%

1

105

57%

2

29

2

132

3

us

an

38%

2%

5%

8%

0%

87 %

13%

33

0%

13%

87%

99

0%

0%

100%

M

92 %

ce pt ed

3

0%

pi3

t

Patient ID

cr ip

Class

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Table 3: Individual posterior probabilities of belonging to 3 trajectory classes for six representative patients.

46

cr ip us an M ce pt ed Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

t

Figure 1: (a) Profile of viral load (response) in log10 scale for six representative patients. Trajectory class 1: decrease rapidly and constantly in a short-term period (solid lines, ID: 31, 105); class 2: decrease at the beginning and then maintain stable at a low level (dashed lines, ID: 29, 132) and class 3: decrease at the beginning, but rebound later (dotted line, ID: 33, 99). (b) Histogram of repeated measurements of viral load in log10 scale from an AIDS clinical trial study.

47

cr ip us an M ce pt ed Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

t

Figure 2: Individual fitted curves of viral load for six representative patients based on Model ST (dashed line), SN (dotted line) and N (solid line). Patients 31 and 105 are from class 1 with probabilities 91% and 58%; patients 29 and 132 belong to class 2 with probabilities 92% and 87%; and patients 33 and 99 are from class 3 with probabilities 88% and 100%. The observed values are indicated by circle “o”.

48

cr ip us an M ce pt ed Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

t

Figure 3: Posterior probabilities of belonging to 3 trajectory classes for the 20 individuals (from the 21st to the 40th patients).

49

Supplementary Materials: Appendices

t

Appendix A. Multivariate skew distributions

cr ip

Different versions of the multivariate skew distributions have been proposed and used in the

literature (Arellano-Valle and Genton, 2005; Arellano-Valle et al., 2007; Azzalini and Capitanio,

us

2008; Sahu et al., 2003). A new class of distributions by introducing skewness in multivariate

an

elliptically distributions were developed Arellano-Valle and Genton (2005); Sahu et al. (2003). The class, which is obtained by using transformation and conditioning, contains many standard

M

families including the multivariate skew-normal (SN) and skew-i (ST) distributions as special cases. A k-dimensional random vector Y follows a k-variate skew-elliptical (SE) distribution if its

ce pt ed

probability density function (pdf) is given by

(

f y µ , Σ, ∆; mν(

k)

) = 2 f ( y µ, A; m ) P (V > 0) , (k )

k

ν

2 where A = Σ + ∆ , µ is a location parameter vector, Σ is a k × k positive (diagonal) covariance

matrix,

∆ = diag (δ1 , δ 2 ,…, δ k )

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

1999, 2003; Azzalini and Genton, 2008; Ho and Lin, 2010; Huang and Dagne, 2011; J ara eta al.,

δ = ( δ 1 , δ 2 , … , δ k ) ;V T

follows the elliptical distribution mν(

the density generator function such that

is a k × k skewness matrix with the skewness parameter vector

f 0∞ r k 2−1mν ( u ) dr

k)

(u ) =

Γ ( k 2)

π

k 2

(

El ∆ A−1 ( y − µ ) , I k − ∆A−1∆ ; mν(

mν ( u ) f r mν ( u ) dr ∞ k 2 −1 0

exists. The function

50

mν ( u )

, with

mν ( u )

k)

) and

being a function

provides the kernel of the original

elliptical density and may depend on the parameterν . This SE distribution is denoted by

(

SE µ , Σ, ∆ ; mν(

k)

) . Two examples of m ( u ) , leading to important special cases used throughout mν ( u ) = exp ( − u 2 )

and

mν ( u ) = (1 + u ν )

− (ν + k ) 2

, where ν > 0 . These two

t

the paper, are

ν

cr ip

expressions lead to the multivariate SN and ST distributions, respectively. In the latter case, ν

us

As we know, a normal distribution is a special case of an SN distribution when the skewness parameter is zero, and the ST distribution reduces to the SN distribution when degrees of

an

freedom are large. For completeness, this Appendix briefly summarizes the multivariate ST distribution introduced by Sahu et al. (2003) to be suitable for a Bayesian inference since it is

M

built using the conditional method; see Azzalini and Capitanio (2003) and Sahu et al. (2003) for detailed discussion on properties of ST distributions. Assume that a k-dimensional random vector

ce pt ed

Y follows a k-variate ST distribution with location vector µ , k × k positive (diagonal)

∆ = diag (δ1 , δ 2 ,…, δ k ) and the degrees of freedom covariance matrix Σ, k × k skewness matrix

ν . Then, its probability density function (pdf) is given by f ( y µ , Σ, ∆ ,ν ) = 2 k tk ,ν ( y µ , A ) P (V > 0 ) , (A.2)

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

corresponds to the degrees of freedom parameter.

we denote the k-variate t distribution with parameters µ , A and degrees of freedom ν by

tk ,ν ( µ , A )

tk ,ν + k

and the corresponding pdf by

. We denote this distribution by

tk ,ν ( y µ , A )

STk ,ν ( µ , Σ, ∆ )

equation (A.2) simplifies to

51

henceforth, V follows the t distribution

2 ∆ = δ Ik . In particular, when Σ = σ I k and ,

)

where

−1 2  T 2 2 −1  y y + + − − µ µ ν σ δ ( ) ( ) ) δ ( y − µ)   (   2 2 ν +k  σ σ + δ  

Tk ,ν + k ( ⋅)

 ,  

−(ν + k ) 2

cr ip

×Tk ,ν + k

)

T Γ ( (ν + k ) 2 )  ( y − µ ) ( y − µ )  1+  k 2  ν (σ 2 + δ 2 )  Γ (ν 2 )(νπ ) 

t

(

f y µ , σ 2 , δ ,ν = 2k (σ 2 + δ

2 −k 2

denotes the cumulative distribution function (cdf) of

tk ,ν +k ( 0, I k )

. However,

us

densities. Here Y is dependent but uncorrelated. It can be shown that the mean and covariance STk ,ν ( µ , σ 2 I k , ∆ )

are given by

an

matrix of the ST distribution

E (Y ) = µ + (ν π )

Γ ( (ν − 1) 2 )

M

12

Γ (ν 2 )

ν  Γ {(ν − 1) 2}  2 −  cov (Y ) = σ I k + ∆   ∆ . ν − 2 π  Γ (ν 2 )  In

ν

ce pt ed 2

order

to

2

have

µ = − (ν π )

12

parameter

a

zero

Γ ( (ν − 1) 2 ) Γ (ν 2 )

mean

δ,

2

vector,

(A.3)

we

should

location

STk ,v ( µ , Σ, ∆ )

, it can be

.

Following Lemma 1 of Azzalini and Capitanio (2003), if Y follows represented by

Y = µ + u −1/ 2 X

the

assume

δ

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

unlike the SN distribution, the ST density can not be written as the product of univariate ST

(A.4)

52

where u follows a Gamma distribution

Γ (ν / 2,ν / 2 )

a k-dimensional SN distribution, denoted by

, which is independent of X , and X follows

SN k (0, Σ, ∆ ) . It follows from (A.4) that

t

Y | u ~ SN k ( µ , u −1 ∑ , u −1/ 2 ∆ ) . Following the results, provided by Arellano-Valle et al. (2007) and

(A.5)

where X0 and X1 are two independent

N k (0, I k )

random vectors. Note that the expression (A.5)

an

Y = µ + u −1/2 ∆ X 0 + u −1/ 2 ∑1/ 2 X1 ,

Let

w = u −1/ 2 X 0

M

provides a convenient device for random number generation and for purpose of implementation. ; then w, conditional on u, follows a k-dimensional normal distribution

ce pt ed

N k (0, u −1 I k ) truncated in the space w > 0 (i.e., the half-normal distribution). Thus, following

studies by Sahu et al. (2003), Ho and Lin (2010) and Jara eta al. (2008), a hierarchical representation based on (A.5) is given by

Y | w , u ~ N k ( µ + ∆w , u −1 ∑ ), w | u ~ N k (0, u −1 I K ) I ( w > 0), u ~ Γ (v / 2, v / 2),

(A.6)

Note that the ST distribution presented in (A.6) can be reverted to the following three special

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

stochastic representation as follows.

us

cr ip

Arellano-Valle and Genton (2005), the SN distribution of Y conditional on u has a convenient

cases: (i) as v → ∞ and u → 1 with probability 1 (i.e., the last distributional specification is omitted), then the hierarchical expression (A.6) becomes an SN distribution

SN k ( µ , Σ, ∆ )

; (ii) as

∆ = 0 , then the hierarchical expression (A.6) becomes a multivariate t-distribution; (iii)

53

as v → ∞ , u → 1 with probability 1 and ∆ = 0 , then the hierarchical expression (A.6) is reverted to a multivariate normal distribution.

cr ip

t

Appendix B. HIV dynamic models Viral dynamic models can be formulated through a system of ordinary differential equations (ODE) (Huang et al., 2006; Nowak and May, 2000; Perelson et al., 1997; Wu and Ding, 1999).

us

(B.1)

V (t ) = log10 (e p1 − λ1t + e p2 − λ2t )

(B.2)

M

V (t ) = log10 (e p1 − λ1t )

an

solution, which can be used to capture viral load responses, have been proposed as follows.

λ1 and λ2 are called the first-

ce pt ed

where V(t) is the log10 scaled plasma HIV-1 RNA levels at time t.

and second-phase viral decay rates, which may represent the minimum turnover rate of productively infected cells and that of latently or long-lived infected cells, respectively (Perelson p1 p1 p2 et al., 1997). The parameters p1 and p2 are macro-parameters; e and e + e are the baseline

viral load at time t = 0 in one- and two-compartment models, respectively. It is generally assumed that

λ1 > λ2 , which assures that the model is identifiable and is appropriate for empirical

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

Under some reasonable assumptions and simplification, two useful approximations of ODE

studies (Wu and Ding, 1999). Models (B.1) and (B.2) offer almost equal performance to capture the early fast-decaying segment of viral load trajectory (Wu and Ding, 1999), but model (B.2) performs better for long term of viral load trajectory (Huang and Dagne, 2011). It was noted that both models can be only applied to the early segment or longer term of the viral load responses

54

with decreasing trajectory patterns as discussed in Section 1 and shown in Figure 1(a) (two solid and two dashed lines), but the viral load trajectory may have a different shape or rebound in later stage (two dotted lines in Figure 1(a)). It was also noted that for some patients the second viral

t

λ2 , may vary over time because they depend on some phenomenological parameters

cr ip

decay rate

that hide with considerable microscopic complexity and change over time (Nowak and May,

us

rebound (Wu and Ding, 1999), suggesting that variation in the dynamic parameters, particularly

λ2 , may be partially associated with time-varying covariates such as repeated CD4 cell count to

an

capture the viral load change including viral rebound. Thus, the following extended function is

ce pt ed

V (t ) = log10 (e p1 − λ1t + e p2 − λ2 ( t ) t ) (B.3)

M

introduced. (Wu and Zhang, 2006).

where the second-phase decay rate

λ2 (t ) is either a function of time-varying CD4 cell count

(covariate) and/or an unknown smooth function. Intuitively, this model is more flexible because it assumes that the second-phase viral decay rate can vary with time as a result of drug resistance, medication adherence and other relevant clinical factors likely to affect changes in the viral load during the treatment. Therefore, all data obtained during whole study period can be used by

Ac

Downloaded by [University of Sydney] at 08:10 30 April 2015

2000). Negative value of the decay rate may correspond to viral increase and lead to viral

fitting this model. Appendix C. R and WinBUGS program codes for Model ST ## R program code

55

library(arm) library(R2WinBUGS)

cr ip

t

data

A mixture of hierarchical joint models for longitudinal data with heterogeneity, non-normality, missingness, and covariate measurement error.

Longitudinal data arise frequently in medical studies and it is a common practice to analyze such complex data with nonlinear mixed-effects (NLME) mod...
1MB Sizes 0 Downloads 5 Views