This article was downloaded by: [University of Sydney] On: 30 April 2015, At: 08:10 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Biopharmaceutical Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lbps20
A Mixture of Hierarchical Joint Models for Longitudinal Data with Heterogeneity, Non-normality, Missingness and Covariate Measurement Error a
b
c
c
Yangxin Huang , Chunning Yan , Ping Yin & Meixia Lu a
Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, 33612, Tampa, Florida, USA b
School of Management, Shanghai University, Shanghai, 200444, P.R. China
c
Department of Epidemiology and Biostatistics, School of Public Health, Huazhong University of Science and Technology, Wuhan, Hubei, 430030, P.R. China Accepted author version posted online: 28 Jan 2015.
Click for updates To cite this article: Yangxin Huang, Chunning Yan, Ping Yin & Meixia Lu (2015): A Mixture of Hierarchical Joint Models for Longitudinal Data with Heterogeneity, Non-normality, Missingness and Covariate Measurement Error, Journal of Biopharmaceutical Statistics, DOI: 10.1080/10543406.2014.1000547 To link to this article: http://dx.doi.org/10.1080/10543406.2014.1000547
Disclaimer: This is a version of an unedited manuscript that has been accepted for publication. As a service to authors and researchers we are providing this version of the accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proof will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to this version also.
PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions
A Mixture of Hierarchical Joint Models for Longitudinal Data with Heterogeneity, Nonnormality, Missingness and Covariate Measurement Error Yangxin Huang1*, Chunning Yan2*, Ping Yin3 and Meixia Lu3 1
2
cr ip
t
Department of Epidemiology and Biostatistics, College of Public Health, University of South Florida, 33612, Tampa, Florida, USA School of Management, Shanghai University, Shanghai, 200444, P.R. China
3
us
*Joint corresponding authors: Yangxin Huang(
[email protected]) and Chunning Yan(
[email protected])
an
Abstract
ce pt ed
M
Longitudinal data arise frequently in medical studies and it is a common practice to analyze such complex data with nonlinear mixed-effects (NLME) models. However, the following four issues may be critical in longitudinal data analysis. (i) A homogeneous population assumption for models may be unrealistically obscuring important features of between-subject and withinsubject variations; (ii) normality assumption for model errors may not always give robust and reliable results, in particular, if the data exhibit skewness; (iii) the responses may be missing and the missingness may be nonignorable; and (iv) some covariates of interest may often be measured with substantial errors. When carrying out statistical inference in such settings, it is important to account for the effects of these data features; otherwise, erroneous or even misleading results may be produced. Inferential procedures can be complicated dramatically when these four data features arise. In this article, the Bayesian joint modeling approach based on a finite mixture of NLME joint models with skew distributions is developed to study simultaneous impact of these four data features, allowing estimates of both model parameters and class membership probabilities at population and individual levels. A real data example is analyzed to demonstrate the proposed methodologies, and to compare various scenarios-based potential models with different specifications of distributions. Keywords: Bayesian analysis; Covariate measurement error; Longitudinal data; Mixture joint models; Missing response; Skew distributions.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Department of Epidemiology and Biostatistics, School of Public Health, Huazhong University of Science and Technology, Wuhan, Hubei, 430030, P.R. China
1
t
1. Introduction
cr ip
Modeling viral load (plasma HIV-1 RNA copies) trajectories (HIV dynamics) after initiation of potent antiretroviral (ARV) therapies has been one of the most important areas in AIDS research
us
models, reflect the potency (efficacy) of ARV therapies (Perelson et al., 1997; Wu and Ding, 1999). Nonlinear mixed-effects (NLME) models in conjunction with HIV dynamic equations
an
(Huang et al., 2006;Wu and Ding, 1999) are used to model the viral load trajectories as well as to
M
quantify between-subject and within-subject variations in viral load longitudinal measurements. Modeling longitudinal data is an active area of statistical research that has received a lot of
ce pt ed
attention in the recent years. A considerable number of statistical modeling and analysis methods have been suggested in the literature for analyzing such data with various complex features including, but not limited to, NLME modeling approach (Higgins et al., 1997; Wu and Ding, 1999), nonparametric NLME modeling approach (Liu and Wu, 2007;Wu and Zhang, 2006), joint modeling approach via Monte Carlo EM algorithm (Liu and Wu, 2007; Wu, 2002), and Bayesian NLME modeling approach via Markov chain Monte Carlo (MCMC) procedure (Huang et al., 2006; Huang and Dagne, 2011). However, there is a relatively little work done on
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
in the past two decades. The viral decay rates, as regression parameters of the viral dynamic
simultaneously accounting for non-normality, heterogeneity, missingness and covariate measurement error, which are inherent features of longitudinal data.
2
First, in the previous longitudinal modeling, most studies assume that all subjects come from a homogeneous population where large between- and within-individual variations were accommodated by random-effects and/or time-varying covariates in those models. This typical
cr ip
t
large between-and within-individual variations are shown in Figure 1(a), viral load trajectory profiles of 6 representative patients in AIDS Clinical Trials Group study 398 (ACTG398) (Hammer et al., 2002)(see Section 2 for details of this study and data), which can be roughly
us
mixture joint modeling to be conducted in this article was to cluster all patients into two classes
an
with virologic suppression and failure (i.e., viral load rebound), respectively, which is of main interest from a clinical prospective. But, the other class was incorporated (class 1) to capture
M
some patients who withdrew too early to be clustered into either class 2 with virologic suppression or class 3 with viral load rebound. Thus, the number of components in this analysis
ce pt ed
was determined empirically based on the viral load trajectory patterns and clinical interpretability. Note that class 1 is referred as a confirmed ‘short-term virologic response’ due to early dropout which may be informative dropout and may not indicate virologic suppression in long-term treatment. We, therefore, can reasonably assume that patients are from a population which consists of three relatively homogeneous classes and, thus, it is motivated to consider a finite mixture of NLME models for such data set. Second, a common fundamental assumption in
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
classified into three classes with the following considerations. The original motivation based on
most studies is that model random errors are normally (symmetrically) distributed, but this assumption may lack robustness against departures from normality and/or outliers and, thus, statistical inference and analysis with normal assumption may lead to misleading results (Huang and Dagne, 2011; Lu and Huang, 2014; Sahu et al., 2003). Figure 1(b) displays the distributions
3
of repeated viral load measurements (in log10 scale) for 379 subjects enrolled in ACTG398 (Hammer et al., 2002). It can be seen that, for this data set to be analyzed in this paper, the viral load response (even after log10 transformation) is skewed and, thus, a normality assumption is
cr ip
t
not quite realistic. Alternatively, an asymmetric distribution such as skew-t (ST) and skewnormal (SN) distributions (Arellano-Valle and Genton, 2005; Azzalini and Capitanio, 1999, 2003; Lu and Huang, 2014; Sahu et al., 2003) should be more appropriate than a symmetric
us
important requirement that variables are “perfectly” measured. In practice, however, collected
an
data are often far from “perfect.” Although longitudinal studies are designed to collect data from every participant in the study at each assessment time, missing data including missing
M
observations during study and earlier dropouts from study in the response are often encountered due to patient’s dropouts or other problems. The missing data are likely to be informative (non-
ce pt ed
ignorable) in the sense that missingness may be related to the missing values. Last, measurement error in covariates is another typical feature of longitudinal data, and ignoring this phenomenon may result in unreliable statistical inference.
The majority of the statistical literature for the modeling of longitudinal data has focused on the development of models that aim at capturing only specific aspects of the motivating case studies. However, relatively few studies have been conducted on simultaneously accounting for the
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
(normal) distribution for this data set. Third, the validity of inference methods relies on an
impacts induced by heterogeneity, asymmetry, missing observations and mismeasured covariates. It is not clear how these typical features often observed in longitudinal data may inherently interact and simultaneously influence inferential procedures. This article proposes a finite mixture of NLME joint (MNLMEJ) models with skew distributions, which include ST and
4
SN distributions, to simultaneously account for response with skewness, heterogeneity and nonignorable missingness and covariate with measurement error. We demonstrate a Bayesian inferential approach to estimate both model parameters and class membership probabilities based
cr ip
t
on a finite mixture model with nonlinear mean functions (components). It is noted that the ST distribution is approximate to the SN distribution when its degrees of freedom approach infinity, and the SN distribution reduces to a normal distribution if skewness parameter is zero. Thus, we
us
extends to the other distributions such as the SN distribution. In what follows, we consider
an
multivariate ST and SN distributions introduced by Sahu et al. (2003), which are suitable for a Bayesian inference and briefly discussed in Appendix A. The rest of this paper is organized as
M
follows. Section 2 describes the data set that motivated this research and investigates a specific MNLMEJ model for viral load response process with missing data mechanism and for CD4
ce pt ed
covariate process with measurement error. The Bayesian inference method that simultaneously accounts for heterogeneity, skewness, missingness and covariate measurement error is presented in Section 3. In Section 4, an AIDS data set described in Section 2 is analyzed to demonstrate the proposed methodologies and the analysis results are reported. Finally, we conclude the paper with discussion in Section 5.
2. Data and a mixture of joint models with ST sistribution for HIV dynamics
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
use an ST distribution to develop the mixture modeling methodologies, as it can be easily
2.1. Motivating data set The data set that motivated this research is from AIDS Clinical Trials Group 398 study (ACTG398), a randomized, double-blind, placebo-controlled, with an extension to more than 72
5
weeks study of saquinavir, indinavir, or nelfinavir added as second protease inhibitor to the 4drug class regimen in patients with virologic failure (Hammer et al., 2002). Although RNA viral load was designed to be measured in copies per milliliter at study weeks 0, 2, 4, 8, 16, and every
cr ip
t
8 weeks thereafter until week 72, it contains missing observations. The number of viral load measurements for each individual varies from 2 to 13. This study consists of 481 HIV-1 infected patients. Out of total 481 patients, 379 patients who had more than 2 measurements were
us
time points. A log10 transformation of viral load was used in the analysis in order to stabilize the
an
variation of the measurement errors and to speed up estimation algorithm. CD4 cell counts were also measured throughout study on a similar scheme. The detailed data description can be found
M
in Hammer et al. (2002).
ce pt ed
As discussed in Section 1, viral load trajectory profiles of 6 representative patients displayed in Figure 1(a) can be roughly classified into three classes: (i) decrease rapidly and constantly in a short-term period (two solid lines: ID=31, 105), (ii) decrease at the beginning and then stay stable at a low level (two dashed lines: ID=29, 132), and (iii) decrease at the beginning, but rebound later (two dotted line: ID=33, 99). Along with these observations, we can reasonably assume that patients from a population consist of three relatively homogeneous classes. We can also see from the right panel in Figure 1(b), which shows the pooled viral load measurements
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
included in this data analysis. 46 patients have missing viral load measurements at scheduled
from all patients, that viral load response (even after logarithmic transformation) is skewed and, thus, an asymmetric distribution (such as ST or SN) should be more appropriate than a symmetric (normal) distribution to model this data set.
6
2.2. Measurement error process This section briefly discusses (CD4) covariate measurement error models. Various covariate mixed-effects models were investigated in the literature (Carroll et al., 2006; Wu, 2002). For
the number of measurements on the ith subject by ni. Let
. In the presence of covariate measurement
error, we employ the following LME model for CD4 covariate process.
(
)
where
,
(
zi* = zi*1 ,… , zin* i
)
)
T
with
T
zij
being the observed CD4 covariate value for individual i at time
zij* = uijT α + vijT ai
ce pt ed
tij
(
zi = zi1 ,… , zini
M
iid
an
zij = uijT α + vijT ai + ε ij ( ≡ zij* + ε ij ) , ε i ~ N 0, σ 12 I ni , (1)
and
covariate value at time ai = ( ai1 ,… , ail )
tij
,
uij
and
vij
may be viewed as the true (but unobservable)
α = ( α1 , … , α l )
T
are l × 1 design vectors,
and
T
are unknown population (fixed-effects) and individual-specific (randomiid
effects) parameter vectors, respectively. We assume that
ai ~ Nl ( 0, Σ a )
, where Σ a is unrestricted
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
be the observed CD4 covariate
us
tij ( i = 1,… n; j = 1,…, ni )
value for individual i at time
zij
cr ip
t
simplicity, we consider a single time-varying covariate. Denote the number of subjects by n and
covariance matrix. To model the CD4 covariate process conducted in this study, at the absence of theoretical rationale for the CD4 trajectories, we consider empirical polynomial LME models for the CD4 process, and choose the best model based on AIC and BIC values. Specifically, we consider the
7
uij = vij = (1, tij ,… , tijl −1 )
T
CD4 covariate model (1) with
and focus on linear (l = 2), quadratic ( l =
3) and cubic ( l = 4) polynomials. The resulting AIC (BIC) values are 1805.03 (1836.14), 1733.49 (1749.80) and 1773.44 (1792.21), respectively. We thus adopted the following quadratic
(2)
2 ij
ai = ( ai1 , ai 2 , ai 3 )
α = (α 1 , α 2 , α 3 )
T
is individual-specific (random-effects) vector with
N3 ( 0, Σ a )
. To avoid too small (large) estimates, which may be
M
multivariate normal distribution
is population (fixed-effects)
an
parameter vector,
T
us
where z ij = (α1 + ai1 ) + (α 2 + ai 2 ) tij + (α 3 + ai 3 ) t , *
unstable, we standardize the time-varying covariate CD4 cell counts and rescale the original time
ce pt ed
(in days) so that the time scale is between 0 and 1.
2.3. A mixture of NLME models with ST distribution for HIV dynamics Finite mixture models are used in longitudinal research studies (Muthen and Shedden, 1999; Pauler and Laird, 2000), where the latent classes corresponding to the mixture components and cluster individuals may provide a better model fitting. However, most finite mixture models are
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
zij = (α1 + ai1 ) + (α 2 + ai 2 ) tij + (α 3 + ai 3 ) tij2 + ε ij ,
cr ip
t
polynomial LME model for the CD4 process.
currently based on linear (polynomial) (Muthen and Shedden, 1999) or piecewise linear (Pauler and Laird, 2000) mean functions. The partial reason is that the computation for inference can be relatively conveniently carried out because the likelihood function of a model based on these linear mean functions has a closed form (Muthen and Shedden, 1999). However, in practice,
8
most longitudinal trajectories appear to be nonlinear patterns. When a model is extended to incorporate nonlinear mean functions which will be conducted in this article, inferential procedures can become complicated dramatically because a closed form of likelihood function is
be the viral load value (in log10 scale) for individual i at time
and
missing response indicator such that yi = ( ymis ,i , yobs ,i )
, where
ymis ,i
and
(
yi = yi1 ,… , yini rij = 1
yobs ,i
if
yij
)
T
. Let
subject to informative
(
ri = ri1 ,… , rini
)
T
be a vector of
us
missingness
( i = 1,… n; j = 1,…, ni )
tij
is missing and 0 otherwise. We write
an
yij
are the collections of the missing and observed
g k ( ⋅)( k = 1,…, K )
, which are known to be specified. The true
ce pt ed
classes with mean functions
M
components of yi , respectively. We assume that there are K plausible nonlinear trajectory
trajectory mean function of the ith subject might be
π k = P ( ci = k )
which satisfies ∑
K
k =1
πk =1
g k ( ⋅)
with unknown probability
, where ci is a latent indicator. A statistical nonlinear
trajectory model of individual i , given ci = k , can be formulated as follows.
( yi | ci = k ) = gk ( ti , Ak β i ) + ei ,
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Let
cr ip
t
no longer available.
where
ti = (ti1 ,…, tini )T
,
iid
(
)
ei ~ STni ,ν − J (ν ) δ 1ni , σ 22 I ni , δ I ni ,
βi = ( β i1 ,…, β in )T i
,
β ij = ( βij1 ,…, βijs )T
(3)
is a vector of individual
J (ν ) = (ν / π ) {Γ[(ν − 1) / 2] / Γ (ν / 2 )} 1/ 2
parameters for the ith individual,
is to adjust the ST
distribution with mean zero (see (A.3) in Appendix A), where Γ(·) is a Gamma function; the
9
vector of random errors
ei = (ei1 ,…, eini )T
ν,
variance
freedom
unknown
follows a multivariate ST distribution with degrees of
parameter
and
skewness
δ;
parameter
, and Ak ( s × s )(k = 1, …, K ) is known square
t
g k (ti , Ak β i ) = g k (ti1 , Ak β i1 ),…, g k (tin i , Ak β ini ))T
σ 22
g1 (⋅),..., g K ( ⋅)
, which are specified
, may only involve different subsets of
us
by the nonlinear functions
yi j (i = 1, … n; j = 1, …, ni )
β ij
. By
to 0 in the k th trajectory class,
an
introducing Ak , Ak β ij will set unrelated elements of
β ij
respectively. We will illustrate the use of Ak and specify nonlinear mean functions gk(·) in the
M
HIV dynamic application below.
Similar to discussion in Lu and Huang (2014), model (3) can be given conditionally and
ce pt ed
marginally by
( yi | ci = k ) ~ STni ,ν ( gk (ti , Ak β i ) − J (ν )δ 1ni , σ 22 I ni , δ I ni )
K
(
)
yi ~ ∑ π k STni ,ν gk ( ti , Ak β i ) − J (ν )δ 1ni , σ 22 I ni , δ I ni . k =1
(4)
(5)
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Ak is introduced because the mean functions of
cr ip
indicator matrix of which diagonal elements are either 0 or 1 and off-diagonal elements are all 0.
The preceding model defines a finite mixture of NLME models. In (5) the vector of mixture T probabilities π = (π 1 , … , π K ) can be also viewed as the mixture weights of all plausible
components within the finite mixture model framework. Model (5) is identifiable, as long as each of the component models is identifiable and distinguishable from each other. When the
10
component models are identifiable but not distinguishable from each other, some constraints may be required to make model (5) identifiable(Titterington et al., 1985).
β ij
, we assume,
cr ip
t
For individual-specific parameter vector iid
(6)
us
T b = (bi1 ,…, biq )T in which β = ( β1 ,… , β r ) is a vector of universal population parameters; i is a
an
Ζ (s × r ) vector of random-effects; Σ b ( q × q ) is an unknown variance-covariance matrix; ij is a design matrix including time-independent and/or time-varying covariates such as CD4 cell count.
zij*
is a summary of true (but unobservable) time-varying
M
We assume one of the elements of Zij,
ce pt ed
CD4 covariate value at time tij (see Section 2.2 in detail). It is noted that r ≥ s ≥ q . For the HIV dynamic application, based on discussion in Appendix B, we consider one- and twocompartment models with constant decay rate(s) for trajectory classes 1 and 2, respectively, and propose a two-compartment model with a time-varying decay rate in the second compartment for trajectory class 3. Thus, the mean functions of K = 3 components in the mixture model are specified as follows.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
β ij = Z ij β + bi , bi ~ N q (0, ∑b )
1. One-compartment model with a constant decay rate for class 1 trajectory g1 (tij , A1β ij ) = log10 (e
pi1 − λi 1tij
), (7)
2. Two-compartment model with constant decay rates for class 2 trajectory
11
g 2 (tij , A2 β ij ) = log10 (e
pi1 − λi1tij
+e
pi 2 − λi 2 tij
),
(8)
3. Two-compartment model with constant and time-varying decay rates for class 3 trajectory
+e
pi 2 − λij* 2tij
).
(9)
t
pi 1 − λi 1tij
us
β i1 = pi1 = β1 + bi1 , β i 2 = λi1 = β 2 + β3 zi 0 + bi 2 , βi 3 = pi 2 = β 4 + bi 3 , β 4 i = λi 2 = β5 + bi 4 , βij 5 = λij*2 = β 5 + β 6 zij* + bi 4 ,
(10)
an
β ij = ( β i1 , β i 2 , βi 3 , βi 4 , βij 5 )T , β = ( β1 , β 2 , β3 , β 4 , β5 , β 6 )T ,
zij*
is true (but unobservable) value of CD4 at time tij defined in
ce pt ed
where zi0 is the baseline CD4,
M
A1 = diag(1, 1, 0, 0, 0), A2 =diag(1, 1, 1, 1, 0), A3 = diag(1, 1, 1, 0, 1),
(2). The decay rate of the second compartment in (9) parameters in
β ij
βij 5
is time-varying due to
zij*
, but other
are time independent. As mentioned previously, mean functions in different
components may involve different subsets of
β ij
; for example, g1 (⋅) only involves parameters
βi1 and β i 2 , and g 2 (⋅) and g 2 (⋅) share the same parameters βi1 , β i 2 , and β i 3 but have different β decay rates of the second compartment, β 4i and ij 5 , respectively. The diagonal indicator
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
In (7)–(9),
cr ip
g3 (tij , A3 β ij ) = log10 (e
matrices, A1, A 2 , and A 3 , determine which elements of
β ij
are involved and set other unrelated
parameters to be 0 in mean functions, g1 (⋅) , g 2 (⋅) , and g 3 (⋅) , respectively. It is noted that (9) is a natural extension of (8) to create a time-varying decay rate for capturing viral rebound in class
12
3 trajectories. With this mixture clustering, our mixture modeling can be used to estimate probabilities of class membership which is either viral rebound, suggesting virologic failure (class 3) or viral decrease continuously, indicating a confirmed short-term virologic response
cr ip
t
(class 1) and a confirmed long-term virologic response (class 2). By relaxing the homogeneous population assumption, finite mixture models allow for parameter
us
considering individual variation around a single mean trajectory, a finite mixture model allows different classes of individuals to vary around several different mean trajectories. In this way,
an
finite mixture modeling may do a better job of capturing inter-individual variation because it
M
does not require that all individuals follow the same average trajectory over time. To allow for non-ignorable missing mechanism in viral load response, we should incorporate a
ce pt ed
missing data model for the missing response mechanism to avoid possibly biased results. Toward this end, we may consider that both missing observations during study and earlier dropouts are likely to be informative or non-ignorable missing in the sense that the probability of missing data may depend on the missing values. Although more complicated missing models can be considered, we focus on the following simple independent missing data model to avoid too many nuisance parameters where the missingness may be related to the viral load value being missing
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
differences across several unobserved classes within a heterogeneous population. Instead of
and current CD4 observation.
∏
f ( ri | η ) = ∏ i =1 ∏ ji=1 P ( rij = 1| η ) 1 − P ( rij = 1| η ) i =1 n
n
n
1− rij
rij
13
,
(11)
where
logit P ( rij = 1| η ) = η0 + η1 zij + η2 yij
η = (η 0 ,η1 ,η 2 )
T
and
is a vector of unknown
nuisance parameters. Note that the non-ignorable missing data model (11) is not testable based on the observed data, so we should consider alternative missing data models to perform
cr ip
t
sensitivity analysis (see Section 4.3 in detail).
us
In a longitudinal study, the longitudinal response and covariate processes are usually connected physically or biologically. Although a simultaneous inference method based on a joint likelihood
an
for the covariate and response data may be favorable, the computation associated with the joint likelihood inference in a finite MNLMEJ models for longitudinal data can be extremely intensive
M
and may lead to convergence problems(Liu and Wu, 2007; Wu, 2002). Here we propose a fully Bayesian inference method for covariate model (2) and mixture of NLME models (5) and (6) in
ce pt ed
the presence of missing data model (11) at population and individual levels, simultaneously. We obtain numerical approximations to posterior distributions using Markov chain Monte Carlo (MCMC) methods.
Let
θ = {α , β ,η , σ 12 , σ 22 , Σ a , Σ b ,ν , δ }
be the collection of unknown parameters in all models
except for the mixture weight π in (5). Under the Bayesian framework, we next need to specify
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
3. Inferential procedure
prior distributions for all the unknown parameters in these models as follows:
α ~ N 3 (τ 1 , Λ 1 ) , σ 12 ~ IG (ω1 , ω2 ) , Σ a ~ IW ( Ω1 , ρ1 ) , β ~ N 6 (τ 2 , Λ 2 ) , σ 22 ~ IG (ω3 , ω4 ) , Σ b ~ IW ( Ω2 , ρ 2 ) ,
η ~ N 3 (τ 3 , Λ 3 ) , δ ~ N ( 0, γ ) , ν ~ Exp ( v0 ) I (ν > 2 ) ,
14
(12)
where the mutually independent Inverse Gamma ( I G ), Normal (N), Exponential (Exp) and Inverse Wishart ( I W ) prior distributions are chosen to facilitate computations. The truncation point in exponential prior distribution for ν was chosen to assure a finite variance. The super-
cr ip
t
Λ parameter matrices 1 , Λ 2 , Λ 3 , Ω1 and Ω2 can be assumed to be diagonal for convenient
implementation. By its definition, the latent indicating variables
iid
(13)
an
in which π = ( π 1 , π 2 , π 3 )
T
follows a Dirichlet distribution (Dir) (Diebolt and Robert, 1994;
M
Lavine et al., 1992),
(14)
ce pt ed
π ~ Dir (φ1 , φ2 , φ3 ) .
follow a
us
ci ~ Cat ( (1, 2,3) , (π 1 , π 2 , π 3 ) ) ,
An MCMC scheme for this mixture model is composed of following two basic steps:
(i). Sampling class membership indicators
ci ( i = 1,… , n )
, conditional on population parameters,
θ, and individual random-effects, ai and bi.
ci ( i = 1,… , n )
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Categorical distribution ( C a t )
ci ( i = 1,… , n )
Generate
P ( ci = k | ai , bi , θ , yi ) =
from
π k f ( yi | ai , bi , ci = k , θ )
∑ m=1π m f ( yi | ai , bi , ci = m,θ ) 3
15
,
(15)
where
f ( yi | ai , bi , ci = k ,θ )( k = 1, 2,3)
is a conditional density function of
yi
based on (4).
Then, update the probability (population proportion) π for next iteration from distribution
numk = ∑ i =1 I ( ci = k ) n
where
,
( k = 1, 2,3) , in which I (⋅)
t
(16)
cr ip
(π | num1 , num2 , num3 ) ~ Dir (φ1 + num1 ,φ2 + num2 ,φ3 + num3 ) ,
is an indicator function. Note that an
us
(ii). Sampling parameters θ, and individual random-effects ai and bi, conditional on class c = ( c1 ,… , cn )
an
T
membership indicator
.
M
Following the study by Sahu et al. (2003), it can be shown, conditional on ci determined in step
ce pt ed
(i), that by introducing the random vector wi and random variable ui based on the stochastic zi
representation for the ST distribution (see Appendix A for details), with random-effects bi, and
(
)
rij
( (
)
)
yi | zi , wi , ui , bi , ri , ci ~ N ni gci ti , Aci β ij + δ wi − J (ν ) 1ni , ui−1σ 22 I ni ,
(
−1 i ni
wi | ui ~ N ni 0, u I
) I (w
i
> 0 ) , ui ~ Γ (ν / 2,ν / 2 ) ,
bi ~ N ( 0, Σ ) , rij | yij , zij ~ Bernoulli ( Pij ) .
(
where zi* = zi*1 ,… , zin* i
)
T
with random-effects ai, yi
in missing data model (11) can be hierarchically formulated as
zi | ai ~ N zi* , σ 11 I ni , ai ~ N ( 0, Σ a ) ,
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
individual could belong to different classes in different iterations, determined by (15).
and Pij = P ( rij = 1| η ) .
16
(17)
Let the observed data
D = { yobs ,i , ti , zi , ri , ( i = 1,… , n )}
,
f ( ⋅ | ⋅)
be a conditional density function
2 2 and h ( . ) be prior density function, respectively. We assume that α , β , η, σ 1 , σ 2 , Σ a , Σ b , ν
are
independent
of
each
h (θ ) = h (α ) h ( β ) h (η ) h (σ 12 ) h (σ 22 ) h ( Σ a ) h ( Σ b ) h (ν ) h (δ )
other,
i.e.,
t
δ
cr ip
and
. After we specify the models for the
observed data conditional on membership indicator and prior distributions of the unknown model
us
the Bayesian framework. When the missing response data are nonignorable, we need to assume
an
only a possible missing data model and then incorporate it into likelihood. Thus, the joint
f (θ | D, c ) ∝
{∏
M
posterior density of θ conditional on D and c can be given by
∫ ∫ ∫ f ( y | z , w , u , b , r , c ) f ( w | u , w > 0) f (u ) × f ( z | a ) f ( r | y , z ) f ( a ) f ( b ) dy da db } h (θ ) , i
i
i
i
i
i
i
i
i
i
i
i
i
i
mis ,i
i
i
i
i
i
(18)
ce pt ed
i
n
i =1
In general, the integrals in (18) are of high dimension and do not have a closed form. Analytic approximations to the integrals may not be sufficiently accurate. Therefore, it is prohibitive to directly calculate the posterior distribution of θ based on the observed data and class membership. As an alternative, the MCMC procedure can be used to sample population parameters, θ, and random-effects, ai and bi ( i = 1, …, n), from conditional posterior
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
parameters, we can draw samples for the parameters based on their posterior distributions under
distributions, based on (18), using the Gibbs sampler along with the Metropolis-Hastings (M-H) algorithm. Steps (i) and (ii) are repeated alternatively in iterations of MCMC until convergence reached. An important advantage of the above representations based on the hierarchical models
17
is that they can be very easily implemented using the freely available WinBUGS software (Lunn
cr ip us an M ce pt ed Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
t
et al., 2000) interacted with a function called bugs in the package R2WinBUGS of R.
18
4. AIDS data analysis
t
4.1. Model implementation
cr ip
As shown in Figure 1(b), the histogram of viral load in log10 scale clearly indicates the
asymmetric feature. Thus, it seems plausible to fit models with a skew distribution to the data.
us
specifications for model error in the finite MNLMEJ models for viral load response in
compare their performance:
Model ST: A mixture of NLME models where three mean functions specified by (7)–(9)
M
•
an
conjunction with the missing data model (11) and the CD4 covariate model (2) were employed to
•
ce pt ed
with the ST distribution for response model error.
Model SN: A mixture of NLME models where three mean functions specified by (7)–(9)
with the SN distribution for response model error. • Model N: A mixture of NLME models where three mean functions specified by (7)–(9) with the normal (N) distribution for response model error.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
With this information, the following four statistical models with different distribution
• Model C: A commonly used NLME model where the mean function specified by (9) alone with the ST distribution for response model error. We conducted the following three scenarios. First, because a normal distribution is a special case
of an SN distribution when the skewness parameter is zero, while the ST distribution reduces to
19
the SN distribution when degrees of freedom are large, we investigated how asymmetric distributions for model error (Models ST and SN) contribute to modeling results and parameter estimation in comparison with a symmetric (normal) distribution for model error (Model N).
cr ip
t
Second, we estimated the model parameters by using the “naive” method (denoted by NM), which ignores measurement error in CD4 covariate and missing data in viral load response. That
zij*
in the response model (5) without the missing viral load data model incorporated into
us
values
model fitting. We used it as a comparison to the joint modeling (JM) approach proposed in
an
Section 3. This comparison attempted to investigate how the measurement errors in CD4 and missing data in viral load contribute to modeling results. Finally, we further investigated a
M
commonly used NLME model with ST distribution (Model C) where the mean function is specified by (9) alone. This model has been widely applied to analyze viral load data (Huang et
ce pt ed
al., 2006; Liu and Wu, 2007; Wu and Ding, 1999; Wu, 2002) in significantly advancing the understanding of pathogenesis of HIV infection and the assessment of effectiveness of ARV treatment. We compared this commonly used NLME joint model (ignoring data feature of heterogeneous population) with the MNLMEJ model proposed in this article to explore how heterogeneous feature influences modeling results.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
is, the “naive” method uses only the observed CD4 values zij rather than unobservable CD4
To carry out the Bayesian inference, we specified the values of the hyper-parameters in the prior
distributions. In the Bayesian analysis, we specified only the priors at the population level. We took weakly-informative prior distributions for the parameters in the MNLMEJ models. In particular, (i) fixed-effects were taken to be independent normal distributions N(0,100) for each
20
2 component of the population parameter vectors α, β and η ; (ii) for the variance parameters σ 1 2 and σ 2 we assume a limiting non-informative inverse gamma prior distribution, IG(0.01, 0.01)
t
so that the distribution has mean 1 and variance 100; (iii) the priors for the variance-covariance
ρ1 )
and
IW(Ω2,
ρ2 )
with
cr ip
matrices of the random-effects Σa and Σb were taken to be inverse Wishart distributions IW(Ω1, covariance
matrices
Ω1 = diag (0.01, 0.01, 0.01) ,
us
ν ~ Exp(0.1) I (ν > 2 ) parameter ν followed truncated exponential distribution ; (v) for the
an
skewness parameter δ, we chose independent normal distributions N(0,100); and finally (vi) we
φ1 = φ2 = φ3 = 1 , where hyper-parameters
M
set hyper-parameters of Dirichlet distribution in (14),
φ1 , φ2 and φ3 are set to be equal because we initially assume individuals have equal probabilities
ce pt ed
of coming from any one of three classes.
The MCMC procedure was implemented using WinBUGS software (Lunn et al., 2000) interacted with a function bugs in the package R2WinBUGS of R, and the program codes for Model ST are available in Appendix C. When the MCMC procedure was applied to the actual clinical data, convergence of the generated samples was assessed using standard tools within WinBUGS software such as trace plots and Gelman-Rubin (GR) diagnostics(Gelman and Rubin,
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Ω2 = diag (0.01, 0.01, 0.01, 0.01) ρ = ρ 2 = 4 , respectively; (iv) the degrees of freedom and 1
1992). Figure I in Appendix D shows the dynamic version of GR diagnostics based on Model ST as obtained from the WinBUGS output for the representative parameters where the three curves are given: the middle and bottom curves below the dashed horizontal line (indicated by the value ˆ ˆ one) represent the pooled posterior variance (V ) and average within-sample variance (W ) ,
21
ˆ respectively, and the top curve represents their ratio ( R ) (Gelman and Rubin, 1992). It is seen
ˆ ˆ that Rˆ tends to 1, and V and W settle down as the number of iterations increases indicating that
t
the algorithm has reached convergence. With the convergence of diagnostics observed, we
cr ip
proposed that the three chains were run with the following considerations. After an initial
number of 50,000 burn-in iterations, every 50th MCMC sample was retained from the next
us
targeted posterior distributions of the unknown parameters for statistical inference. The computational burden for fitting such mixture of highly nonlinear joint models via MCMC
an
procedure was reasonable. For example, to fit Model ST using JM approach, it took about 7.0
M
hours on a Window PC with Intel Core i7-2600 CPU @ 6.80GHz and RAM of 16 GB. We also check the k-lag serial correlation of the samples for each parameter to diagnose independence of
ce pt ed
MCMC samples. We graphically checked the last 500 samples drawn from MCMC sampling scheme for each parameter (plot not shown here) and found that the consecutive samples move randomly towards different directions, indicating that MCMC is not “sticky” and MCMC samples are independent for each parameter, suggesting the convergence to the stationary distribution.
4.2. Comparison of modeling results
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
50,000 iterations for each chain. Thus, we obtained a total of 3,000 samples for each of the
The Bayesian modeling approach in conjunction with the CD4 covariate model (2) and the
mixture of NLME response model discussed in Section 2.3 with different distribution specifications for the model error was used to fit the viral load and CD4 data simultaneously. Tables 1 and 2 present the population posterior mean (PM), the corresponding standard deviation
22
(SD) and 95% credible interval (CI) for fixed-effects parameters based on the four models with two methods (JM approach and naive method). The following findings are obtained for estimated results of parameters.
cr ip
t
In the response mixture model, the findings, particularly for the fixed-effects ( β 5 , β 6 ) , which
are parameters of the second-phase viral decay rate, show that these estimates are statistically
us
Nevertheless, for the estimate of the coefficient of CD4 covariate β 6, its estimate is significantly positive. This means that CD4 has a significantly positive effect on the second-phase viral decay
an
rate, suggesting that the CD4 covariate may be an important predictor of the second-phase viral decay rate during the treatment. For the fixed-effects (β 2, β 3), which are parameters of the first-
M
phase viral decay rate, show that β 2 is significantly estimated, while the estimate of β 3, where
ce pt ed
the corresponding baseline CD4 covariate is interacted with time t, is not significant, indicating that the baseline CD4 has no effect on the first-phase viral decay rate. In addition, there are 2 marked differences in posterior means of the variance (scale parameter) σ 2 (0.08, 0.16 and 0.31
for Models ST, SN and N, respectively) across the three models. The estimated values of the variance for both Models ST and SN are much smaller than that of Model N because the former models take into account skewness of the data while the latter does not. The estimates of the
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
significant for three Models ST, SN and N since the 95% credible intervals do not contain zero.
skewness parameter ( δ ) of Models ST and SN are, respectively, 0.50 with 95% CI (0.39, 0.58)
and 0.62 with 95% CI (0.51, 0.72). This finding suggests that there is a significantly positive skewness in the data and confirms the fact that the distribution of the original data is skewed even after taking log10-transformation (see Figure 1(b)). Thus, incorporating a skewness
23
parameter in the modeling of the data is recommended. Furthermore, the estimate (0.50) of the skewness parameter based on Model ST is less than that (0.62) based on Model SN. This may be due to the fact that an additional parameter ν (degrees of freedom) for heaviness in the tails was
cr ip
t
estimated to be 3.77 trading-off the effect of skewness. For parameter estimates of the CD4 covariate model (2), the estimates of the coefficient α2 based on Models ST, SN and N are
us
2 negative. The estimate (0.73) of the variance σ 1 are equivalent for the three models.
an
We have seen from model fitting results that, in general, all of Models ST, SN and N provided reasonably good fit to the observed data for most patients in our study, although the fitting for a
M
few patients was not completely satisfactory due to unusual viral load fluctuation patterns for these patients, particularly for Model N. Figure 2 displays estimated viral load trajectories based
ce pt ed
on the three models for the six representative subjects shown in Figure 1(a). The estimated individual trajectories based on Models ST and SN fit the observed viral load values more closely than those based on Model N. Note that the lack of smoothness in Models ST and SN estimates of individual trajectories may be explained by the fact that a random component wi was incorporated in the expected functions related to fitted values (see model (17) for details) based on the property of the skewed distributions for “chasing the data” to the extent.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
slightly different but significantly positive, whereas the estimates of α1 and α3 are significantly
To assess the goodness-of-fit for the proposed statistical models, Figure II in Appendix D
presents the diagnosis plots of the observed values versus the fitted values (left panel), and ST, SN, and normal Q-Q plots of different distributions (right panel) based on Models ST, SN and N. It can be seen from the left panel in Figure II that Models ST and SN gave a similar performance,
24
but provided better fit to observed data compared with Model N. This finding was also confirmed by the residual Q-Q plots (right panel) that all show the existence of outliers, but Models ST and SN clearly have fewer outliers and, thus, give better fit to the data than Model N.
cr ip
t
Further, Model ST did an even better job than Model SN in accounting for skewness. This finding is confirmed by their sums of squared residuals (SSR), which are 439.2 (Model ST),
us
To select the best model that fits the data adequately, a Bayesian selection criterion, known as deviance information criterion (DIC) suggested by Spiegelhalter et al. (2002), is used. As with
an
other model selection criteria, we caution that DIC is not intended for identification of the “correct” model, but rather merely as a method of comparing a collection of alternative
{∑
i, j
(y
rep ,ij
− yobs ,ij )
2
} for model comparison, where the predictive value y
ce pt ed
by
EPD = E
M
formulations. As an alternative, we also evaluate expected predictive deviance (EPD) formulated
rep,ij
is a
replicate of the observed yobs,ij and the expectation is taken over the posterior distribution of the model parameters θ (see Gelman et al. (2003) in detail). This criterion chooses the model where the discrepancy between predictive values and observed values is the lowest. For mixture models, the structure of DIC does not allow for automatic computation in WinBUGS software. We wrote R code interacted with WinBUGS program to calculate estimated DICs using the joint
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
458.7 (Model SN), and 875.0 (Model N), respectively.
modeling approach based on Models ST, SN and N, which are 11368.9, 11572.8 and 12699.1, respectively. As mentioned previously, it is difficult to tell which model is “correct”, only which one fits the data better. Therefore, based on the DIC, the results indicate that Model ST is the best fitting model and Model SN is next, supporting the contention of a departure from
25
normality. This finding is confirmed by the results of the EPD values (see Table 2). These results are also consistent with those in diagnosis of the goodness-of-fit displayed in Figure II in Appendix D, indicating that Model ST performs best. In summary, our results may suggest that it
cr ip
t
is important to assume a skew distribution in the mixture models for the viral load response in order to achieve reliable results, in particular if the data exhibit skewness. Based on these
us
4.3. Analysis results based on Model ST
an
As mentioned in Section 1, one of the primary objectives in this analysis was to cluster all individuals’ membership into 3 classes based on viral load trajectories. Based on the mixture
M
modeling, we are able to obtain a summary of class membership at both the population and individual levels. At population level, the MCMC procedure yields samples from the posterior (π 1 ,… , π K ) K = 3 , in (16), the population proportion of individuals in each class.
ce pt ed
distribution of
The estimates of population proportion and associated 95% CI of ( π 1 , π 2 , π 3 ) for three classes are 22.16% (20.06%, 24.39%), 41.30% (38.52%, 44.16%) and 36.54% (33.71%, 39.47%), respectively. It can be seen that class 2 (decrease and maintain stable) has the largest proportion 41.30%, followed by class 3 (decrease and rebound later) with proportion 36.54% and then class 1 (decrease all the time) with the lowest proportion 22.16%. Thus, out of 379 patients, the
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
observations, we will further report our results in detail only for the best Model ST below.
patterns of changing viral load of 84, 157 and 138 patients followed classes 1, 2 and 3, respectively. This indicates that a confirmed short-term (class 1 due to early dropout) and longterm (class 2) virologic responses were observed in a total of 63.46% of the patients in classes 1 and 2.
26
At individual level, the posterior probability of individual i belonging to the k th ( k = 1, 2, 3)
class,
pik = E [ I (ci = k )]
1 , can be approximated by M
∑
M m =1
I (ci( m ) = k )
(m) , where ci is class
t
membership of individual i drawn from the posterior distribution (15) in the mth MCMC
cr ip
iteration ( m = 1,… , M ) and M is a total number of posterior sample iterations (M = 3000 in this
study). Barplot shown in Figure 3 displays the probabilities for the selected 20 individuals (from
us
classified as either viral load rebound or not may help physicians to refine treatment strategy and
an
to identify the reason of viral load rebound for such individual patient. Table 3 shows individual posterior probabilities for the six representative patients shown in Figure 1(a). The probabilities
M
shown in Table 3 and the classes of trajectories shown in Figure 1(a) are matched quite well: The patients 31 and 105 belong to class 1 because their viral loads decrease constantly in a early
ce pt ed
short-term period, with probabilities 81% and 57%; the viral loads of the patients 29 and 132 decrease and then maintain stable, and thus, they belong to class 2, with probabilities 92% and 87%; and finally, the patients 33 and 99 are in class 3 (with viral load rebound), with probabilities 87% and 100%.
The estimated results of fixed-effects presented in Table 1 based on Model I indicate that the first-phase decay rate, and the second-phase decay rate without and with time-varying CD4
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
the 21st to the 40th subjects). The probability corresponding to individual patient who is
covariate
may
be
approximated
by
λˆ1 = 116.5 + 4.62z0 ,
λˆ2 = −4.35
and
λˆ2* (t ) = −4.35 + 9.77(−0.21 + 1.42t − 1.46t 2 ) , respectively, where z is the baseline CD4 value. 0 Thus, the population viral load processes of 3 classes may be approximated by
27
{
},
{
}
Vˆ1 (t ) = exp 10.5 − λˆ1t
{
}
{
Vˆ2 (t ) = exp 10.5 − λˆ1t + exp 6.8 − λˆ2t
{
Vˆ3 (t ) = exp 10.5 − λˆ1t + exp 6.8 − λˆ2* (t )t
}.
}
and
Since the first-phase viral decay rate, λ1 , is not
t
significantly associated with the baseline CD4 (due to insignificant estimate of β3 ) and the
cr ip
* second-phase viral decay rate λ2 (t ) in component 3 is significantly associated with the time-
us
V(t) may be significantly associated with time-varying CD4 values, but not associated with the baseline CD4. This simple approximation considered here may provide a rough guidance and
an
point to further research even though the true association described above may be more complicated. The analysis results suggest that for patients with viral load rebound, the CD4
M
process at time t has a significantly positive effect on the second-phase viral decay rate; this finding confirms that the CD4 covariate is a more significant predictor on viral decay rate during
ce pt ed
late stage. More rapid decrease in CD4 cell count may be associated with slower viral decay in late stage, which, in turn, may result in a viral load rebound. In order to investigate how the measurement errors in CD4 and missing data in viral load contribute to modeling results, we further compare two methods for estimation based on Model ST: the proposed joint modeling approach (JM) and the ‘naive’ method (NM) where the raw
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
varying CD4 values (due to significant estimate of β 6 ), this suggests that the viral load change
(observed) CD4 values
zij
, rather than the true (unobservable) CD4 values
zij*
, are substituted in
the mixture model and missing data in viral load are ignored. It can be seen from Tables 1 and 2 that there are important differences in the estimates for the parameters β5 and β6, which are directly associated with whether or not ignoring potential CD4 measurement errors for inference.
28
The NM may produce biased estimates and substantially underestimate the covariate CD4 effect
(NM : βˆ6 = 0.02 vs. JM : βˆ6 = 9.77) . The estimated SD for the CD4 effect ( β ) using JM is at 6 least ten times larger than that using NM, probably because JM incorporates the variation from
cr ip
t
fitting the CD4 process. The difference of estimates between JM and NM, due to whether or not potential CD4 measurement errors and missing data in viral load are ignored for inference,
us
analysis. We also obtained estimated DIC values with 11368.9 (JM) and 12017.4 (NM) based on Model I, respectively, suggesting that JM performs better in comparison with NM. Thus, it is
an
important to take the CD4 measurement errors and the missing data in viral load into account
M
when collected data are “imperfectly” measured.
We further investigated a commonly used NLME model with ST distribution (Model C, ignoring
ce pt ed
data feature of heterogeneous population), where the mean function is specified by (9) alone, and compared it with the mixture of NLME model (Model ST) to explore how heterogeneous feature influences modeling results. We found that the important differences in the estimates for the parameters ( β 2 , β 3 ), which are associated with the first-phase viral decay rate, and for the parameters ( β 5 , β 6 ), which are associated with the second-phase viral decay rate. Based on the estimates of β 3 (Model ST:
βˆ3 = 4.62 vs. Model C: βˆ3 = −6.83 ), which is directly associated
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
indicates that CD4 measurement errors and missing data in viral load cannot be ignored in the
with effect of baseline CD4 covariate, and β6 (Model ST:
βˆ6 = 9.77 vs. Model C: βˆ6 = −13.5 ),
which is directly associated with effect of time-varying CD4 covariate, Models ST and C provide completely opposite relation between viral decay rates and covariates.
29
The basic tool for investigating model uncertainty is sensitivity analysis. That is, we simply make reasonable modifications to the assumptions in question, recompute the posterior quantities of interest, and see whether they have changed in a way that significantly affects the
cr ip
t
interpretations or conclusions. If the results are robust against the suspected assumptions, we can report the results with confidence. However, if the results are sensitive to the assumptions, we can choose to communicate the sensitivity results and interpret the results with caution (Gelman
us
parameter estimates on the prior distributions and missing data models. First, we implemented
an
the MCMC sampling scheme and monitored several independent MCMC runs, starting from different super-parameter values in IG (⋅, ⋅) prior distribution based on Model ST. The results
M
follow the same patterns as those presented in this paper. The conclusions of our analysis remain unchanged (data not shown due to space limitations). Second, it is important to check sensitivity
ce pt ed
of parameter estimates to various plausible missing data models. Subject-area knowledge may help us to determine alternative missing data models. It is conceivable that missing values may be related to current and previous viral load and CD4 measurements and/or missingness may link to the random-effects that characterize the true response processes in practice. Note that we should avoid building a too complicated missing data model since the parameters may become non-identifiable. Thus, we consider the mixture joint model with the ST distribution for viral
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
et al., 2003). Thus, we performed the sensitivity analysis to examine the dependence of
load response yij and CD4 covariate zij in conjunction with the following plausible missing data
models (MDM) for sensitivity analysis. • MDM I: logit [ P ( rij = 1| η )] = η0 + η1 zij + η 2 yij ;
30
• MDM II: logit [ P ( rij = 1| η )] = η0 + η1ri , j −1 + η 2 ai1 + η3 ai 2 + η 4bi 2 + η5bi 4 ; • MDM III: logit [ P ( rij = 1| η )] = η0 + η1 yi , j −1 + η 2 zij + η3 yij .
cr ip
t
MDM I is one in (11) discussed in Section 2.3. MDM II is given by inducing combination of the previous missing status and random-effects from the mixture joint models. Note that the random-
us
intercept and slope), so they are included; although ai3 in the MLE model (2) quantifies variation of individual CD4 trajectories through a quadratic form, which seems not highly predictive for
an
missing value, it is excluded here to reduce number of parameters. The random-effects bi2 and bi4 represent individual variations in the first- and second-phase viral decay rates, respectively, so
M
they may be related to missing values. While bi1 and bi3 represent variations in the baseline viral loads, they may not appear to be highly related to missing data, so they are excluded from the
ce pt ed
model to reduce the number of parameters. MDM III incorporates the previous and current viral load measurements being missing as well as current CD4 value; By comparing the results based on the MDMs I, II and III presented in Tables 1 and 2, it is seen that all of three models have very similar posterior estimates of population parameters, suggesting that the estimation of population parameters is robust against the MDMs and, thus,
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
effects ai1 and ai2 capture the main features of individual CD4 trajectories (such as initial
the parameter estimates are reliable. In addition, we found that adding previous viral load missingness (MDM III) does not improve the DIC over MDM I; this may be explained by the fact that the current observations may be most predictive for missing data since measurements are relatively sparse in this study and early observed values may not have much influence on patients’ missing values. MDM II (DIC=11437.5), which is induced by combination of the
31
previous missing status and random-effects, gives largest DIC value in comparison with MDM I (DIC=11368.9) and MDM III (DIC=11388.3). These findings are also confirmed by the results of the EPD and SSR values.
cr ip
t
5. Concluding discussion
For longitudinal biomarkers, viral load and CD4 count, that have been used as outcome measures
us
of HIV infection and to assess risk of disease progression to AIDS, we have developed a
an
Bayesian approach to the MNLMEJ models that use longitudinal viral load data with features of heterogeneous population, skewness and missingness in the presence of mismeasured CD4
M
covariate to provide estimates of both model parameters and class membership probabilities at individual and population levels. This kind of mixture modeling approach is important in many
ce pt ed
biostatistical application areas, allowing accurate inference of parameters while adjusting for heterogeneity, non-normality and incompleteness in the longitudinal data. Although this paper is motivated by AIDS clinical study, the basic concepts of the newly-developed MNLMEJ models have generally broader applications whenever the relevant technical specifications are met and longitudinal measurements are assumed to arise from two or more identifiable subclasses within a population.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
to evaluate treatment effects of ARV therapies in AIDS clinical trials, to understand pathogenesis
This article have considered two mixture modeling methods (NM and JM) to compare potential models with different distribution specifications and various scenarios. In particular, (i) we investigated how asymmetric distributions for model error (Models ST and SN) contribute to modeling results and parameter estimation in comparison with a symmetric (normal) distribution
32
for model error (Model N). Note that the results from symmetric t model are not presented in this article because the findings are similar to those from Model N. (ii) We compared NM with JM to investigate how the measurement errors in CD4 and missing data in viral load contribute to
cr ip
t
modeling results. (iii) Under the assumption of ST distribution for model error, we further explored how heterogeneous feature influences modeling results by comparing Model ST with a commonly used NLME model (Model C), in which the data feature of heterogeneous population
us
the proposed mixture joint models and methods may have a significant impact on HIV/AIDS
an
research and ARV treatments to AIDS patients and help improve understanding of the pathogenesis of HIV infection and evaluation of ARV treatments.
M
One advantage of mixture modeling approach-based MNLMEJ models proposed in this article is
ce pt ed
its flexibility to study simultaneous impact of various data characteristics (heterogeneity, nonlinearity, skewness and incompleteness). Another advantage of our mixture modeling is not only to provide model parameter estimates, but also model-based probabilistic clustering to obtain class membership probabilities. In the data analysis presented in this paper, the probabilities of belonging to any one of three classes can be obtained at both population and individual levels. This information may help physicians refine general therapeutic strategy and develop individualized treatment. In summary, for the data analysis to demonstrate the proposed the
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
is ignored. The important findings for these scenarios are detailed in Section 4.3. In summary,
MNLMEJ models and associated methodologies, the results indicate that for reliable estimation of HIV dynamic parameters and class membership probabilities, we should simultaneously address asymmetry, heterogeneity, missingness and measurement error in longitudinal data. We conclude that it is important to take the CD4 measurement errors and viral load missing data into
33
account, indicating that JM outperforms NM; it is preferable to offer mixture models with a skew distribution, in particular, when the data exhibit skewness; and it is critical to account for heterogeneous population feature for inference if longitudinal measurements arise from two or
cr ip
t
more identifiable subclasses within a population. For inference of mixture models, parameter (or model) identifiability can be an important but
g k ( ⋅)( j = 1,…, K )
us
As discussed by Pauler and Laird (2000), the mixture components,
, in the
an
finite mixture model may have the same family of densities but differ only in specific values of parameters such as in their means and/or variances, or have completely different functional
M
forms with parameters of different dimensions and meanings across the submodels, which is the case adopted in this paper (see equations (7)–(9) in detail). These mean functions have been well
ce pt ed
developed and have biologically meaningful interpretation based on HIV dynamics (Nowak and May, 2000; Perelson et al., 1997; Wu and Ding, 1999). It is noted that we must ensure each component model to be identifiable to make whole mixture model identifiable. To make (8) and (9) identifiable, we assume
* λ1i > λ2i in (8) and λ1i > λ2ij in (9) as discussed in Appendix B. In
practice, if the models are not identifiable, the MCMC algorithm would diverge quickly. In the application considered in this article, the MCMC algorithm was converged without problems and
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
difficult problem when a large number of model parameters must be estimated simultaneously.
we did not observe potential identifiability problems. Instead of specifying different mean functions for classes, alternatively, we may also specify an universal mean function for all classes, e.g. the mean function (9), and let data themselves determine the number of clusters and/or shapes of trajectories. In that way the identifiability issue may arise. Because of
34
complexity of longitudinal studies, an advantage of choosing each component with different functional forms
gk
,
( j = 1,… , K )
rather than with the same general family in the finite mixture
t
models is that the non-identifiability can be avoided (Pauler and Laird, 2000).
cr ip
The two issues related to this study are noted as follows. First, in this article, we imputed the censored value by a limit of detection (LOD) was imputed by half of LOD for simplification, but
us
multiple imputation. This complicated problem is beyond the focus of this article, but a further
an
study may be warranted. Second, it is our purpose that this article is to demonstrate the proposed mixture joint models and methods with various scenarios for real data analysis in comparison of
M
symmetric distributions with asymmetric distributions for model error, although a limited simulation study may be conducted to evaluate the performance from different model
ce pt ed
specifications and the corresponding methods. However, since this paper investigated many different scenarios-based models and methods with real data analysis, the complex natures considered in this article, especially, skew distributions involved, will pose some challenges for such simulation studies which requires additional efforts. We are actively investigating these important research problems and hope that the relative results could be reported in the near future.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
is not intended to consider data with LOD problems using more advanced techniques such as
Acknowledgements The authors would like to gratefully acknowledge the Editor and two anonymous referees for their insightful comments and constructive suggestions that led to a marked improvement of the article.
35
References Arellano-Valle RB, Genton, M. (2005). On fundamental skew distributions. Journal of
t
Multivariate Analysis 96: 93–116.
mixed models. Journal of Applied Statistics 34: 663–682.
cr ip
Arellano-Valle RB, Bolfarine H, Lachos, VH. (2007). Bayesian inference for skew-normal linear
us
distribution. Journal of Royal Statistical Society, Series B 61: 579–602.
an
Azzalini A. and Capitanio A. (2003). Distributions generated by perturbation of symmetry with
367–389.
M
emphasis on a multivariate skew-t distributions. Journal of Royal Statistical Society, Series B 65:
Azzalini A, Genton M. (2008). Robust likelihood methods based on the skew-t and related
ce pt ed
distributions. International Statistical Review 76: 106–129. Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edition. London: Chapman and Hall. Diebolt, J., Robert, C. (1994). Estimation of finite mixture distributions by Bayesian sampling. J. Royal Statist. Soc. Series B 56, 363–375.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Azzalini A, Capitanio A. (1999). Statistical applications of the multivariate skew normal
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (2003). Bayesian Data Analysis. London:
Chapman and Hall. Gelman, A., Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–511.
36
Hammer S.M., et al. (2002). Dual vs single protease inhibitor therapy following antiretroviral treatment failure: a randomized trial. The Journal of the American Medical Association 288, 169–180.
cr ip
t
Higgins, M., Davidian, M. and Gilitinan, D. M. (2007). A two-step approach to measurement
error in time-dependent covariates in nonlinear mixed-effects models, with application to IGF-I
us
Ho HJ, Lin TI. (2010). Robust linear mixed models using the skew-t distribution with application
an
to schizophrenia data. Biometrical Journal 52, 449–469.
Huang, Y., Liu, D., Wu, H. (2006). Hierarchical Bayesian methods for estimation of parameters
M
in a longitudinal HIV dynamic system. Biometrics 62, 413–423.
Huang, Y., and Dagne, G. (2011). A Bayesian approach to joint mixed-effects models with a
ce pt ed
skew-normal distribution and measurement errors in covariates. Biometrics 67, 260–269. Jara A, Quintana F, Martin ES. (2008). Linear mixed models with skew-elliptical distributions: A Bayesian approach. Computational Statistics and Data Analysis 52, 5033–5045. Lavine, M., West, M. (1992). A Bayesian method for classification and discrimination. Canadian Journal of Statisttics 20, 451–461.
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
pharmacokinetics. Journal of the American Statistical association 92, 436–448.
Liu, W., and Wu, L. (2007). Simultaneous inference for semiparametric nonlinear mixed-effects
models with covariate measurement errors and missing responses. Biometrics 63, 342–350. Lu X. and Huang, Y. (2014). Bayesian analysis of nonlinear mixed-effects mixture models for longitudinal data with heterogeneity and skewness. Statistics in Medicine 33, 2830–2849.
37
Lunn, D. J., Thomas, A., Best, N., Spiegelhalter, D. (2000). WinBUGS - a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing 10, 325–337. Muthen, B., Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM
cr ip
t
algorithm. Biometrics 55, 463–469.
Nowak, M.A., May, R.M., (2000). Virus Dynamics: Mathematical Principles of Immunology and
us
Pauler, D., Laird, N.M. (2000). A mixture model for longitudinal data with application to
an
assessment of noncompliance. Biometrics 56, 464–472.
Perelson, A.S., Essunger, P., Cao, Y., Vesanen, M., Hurley, A., Saksela, K., Markowitz, M., Ho,
Nature 387, 188–191.
M
D.D. (1997). Decay characteristics of HIV-1-infected compartments during combination therapy.
ce pt ed
Sahu S.K., Dey D.K., Branco M.D. (2003). A new class of multivariate skew distributions with applications to Bayesian regression models. The Canadian Journal of Statistics 31, 129–150. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B 64, 583–639. Titterington, D.M., Smith, A.F.M., Makov, U.E. (1985). Statistical Analysis of Finite Mixture
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Virology. Oxford: Oxford University Press.
Distributions. Chichester, U.K.: John Wiley and Sons. Wu, H., Ding, A.A. (1999). Population HIV-1 dynamics in vivo: Applicable models and inferential tools for virological data from AIDS clinical trials. Biometrics 55, 410–418.
38
Wu, L. (2002). A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. Journal of the American Statistical Association 97, 955–964.
cr ip
t
Wu, H., Zhang, J.T. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis.
us an M ce pt ed Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Hoboken, New Jersey: Wiley.
39
Model
I
JM
ST
βl
β2
β3
β4
cr ip
Method
β5
β6
αl
α2
α3
us
MDM
an
PM 10.5 116.5 4.62 6.80 −4.35 9.77 −0.21 1.42 −1.46
M
LCI 10.3 108.0 −1.76 6.48 −5.38 6.86 −0.28 0.81 −2.34
ce pt ed
UCI 10.7 125.7 11.4 7.13 −3.35 12.9 −0.14 2.01 −0.57
SD 0.10 4.542 3.37 0.16 0.53 1.58 0.04 0.31 0.45
I
JM
SN
PM 10.4 107.1 5.38 6.55 −5.60 10.1 −0.20 1.33 −1.28
LCI 10.2 98.31 −1.12 6.21 −6.70 7.15 −0.28 0.72 −2.13
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
t
Table 1: Summary of estimated posterior mean (PM) of population (fixed-effects) parameters, the corresponding standard deviation (SD), and lower limit (LCI) and upper limit (UCI) of 95% equal-tail credible interval (CI) based on “naive” method (NM) and joint modeling approach (JM) for a finite mixture of NLME joint models (Models ST, SN and N) and a commonly used NLME joint model (Model C) in conjunction with non-ignorable missing data models (MDM) and mismeasured covariate model (2).
UCI 10.6 116.5 11.9 6.87 −4.47 13.3 −0.13 1.94 −0.40
SD 0.10 4.602 3.27 0.16 0.56 1.60 0.04 0.32 0.46
40
I
JM
N
PM 10.4 106.2 3.78 6.57 −5.49 10.0 −0.21 1.41 −1.38
cr ip
t
LCI 10.2 97.97 −2.37 6.24 −6.68 7.32 −0.28 0.89 −2.21
UCI 10.6 114.8 9.86 6.90 −4.46 13.8 −0.14 1.96 −0.63
C
us
JM
PM 10.6 134.1 −6.83 7.43 −0.90 −13.5 −0.21 1.18 −0.64
an
I
M
LCI 10.4 124.2 −14.7 7.14 −2.35 −17.0 −0.26 0.88 −0.90
ce pt ed
UCI 10.8 143.9 −1.09 7.70 0.51 −10.2 −0.14 1.43 −0.37
SD 0.10 4.959 4.03 0.14 0.71 1.81 0.03 0.13 0.12
I
NM
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
SD 0.09 4.320 3.13 0.17 0.56 1.64 0.04 0.27 0.39
ST
PM 10.5 112.7 4.61 6.87 −3.96 0.02 –
–
–
LCI 10.3 104.3 −2.15 6.56 −4.95 −0.25 –
–
–
UCI 10.7 121.5 11.5 7.19 −3.02 0.29 –
–
–
41
SD 0.11 4.402 3.46 0.16 0.49 0.14 –
JM
ST
–
PM 10.5 120.3 4.64 6.94 −3.71 9.76 −0.20 1.19 −1.02
cr ip
t
II
–
LCI 10.3 109.0 −1.71 6.52 −5.38 6.76 −0.28 0.67 −2.17
us
JM
ST
PM 10.5 120.1 4.44 6.95 −3.67 9.69 −0.20 1.17 −0.98
M
III
an
SD 0.11 6.593 4.16 0.25 1.03 1.81 0.04 0.40 0.68
ce pt ed
LCI 10.3 108.7 −1.69 6.51 −5.25 6.90 −0.27 0.65 −2.30
UCI 10.7 133.6 10.1 7.45 −1.77 12.3 −0.12 1.98 −0.27
SD 0.11 6.704 4.72 0.26 0.94 1.56 0.04 0.33 0.57
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
UCI 10.8 134.2 10.3 7.43 −1.51 13.5 −0.13 1.98 −0.22
42
JM
ST
σ 22
δ
PM
0.73
0.08
0.50
LCI
0.68
I
0.78
ce pt ed
UCI
JM
SN
ν
cr ip
I
σ 12
DIC
EPD
SSR
11368.9
0.310
439.2
11572.8
0.323
458.7
us
Model
3.77
an
Method
0.06
0.39
M
MDM
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
t
Table 2: Summary of estimated posterior mean (PM) for parameters of scale, skewness, and degree of freedom, and corresponding standard deviation (SD) and lower limit (LCI) and upper limit (UCI) of 95% equal-tail credible interval (CI) as well as DIC, EPD and SSR values based on “naive” method (NM) and joint modeling approach (JM) for a finite mixture of NLME joint models (Models ST, SN and N) and a commonly used NLME joint model (Model C) in conjunction with non-ignorable missing data models (MDM) and mismeasured covariate model (2).
3.11
0.12
0.58
4.64
SD
0.02
0.02
0.05
0.39
PM
0.73
0.16
0.62
–
LCI
0.68
0.12
0.51
–
UCI
0.78
0.21
0.72
–
43
JM
N
0.03
0.02
0.05
–
PM
0.73
0.31
–
–
LCI
0.68
0.29
–
–
UCI
0.78
0.33
–
–
SD
0.02
0.01
–
PM
0.78
12699.1
0.617
875.0
C
I
0.73
ce pt ed
LCI
NM
ST
cr ip
us –
an
JM
0.07
0.43
M
I
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
t
I
SD
3.29
0.05
0.34
3.01
UCI
0.83
0.10
0.51
3.85
SD
0.02
0.02
0.04
0.23
PM
–
0.09
0.50
3.75
LCI
–
0.07
0.39
3.10
44
13767.0
0.320
462.4
12017.4
0.353
499.1
–
0.13
0.59
4.66
SD
–
0.02
0.05
0.40
PM
0.75
0.08
0.49
3.65
LCI
0.69
0.05
0.38
3.04
UCI
0.81
0.11
0.58
SD
0.03
ST
PM
0.323
449.2
0.318
443.2
us
11437.5
4.60
an 0.02
0.05
M
JM
ST
0.75
ce pt ed
III
JM
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
II
cr ip
t
UCI
0.42
0.08
0.49
3.64
LCI
0.69
0.05
0.38
3.04
UCI
0.81
0.11
0.58
4.62
SD
0.03
0.01
0.05
0.41
45
11388.3
pi1
pi2
1
31
81%
17%
1
105
57%
2
29
2
132
3
us
an
38%
2%
5%
8%
0%
87 %
13%
33
0%
13%
87%
99
0%
0%
100%
M
92 %
ce pt ed
3
0%
pi3
t
Patient ID
cr ip
Class
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Table 3: Individual posterior probabilities of belonging to 3 trajectory classes for six representative patients.
46
cr ip us an M ce pt ed Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
t
Figure 1: (a) Profile of viral load (response) in log10 scale for six representative patients. Trajectory class 1: decrease rapidly and constantly in a short-term period (solid lines, ID: 31, 105); class 2: decrease at the beginning and then maintain stable at a low level (dashed lines, ID: 29, 132) and class 3: decrease at the beginning, but rebound later (dotted line, ID: 33, 99). (b) Histogram of repeated measurements of viral load in log10 scale from an AIDS clinical trial study.
47
cr ip us an M ce pt ed Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
t
Figure 2: Individual fitted curves of viral load for six representative patients based on Model ST (dashed line), SN (dotted line) and N (solid line). Patients 31 and 105 are from class 1 with probabilities 91% and 58%; patients 29 and 132 belong to class 2 with probabilities 92% and 87%; and patients 33 and 99 are from class 3 with probabilities 88% and 100%. The observed values are indicated by circle “o”.
48
cr ip us an M ce pt ed Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
t
Figure 3: Posterior probabilities of belonging to 3 trajectory classes for the 20 individuals (from the 21st to the 40th patients).
49
Supplementary Materials: Appendices
t
Appendix A. Multivariate skew distributions
cr ip
Different versions of the multivariate skew distributions have been proposed and used in the
literature (Arellano-Valle and Genton, 2005; Arellano-Valle et al., 2007; Azzalini and Capitanio,
us
2008; Sahu et al., 2003). A new class of distributions by introducing skewness in multivariate
an
elliptically distributions were developed Arellano-Valle and Genton (2005); Sahu et al. (2003). The class, which is obtained by using transformation and conditioning, contains many standard
M
families including the multivariate skew-normal (SN) and skew-i (ST) distributions as special cases. A k-dimensional random vector Y follows a k-variate skew-elliptical (SE) distribution if its
ce pt ed
probability density function (pdf) is given by
(
f y µ , Σ, ∆; mν(
k)
) = 2 f ( y µ, A; m ) P (V > 0) , (k )
k
ν
2 where A = Σ + ∆ , µ is a location parameter vector, Σ is a k × k positive (diagonal) covariance
matrix,
∆ = diag (δ1 , δ 2 ,…, δ k )
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
1999, 2003; Azzalini and Genton, 2008; Ho and Lin, 2010; Huang and Dagne, 2011; J ara eta al.,
δ = ( δ 1 , δ 2 , … , δ k ) ;V T
follows the elliptical distribution mν(
the density generator function such that
is a k × k skewness matrix with the skewness parameter vector
f 0∞ r k 2−1mν ( u ) dr
k)
(u ) =
Γ ( k 2)
π
k 2
(
El ∆ A−1 ( y − µ ) , I k − ∆A−1∆ ; mν(
mν ( u ) f r mν ( u ) dr ∞ k 2 −1 0
exists. The function
50
mν ( u )
, with
mν ( u )
k)
) and
being a function
provides the kernel of the original
elliptical density and may depend on the parameterν . This SE distribution is denoted by
(
SE µ , Σ, ∆ ; mν(
k)
) . Two examples of m ( u ) , leading to important special cases used throughout mν ( u ) = exp ( − u 2 )
and
mν ( u ) = (1 + u ν )
− (ν + k ) 2
, where ν > 0 . These two
t
the paper, are
ν
cr ip
expressions lead to the multivariate SN and ST distributions, respectively. In the latter case, ν
us
As we know, a normal distribution is a special case of an SN distribution when the skewness parameter is zero, and the ST distribution reduces to the SN distribution when degrees of
an
freedom are large. For completeness, this Appendix briefly summarizes the multivariate ST distribution introduced by Sahu et al. (2003) to be suitable for a Bayesian inference since it is
M
built using the conditional method; see Azzalini and Capitanio (2003) and Sahu et al. (2003) for detailed discussion on properties of ST distributions. Assume that a k-dimensional random vector
ce pt ed
Y follows a k-variate ST distribution with location vector µ , k × k positive (diagonal)
∆ = diag (δ1 , δ 2 ,…, δ k ) and the degrees of freedom covariance matrix Σ, k × k skewness matrix
ν . Then, its probability density function (pdf) is given by f ( y µ , Σ, ∆ ,ν ) = 2 k tk ,ν ( y µ , A ) P (V > 0 ) , (A.2)
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
corresponds to the degrees of freedom parameter.
we denote the k-variate t distribution with parameters µ , A and degrees of freedom ν by
tk ,ν ( µ , A )
tk ,ν + k
and the corresponding pdf by
. We denote this distribution by
tk ,ν ( y µ , A )
STk ,ν ( µ , Σ, ∆ )
equation (A.2) simplifies to
51
henceforth, V follows the t distribution
2 ∆ = δ Ik . In particular, when Σ = σ I k and ,
)
where
−1 2 T 2 2 −1 y y + + − − µ µ ν σ δ ( ) ( ) ) δ ( y − µ) ( 2 2 ν +k σ σ + δ
Tk ,ν + k ( ⋅)
,
−(ν + k ) 2
cr ip
×Tk ,ν + k
)
T Γ ( (ν + k ) 2 ) ( y − µ ) ( y − µ ) 1+ k 2 ν (σ 2 + δ 2 ) Γ (ν 2 )(νπ )
t
(
f y µ , σ 2 , δ ,ν = 2k (σ 2 + δ
2 −k 2
denotes the cumulative distribution function (cdf) of
tk ,ν +k ( 0, I k )
. However,
us
densities. Here Y is dependent but uncorrelated. It can be shown that the mean and covariance STk ,ν ( µ , σ 2 I k , ∆ )
are given by
an
matrix of the ST distribution
E (Y ) = µ + (ν π )
Γ ( (ν − 1) 2 )
M
12
Γ (ν 2 )
ν Γ {(ν − 1) 2} 2 − cov (Y ) = σ I k + ∆ ∆ . ν − 2 π Γ (ν 2 ) In
ν
ce pt ed 2
order
to
2
have
µ = − (ν π )
12
parameter
a
zero
Γ ( (ν − 1) 2 ) Γ (ν 2 )
mean
δ,
2
vector,
(A.3)
we
should
location
STk ,v ( µ , Σ, ∆ )
, it can be
.
Following Lemma 1 of Azzalini and Capitanio (2003), if Y follows represented by
Y = µ + u −1/ 2 X
the
assume
δ
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
unlike the SN distribution, the ST density can not be written as the product of univariate ST
(A.4)
52
where u follows a Gamma distribution
Γ (ν / 2,ν / 2 )
a k-dimensional SN distribution, denoted by
, which is independent of X , and X follows
SN k (0, Σ, ∆ ) . It follows from (A.4) that
t
Y | u ~ SN k ( µ , u −1 ∑ , u −1/ 2 ∆ ) . Following the results, provided by Arellano-Valle et al. (2007) and
(A.5)
where X0 and X1 are two independent
N k (0, I k )
random vectors. Note that the expression (A.5)
an
Y = µ + u −1/2 ∆ X 0 + u −1/ 2 ∑1/ 2 X1 ,
Let
w = u −1/ 2 X 0
M
provides a convenient device for random number generation and for purpose of implementation. ; then w, conditional on u, follows a k-dimensional normal distribution
ce pt ed
N k (0, u −1 I k ) truncated in the space w > 0 (i.e., the half-normal distribution). Thus, following
studies by Sahu et al. (2003), Ho and Lin (2010) and Jara eta al. (2008), a hierarchical representation based on (A.5) is given by
Y | w , u ~ N k ( µ + ∆w , u −1 ∑ ), w | u ~ N k (0, u −1 I K ) I ( w > 0), u ~ Γ (v / 2, v / 2),
(A.6)
Note that the ST distribution presented in (A.6) can be reverted to the following three special
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
stochastic representation as follows.
us
cr ip
Arellano-Valle and Genton (2005), the SN distribution of Y conditional on u has a convenient
cases: (i) as v → ∞ and u → 1 with probability 1 (i.e., the last distributional specification is omitted), then the hierarchical expression (A.6) becomes an SN distribution
SN k ( µ , Σ, ∆ )
; (ii) as
∆ = 0 , then the hierarchical expression (A.6) becomes a multivariate t-distribution; (iii)
53
as v → ∞ , u → 1 with probability 1 and ∆ = 0 , then the hierarchical expression (A.6) is reverted to a multivariate normal distribution.
cr ip
t
Appendix B. HIV dynamic models Viral dynamic models can be formulated through a system of ordinary differential equations (ODE) (Huang et al., 2006; Nowak and May, 2000; Perelson et al., 1997; Wu and Ding, 1999).
us
(B.1)
V (t ) = log10 (e p1 − λ1t + e p2 − λ2t )
(B.2)
M
V (t ) = log10 (e p1 − λ1t )
an
solution, which can be used to capture viral load responses, have been proposed as follows.
λ1 and λ2 are called the first-
ce pt ed
where V(t) is the log10 scaled plasma HIV-1 RNA levels at time t.
and second-phase viral decay rates, which may represent the minimum turnover rate of productively infected cells and that of latently or long-lived infected cells, respectively (Perelson p1 p1 p2 et al., 1997). The parameters p1 and p2 are macro-parameters; e and e + e are the baseline
viral load at time t = 0 in one- and two-compartment models, respectively. It is generally assumed that
λ1 > λ2 , which assures that the model is identifiable and is appropriate for empirical
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
Under some reasonable assumptions and simplification, two useful approximations of ODE
studies (Wu and Ding, 1999). Models (B.1) and (B.2) offer almost equal performance to capture the early fast-decaying segment of viral load trajectory (Wu and Ding, 1999), but model (B.2) performs better for long term of viral load trajectory (Huang and Dagne, 2011). It was noted that both models can be only applied to the early segment or longer term of the viral load responses
54
with decreasing trajectory patterns as discussed in Section 1 and shown in Figure 1(a) (two solid and two dashed lines), but the viral load trajectory may have a different shape or rebound in later stage (two dotted lines in Figure 1(a)). It was also noted that for some patients the second viral
t
λ2 , may vary over time because they depend on some phenomenological parameters
cr ip
decay rate
that hide with considerable microscopic complexity and change over time (Nowak and May,
us
rebound (Wu and Ding, 1999), suggesting that variation in the dynamic parameters, particularly
λ2 , may be partially associated with time-varying covariates such as repeated CD4 cell count to
an
capture the viral load change including viral rebound. Thus, the following extended function is
ce pt ed
V (t ) = log10 (e p1 − λ1t + e p2 − λ2 ( t ) t ) (B.3)
M
introduced. (Wu and Zhang, 2006).
where the second-phase decay rate
λ2 (t ) is either a function of time-varying CD4 cell count
(covariate) and/or an unknown smooth function. Intuitively, this model is more flexible because it assumes that the second-phase viral decay rate can vary with time as a result of drug resistance, medication adherence and other relevant clinical factors likely to affect changes in the viral load during the treatment. Therefore, all data obtained during whole study period can be used by
Ac
Downloaded by [University of Sydney] at 08:10 30 April 2015
2000). Negative value of the decay rate may correspond to viral increase and lead to viral
fitting this model. Appendix C. R and WinBUGS program codes for Model ST ## R program code
55
library(arm) library(R2WinBUGS)
cr ip
t
data