Bayesian inferences for beta semiparametric-mixed models to analyze longitudinal neuroimaging data.

662

Biometrical Journal 56 (2014) 4, 662–677

DOI: 10.1002/bimj.201300003

Bayesian inferences for beta semiparametric-mixed models to analyze longitudinal neuroimaging data Xiao-Feng Wang∗,1 and Yingxing Li2 1

2

Department of Quantitative Health Sciences/Biostatistics Section, Cleveland Clinic Lerner Research Institute, Cleveland, OH 44195, USA The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, China

Received 9 January 2013; revised 19 November 2013; accepted 6 December 2013

Diffusion tensor imaging (DTI) is a quantitative magnetic resonance imaging technique that measures the three-dimensional diffusion of water molecules within tissue through the application of multiple diffusion gradients. This technique is rapidly increasing in popularity for studying white matter properties and structural connectivity in the living human brain. One of the major outcomes derived from the DTI process is known as fractional anisotropy, a continuous measure restricted on the interval (0,1). Motivated from a longitudinal DTI study of multiple sclerosis, we use a beta semiparametric-mixed regression model for the neuroimaging data. This work extends the generalized additive model methodology with beta distribution family and random effects. We describe two estimation methods with penalized splines, which are formalized under a Bayesian inferential perspective. The first one is carried out by Markov chain Monte Carlo (MCMC) simulations while the second one uses a relatively new technique called integrated nested Laplace approximation (INLA). Simulations and the neuroimaging data analysis show that the estimates obtained from both approaches are stable and similar, while the INLA method provides an efficient alternative to the computationally expensive MCMC method.

Keywords: Beta distribution; Diffusion tensor imaging; Generalized semiparametric-mixed model; Integrated nested Laplace approximation; Markov chain Monte Carlo; Penalized splines.

Additional supporting information may be found in the online version of this article at the publisher’s web-site

1 Introduction and the motivating example Diffusion tensor imaging (DTI) is a popular quantitative magnetic resonance imaging (MRI) technique that measures the three-dimensional diffusion of water molecules within tissue through the application of multiple diffusion gradients (Basser and Pierpaoli, 1996). It allows one to measure the diffusion of water through tissue and track a fiber, or path, through which information travels in the brain, thus providing a powerful tool to assess the changes of brain white matter tracts due to diseases. This technique is rapidly increasing in popularity for studying white matter properties and structural connectivity in the living human brain. One of the major outcomes derived from the DTI process is known as fractional anisotropy (FA), a measure that describes the degree of anisotropy of a diffusion process in a brain. It is computed from the diffusion tensor’s eigenvalues in the image processing, and quantifies the magnitude of the directional preference. FA has been routinely used as an index of white matter integrity, which is ∗ Corresponding author: e-mail: [email protected]

C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Biometrical Journal 56 (2014) 4

663

Figure 1 Examples of lesion ROIs in the MS study. In total, 34 lesion ROIs were identified across four MS patients with natalizumab therapy. sensitive to white matter deterioration in neurodegenerative diseases. Specifically, in the DTI process, FA for each voxel is formulated as 1 (λ1 − λ2 )2 + (λ2 − λ3 )2 + (λ3 − λ1 )2 , (1) FA = 2 λ2 + λ2 + λ2 1

2

3

where λ1 , λ2 , λ3 ∈ R are three eigenvalues of the diffusion tensor (Le Bihan et al., 2001). Our motivating example in this paper is a longitudinal DTI study of multiple sclerosis (MS), a chronic demyelinating disease of the central nervous system. Conventional MRI easily identifies the classic hallmarks of MS in white matter, but has some apparent disadvantages. Areas of active inflammation and tissue injury could be identified by focal lesions, yet the degree of tissue injury cannot be characterized (Fox, 2008; Fox et al., 2011). As an alternative to conventional imaging techniques, DTI quantitatively measures tissue integrity and thus provides information about damage to the parts of nervous system. In a pilot MS study in Ohio, USA, four patients with relapsing-remitting MS were enrolled to undergo serial DTI examinations for 12–18 months. Neuroimages were serially acquired at the time of patient enrollment and after 1, 2, 6, 12, and/or 18 months. Imaging was performed on a 3T Siemens Trio. Diffusion-weighted imaging was performed with 71 noncollinear diffusion-weighting gradients. T1 (or spin-lattice) anatomical imaging was performed for lesion detection and coregistration. In total, 34 regions-of-interest (ROIs) for gadolinium enhancing lesion areas in the brains were identified by an experienced radiologist from T1 postgadolinium images with AFNI imaging software (Cox, 1996) (see Fig. 1). These ROIs were distributed (without overlap) across the four MS patients. For each patient, images from different time points were coregistered using FSL imaging software (Smith et al., 2004). In the data process, DTI-based FA measures were obtained by averaging FA intensities over voxels within each ROI. The time point at which the individual lesion first developed gadolinium enhancement was registered as time zero for each ROI. Negative time points refer to months prior to lesion development and positive time points refer to months after the onset of lesion development for that C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biometrical-journal.com

664

X.-F. Wang and Y. Li: Bayesian inferences for beta semiparametric-mixed models

Figure 2 The longitudinal DTI-based FA data for the MS study. The x-axis denotes the registered time (in months). The y-axis denotes the DTI-based FA outcome. Repeated measurements for an individual ROI are connected each other. A unique symbol is assigned to each patient. Time zero refers to the lesion development time of ROIs. ROI. Figure 2 shows the DTI-based FA measures of ROIs over the registered time in this longitudinal DTI study. Repeated measurements for an ROI are connected to each other, where a unique symbol and color is assigned to each patient. The goal is to understand the pre- and postlesional tissue characteristics of MS patients through evaluating the FA progress over time. By Eq. (1), measurement of FA is continuous but restrictedly bounded on the interval (0, 1). In this study, FA outcomes are not binomial proportions, because they do not represent the ratio of a count over a total number of Bernoulli trials. In medical neuroimaging literature, inferences for DTI-based FA measures were often based on statistical models with Gaussian error. Jito et al. (2008) compared FA with histological changes in the rat corpus callosum using a standard linear regression model. P´eran et al. (2010) applied a repeated measures multivariate analysis of covariance in the study of magnetic resonance imaging markers of Parkinson’s disease. Fox et al. (2011) modeled FA using a linear mixed-effects model. An analysis based on a Gaussian error model might be feasible for FA measures. However, it does not take the nature of FA’s sample space into account, which could yield fitted values for the responses that exceed their upper and lower bounds. Since the sample space of FA is naturally restricted to (0, 1), modeling FA with beta distributions becomes an appropriate choice (Ferrari and Cribari-Neto, 2004; Wang, 2012; Wang et al., 2014). Let us consider the reparameterized beta distribution to model the continuous data observed on (0, 1). The probability density function of the conventional beta distribution is given by f (y; p, q) = y p−1 (1 − y)q−1 /B(p, q), where p, q > 0, and B(p, q) is the beta function. If we let μ = p/(p + q) and τ = p + q, the density function can be reparameterized as f (y; μ, τ ) =

1 yμτ −1 (1 − y)(1−μ)τ −1 , B(μτ, (1 − μ)τ )

(2)

where μ ∈ (0, 1) and τ > 0. In the following, we denote a random variable Y that follows a beta distribution with the density form of Eq. (2) by Y ∼ beta(μ, τ ). It can be shown that E(Y ) = μ, C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim



665

and Var(Y ) = μ(1 − μ)/(1 + τ ). The parameter τ can be interpreted as a dispersion parameter, since the variance of the distribution increases as τ decreases. The beta distribution covers a large class of distributions confined to (0, 1), including probability density shapes from right skewed, left skewed to very flat. Ferrari and Cribari-Neto (2004) gave a comprehensive discussion on the beta regression model and addressed the maximum likelihood approach for parameter estimation. Branscum et al. (2007) presented a full Bayesian approach for fitting beta parametric and nonparametric models and applied them to two data examples. In this paper, we use a beta semiparametric-mixed model for the longitudinal neuroimaging data. This model extends the generalized-mixed model methodology with a beta distribution family and nonparametric smooth component. Such an extension is essential to study the nonlinearity of FA and to properly account for the within and between ROI variations. The mean function of the longitudinal FA data is modeled by a covariate-adjusted nonparametric time curve through a link function, where the penalized spline technique is applied for curve smoothing. We present two Bayesian approaches for the model: The first uses Markov chain Monte Carlo (MCMC) simulations while the second one employs a relatively new approximate Bayesian inference technique using integrated nested Laplace approximations (INLA) (Rue et al., 2009). Through extensive simulations, we compare the MCMC and INLA approaches for beta nonparametric/semiparametric mixed models. We show that both approaches provide reliable inferences while the INLA method gives an efficient alternative to the computationally expensive MCMC method. INLA substitutes MCMC simulations with accurate approximations to the posterior marginal distribution in short computational time. Bayesian model selection within the INLA approach is also discussed in the application. The remainder of the paper is organized as follows. Section 2 introduces the Bayesian beta semiparametric-mixed model and then addresses the methods for model fitting using MCMC sampling and INLA, respectively. Section 3 presents simulation studies and discusses the MS neuroimaging data analysis. Finally, we close with discussions in Section 4.

2 Methods 2.1

The beta semiparametric-mixed model

Consider now a general longitudinal data situation, where the observed response outcome yi j for i = 1, . . . , n units (clusters) and j = 1, . . . , mi repeated measures is bounded in (0, 1). The beta semiparametric-mixed model we are concerned with is yi j |μi j , τ ∼ beta(μi j , τ ), logit(μi j ) = xi j β + s(ti j ) + zi j bi ,

(3)

where β = (β0 , β1 , . . . , β p )T is (p + 1) × 1, a vector including an intercept term β0 and fixed effects parameters β1 to β p ; xi j is 1 × (p + 1), a row of a fixed effects model matrix; s(·) is an unknown smooth function of the covariate ti j ; bi = (bi1 , . . . , biq )T is q × 1, a vector of random effects coefficients; and zi j is 1 × q, a row of a random effects model matrix. For the sake of identifiability, we require s(0) = 0. The logit link function is used here (Ferrari and Cribari-Neto, 2004). In a Bayesian modeling framework, all fixed and random effects coefficients are treated as random. The fixed effects here mean that noninformative priors will be assigned to the parameters, while the random effects mean that informative priors will be assigned to the parameters. We use penalized splines to model s(t), as they are powerful in handling longitudinal effects and in fitting generalized nonparametric/semiparametric regression. The basic idea is to approximate the unknown function s(t) by a large number of spline basis functions and to use a penalty term to avoid overfitting or undersmoothing (Crainiceanu et al., 2005; Li and Ruppert, 2008). Note that the penalized smoothing methodology has a mixed model representation (Ruppert et al., 2003; Wang et al., 2009). C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim


666


Hence s(t) in the model (3) is able to be estimated in the generalized linear-mixed model (GLMM) framework. To be specific, we first express s(t) using the truncated power polynomial splines, s(t, θ) = α1t + · · · + αl t l +

K

dk (t − κk )l+ ,

(4)

k=1

where θ = (α1 , . . . , αl , d1 , . . . , dK )T is the l + K dimensional vector of regression coefficients, and κ1 , . . . , κK are K equally spaced knots in the support of s(t). Note that we do not include an intercept term α0 because we require s(0) = 0. Following Ruppert et al. (2003), the spline coefficients dk ’s are interpreted as deviations from the lth degree polynomial function and we treat them as random coefficients with a mean of 0 and variance of σd2 , where the amount of σd2 serves as the penalty parameter and controls the roughness of the function s(t). If we further assume that the spline coefficients are independent normally distributed, that (d1 , . . . , dK ) ∼ N 0, σd2 IK , with IK being the K × K identity matrix, we link the beta semiparametric regression problem to a GLMM, and could rewrite model (3) as yi j |μi j , τ ∼ beta(μi j , τ ), logit(μi j ) = x˜ i j β˜ + zi j bi + s˜i j d,

(5)

where the vector of the fixed effects β˜ = (β0 , β1 , . . . , β p , α1 , . . . , αl )T is (p + l + 1) × 1, the vector of the random effects bi = (bi1 , . . . , biq )T is q × 1, d = (d1 , . . . , dK )T is K × 1, x˜ i j is 1 × (p + l + 1), z˜ i j is 1 × q, and s˜i j = ((ti j − κ1 )l+ , . . . , (ti j − κK )l+ ) is 1 × K. The GLMM may be fitted using penalized quasi-likelihood (PQL) estimation in a frequentist framework (Breslow and Clayton, 1993). However, it has been shown that the PQL approximation method for a complex GLMM may result in seriously biased estimation (Breslow and Lin, 1995). Pinheiro and Chao (2006) described efficient Laplacian and adaptive Gaussian quadrature algorithms that greatly reduced the computational complexity and the memory usage for approximating the multilevel GLMM likelihood. In our study, the response distribution is not common and the dimension of the unknown parameters is large, therefore, the Bayesian inferential approaches are considered. We specify prior distributions on the model parameters and estimate their marginal posterior distributions to make Bayesian inferences. We now discuss two different estimation techniques using MCMC and INLA, respectively. 2.2

Priors

One needs to specify the priors for both MCMC and INLA approach. It is common to assume that the fixed-effect parameters, β, are a priori independent with prior distribution N(0, λ2 ), where λ2 is very large (e.g. λ2 = 105 ). In Bayesian models, the estimates of the variance components are known to be sensitive to the prior specification. Crainiceanu et al. (2007) discussed the choice of priors of the precision parameters for penalized spline regression. It was shown that giving Gamma priors for all precision parameters was a stable choice due to the conjugate property of the gamma family. In both MCMC and INLA algorithms, all precisions are given a dispersed gamma prior, Gamma(ω, ν), say, ω = ν = 10−5 . The parameterization of such a gamma distribution is chosen so that its mean equals ω/ν = 1 with a large variance of ω/ν 2 = 105 . C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim



2.3

667

Model fitting and inference using MCMC

Assume that bi | ∼ N(0, ), where the covariance matrix = (φ) depends on the parameters φ. T Let y = (y , . . . , y )T be the vector of observations on unit i, γ = (β˜ , bT , . . . , bT , d , . . . , d )T be i

i1

1

imi

n

1

K

the vector of all unknown parameters in the linear predictor to which we shall assign Gaussian priors, and ζ = (φ, σd2 , τ ) be the vector of all variance parameters to which we shall assign non-Gaussian priors. Under the model (5), the posterior distribution π (γ, ζ|y) is given by

π (γ, ζ|y) ∝ π (γ|ζ)π (ζ)

n

π (yi |γ, ζ)

i=1

⎧ ⎫ mi n ⎨ ⎬ ˜ (φ)π σ 2 π (τ )|G|−1/2 exp − 1 b˜ T G−1 b˜ + log f yi j |x˜ i j β˜ + zi j bi + s˜i j d, τ , ∝ π (β)π d ⎩ 2 ⎭ i=1 j=1

where y = (yT1 , . . . , yTn )T is the vector of conditional independent observations, b˜ = (bT1 , . . . , bTn , d1 , . . . , dK )T , and G is a block diagonal matrix in which the first n diagonal blocks are and the last block is σd2 IK . Bayesian inference is based on the posterior distribution of all parameters. In our case, though, the joint posterior distribution of all parameters π (γ, ζ|y) is analytically intractable, and so are posterior marginal distributions. To this end, Markov chain Monte Carlo methods have been developed that generate a Markov chain through repeated sampling that converges within a burn-in phase to the ˜ ˜ marginal posterior distributions of interest, that is in our case, π (β|y), π (b|y), and π (ζ|y). A basic MCMC algorithm is the Gibbs sampler, which involves repeated sampling of full conditional distri˜ (l ) , b˜ (l ) , y), π (ζ|β˜ (l ) , b˜ (l ) , y), and π (b| ˜ β˜ (l ) , ζ (l ) , y). However, in our case, full conditional butions: π (β|ζ distributions of effect parameters β˜ and b˜ are of an unknown form, because the Gaussian priors are nonconjugate to the beta distribution. Adaptive rejection sampling or slice sampling is feasible here in our models. Adaptive rejection sampling constructs an envelope function of the log of the target density, which is then used in rejection sampling (Zeger and Karim, 1991; Gilks and Wild, 1992; Wild and Gilks, 1993). Slice sampling uses the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function (Neal, 2003). The algorithms have been implemented in several software platforms including WinBUGS (Lunn et al., 2000) and/or JAGS (Plummer, 2003). We fit the models using JAGS in this paper, where the work-horse function for the MCMC algorithm is the slice sampling. 2.4

Model fitting and inference using INLA

INLA is a new approximate Bayesian method for parameter estimation in latent Gaussian models (Rue et al., 2009). Several studies have shown the success of INLA in a large class of statistical models ¨ (Fong et al., 2010; Roos and Held, 2011; Martino et al., 2011; Schrodle and Held, 2011a, 2011b; Wang, 2013). Here we briefly review the INLA approach. In the INLA computational approach, of main interest is the marginal posterior distribution π (γv |y), where γv is one element of the vector of all unknown regression parameters, γ. It can be written as π (γv |y) =

π (γv |ζ, y)π (ζ|y)dζ.

C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

(6)


668


The key of INLA is to construct a nested approximation of Eq. (6). This involves three steps. First, the marginal posterior distribution π (ζ|y) is approximated using π (γ, ζ, y) π˜ (ζ|y) ∝ , π˜ G (γ|ζ, y) γ=γ ∗ (ζ) where π˜ G (γ|ζ, y) is the Gaussian approximation to the full conditional distribution of γ evaluated in the mode γ ∗ (ζ) for a given ζ. Second, compute π˜ (γv |ζ, y), an approximation of the posterior distribution of the parameter γv given the observed data and the hyperparameters. In particular, Rue et al. (2009) suggested three approaches to approximate π (γv |ζ, y): a Gaussian approximation, a full Laplace approximation, and a simplified Laplace approximation. The Gaussian approximation is the simplest method, while the Laplace approximation is the most accurate. Ultimately, the simplified Laplace approximation is less computationally intensive than the full Laplace approximation with only a slight loss of accuracy. Third, INLA combines the previous two steps and uses a numerical integration to reach the goal. The approximate posterior distribution π (γv |y) is obtained by π˜ (γv |y) = π˜ (γv |ζ q , y)π˜ (ζ q |y) q , q

where the sum is over values of ζ q ’s with area weights q . The choice of the integration points can be made using two strategies. See Rue et al. (2009) for more details on both strategies. INLA provides a fast Bayesian inference method using accurate approximation to the marginal posterior distributions for the parameters in the linear predictor and the variance components (Rue et al., 2009). In our study, we use the interface provided by the R package INLA and implement additional functions to define the user-specified latent Gaussian fields and conduct the approximations. No samples of the posterior marginal distributions are drawn using INLA, so it is a computationally convenient alternative to MCMC. In addition, the deviance information criteria (DIC) can be computed in INLA, similar to other Bayesian methods. DIC is defined as ¯ + pD , DIC = D ¯ is the posterior mean of the deviance of the model and pD is the effective number of where D parameters. In a generalized regression model, pD is approximately the trace of the product of the Fisher’s information and the posterior covariance matrix. DIC is thus the sum of a measure of goodness of fit plus a measure of model complexity. The model with the lower DIC provides the better trade-off between fit and model complexity (Spiegelhalter et al., 2002).

3 The analysis of the multiple sclerosis neuroimaging data For the MS neuroimaging study, we have the longitudinal data from an unbalanced factorial design. The FA outcome, yri j , is observed on the ith ROI at the jth time point for the rth subject, where r = 1, . . . , R, i = 1, . . . , nr , and j = 1, . . . , mri . In the application, we have R=4 subjects, 4r=1 nr = 34 ROIs, and mri ∈ {4, 5, 6}. The beta semiparametric-mixed model we applied is: yri j |μi j ∼ beta(μri j , τ ), logit(μri j ) = β0+ βr + s(tri j ) + uri , uri ∼ N 0, σu2 , where s(tri j ) is the population evolution curve as a function of time, tri j , and uri is the ROI random effect used to model a considerable degree of heterogeneity among ROIs. In this formulation, we think of β0 as a grand mean, that is the common mean level of the subjects. The parameters βr denote the unique effect due to subject r = 1, . . . , 4, the deviation from the mean level that is caused by the C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim



669

subjects. Note that we need to impose additional constraints by setting β1 = 0 and s(0) = 0 in order to avoid unidentifiability. Following the discussion in Section 2, the function s(tri j ) is modeled using the truncated power polynomial spline in Eq. (4). Thus we have yri j |μri j logit(μri j ) dk uri

∼ = ∼ ∼

beta(μri j , τ ), l K q l β0+ βr + q=1 αq tri j + k=1 dk (tri j − κk )+ + uri , 2 N 0, σd , N 0, σu2 .

(7)

We fit the model (7) using both MCMC and INLA. Next, we present a simulation study and then the results of the data analysis. 3.1

Simulations

We illustrate the performance of the proposed methods through simulations in this section. The purpose of the simulation study is to evaluate the adequacy of using MCMC and INLA to fit the models for the FA data, and to compare the performance of these two approaches in estimating nonlinear curves, linear parameters and variance components. Our first simulation example evaluates generalized nonparametric-mixed-effects models with beta distribution. The datasets were simulated from the beta distributions, yi j ∼ beta(μi j , τ ), i = 1, . . . , n, and j = 1, . . . , m, where logit(μi j ) = s(ti j ) + ui , ui ∼ N(0, σu2 ). In these model settings, we do not have the fixed effects in (3), thus the additional constraint that s(0) = 0 in the mixed-effects models is not necessary. We considered the following four nonlinear functions, which are listed in order of complexity: (i) (ii) (iii) (iv)

s(t) = 2.5 exp(t 2 /2) − 2, s(t) = 2 sin(2t + 1) − 1, s(t) = 2 cos(πt/2 + 3) + t 2 , s(t) = 1.5 sin(πt/2 + 3) + 2 exp(−t 2 /2) − 2.5.

Two levels of variance components were considered: (a) τ = 15, σu2 = 0.52 ; (b) τ = 10, σu2 = 0.82 ; and two levels of sample size were evaluated: (1) n = 20, m = 20; (2) n = 50, m = 40. The time points ti j ’s are generated from uniform distribution U (0, 1). We used a quadratic spline with equally spaced knots to model the nonlinear functions. Ruppert (2002) showed that the number of knots K (typically 5–20) was insensitive to the model fitting in penalized spline regression. Following his recommendation, we considered K = 10 to ensure the desired flexibility. The MCMC computations were performed using the R package rjags, an R interface to the “jags” program; the INLA computations were performed using the R package INLA, an R interface to the “inla” C library. All computations were performed on a Linux server with 32 Intel(R) Xeon(R) 2.90 GHz Cores and 250 Gb memory. The results from the MCMC sampling approach were based on 50,000 total iterations, with a 20,000 burn-in-period. Unsurprisingly, INLA provided inference results with very fast computation. In estimating the model (i), our computational time using INLA is on average less than 4 s for the total sample size 400 and less than 36 second for the sample size 2000, while MCMC needs much longer time to obtain sufficiently accurate estimates. Figure 3 presents a typical simulated example for fitting a beta nonparametric-mixed model with MCMC and INLA. In this example, n = 50, m = 40, τ = 10, and σu2 = 0.82 . The true function is Model (iv), a v-shape function. The estimated curves were calculated from the posterior means of the parameters in the penalized spline. In the figure, the estimate using INLA is denoted by the dashed lines, the estimate using MCMC is denoted by the dotted line, and the true function is denoted by the solid line. Both MCMC and INLA methods work quite well even though the underlying model is relatively complex. C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim


670


Figure 3 A simulated example for estimating the nonlinear curve s(t) in the beta nonparametric mixed models using MCMC and INLA: n = 50, m = 40, τ = 10, and σu = 0.8 for Model (iv). The estimated function using INLA is denoted by the dashed line, the estimated function using MCMC is denoted by the dotted line, and the true function is denoted by the solid line. To assess the performance of these methods, we calculate the mean absolute deviation error (MADE) to the true function, as given by: ˆ = MADE(s)

n m 1 ˆ i j ) − s(ti j ). s(t nm i=1 j=1

We performed 200 replications for each model with different sample sizes and variance components. The medians and inter-quartile ranges (IQR) of the 200 MADEs for the four models are reported in Table 1. Both MCMC and INLA yield very similar performances in terms of MADE. In general, MADE gets smaller as the sample size increases, while it becomes larger as the variance of the data increases. The estimates of the variance components for the generalized nonparametric models were accurate using both methods. The comparison of the two methods in terms of estimating the dispersion parameter τ and the precision parameter 1/σu2 are reported in the supplement of this paper. Our second simulation example evaluates the semiparametric-mixed models. The datasets were simulated from the beta distributions, yri j ∼ beta(μri j , τ ), r = 1, . . . , 4, i = 1, . . . , n, and j = 1, . . . , m, where logit(μri j ) = βr + s(tri j ) + uri , uri ∼ N(0, σu2 ). We set β1 = 0, β2 = 1, β3 = 2, β4 = 3, and the smooth function as (v)

s(t) = −3t 2 + 1.5t.

In order to judge the performance of our application, we consider sample sizes n = 20, m = 10 (the total sample size is comparable to our real study) as well as sample sizes n = 40, m = 20. Figure 4 displays the curve estimates from a simulated dataset for Model (v) with n = 20, m = 10, τ = 15, and σu2 = 0.52 . In this example, MCMC and INLA give almost identical estimates. They accurately recovered the underlying target function. Table 2 reports the summarized results for MADE of s(t) from 200 replicates of simulations. Table 3 lists the summarized results for the estimation of fixed effects parameters and variance components. We report the biases and SDs of the 200 parameter estimates, for each simulation case. In as well as the means of the 200 estimated standard errors (denoted by s.e.) C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim



671

Table 1 The comparison of the MCMC and INLA methods in terms of estimating the nonlinear curve s(t) in the beta nonparametric-mixed models from 200 replicates of simulations. Entries without parentheses are the medians of MADEs and entries with parentheses are the IQRs of MADEs Variance level

Model

Medians and IQRs of MADEs n = 20, m = 20

n = 40, m = 50

MCMC

INLA

MCMC

INLA

(a)

(i) (ii) (iii) (iv)

0.080 (0.067) 0.095 (0.095) 0.078 (0.082) 0.087 (0.077)

0.078 (0.067) 0.094 (0.093) 0.077 (0.085) 0.085 (0.080)

0.046 (0.045) 0.051 (0.056) 0.048 (0.049) 0.047 (0.050)

0.045 (0.047) 0.050 (0.058) 0.046 (0.052) 0.047 (0.052)

(b)

(i) (ii) (iii) (iv)

0.120 (0.145) 0.121 (0.128) 0.118 (0.149) 0.129 (0.153)

0.119 (0.141) 0.116 (0.129) 0.115 (0.142) 0.127 (0.154)

0.086 (0.114) 0.087 (0.100) 0.075 (0.087) 0.089 (0.094)

0.085 (0.104) 0.080 (0.107) 0.073 (0.080) 0.084 (0.098)

Figure 4 A simulated example for estimating the nonlinear curve s(t) in the beta semiparametricmixed models using MCMC and INLA: n = 20, m = 10, τ = 15, and σu2 = 0.52 . The estimated function using INLA is denoted by the dashed line, the estimated function using MCMC is denoted by the dotted line, and the true function is denoted by the solid line.

general, both methods provide reasonable estimates for the variance parameters. The estimates become are very close to the SD’s, more accurate as the sample size increases. In the estimation of τ , the s.e.’s which indicates that, for either method, the estimates of the standard errors for τ are reasonably good. of σu2 are In the estimation of σu2 , both methods slightly underestimate its standard errors. The s.e.’s smaller than the corresponding SD’s. C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim


672


Table 2 The comparison of the MCMC and INLA methods in terms of estimating the nonlinear curve s(t) in the beta semiparametric-mixed models from 200 replicates of simulations. Entries without parentheses are the medians of MADEs and entries with parentheses are the IQRs of MADEs Variance level

Medians and IQRs of MADEs n = 20, m = 10

(a) (b)

n = 40, m = 20

MCMC

INLA

MCMC

INLA

0.107 (0.117) 0.141 (0.164)

0.109 (0.108) 0.114 (0.145)

0.059 (0.067) 0.068 (0.085)

0.054 (0.064) 0.055 (0.070)

Table 3 The comparison of the MCMC and INLA methods in terms of estimating the fixed-effects parameters and precision parameters in the beta semiparametric-mixed models from 200 replicates of simulations Variance level

Sample size

Parameters MCMC

n = 20, m = 10 β1 β2 β3 τ σu−2 n = 40, m = 20 β1 β2 β3 τ σu−2 n = 20, m = 10 β1 β2 β3 τ σu−2 n = 40, m = 20 β1 β2 β3 τ σu−2

(a)

(b)

3.2

INLA

Bias

SD

s.e.

Bias

SD

s.e.

0.001 0.011 0.001 0.212 0.285 −0.018 0.007 0.019 −0.089 0.328 −0.041 0.036 −0.029 0.317 0.236 0.036 0.016 0.061 0.072 0.119

0.306 0.334 0.334 1.594 2.857 0.224 0.225 0.228 0.826 1.462 0.535 0.592 0.576 1.126 0.891 0.364 0.386 0.349 0.519 0.490

0.341 0.348 0.356 1.631 1.839 0.223 0.226 0.230 0.778 1.117 0.520 0.513 0.531 1.114 0.719 0.343 0.344 0.349 0.534 0.417

0.004 0.003 −0.036 −0.619 0.867 −0.026 −0.007 −0.025 −0.818 0.560 −0.041 −0.012 −0.092 −0.160 0.419 0.022 −0.016 0.022 −0.494 0.236

0.301 0.324 0.320 1.478 3.528 0.218 0.218 0.218 0.798 1.554 0.517 0.586 0.560 1.054 0.903 0.350 0.373 0.336 0.451 0.480

0.327 0.331 0.340 1.498 2.066 0.221 0.223 0.226 0.717 1.185 0.494 0.496 0.503 1.019 0.790 0.347 0.349 0.350 0.482 0.442

Results

In this subsection, we present the analysis results for the longitudinal neuroimaging data. The beta semiparametric-mixed model (7) was built for the dataset, where the quadratic splines with K = 10 equally spaced knots were applied to model the smooth function s(t). Estimation of the model was implemented using both the INLA and MCMC methods. The convergence of MCMC chains was assessed using graphical diagnostics, which indicated that the Markov chains converged to their stationary distributions. The estimates with their 95% credible bands for the nonparametric time effect C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim



673

Figure 5 Estimates with 95% credible bands for the nonparametric time effect in the longitudinal neuroimaging study. The mean estimate and its credible bands using INLA are denoted by solid lines; the mean estimate and its credible bands using MCMC are denoted by dashed lines. are displayed in Fig. 5. The estimated function and its credible bands using INLA are denoted by the solid lines; the estimated function and its credible bands using MCMC are denoted by the dashed lines. The credible bands are the pointwise credible intervals for each s(tri j ) that were calculated from the 2.5% and 97.5% quantiles of the posterior distributions for the parameters in the penalized spline model (4). The estimates from these two methods are very close. Figure 5 indicates that the function of the time effect does not correspond to a horizontal line. The time trend s(t) starts to decline approximately 4 months prior to lesion formation, and then progressively increases after lesion formation. It becomes stable approximately 7 months after lesion formation, however it could not recover back to the level of 4 months prior to lesion formation. This nonlinear time evolution of DTI-based FA suggests that physiological changes in the brain tissue of MS patients starts prior to the development of lesions. FA reaches its minimum around the time of lesion formation. The finding suggests that FA could be potentially a biomarker for early detection of new lesion development in MS patients. The estimated-fixed effects and variance components with their posterior standard deviations and 95% credible intervals are listed in Table 4. Both the INLA and MCMC methods give concordant estimates. All estimated coefficients are significant in the analysis. The heterogeneity of subjects is captured by modeling the effect of subject class in the linear predictor function. As we have addressed in Section 2, DIC can be used to compare models in a Bayesian framework with respect to fit and complexity. In addition to fitting the full model (3) (denoted by M1), we fit three reduced models using INLA: a model without the random effect ui (denoted by M2), a model assuming a linear time trend (i.e. s(ti j ) = α1ti j ; denoted by M3), and a model assuming no time trend (i.e. s(ti j ) = 0; denoted by M4). Table 5 shows the estimated DIC for each model. It is noticeable that M2 has the largest DIC, −428.23. This indicates that it is important to take into account the random effect of ROIs in this case study. The DIC of M3, −655.75, is smaller than that of M2, but the linear time trend appears to be insufficient to model the data. Since the full model we proposed has the smallest DIC, it is the best choice here. In this application, the beta regression model appears to be a good choice due to its nice interpretation for the bounded outcomes. It allows the linear predictor to be related to the bounded response variable via a logit link function and allows the magnitude of the variance of each measurement to be a function of its predicted value. C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim


674


Table 4 Estimated fixed effects and variance components with their posterior standard deviations and 95% credible intervals in the longitudinal neuroimaging study Parameter

Intercept Subject2 Subject3 Subject4 τ σu−2 σd−2

INLA

MCMC

Mean

SD

2.5% Quantile

97.5% Quantile

Mean

SD

2.5% Quantile

97.5% Quantile

−1.102 0.316 0.676 0.228 182.972 8.514 684.440

0.146 0.180 0.254 0.177 22.919 2.279 574.387

−1.393 −0.038 0.176 −0.121 144.075 4.642 111.101

−0.815 0.671 1.177 0.576 233.869 13.505 2194.701

−1.106 0.322 0.669 0.232 186.620 8.478 663.425

0.147 0.180 0.255 0.178 22.245 2.273 548.249

−1.397 −0.034 0.166 −0.119 146.234 4.802 123.029

−0.816 0.678 1.173 0.582 233.543 13.653 2110.270

Table 5 DICs for different regression models using INLA: M1 denotes the full model, M2 denotes the model without the random effect, M3 denotes the model assuming a linear time trend, and M4 denotes the model assuming no time trend Model

¯ D

pD

DIC

M1 M2 M3 M4

−700.72 −418.65 −621.79 −570.74

39.86 9.58 33.96 32.52

−740.58 −428.23 −655.75 −603.26

4 Closing remarks DTI is a relatively new neuroimaging platform to study the underlying physiological processes in brain research. The major outcome from the DTI process, FA, is a continuous measure restrictedly defined on the interval (0, 1). We propose unified Bayesian inference approaches via MCMC and INLA for the beta semiparametric-mixed model to analyze the neuroimaging data of special nature. The penalized spline methodology provides a flexible tool to model the nonlinear pattern for nonGaussian correlated responses. Our neuroimaging case study shows the effectiveness of our model in accounting for the nonlinear time trend and the random effects of ROIs. In the model we considered, the smooth component was expressed using truncated polynomial splines. Other splines, such as B-splines, radial splines, and O’Sullivan splines could be used as alternatives in the same modeling scenario. Through extensive simulations and the case study, we demonstrate that the INLA method provides an efficient alternative to the computationally expensive MCMC method in generalized semiparametricmixed models for the responses observed on (0, 1). The DIC, originally developed when MCMC draws were available, can be also easily obtained in this case. The use of INLA provides a powerful and speedy inferential tool, especially when the MCMC approach might be infeasible to apply. In the specification of priors for the MCMC and INLA, we use the univariate normal distribution for fixed-effect parameters and the univariate gamma distribution for precision parameters in order to keep the simplicity of the model. It appears that the choices are reasonable to obtain reasonable converging chains and are insensitive to our posterior estimates. In practice, one could consider informative priors when MCMC converges poorly. Branscum et al. (2007) suggested a method for C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim



675

constructing informative priors of Beta regression, which could be applicable to our semiparametric model. Our current data are from a pilot DTI study of MS. The sample size is relatively small. Only a few FA trajectories cover the whole time range, and the longitudinal design is unbalanced. Although our preliminary findings are interesting, more data are needed to substantiate them and test for more flexible models. In the present modeling, we ignore the temporal correlation over repeated measures, and we only consider one smooth term in the linear predictor. Our model (3) can be naturally extended to more complex beta additive-mixed models. However, when multiple smoothing functions exist in the highly correlated linear predictors, MCMC or INLA could fail to converge. Developing more effective algorithms for generalized additive mixed models with high-dimensional variables remains an open problem left for future research. Acknowledgments We are grateful to the Associate Editor and two reviewers for their valuable suggestions which substantially improved the paper. The research of Xiao-Feng Wang is supported in part by NIH Grant UL1 RR024989. The research of Yingxing Li is partially supported by the NSF of China (no. 11201390) and NSF of Fujian (no. 2013J01024).

Conflict of interest The authors have declared no conflict of interest.

Appendix We present here the key computer code to call MCMC and INLA methods for the beta semiparametric-mixed model. Simulation examples and code files can be obtained from the journal’s webpage or http://filer.case.edu/xxw17/Software/BSMM/.

r

BUGS model definition to fit the beta semiparametric-mixed model. # ‘‘X’’ and ‘‘Z’’ are the design matrices of truncated power polynomial spline # ‘‘degree’’ is the degree of the truncated power polynomial spline # ‘‘class’’ is the class variable of patients; ‘‘ID’’ is the class variable of ROIs model{ for(i in 1:m) { alpha_sub[i] ˜dnorm(0,taualpha) } for(j in 1:degree) { beta[j] ˜dnorm(0,.001) } for(j in 1:Nknots) { b[j] ˜dnorm(0,taub) } # prior alpha[1]

STGP: Spatio-temporal Gaussian process models for longitudinal neuroimaging data.

Bayesian multi-task learning for decoding multi-subject neuroimaging data.

Bayesian Nonparametric Longitudinal Data Analysis.

Bayesian Models for fMRI Data Analysis.

Bayesian Transformation Models for Multivariate Survival Data.

Bayesian analysis of nonlinear mixed-effects mixture models for longitudinal data with heterogeneity and skewness.

Using hierarchical Bayesian binary probit models to analyze crash injury severity on high speed facilities with real-time traffic data.

Bayesian multivariate augmented Beta rectangular regression models for patient-reported outcomes and survival data.

A Bayesian mixture of semiparametric mixed-effects joint models for skewed-longitudinal and time-to-event data.

Bayesian model selection for pathological neuroimaging data applied to white matter lesion segmentation.

Right-sizing statistical models for longitudinal data.

Bayesian inference for longitudinal data with non-parametric treatment effects.

A Sparse Bayesian Learning Algorithm for Longitudinal Image Data.

A Bayesian hierarchical framework for modeling brain connectivity for neuroimaging data.

Bayesian Generalized Low Rank Regression Models for Neuroimaging Phenotypes and Genetic Markers.

SEX, LIES AND SELF-REPORTED COUNTS: BAYESIAN MIXTURE MODELS FOR HEAPING IN LONGITUDINAL COUNT DATA VIA BIRTH-DEATH PROCESSES.

Bayesian inference on mixed-effects varying-coefficient joint models with skew- t distribution for longitudinal data with multiple features.

Integrative Bayesian analysis of neuroimaging-genetic data with application to cocaine dependence.

A Bayesian approach to functional mixed-effects modeling for longitudinal data with binomial outcomes.

Bayesian inferences of latent class models with an unknown number of classes.

A statistically motivated framework for simulation of stochastic data fusion models applied to multimodal neuroimaging.

Covariance Partition Priors: A Bayesian Approach to Simultaneous Covariance Estimation for Longitudinal Data.

Inferences about population dynamics from count data using multistate models: a comparison to capture-recapture approaches.

Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures.