Research Article Received 6 June 2013,

Accepted 12 November 2013

Published online 15 December 2013 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6065

Latent variable models with nonparametric interaction effects of latent variables Xinyuan Song,* † Zhaohua Lu and Xiangnan Feng Renal disease is one of the common complications of diabetes, especially for Asian populations. Moreover, cardiovascular and renal diseases share common risk factors. This paper proposes a latent variable model with nonparametric interaction effects of latent variables for a study based on the Hong Kong Diabetes Registry, which was established in 1995 as part of a continuous quality improvement program at the Prince of Wales Hospital in Hong Kong. Renal outcome (outcome latent variable) is regressed in terms of cardiac function and diabetes (explanatory latent variables) through an additive structural equation formulated using a series of unspecified univariate and bivariate smooth functions. The Bayesian P-splines approach, along with a Markov chain Monte Carlo algorithm, is proposed to estimate smooth functions, unknown parameters, and latent variables in the model. The performance of the developed methodology is demonstrated via a simulation study. The effect of the nonparametric interaction of cardiac function and diabetes on renal outcome is investigated using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd. Keywords:

Bayesian P-splines; MCMC algorithm; latent variables; nonparametric interaction effects; semiparametric models

1. Introduction In the biomedicine, genetics, and ecological sciences, latent variables are commonly used to represent characteristics that cannot be measured accurately by a single observed variable and instead are assessed through many observed variables. Latent variable models (LVMs) [1–3] are useful tools for assessing the interrelationships among observed and latent variables. In conventional LVMs, outcome latent variables are usually regressed on explanatory latent variables in a linear form. Extensions in the parametric framework have been conducted in two ways [4–7]. The first way is to include the univariate nonlinear functions of latent variables, for example, quadratic functions, to lessen the linear assumption on each latent variable. The second way is to consider the interactions of latent variables, for example, products of latent variables, which eliminate the assumption that the effects of explanatory latent variables are additive. Although such nonlinear LVMs have been useful, their structural equations remain parametric; hence, they may impose too many rigid parametric assumptions to model the actual conditions correctly. Thus, considering more general structural equations is necessary for revealing the true functional relationships among latent and observed variables. Abundant studies on the modeling of nonparametric functional relationships between observed response variables and covariates are available in the literature. Approaches include, but are not limited to, smoothing splines [8], local polynomials [9], and penalized splines [10]. To relax the parametric assumption on interaction effects of covariates, we need to model multivariate nonparametric functions. As pointed out in [11], however, additive univariate nonparametric functions may be insufficient for modeling multivariate regression functions because the additive assumption of each covariate induces approximation errors. More flexible approaches model the whole interaction as bivariate nonparametric functions of covariates [12–14]. Considering the useful features of the Bayesian approach and its related

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1723–1737

1723

Department of Statistics, Chinese University of Hong Kong, Shatin, NT, Hong Kong *Correspondence to: Xinyuan Song, Department of Statistics, Chinese University of Hong Kong, Shatin, NT, Hong Kong. † E-mail: [email protected]

X. SONG, Z. LU AND X. FENG

1724

powerful sampling-based tools such as the Markov chain Monte Carlo (MCMC) techniques, Bayesian methods for estimating the univariate and bivariate nonparametric functions of (observed) covariates have been developed extensively [15–17, among others]. Bayesian P-splines [18, 19], a Bayesian approach to penalized splines, is particularly attractive because it flexibly fits unknown smooth functions using a large amount of basis functions with a simple penalty for differences between the coefficients of adjacent basis functions, and it simultaneously estimates smooth functions and smoothing parameters. For LVMs, limited work has been carried out on the nonparametric modeling of covariates and latent variables. Fahrmeir and Raach [20] proposed an LVM with a structural equation consisting of the nonparametric functions of observed covariates and spatial effects. However, the functional effects of explanatory latent variables on the outcome latent variables were not considered in their structural equation. Song and Lu [21] proposed a more general LVM that accommodates the nonparametric functions of covariates and explanatory latent variables in an additive way. They developed a modified Bayesian P-splines approach to deal with the difficulties of modeling the nonparametric functions of latent variables. Although they relaxed the parametric assumption on the univariate functional effects of explanatory latent variables, they did not consider the functional interaction effects of latent variables. Kundu and Dunson [22] proposed latent factor models for density estimation involving univariate nonparametric functions of latent variables. In their model, latent variables were used to characterize the distributions of observed variables through nonparametric transformation functions modeled by Gaussian processes. However, the latent variables there do not have concrete meanings. Hence, the purpose of introducing latent variables in [22] is different from ours in this paper. Moreover, Kundu and Dunson [22] did not consider the functional effects of explanatory latent variables on outcome latent variables. Given that latent variables are unobservable, estimating the multivariate (or even bivariate) nonparametric functions of them is challenging. Approaches developed in this research direction are rather limited. Exception lies in a few areas, including the machining learning literature where the multivariate nonparametric functions are modeled by Gaussian processes [23]. However, the nonparametric modeling formulation in the aforementioned work does not separate the main and interaction effects of latent variables, thereby lessening the explanation power of the model. Recently, Guo et al. [24] developed a Bayesian lasso method for semiparametric LVMs, in which the nonparametric relations at the structural level were represented by basis expansions, and the Bayesian lasso was applied for simultaneous estimation and model selection. In this paper, we propose a nonparametric LVM, in which the functional effects of explanatory observed and latent variables, as well as the functional interactions of explanatory latent variables on outcome latent variables, are modeled through an additive nonparametric structural equation. Given that the unobservable characteristic of latent variables induces particular difficulty in the nonparametric modeling and posterior computation, we develop a new MCMC method to estimate nonparametric functions, unknown parameters, and latent variables. We propose a modified Bayesian P-splines approach, along with a novel penalty, to approximate the bivariate interaction of latent variables. We utilize the deviance information criterion (DIC) to assess the necessity of univariate and/or bivariate nonparametric terms through a comparison of nonparametric LVMs and their parametric counterparts. Finally, we apply the proposed methodology to analyze a new medical data set and obtain meaningful results. This research is motivated by a study based on the Hong Kong Diabetes Registry, which was established in 1995 as part of a continuous quality improvement program with a weekly enrolment of 30–50 patients at the Prince of Wales Hospital in Hong Kong. Luk et al. [25] described the patient recruitment and characterization of the Registry in detail. A total of 10,129 diabetic patients were enrolled in the Registry from 1995 to 2007, with a cohort of 3222 patients with complete documentation of measurements for renal disease, diabetes, and cardiovascular disease included in this analysis. Diabetes is one of the main driving forces for the rising prevalence of chronic kidney disease (CKD) and end-stage renal disease (ESRD) [26]. Consistent evidence has shown that Asian diabetic populations have a higher risk of renal complications than their Caucasian counterparts [27]. Moreover, cardiovascular and renal diseases share common risk factors [28]. The main objective of this analysis is to investigate how cardiac function and glycemia level functionally influence renal outcome. Given that renal disease, diabetes, and cardiovascular disease may not be completely characterized by a single symptom (index), they are treated as latent traits and are measured through several indicators in this analysis. This kind of method, which treats complicated diseases as latent traits and measures them through multiple medical indices, has received much attention recently [29–31, among others]. In a preliminary study using a parametric LVM, we found that cardiovascular complication and diabetes interactively influence renal disease. Considering this interaction is of great interest in medical study, investigating further its pattern and Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1723–1737

X. SONG, Z. LU AND X. FENG

revealing the associated implication are of particular significance. In this study, we propose a nonparametric LVM to investigate the functional effects of cardiovascular function and diabetes, as well as their interaction effect on renal disease. The nonparametric interaction between cardiac function and diabetes discovered using our approach would be an interesting finding for understanding their joint effect on renal disease. The remainder of the paper is organized as follows. Section 2 defines the proposed model, in which the structural equation is formulated as additive univariate and/or bivariate nonparametric functions of explanatory observed and latent variables. Section 3 introduces a modified Bayesian P-splines approach for modeling the unknown functions. A novel penalty is proposed to manage the nonparametric bivariate functions. The MCMC sampling and related computational issues are presented in Section 4. Section 5 demonstrates the empirical performance of the proposed method via a simulation study. An application to the motivating example concerning the joint effect of cardiac function and diabetes on renal outcome is likewise reported. Section 6 concludes the paper. Technical details are provided in the Appendix.

2. Model description Let yi D .yi1 ;    ; yip /T be a random vector corresponding to the ith observation in a random sample of size n, !i D .!i1 ;    ; !i q /T be a random vector of latent variables, and p > q. The relationship between yi and !i is formulated through the following measurement equation: yi D Aci C ƒ!i C i ;

i D 1; : : : ; n;

(2.1)

where A is a p  r1 matrix of coefficients, ci is an r1  1 vector of fixed covariates, ƒ is a p  q factor loading matrix, i  N Œ0; ‰ is a p  1 residual random vector, which is independent of !i , and ‰ is a p  p diagonal matrix. The number of latent variables and the structure of ƒ are pre-specified, resulting in a confirmatory measurement equation. These specifications are chosen based on the nature of the observed variables and/or prior knowledge of experts to identify the model.  T To examine the interrelationships among latent variables, we partition !i into Ti ;  Ti , where i .q1  1/ and  i .q2  1/ denote the outcome and the explanatory latent vectors, respectively, and  i is assumed independent of i and distributed as N Œ0; ˆ   with a covariance matrix ˆ  . For notational simplicity, in what follows, we assume that q1 D 1. An extension to the case with q1 > 1 is straightforward. The functional effects of covariates and explanatory latent variables on the outcome latent variable i are characterized by the nonparametric structural equation X hul .i u ; i l / C ıi ; (2.2) i D g1 .xi1 / C    C gD .xiD / C f1 .i1 / C    C fq2 .i q2 / C u 0:

A vague prior distribution is assigned to b so that b is driven by data: b  Gamma.ab01 ; ab02 /;

(3.3)

where ab01 and ab02 are hyperparameters. A common choice is ab01 D 1 and ab02 D 0:005 [18, 20]. 3.2. Modeling univariate nonparametric functions of latent variables

1726

As discussed in [21], the traditional Bayesian P-splines approach (Section 3.1) cannot be applied directly to model f1 ;    ; fq2 of latent variables. The main reason is that the B-splines, Bkx ./, k D 1; : : : ; Kb , are defined in a finite interval, which is usually taken as the range of the observed covariates. However, the latent variables are unobservable, unscaled, and distributed in .1; 1/, and their observations are updated in each MCMC iteration. Hence, predefining finite ranges so that they can embrace the ranges of latent variables in every MCMC iteration and determining the positions of the knots beforehand are difficult. Technically, assuming latent variables distributed in a finite interval is a possible solution [22]. However, it causes difficulty in interpreting the meaning of latent variables, which usually represent unbounded latent constructs in LVMs. To solve the aforementioned problems, Song and Lu [21] proposed the use of natural cubic splines with an unknown scale parameter to model each fj , j D 1; : : : ; q2 . However, their method needs to carefully specify the prior distribution of the scale parameter and requires an extra step in the posterior simulation, which produces a less efficient MCMC algorithm and requires extra effort in the sensitivity analysis to the prior specification. In this paper, we consider a different approach that introduces a ‘probit transformation’ of the explanatory latent variables. Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1723–1737

X. SONG, Z. LU AND X. FENG

For notational simplicity, j is temporarily suppressed. Let ˆ./ be the cumulative distribution function of N.0; 1/. The nonparametric function of latent variables, f , is modeled by f .i / D

Kˇ X

ˇk Bk .ˆ.i //;

(3.4)

kD1

where Kˇ is the number of splines determined by the number of knots in the range of ˆ.i /, ˇk are unknown coefficients, and Bk ./ is the B-spline basis function. The knots used to construct Bk ./ are placed uniformly in Œ0; 1. We use a linear combination of new basis functions to model f , which are composite functions of B-spline basis and probit transformation. Compared with that proposed in [21], the current approach has the following advantages: (i) ˆ.i / 2 Œ0; 1. Hence, the problems caused by the unobservable and unscaled nature of i in constructing the spline basis are circumvented. (ii) With the B-spline basis, the posterior covariance matrix of ˇ D .ˇ1 ; : : : ; ˇKˇ /T is a band matrix with small bandwidth, which qualifies for the fast MCMC algorithm in [35]. In contrast, the natural cubic splines used in [21] make the band width of the covariance matrix much larger, which leads to inefficient computation. (iii) The model is simpler because free-scale parameters are not introduced into the model. Similar to (3.1) and (3.2), a penalty is incorporated through the following prior on ˇ: r p.ˇjˇ / D

ˇ 2

Kˇ

n  o ˇ exp  ˇ T Mˇ ˇ ; 2

(3.5)

where Kˇ D rank.Mˇ /, ˇ is an unknown parameter that controls the amount of penalty, and the derivation of Mˇ is similar to Mb in (3.2). A vague prior Gamma.˛ˇ 01 ; ˛ˇ 02 / is assigned to ˇ , together with a common choice of ˛ˇ 01 D 1 and ˛ˇ 02 D 0:005 [20, 21]. 3.3. Modeling bivariate nonparametric functions of latent variables Modeling hul .i u ; i l / is the most critical issue in the current analysis. Marx and Eilers [14] and Lang and Brezger [18] proposed the tensor product B-splines approach for modeling the bivariate functions of (observed) covariates, which cannot be directly applied to handle latent variables because of their unobservable and unscaled nature as described in Section 3.2. Hence, we modify the tensor product B-splines as hul .i u ; i l / D

R K X X

ulkr Uuk .ˆ.i u //Vlr .ˆ.i l //;

(3.6)

kD1 rD1

where Uuk ./ and Vlr ./ are the B-spline basis functions constructed similar to Bk ./. We temporarily suppress the subscripts u and l in (3.6) for notational simplicity. Similar to univariate cases, a penalty is assigned to  D .11 ; : : : ; 1R ; : : : ; K1 ; : : : ; KR / to prevent overfitting. Several different versions of penalties have been proposed for estimating the bivariate functions of covariates [14, 18]. The ideas are to extend the random walk priors (3.2) to , which penalize/assign small prior probability to large differences among adjacent kr . The differences among these approaches lie in the definition of adjacency. An example is " K R r # " Y #   R Y K r Y Y 1k 2r 1k .kr  k;r1 /2 2r .kr  k1;r /2 exp  exp  ; (3.7) 2 2 2 2 rD2 rD1 kD1

kD2

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1723–1737

1727

which essentially assigns independent first-order random walk priors to each .k1 ; : : : ; kR / and .1r ; : : : ; Kr / with distinct smoothing parameters 1k and 2r , k D 1; : : : ; K, r D 1; : : : ; R. The idea of assigning independent random walk priors with distinct smoothing parameters can also be found in [36, 37]. For smoothing parameters 1k and 2r , a natural choice is to assign gamma prior (3.3) to them. In the context of estimating bivariate functions of latent variables, however, the prior distributions (3.7) may lead to a severely oversmooth estimation, especially when the sample size is not very large. This unappealing feature is illustrated through a simulation (the setting is the same as that presented in Section 5.1 with n D 1500 samples), where the true h12 .i1 ; i 2 / is shown in the left panel of Figure 1. The estimated h12 .i1 ; i 2 / is depicted in the center panel of Figure 1, which is severely oversmoothed, and the corresponding smoothing parameters 1k and 2r diverge to very large values. Similar results are

X. SONG, Z. LU AND X. FENG

Figure 1. The estimated bivariate functions of latent variables h12 .i1 ; i2 / D 5.ˆ.i1 /0:5/ cos.2ˆ.i2 // in the simulation study. The left panel is the true function. The center panel is the estimate based on prior distribution (3.7). The right panel is the estimate based on prior distribution (3.8).

acquired when (3.7) is replaced by the prior of  used in [18]. The explanations for this phenomenon are given later. Bayesian estimates are based on the combination of likelihood and prior information. The bivariate functions can be recovered when the information of likelihood is strong enough. However, two characteristics of the current problem cause the likelihood signal to be relatively weak compared to the prior information. First, the likelihood signal is weaker for latent variables than for covariates because of the randomness of latent variables. Second, curse of dimension substantially increases the prior information, which tends to dominate the posterior when the likelihood signal is not strong enough. For example, prior (3.7) is the product of K.R  1/ C .K  1/R independent normal distributions, which can be hundreds. One naive solution is to assign prior distributions for 1k and 2r so that smaller prior probabilities will be given to larger 1k and 2r . However, the values of the smoothing parameters are problem dependent and should be determined by the data. We also tried the half-Cauchy prior advocated by [38, 39]. We found that the half-Cauchy prior could only partially address the problem because it still severely oversmoothed certain parts of the bivariate function. In this paper, we propose the following prior distribution for , which are shown to give superior performance than the aforementioned prior distributions: QK QR hq v1kr 1 kD1

rD2

2

n n oiQ Q hq oi v  .  /2 v2kr 2 .kr k1;r /2 R K v2kr 2 exp  1kr 1 kr2 k;r1 exp  ; rD1 kD2 2 2

l  Gamma.˛ 01 ; ˛ 02 /; for l D 1; 2; and vlkr  U.0; 1/; for all l; k; r; (3.8) where U.0; 1/ is uniform distribution on Œ0; 1. ˛ 01 D 1 and ˛ 02 D 0:005 can be used. Again, we use the preceding simulation to demonstrate the performance of (3.8), under which the estimated bivariate function of latent variables is depicted in the right panel of Figure 1. Clearly, the functional pattern can be correctly captured. The reason for the dramatic improvement is the inclusion of the local smoothing parameters vlkr that can place prior information adaptively. The impact of the prior distribution is reduced in informative regions of .i u ; i l / corresponding to large kr  k;r1 or kr  k1;r . The idea of local smoothing parameters was previously proposed in [18], in which chi-square distributions with 1 degree of freedom were assigned to vlkr . However, if chi-square distributions for vlkr are adopted for small kr  k;r1 and kr  k1;r , large vlkr are likely to be generated, which exacerbate the prior dominance phenomenon. By contrast, our proposed uniform distribution exerts less prior belief on smoothing parameters, which reduces the prior dominance, thereby leading to a better performance in estimating the bivariate function of latent variables.

4. Bayesian inference and MCMC algorithm 4.1. Prior distributions and posterior inference

1728

To implement a full Bayesian analysis, except those discussed in Section 3, we also need to specify prior distributions for unknown structural parameters, such as A; ƒ; ‰; ı , and ˆ  . We let ATj be the j th row of A, ƒTj be the vector that contains the unknown parameters in the j th row of ƒ, and j be the Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1723–1737

X. SONG, Z. LU AND X. FENG

j th diagonal element of ‰. According to the common practice in LVMs [3], we propose the following conjugate prior distributions. For j D 1;    ; p, Aj  N.Aj 0 ; † aj 0 /; 1 ı

ƒj  N.ƒj 0 ;

 Gamma.˛ı0 ; ˇı0 /; ˆ

 1

j † j 0 /;

1 j

 Gamma.˛j 0 ; ˇj 0 /;

 Wishart.R0 ; 0 /;

(4.1)

where ˛j 0 , ˇj 0 , ˛ı0 , ˇı0 , 0 , Aj 0 , and ƒj 0 , as well as positive definite matrices † aj 0 , † j 0 , and R0 , are hyperparameters whose values are assumed to be given based on the prior information. The Bayesian estimate of parameters is obtained by drawing observations from the joint posterior distribution of parameters and latent variables via MCMC tools such as the Gibbs sampler [40] and the Metropolis–Hastings algorithm [41, 42]. The MCMC procedure, the full conditional distributions, and the implementation of the Metropolis–Hastings algorithm are provided in online Appendix A. 4.2. Model comparison using DIC To facilitate the model comparison in the real data analysis, Bayesian model selection statistics, such as the Bayes factor [43] and the DIC [44], may be considered. However, the computation of the Bayes factor requires the calculation of the ratio of marginal likelihoods, which is extremely challenging for the current complex model. Although the path sampling procedure proposed by [45] has been demonstrated useful for computing the Bayes factor in the comparison of many statistical models [46], it cannot be directly applied to the current model because defining a linked model between the proposed nonparametric LVM and conventional parametric LVMs is difficult. In this paper, we adopt an index based on the DIC. This index considers a balance between the goodness-of-fit to the data and model complexity to seek for an appropriate model. The model with the smallest DIC-based index is chosen in the model comparison procedure. Song et al. [47] proposed a complete DIC that is applicable to our model with latent variables. We use  and  to include the parameters and latent variables in the model. The complete DIC is then defined as DIC D 4E; flog p.Y; j/jYg C 2E flog p.Y; jE ŒjY; /jYg;

(4.2)

where log p.Y; j/ is the complete-data log-likelihood function. For the proposed model, P log p.Y; j/ can be written as log p.Y; j/ D niD1 log p.yi ; !i j/, where log p.yi ; !i j/ D log p.yi j!i ; / C log p.i j i ; / C log p. i j/ 2 3   p p  2 1 X X 1 1 1 1 D 4 log. j /5 C   Ti ˆ   i  logjˆ  j yij  ATj ci  ƒTj !i  2 j 2 2 2 j D1 j D1 8 2 q2 K D K ˇj bd X X X X (4.3) 1 < C 4 bd k Bdxk .xid /  ˇj k Bj k .ˆ.ij // i  2 ı: j D1 kD1 d D1 kD1 3 )2 K X R XX 1  ulkr Uuk .ˆ.i u //Vlr .ˆ.i l //  log. ı /5 C constant: 2 rD1 u

Latent variable models with nonparametric interaction effects of latent variables.

Renal disease is one of the common complications of diabetes, especially for Asian populations. Moreover, cardiovascular and renal diseases share comm...
846KB Sizes 0 Downloads 0 Views