Linear mixed function-on-function regression models.

Biometrics 70, 794–801 December 2014

DOI: 10.1111/biom.12207

Linear Mixed Function-on-Function Regression Models Wei Wang Center for Outcomes Research, Children’s Hospital of Philadelphia, Civic Center Blvd, Philadelphia, Pennsylvania 19104, U.S.A. email: [email protected] Summary. We develop a linear mixed regression model where both the response and the predictor are functions. Model parameters are estimated by maximizing the log likelihood via the ECME algorithm. The estimated variance parameters or covariance matrices are shown to be positive or positive definite at each iteration. In simulation studies, the approach outperforms in terms of the fitting error and the MSE of estimating the “regression coefficients.” Key words:

ECME algorithm; Functional data analysis; Linear mixed effects models; Principal component analysis.

1. Introduction Functional data are observed in many research fields and various methods have been proposed, for instance (Cardot, Ferraty, and Sarda, 2003; Ramsay and Silverman, 2005; Reiss and Ogden, 2007; Crainiceanu, Staicu, and Di, 2009; Crambes, Kneip, and Sarda, 2009; Goldsmith et al., 2011; Randolph, Harezlak, and Feng, 2012). Easy to use statistical software packages include “fda” (Ramsay, Hooker, and Graves, 2009; Ramsay et al., 2013) and “refund” (Crainiceanu et al., 2013) of R (R Development Core Team, 2013), R functions at the companion website (www.math.univtoulouse.fr/staph/npfda) of (Ferraty and Vieu, 2006), and “PACE” package (www.stat.ucdavis.edu/PACE) of MATLAB (The MathWorks Inc., 2013). Among the studies, the regression model where both the response and the predictor are functions is of increasing interest. Methods have been developed to investigate their relationship by Ramsay and Silverman (2005), Yao, M¨ uller, and Wang (2005), Ferraty et al. (2011), Ferraty, Keilegom, and Vieu (2012), and Scheipl, Staicu, and Greven (2013). In this article, we propose a linear mixed modeling approach and study a data set from a biological experiment. Let X(·) be the predictor function of s with a finite interval domain, and let Y (·) be the response function of t with a finite interval domain. The response Y (t) depends on X(s) through β(t, s)X(s) ds, where the functional coefficient β describes the dependence structure of Y on X. The coefficient β(t, s) is assumed 2smooth and square integrable in the domain, that is, β (t, s) ds dt < ∞. The model is capable of studying data where the predictor is densely observed and the sample size is relatively small. It extends the scalar-on-function regression model when observations of the response are possibly correlated. Ferraty et al. (2011, 2012) propose a functional version of the Nadaraya–Watson estimate of the regression operator E(Y |X). The method is provided in the R function “ffunopare.knn.gcv” where a global bandwidth is selected by the cross validation criterion. In the approach, the estimate of Y

794

ˆ an estimate of β, is not available. One is obtained, but β, ˆ is through estimating the unknown coapproach to obtain β efficients in the expansion of β by basis functions. Let ϕp ’s and ψq ’s be two series of basis functions such as splines or eigenfunctions. A tensor product expansion of β follows as β(t, s) =

∞ ∞

bpq ϕp (t)ψq (s),

(1)

p=1 q=1

where the bpq ’s are the unknown parameters to be estimated. In practice, the numbers of basis functions are usually truncated as p ≤ P and q ≤ Q. ˆ by solving a funcRamsay and Silverman (2005) find β tional version of the normal equation which is derived from minimizing an integrated residual sum of squares. To use the corresponding R function “linmod” of the “fda” package, one has to specify the smoothing parameter values which are asˆ sociated with the penalties controlling the smoothness of β. Scheipl et al. (2013) develop a penalized regression model. The smoothing parameters are treated as variance component parameters and estimated by the restricted maximum likelihood method. The proposed method is implemented in the “pffr” function of the “refund” package. Yao et al. (2005) take basis ˆ using the eigenvalfunctions as the eigenfunctions and get β ues and eigenfunctions from smoothed covariance matrices of the X and the Y . The approach has nice asymptotic properties (see also Hall, M¨ uller, and Yao, 2008) and is provided by the “FPCreg” function of the “PACE” package. Peng and Paul (2009) observe that sometimes the estimated variance parameter of the error term in modeling X can be negative. In this article, we propose a linear mixed effects modeling of the function-on-function regression. Model parameters are estimated by maximizing the log likelihood via the ECME algorithm (Liu and Rubin, 1994). Under very mild conditions, we show at each iteration that the estimated variance parameters are positive and the estimated covariance matrix of the random effects is positive definite. The rest of the article is organized as follows. In Sections 2 and 3, we present the © 2014, The International Biometric Society

Linear Mixed Function-on-Function Regression Models model and the model fitting procedure. We focus on the balanced design case as the data to be analyzed in Section 4 are balanced. We compare performance of the developed approach (LMFF) with the approaches “ffunopare.knn.gcv” (FF), “pffr” (PFFR) and “FPCreg” (PACE) in data fitting. We also compare with the method proposed by Reiss and Ogden (2007) (FPCR) which is implemented in the R function “fpcr” of “refund.” The FPCR approach is proposed in a scalar-onfunction regression framework. To apply, we obtain the estimate of Y and β at each t separately. In terms of the mean absolute fitting error, our approach (LMFF) outperforms other methods. In Section 5, we conduct simulations to further compare performance of these approaches. In the settings we tried, the LMFF approach has comparable fitting error of X, smallest fitting error of Y , and most of the times smallest MSE of estimating β. In Section 6, we discuss possible directions for further investigation. Technical derivations are presented in Web Appendices A–C. 2. Model Using basis function expansions, we present modeling of the response and predictor processes in a linear mixed effects model framework. We then derive the model for the observed data in a vector/matrix form. Unknown “regression coefficients,” bpq ’s, are contained in the design matrix associated with the random effects. We show the model is identifiable and thus there are no distinct parameters producing the same model. 2.1. Processes Modeling Let φk ’s be a series of basis functions. We expand ∞the predictor Xi of the ith individual as Xi (s) = μx (s) + k=1 uik φk (s), where the fixed μx (s) represents the mean and k is truncated at K in practice. In the expansion, we treat the coefficients, uik ’s, random. As we will see in (4), the random coefficients correspond to the random effects in a linear mixed effects model. They can also be viewed as the functional principal component scores as in (Yao et al., 2005) when (i) the φk ’s are chosen as the eigenfunctions of the X process, and (ii) uik and uik are uncorrelated if k = k. Let xi be the observed trajectory of Xi contaminated with error i . We write

795

2.2. Observed Data Suppose the sampling points of the ith individual are sij , j = 1, . . . , ni , and til , l = 1, . . . , mi . The observed data are then xi = (xi (si1 ), . . . , xi (sini )) and yi = (yi (ti1 ), . . . , yi (timi )) . Data collected from the N individuals are thus {(xi , yi ), i = 1, . . . , N}. Let μx,i = (μx (si1 ), . . . , μx (sini )) and i = (i (si1 ), . . . , i (sini )) . From (2), we get xi = μx,i + Ai ui + i , Ai [j, k] = φk (sij ), ui ∼ (0, u ), i ∼ (0, σ2 Ex,i ).

Model of xi in (4) has a linear mixed model representation with fixed effects μx,i , known design matrix Ai , random effects ui with covariance matrix u , and error i . We assume the error i has a covariance matrix parametrized by σ2 where Ex,i is a known symmetric and positive definite matrix. Let μy,i = (μy (si1 ), . . . , μy (sini )) and ei = (ei (ti1 ), . . . , ei (timi )) . From (3), we have yi = μy,i + i BTui + ei , i [l, p] = ϕp (til ), B[p, q] = bpq ,

T[q, k] =

ψq (s) φk (s) ds, ei ∼ (0, σe2 Ey,i ),

(5)

where the covariance matrix of ei is parametrized by σe2 and Ey,i is also known, symmetric and positive definite. Similarly, yi in (5) has a linear mixed model representation but with a partially known design matrix i BT where the unknown B is to be estimated. Let zi = (xi , yi ). We obtain the joint model of xi and yi as

zi =

μz,i =

xi

= μz,i + Di ui + ei ,

yi μx,i μy,i

, Di =

Ai

i BT

, ei =

ui ∼ (0, u ) , ei ∼ (0, Ei ) , Ei =

K

xi (s) = μx (s) +

(4)

i ei

,

σ2 Ex,i

0

0

σe2 Ey,i

uik φk (s) + i (s) ≡ μx (s) + φ(s) ui + i (s),

. (6)

k=1

(2) where φ (s) = (φ1 (s), . . . , φK (s)) and ui = (ui1 , . . . , uiK ) . Let yi be the observed value of the response Yi with mean μy . Let yi depend on Xi through β as yi (t) = μy (t) + β(t, s) [Xi (s) − EXi (s)] ds + ei (t), where ei represents the error. By the expansion of β in (1) and the modeling Xi = μx (s) + φ(s) ui , we then write P

Q

yi (t) = μy (t) +

bpq ϕp (t)

ψq (s)φ (s)ds ui + ei (t).

p=1 q=1

(3)

The model of zi has unknown parameters {μz,i , u , σ2 , σe2 , B}. In the following, we take u as an unstructured covariance matrix. In the data analysis and simulations, we take Ex,i = Ini and Ey,i = Imi . We discuss the selection of other covariance matrix structures in Section 6. It follows that Cov(zi ) ≡ zi = Di u Di + Ei

=

Ai u Ai + σ2 Ex,i

Ai u T B i

i BTu Ai

i BTu T B i + σe2 En,i

.

We assume the random parts in the model are all mutually independent. As in standard linear mixed models, we also assume they are all normally distributed; for instance (Laird

796

Biometrics, December 2014

and Ware, 1982). Unknown parameters in the joint model of all individuals’ are then θ = {μz,1 , . . . , μz,N , u , σ2 , σe2 , B}. 2.3. Identifiability A multivariate normal distribution is uniquely characterized by its mean vector and covariance matrix. Since all the zi ’s are normally distributed, we only need to check if the parameters in the mean vectors and covariance matrices are identifiable. It is easy to see there is no μ∗z,i = μz,i giving the same mean vector of zi . As all the zi ’s have the same covariance parameters {u , σ2 , σe2 , B}, we need to check if they are identifiable in an arbitrary zi ’s model (Wang, 2013). We show in Web Appendix A that these covariance parameters are identifiable under the assumptions that T is invertible, and the Ai and the i are of full column rank. The conditions are easy to be satisfied as T is a low dimensional matrix with elements being the integration of basis functions, and usually we have ni K and mi P. 3. Model Fitting We present the model fitting procedure in the balanced design case. Extension to fit an unbalanced design is discussed in Web Appendix C. We have ni ≡ n, sij = sj , mi ≡ m, til = tl . It follows that {μz,i , Di , zi } are the same for all i. To save notation, we suppress the subscript i in the following. To fit the model, we first choose the basis functions φk ’s, ϕp ’s and ψq ’s contained in the design matrix D. In this article, we use the basis functions as the eigenfunctions estimated from the sample covariance matrices. Specifically, we use φ1 , . . . , φK in (2) equal to the first K estimated eigenfunctions from the sample covariance matrix of the xi ’s. As in (Yao et al., 2005), we use ϕ1 , . . . , ϕP in (1) equal to the first P estimated eigenfunctions of the yi ’s, and take ψj = φj and Q = K. We then treat the basis functions known and estimate the model parameters θ = {μz , u , σ2 , σe2 , B} by maximizing the log-likelihood of the observed data {zi ≡ (xi , yi ), i = 1, . . . , N}. To assess the model fit, we look at the fitted values of xi and yi based on the BLUP of ui .

evaluated at the previous estimate ˆθ. Expressions of the current parameter estimate are given in the following and details of the derivation are presented in Web Appendix B. In Web Appendix B, we also justify that the estimate of u is positive definite and the estimates of {σ2 , σe2 } are positive at each iteration if the starting values are. The estimate of u is u =

N 1 ˆ u|z . ˆ ui |zi μ ˆ ui |zi + μ N

(8)

i=1

Let ˆ x − Aˆ ˆ x − Aˆ ˆsx,i = (xi − μ μui |zi ) E−1 μui |zi ). x (xi − μ

(9)

We obtain σ2 =

N 1 1 ˆ ˆsx,i + tr[E−1 x Au|z A ]. nN n

(10)

i=1

Let “vec” be the operator which transforms a matrix into a vector by stacking the columns one underneath another. Let ⊗ be the Kronecker product which maps two arbitrarily dimensioned matrices into a larger matrix with special block structure. We have

vec(B) =

T

N

ˆ ui |zi μ ˆ ui |zi T μ

ˆ u|z T + NT

i=1

× vec

N

⊗

−1

E−1 y

E−1 μui |zi T y (yi − μy )ˆ

.

i=1

From (1), the estimate of β is then β(t, s) =

Q P

bpq ϕp (t)ψq (s) ≡ ϕ (t)Bψ(s).

p=1 q=1

3.1. Parameter Estimates Closed form of the estimate of μz can be obtained by maximizing the log-likelihood directly. From (6), it is clear that ˆz = z ¯. We use the EM type algorithm the MLE of μz is μ (Laird and Ware, 1982; Meng and Rubin, 1993) to estimate {u , σ2 , σe2 , B} and the overall estimation procedure is in the spirit of the ECME algorithm (Liu and Rubin, 1994). To use the algorithm, we treat {(zi , ui ), i = 1, . . . , N} as the complete data. In the E-step, given the previous estimate ˆθ, we calculate the conditional expectation of the log-likelihood of the complete data on the observed data. In the M-step, we maximize the conditional log-likelihood with respect to each of {u , σ2 , σe2 , B} given the others fixed at their previous or current estimates. The E-step calculation involves the following two terms

μui |zi ≡ E(ui |zi ) = u D −1 z (zi − μz ) ui |zi ≡ Cov(ui |zi ) = u − u D −1 z Du ,

(7)

ˆ y − BTˆ ˆ y − BTˆ μui |zi ) E−1 μui |zi ). Let ˆsy,i (B) = (yi − μ y (yi − μ We get σe2 =

N 1 1 ˆ ˆsy,i (B) + tr[E−1 y BTu|z T B ]. mN m i=1

4. Data Analysis We apply the LMFF approach to analyze a data set from an evolutionary biology experiment where mice were bred over generations. Researches have identified a negative age-specific correlation between the wheel running distance and the mouse body mass. We are interested to study the dependence of the wheel running distance, yi ’s, on the log body mass, xi ’s, simultaneously for all ages of the population of N = 38 mice. ˆ and Table 1 shows the fitting Figure 2 contains the plot of β errors for different approaches. Compared with other methods, the LMFF approach has smaller errors.

Linear Mixed Function-on-Function Regression Models

797

Table 1 Comparison of the mean absolute fitting errors of log body mass (x) and of wheel running distance (y) Fitting error (x)

Fitting error (y)

K

P

LMFF

PACE

PFFR

LMFF

PACE

PFFR

FF

FPCR

2 5

5 5

0.0223 0.0168

0.0301 0.0286

0.0435 0.0412

12.1925 7.7596

12.5731 11.5153

12.5449 11.4097

13.2169 13.1225

12.5520 11.6656

4.1. Experiment Selective-breeding experiments were performed on house mice (Mus domesticus) for generations for increased voluntary wheel-running exercise (Swallow, Carter, and Garland, 1998; Garland, 2003). Mice were placed individually in cages with a 11.75 cm radius running wheel and electronic wheel-revolution counter built into the cage-top. The mouse had the option of voluntarily getting into the wheel and running, or of remaining in the cage and not running. Once a week each mouse was weighed and its weekly wheel revolutions were downloaded from the counter device. Cages were cleaned weekly and running wheels were cleaned monthly. Monthly wheel running distance was obtained based on weekly wheel revolutions. Mice were selected for high voluntary wheel running activity at 8 weeks of age after 10 generations of selection (Swallow et al., 1998). A subsequent study, 14 generations after selection, examined wheel running and body mass in this system of mice after 8 weeks of access to running wheels (Swallow et al., 1999). The study at the younger ages (weeks 8–15) finds a negative genetic correlation between age-specific body mass and wheel running, and access to running wheels did result in a decrease in body mass. Morgan, Garland, and Carter (2003) conducted a detailed study on body mass and wheel running ontogenies in the 16th generation of this system of mice. The authors identified that selected mice showed a steeper decline in wheel running activity at older ages, agespecific body mass was a significant predictor for age-specific wheel running, and smaller individuals ran more than larger individuals. 4.2. Estimate of β The log body mass trajectories of n = 58 weeks are shown in the upper left panel of Figure 1. Overall, the body mass is an increasing function of age with a sharp rate at early weeks (

Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

Multiple testing correction in linear mixed models.

VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS.

The Dantzig Selector for Censored Linear Regression Models.

Imputing genotypes using regularized generalized linear regression models.

Multikernel linear mixed models for complex phenotype prediction.

Linear mixed effects models under inequality constraints with applications.

Use of generalized linear mixed models for network meta-analysis.

Spatial generalised linear mixed models based on distances.

Identifying genetically driven clinical phenotypes using linear mixed models.

Goodness of Fit Tests for Linear Mixed Models.

Fast Censored Linear Regression.

Multi-model inference using mixed effects from a linear regression based genetic algorithm.

Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements?

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors.

Standardizing effect size from linear regression models with log-transformed variables for meta-analysis.

Linear regression models and k-means clustering for statistical analysis of fNIRS data.

Genomic prediction based on data from three layer lines using non-linear regression models.

Censored linear regression models for irregularly observed longitudinal data using the multivariate- t distribution.

Interpretation of linear regression models that include transformations or interaction terms.

Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.

Significance tests to determine the direction of effects in linear regression models.

SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression.