A two-step super-Gaussian independent component analysis approach for fMRI data.

NeuroImage 118 (2015) 344–358

Contents lists available at ScienceDirect

NeuroImage journal homepage: www.elsevier.com/locate/ynimg

Full Length Articles

A two-step super-Gaussian independent component analysis approach for fMRI data Ruiyang Ge a,b,c, Li Yao a,b,c, Hang Zhang d, Zhiying Long a,b,⁎ a

State Key Laboratory of Cognitive Neuroscience and Learning& IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University, Beijing, 100875, China College of Information Science and Technology, Beijing Normal University, Beijing, 100875, China d Paul C. Lauterbur Research Centers for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China b c

a r t i c l e

i n f o

Article history: Received 31 October 2014 Accepted 15 May 2015 Available online 7 June 2015 Keywords: ICA fMRI Sparsity Super-Gaussian ATGP

a b s t r a c t Independent component analysis (ICA) has been widely applied to functional magnetic resonance imaging (fMRI) data analysis. Although ICA assumes that the sources underlying data are statistically independent, it usually ignores sources’ additional properties, such as sparsity. In this study, we propose a two-step superGaussianICA (2SGICA) method that incorporates the sparse prior of the sources into the ICA model. 2SGICA uses the super-Gaussian ICA (SGICA) algorithm that is based on a simplified Lewicki-Sejnowski’s model to obtain the initial source estimate in the first step. Using a kernel estimator technique, the source density is acquired and fitted to the Laplacian function based on the initial source estimates. The fitted Laplacian prior is used for each source at the second SGICA step. Moreover, the automatic target generation process for initial value generation is used in 2SGICA to guarantee the stability of the algorithm. An adaptive step size selection criterion is also implemented in the proposed algorithm. We performed experimental tests on both simulated data and real fMRI data to investigate the feasibility and robustness of 2SGICA and made a performance comparison between InfomaxICA, FastICA, mean field ICA (MFICA) with Laplacian prior, sparse online dictionary learning (ODL), SGICA and 2SGICA. Both simulated and real fMRI experiments showed that the 2SGICA was most robust to noises, and had the best spatial detection power and the time course estimation among the six methods. © 2015 Published by Elsevier Inc.

1. Introduction Functional magnetic resonance imaging (fMRI), with its ability to provide noninvasive characterization of cortical activation, has become a popular technique to investigate the neural mechanisms underlying various cognitive processes. Generally, fMRI data analysis methods can be divided into two classes: univariate methods and multivariate methods. One of the most popular univariate methods is the general linear model (GLM) (Friston et al., 1994) on a voxel-byvoxel basis to estimate the conformity of the measured voxel-wise signal to the hemodynamic response function (HRF). However, the univariate GLM approach does not consider interactions between voxels, and the shape of HRF varies across subjects, scans and brain regions. Accordingly, the univariate approaches are not able to fully reveal the information held in fMRI data. Complementary to univariate approaches, multivariate statistical methods such as principal component analysis (PCA), independent component analysis (ICA), factor analysis

⁎ Corresponding author at: State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University, Xin Jie Kou Wai Da Jie 19#, Beijing, 100875, China. Fax: +86 10 5880 9444. E-mail address: [email protected] (Z. Long).

http://dx.doi.org/10.1016/j.neuroimage.2015.05.088 1053-8119/© 2015 Published by Elsevier Inc.

(FA) and canonical correlation analysis (CCA) consider brain voxels or regions of interest as an integrated network and analyze them jointly (Calhoun and Adali, 2006; Ma et al., 2007). Independent component analysis (ICA) is a data-driven method that can recover a set of maximally independent sources from observed multivariate data without using prior information (Hyvärinen and Oja, 2000). The assumption of independence among the sources is revealed to be powerful and effective for a wide range of problems in various practical domains. Since its initial introduction to the field of fMRI analysis by McKeown et al. (1998), spatial ICA has been widely applied to reveal the spatially independent brain networks underlying various cognitive tasks and during the brain’s resting state (Calhoun and Adali, 2012; Ge et al., 2014, 2015; Long et al., 2009). Although ICA considers the spatial or temporal independence of the sources, it usually ignores additional source properties such as sparsity. Within the signal processing community, besides independence, sparsity is another commonly imposed assumption that arises from the principle of parsimony (Babaie-Zadeh et al., 2006; Bronstein et al., 2005; Chen et al., 1998; Georgiev et al., 2007; Mairal et al., 2010; Wright et al., 2010). For neural coding, the sparseness means that the number of activated neurons is much less than the total number of neurons in the population in one time period. The sparsity of the neural response was found in visual (Olshausen, 1996), auditory (Olshausen and Field,

R. Ge et al. / NeuroImage 118 (2015) 344–358

2004), olfactory (Perez-Orive et al., 2002), motor (Beloozerova et al., 2003) and memory (Wixted et al., 2014) systems in the brain. These biological findings of sparse coding in the brain suggest that sparsity may be more effective than independence in determining neural activity (Daubechies et al., 2009). Moreover, from the point of view of energy consumption, the sparse coding of the neurons are metabolically efficient (Olshausen and Field, 2004). Several studies estimating the energy required by signaling in cortical neurons concluded that the average firing rates are rather low and that the brain may have less than 1% of its neurons active concurrently at any given time (Attwell and Laughlin, 2001; Lennie, 2003). Moreover, a common characteristic of fMRI data is the voxels participating in each brain network usually occupy a very small percentage of the whole brain voxels. Thus, the spatial activation patterns of brain networks are sparsely distributed and have the superGaussian distributions. Recently, inspired by the success of using sparse representation for signal and pattern analysis in the machine learning and pattern recognition fields (Elad and Aharon, 2006; Kreutz-Delgado et al., 2003; Wright et al., 2009, 2010), sparse representation has been used in fMRI data analysis to help decode the functions of the human brain. The assumption underlying sparse representation is that each voxel’s fMRI signal is linearly composed of sparse components. Georgiev et al. (2007) proposed a sparse component analysis algorithm to identify the sources in fMRI data without the independent assumption. Li et al. (2009, 2012) applied sparse representation for detection of voxels in fMRI data with task relevant information (Li et al., 2009) and proposed a sparse representation-based multivariate pattern analysis algorithm to localize brain activation patterns corresponding to different stimulus classes/brain states (Li et al., 2012). A data-driven sparse GLM for fMRI analysis using sparse dictionary learning that was proposed by Lee et al. (2011) was demonstrated to be able to adapt to individual variation better than conventional ICA methods (Lee et al., 2011). More recently, Abolghasemi et al. (2015) proposed a fast incoherent Ksingular value decomposition (K-SVD) method to detect regions of brain activation (Abolghasemi et al., 2015). Ramezani et al. (Ramezani et al., 2015) proposed a joint sparse representation analysis (jSRA) method to identify common information across different functional subtraction (contrast) images in data from a multi-task fMRI experiment. An online dictionary learning (ODL) method (Mairal et al., 2010) was used by Lv et al. (2015) to derive the atomic basis dictionary for identification of functional networks (Lv et al., 2015). Unlike ICAbased methods, the above sparse representation methods do not require independence assumption of sources. Although the traditional ICA algorithms do not consider the sparsity of sources, there are some ICA algorithms that explicitly model source distributions. Both mixture of Gaussian (Attias, 1999; Davies and Mitianoudis, 2004; Todros and Tabrikian, 2007; Welling and Weber, 2001) and Laplacian distribution (Allassonniere and Younes, 2012; Bermond and Cardoso, 1999; Højen-Sørensen et al., 2002) were used as the probability density functions (pdf) of sources in these ICA algorithms. The sparseness-inducing nature of the Laplacian prior is well-known and has been exploited in many areas (Kotz et al., 2001). Accordingly, the ICA algorithms with Laplacian prior consider both the independence and sparsity of sources and are especially suitable to data with super-Gaussian sources. Moreover, Lewicki and Sejnowski (2000) presented an iterative algorithm for learning an overcomplete basis by viewing the algorithm as a probabilistic model of the observed data under the assumption of both the sparsity and independence of the sources. Lewicki-Sejnowski’s algorithm also assumed that the pdf of the underlying sources were the Laplacian distributions to achieve sparseness of the sources. We simplified the iterative algorithm as superGaussian ICA (SGICA) in the case of a complete basis, since sources with Laplacian distribution are super-Gaussian. Although the ICA algorithms with Laplacian prior and the algorithm based on Lewicki-Sejnowski’s model can extract the sparse and independent components from data, these methods use fixed Laplacian

345

distribution for all the sources of data. Actually, the sources underlying the fMRI data may have different pdfs and vary across different subjects. Thus, using a fixed Laplacian distribution in blind source separation may not provide an optimal source separation of fMRI data. Previous studies have pointed out that the performance of an ICA algorithm depends on the “true” pdf of each source. However, these pdfs are usually unknown in practical applications. It has been demonstrated that adaptively estimating the score functions (Vlassis and Motomura, 2001) or choosing nonlinear function (Hong et al., 2005) associated with the pdf of each source can improve the performance of the ICA algorithms. Accordingly, it is important for blind source separation to achieve source adaptivity through estimating the distributions of the sources from the data directly. In this study, we propose a two-step super-Gaussian ICA (2SGICA) approach that combines the independence with sparsity of sources and has better source adaptivity than SGICA. In contrast to most ICA methods with Laplaican prior that use the expectation–maximization (EM) algorithm (Allassonniere and Younes, 2012; Bermond and Cardoso, 1999; Højen-Sørensen et al., 2002), the SGICA method based on Lewicki-Sejnowski’s model for a complete basis (Lewicki and Sejnowski, 2000) uses a simpler gradient method for maximum likelihood estimation and has higher computational efficiency. Therefore, 2SGICA directly uses the SGICA algorithm that is based on LewickiSejnowski’s model to estimate the initial sources. The parameters of the Laplacian prior on each source are estimated adaptively based on the initial source’s pdf, which has been calculated from the first step of SGICA. A refitted Laplacian prior is then used for each source at the second step of SGICA. Moreover, the automatic target generation process (ATGP) for initial value generation and adaptive step size selection are used in 2SGICA to further improve the stability of the algorithm. We compared the performance of 2SGICA, SGICA, the two popular ICA algorithms (FastICA and InfomaxICA), the mean field ICA (MFICA) with Laplacian prior (Højen-Sørensen et al., 2002) and ODL (Mairal et al., 2010) using both simulated and real fMRI data. The results showed that 2SGICA outperforms SGICA, FastICA, InfomaxICA, MFICA and ODL in both the time course estimation and the spatial detection power at all noise levels. Moreover, 2SGICA showed faster computation speed compared to InfomaxICA and MFICA. 2. Theory In this section, we will first briefly introduce the ICA model in the context of fMRI. Then, the SGICA and 2SGICA methods are described. Finally, the initial values generation and adaptive step size selection algorithms used in 2SGICA are introduced. 2.1. The ICA model The spatial ICA model that has been widely used in fMRI data analysis can be expressed as X ¼ AS

ð1Þ

where X is N×V observed fMRI signal data, N is the number of fMRI scans, V is the number of voxels, A is the N×N mixing matrix, and S is the N×V source matrix. Each row of matrix S represents one of the spatially independent components and each column of matrix A represents the time course of the corresponding independent component. The goal of spatial ICA is to model the data as a mixture of underlying components that are maximally spatially independent of each other. Conventional ICA algorithms usually achieve the ICA decomposition by estimating an unmixing matrix W = A−1 such that y = WX is an estimation of S (Hyvärinen and Oja, 2000). So far, InfomaxICA and FastICA are the two most popular algorithms that are applied to fMRI data analysis (Esposito et al., 2002).

346


2.2. Super-Gaussian ICA (SGICA)

2.3. Two-step super-Gaussian ICA (2SGICA)

As overcomplete coding can be viewed as a generalization of ICA, the SGICA method in this study was a simplified overcomplete representation approach that was developed by Lewicki and Sejnowski (Lewicki and Sejnowski, 2000). For the linear model in the Eq. (1), the mixing matrix A is a nonsquare matrix in the case of an overcomplete basis and A is a square matrix in the case of a complete basis. SGICA is based on the linear model with complete bases. According to the ICA model in Eq. (1), the data vector x = (x1, …, xN)T can be modeled as x = As. It is assumed that the elements of the vector s = (s1, …, sN)T are statistically independent, we have p(s) = ∏N n = 1p(sn). The estimation of the mixing matrix A can be converted to solve the optimization problem by maximizing the likelihood of the data X given A, i.e., max(logp(X|A)). It is almost always assumed in applying the maximum likelihood method that the observations xk are statistically independent of each other: p(X|A) = ∏Vk = 1p(xk|A). The objective function logp(X|A) can be approximated by Eq. (2) (Lewicki and Sejnowski, 2000):

For SGICA, the parameters θ and μ of the Laplacian prior are set the same for all of the sources, that is θ = 1 and μ = 0 as in previous studies (He et al., 2008; Lewicki and Sejnowski, 2000). Since the probability density of each source is generally different across sources and across subjects, using the same Laplacian prior to approximate the pdfs of all of the sources may degenerate the performance of SGICA. The central idea of 2SGICA is to adaptively estimate the parameters of the Laplacian distribution based on the true pdf of each source that can be calculated from the first-step SGICA. Thus, 2SGICA can be performed with the following steps: 1) SGICA for initially statistically independent sparse source estimation; 2) determination of the probabilistic density of these sources and estimation of parameters of the Laplacian distribution for each source; 3) SGICA using the optimal parameters of the Laplacian distribution for the final separation. After the first-step SGICA, the source estimation can be obtained by ~ −1 x , where ~s ¼ ð~s1 ; …~s ; …; ~sN ÞT . For a set of samples using ~s ¼ A i ~si1 ; …; ~si j ; …; ~siV that is drawn from the random variable ~si , The pdf of ~si can be calculated by the kernel estimator (Silverman, 1986):

VN 1 VN logpðXjAÞ ¼ log∏ pðxk jAÞ≈ þ log log2π 2 2 2 2πσ k ð2Þ XV 1 1 2 logp ð s Þ− ð x −As Þ − ð Þ ; log detH s þ k k k k 2 k¼1 2 2σ where σ 2 is the noise variance and H is the Hessian of the log posterior at sk, and P(sk) is the probability of the k-th source sample. A learning rule can be obtained by differentiating logP(x|A) respect to A (Lewicki and Sejnowski, 2000), i.e., ∇logpðxjAÞ ¼ ∇logpðsÞ−

1 1 ∇ðx−AsÞ2 − ∇log detHðsÞ 2 2σ 2

ð3Þ

It is known that priors with high kurtosis can be used to achieve the sparse solution. Thus, the SGICA method assumed that the probability density of sn is approximate to the Laplacian distribution in Eq. (4): pðsn Þ∝θe−θjsn −μ j ;

ð4Þ

where θ is the scale parameter and μ is the location parameter for source sn. In the case of a Laplacian prior on sn: logpðsÞ ¼ logð∏n pðsn ÞÞ ¼

X n

logðpðsn ÞÞ ¼ Nlogθ−θ

XN n¼1

jsn −μ j:

ð5Þ

Because logp(s) is not a smooth function, the cosh function in Eq. (6) was used to approximate p(sn): −βθ

pðsn Þ∝ cosh

ðβðsn −μ ÞÞ:

ð6Þ

For a large β, this function approximates the true Laplacian prior while staying smooth around zero. By using this approximation, z(sn) can be expressed as: zðsn Þ≈−θ tanhðβðsn −μ ÞÞ:

ð7Þ

The final learning rule can thus be derived as (Lewicki and Sejnowski, 2000):

A ¼ Aþ λΔA ΔA ¼ −A zðsÞsT þ I

ð8Þ

where ΔA is the natural gradient of the objective function (2), λ is the step size of the learning rule, I is the identity matrix, z(s) = (z(s1), …, z(sN))T and zðsn Þ ¼ ∂s∂n logðpðsn ÞÞ, n = 1,2,…,N.

f ðωÞ ¼

ω−~si j 1 XV ; K j¼1 h Vh

ð9Þ

where ~si j is the j-th voxel value for i-th source, V is the number of voxels, and h is the window bandwidth that was chosen as 1000 equally spaced points covering the range of data. A Gaussian kernel function was chosen as the density estimator. K(ω) with ω is a random variable ∞ that satisfies ∫+ − ∞K(ω)dω = 1. After the pdfs of the sources were estimated, the pdf of each source was fitted to the Laplacian function in Eq. (4) using an iterative nonlinear least-squares method (Dennis, 1977). The parameters θ and μ of the Laplacian prior can be estimated for each source. Subsequently, a new SPICA procedure was implemented by using the estimated parameters for each source. In the second step SGICA, the initial values of s and A can be set as ˜s and Ã, which are estimated from the first step SGICA. Fig. 1 shows the entire procedure of 2SGICA. 2.4. Initial values generation Like FastICA and InfomaxICA, the initial values of the matrix A were set as random values in SGICA (He et al., 2008; Lewicki and Sejnowski, 2000), which may result in unstable results. To avoid instability, we employed ATGP (Chang et al., 2011) to generate fixed initial values based on the observation data X for the SGICA and 2SGICA methods in the present study. ATGP can be regarded as an unsupervised and unconstrained orthogonal subspace projection (OSP) technique and is able to find targets by performing a succession of OSPs that are specified by Eq. (10): # P⊥ U ¼ I−UU

ð10Þ

where U# = (UTU)−1UT is the pseudo-inverse of U. The projector P⊥ U maps the observed data X into the orthogonal complement of b UN that can be designated by b U N ⊥. The details of this algorithm implementation are given below. a) Let k = 0 and N be the number of the target vectors, i.e., the number of sources in this study. Find the first vector t0 in X that satisfies: n h io t0 ¼ arg maxx diag XT X ; set U0 = [t0].

ð11Þ


347

Fig. 1. Framework of the 2SGICA approach.

b) Let k = k +1 and apply P⊥ t0 in Eq. (10) to all of the vectors of X. c) Find the k-th target tk which has the maximum orthogonal projection at the k-th stage as follows:

to the previous step size and the present error estimation, we define the adaptive step size as: 2

λðkÞ ¼ αλðk−1Þ þ ð1−α ÞρϕðkÞ

T

⊥ ; X P X tk ¼ arg maxx P⊥ ½Uk−1 tk ½Uk−1 tk

ð12Þ

d) Set Uk = [Uk − 1tk], 1 ≤ k ≤ N − 1. e) If k b N, then go to step b). Otherwise, the algorithm is terminated. The vectors t0,…,tk,…,tN− 1 are considered as the desired target vectors. These vectors form a matrix UN = [t0, …, tk, …, tN − 1] that can be used as the initial values of the ICA algorithm. 2.5. Adaptive step size selection A natural gradient-based learning algorithm in Eq. (8) was used in SGICA. For the SGICA and 2SGICA methods, the learning step size plays a key role in the tradeoff between convergence speed and final accuracy. One simple way to balance this tradeoff is to define the step size by an inverse proportional function or an exponential decline function of the iteration number, but this method has no tracking ability in a time-varying environment (von Hoff and Lindgren, 2002). In this study, we employed an adaptive procedure to select the optimal step size of the learning algorithm (Tang et al., 2014; von Hoff and Lindgren, 2002). Generally, the principle of an adaptive selection of the step size demands that the step size be large when the separated sources are far from the optimal ones and that be small when the separated sources are near the optimal ones. As the optimal sources are unknown in advance, it is impossible to control the step size with the actual distance (error) between the separated sources and optimal ones. Thus, we constructed the following error estimation based on the learning process. From Eq. (8), it can be seen that the difference in the mixing matrix A between the adjacent iteration is δA = A(k) − A(k − 1) ≈ 0 when the algorithm attains convergence, which means that δA ∝ ΔA ≈ 0. We constructed error estimation matrix E(k) as E(k) = ΔA = − A[z(S)ST + I], where k represents the k-th iteration of the algorithm. This error estimation matrix is related with the distance between the estimated and optimal sources. Furthermore, a smoothed form of error estimation matrix can be defined by Eq. (13) (Tang et al., 2014)

ð14Þ

where α is a forgetting factor, and ρ is a scaling factor. In the present study, we set α = 0.998 and ρ = 0.002 according to a previous study (Tang et al., 2014). To guarantee the stability of the algorithm, the range of the step size is set as 0 b λ ≤ λ0, where λ0 is the initial value of the step size. The final 2SGICA algorithm is listed in the following. a) b) c) d) e)

Center the observed signal x; Reduce the dimension of the data x using PCA; Whiten the reduced data; Initialize the mixing matrix A using ATGP; Set initial values for θ and μ as: θn = 1 and μn = 0 (n = 1,…,N) for all N sources; f) Set β = 50000 and the initial values for the step size λ = 0.15; g) For the k-th iterative step: (1) Compute s = A−1x where s = (s1,…,sN)T; (2) Compute z(sn) ≈ − θn tanh(β(sn − μn)), n = 1,2…,N; (3) Compute ΔA = − A[z(s)sT + I], where z(s) = (z(s1), … z(sn) …, z(sN))T, n = 1,2…,N; (4) Update the mixing matrix A by A = A + λΔA; (5) Set E(k) = ΔA; if k = 1, set Ē(k) = E(k); else Ē(k) = (1 − λ)Ē(k − 1) + λE(k); (6) Compute ϕ(k) = maxi,j|Ēij(k)|; (7) Adaptively update the step size λ with λ = 0.998 × λ + 0.002 × (1 − 0.998) × ϕ (k)2;

Repeat the above steps until convergence ||ΔA|| b 5 × 10−6. When the iteration stops, the mixing matrix Ã is estimated and the source estimation ~s is obtained using ˜s ¼ Ã − 1 x; h) Estimate the pdf of each source according to kernel estimator (9); i) Fit the pdf of each source to the Laplacian function, i.e., pðsn Þ∝θe−θjsn −μ j, by using an iterative nonlinear least-squares method, and thus obtain the value of θn and μn for the n-th source; j) Go to step g) by using θn and μ n values obtained in step i), and use Ã obtained in the first step SGICA as the initial value until ||ΔA|| b 10−6.

3. Materials and methods EðkÞ ¼ ð1−λðk−1ÞÞEðk−1Þ þ λðk−1ÞEðkÞ:

ð13Þ

In the present study, we take the largest element of the absolute value of the smoothed error estimation matrix as the final error estimationϕðkÞ ¼ max jEi j ðkÞj. The selection of the present step size is relevant i; j

In this section, we evaluated the feasibility and robustness of 2SGICA using simulated and real fMRI data. Moreover, the performance of 2SGICA was compared with SGICA, FastICA, InfomaxICA, MFICA and ODL. The termination criteria were set to ||ΔW|| b 10−6 for InfomaxICA, FastICA and ODL, and ||ΔA|| b 10−6 for SGICA and 2SGICA. The maximum

348


number of iterations of E-step and M-step were set to 100 and 50 that were the default values in the MFICA algorithm. Furthermore, we set β = 50000 and the initial step size λ0= 0.15 in the 2SGICA and SGICA algorithms. The use of these values for β and λ0 was validated in the following simulation. The core InfomaxICA, FastICA, MFICA and ODL algorithms were downloaded from the Internet (InfomaxICA: http:// scdcn.ucsd.edu/~scott/ica.html; FastICA: http://research.ics.aalto.fi/ica/ fastica/; ODL: http://spams-devel.gforge.inria.fr/; MFICA: http://cogsys. imm.dtu.dk/toolbox/ica/). The 2SGICA and SGICA were implemented in Matlab R2009b (MathWorks). In this paper, all the computations were performed on a computer with Intel(R) Core(TM) i7-4770, CPU 3.40 GHz, and RAM 4GB. 3.1. Simulated experiment 3.1.1. Generation of simulated data The simulated datasets with different contrast-to-noise ratio (CNR) levels were generated using the simulation toolbox SimTB (http:// mialab.mrn.org/software/simtb/index.html) (Erhardt et al., 2012). We generated 20 spatial maps, each of which had 270 × 270 voxels with a baseline intensity of 800 (see Fig. 2A). For each spatial component, percent signal change is centered at 3 with a standard deviation of 0.25. It was assumed that the simulated experiment included one task and that the task induced one task-related spatial component. The entire session lasted 180-s and consisted of four 20-s task blocks that alternated with five 20-s rest blocks with a repetition time (TR) of 2 s. We further supposed that the simulated data consisted of 20 independent components and that the task activated the spatial map of the sixth component (as shown in apple green in Fig. 2A). The simulated fMRI response that was added to the task-related spatial map was derived from a convolution of the stimulus paradigm with the HRF (see Fig. 2B). Moreover, the other 19 spatial components had unique events that occurred with a probability of 0.2 at each TR and unique event modulation coefficients that were equal to 1. The unique events refer to unexplained deviations that are unique to each source and subject (Erhardt et al., 2012). The simulated fMRI responses that were added to the other 19 spatial maps were derived from the convolution of the unique events

Fig. 2. Twenty simulated sources. (A) Activated regions of the twenty sources. (B) Time course of the sixth source.

with HRF. Rician noise was added to each dataset relative to a specified CNR. The CNR level of the simulated datasets varied from 0.40 to 2.20, with an increment of 0.20. For each CNR level, 20 simulated subjects were generated. 3.1.2. Single-subject analysis The dimensions of each dataset were reduced to 20 by PCA before 2SGICA, SGICA, FastICA, InfomaxICA, MFICA and ODL were performed. The reason that we reduced the dimension to 20 was that each simulated dataset contained 20 sources. After dimension reduction, the six methods were applied to each dataset separately. To identify the taskrelated component, the temporal correlation between the time course of each component and the reference function was calculated. The reference function was derived from the convolution of the task paradigm with the HRF. The component with the highest temporal correlation coefficient was selected as the task-related component. The receiver operation characteristic (ROC) analysis (Skudlarski et al., 1999) was applied to compare the spatial detection power of different methods. The area under the ROC curve can be used to evaluate the accuracy of the method. A larger area under the curve indicates that the method is more accurate. For each dataset, the ROC areas of the task-related components that were extracted by the six methods were recorded. Moreover, the mean ROC area across 20 subjects at each CNR level was calculated for each method. Furthermore, the computation time of each dataset was also recorded for each algorithm. 3.1.3. Multi-subject analysis Multi-subject ICA has become a standard strategy for evaluating hidden spatiotemporal structures contained in fMRI data for a group of subjects (Calhoun et al., 2001). Among the various group ICA approaches (Beckmann and Smith, 2005; Calhoun et al., 2001; Esposito et al., 2005; Svensén et al., 2002), the temporal concatenation implemented in the GIFT software (http://icatb.sourceforge.net/) is popular and easy to integrate with 2SGICA, SGICA, ODL and MFICA for group analysis. To compare the performance of all the six methods in multi-subject analysis, the simulated datasets of 20 subjects at a specific CNR level (CNR=1.0) were used. Two data reduction steps were performed on the datasets of 20 subjects by using PCA. In the first PCA stage, the functional data of each subject were reduced to 20. After concatenation across subjects, the data were again reduced to 20 through the second PCA stage. For the InfomaxICA, FastICA and ODL algorithms, decomposition processes were repeated 50 times using ICASSO toolbox (Himberg et al., 2004) to obtain the stable results. Due to the large computation time cost of MFICA, decomposition of MFICA was not repeated 50 times. Fixed initial values that were generated by ATGP were used in the 2SGICA, SGICA and MFICA methods. The same results can be acquired for the same datasets no matter how many times 2SGICA/ SGICA/MFICA were performed. Thus, 2SGICA, SGICA and MFICA only need to run once. Following the estimation of each method, backreconstruction was performed to reconstruct single subject maps and time courses. To identify the task-related component, the correlation coefficient of the mean time course of each component over 20 subjects with the reference function was calculated. Only the component with the highest correlation coefficients was selected. The ROC area of each subject’s task-related component was recorded. For each method, the mean ROC area and temporal correlation coefficient across the 20 subjects were calculated. To further examine the differences in the performance between 2SGICA, SGICA, FastICA, InfomaxICA, MFICA and ODL, nonparametric Friedman tests were used to test for differences between the six methods on temporal correlation coefficients and ROC areas. If the Friedman test showed significant differences between these methods, post-hoc Wilcoxon tests were conducted to assess any significant differences between any two pairs of these methods. Furthermore, we recorded the computation time for each decomposition to compare the computational speed of each algorithm.


3.1.4. Determination of the parameters β and λ0 In this simulation, 200 simulated datasets with different CNR (20 simulated subjects × 10 CNR levels) that were generated in the above simulation were used. The two parameters to be determined were parameter β in Eq. (6) and the initial step size λ0. The parameter β controls the similarity between the cosh function and the Laplacian function. In contrast to a large β, a small β leads to smoother shape and a larger difference between the Laplacian and cosh functions. For a large β, the θk

−

function cosh β ðβðsk −μ k ÞÞ approximates the true Laplacian prior p(sk) in Eq. (4). However, the value of β cannot be excessively large because excessively large value could result in the overflow of Matlab. Moreover, the choice of the initial step size λ0 affects the trade-off between the convergence speed and steady-state error of the algorithm. The simulated experiments were conducted to determine the optimal values of these two parameters. We first set the value of β as a large number, i.e., 50000, and sought the optimal value of λ0 given this β value. The value of λ0 was varied empirically from 0.01 to 0.45, with an increment of 0.01 from 0.01 to 0.10 and an increment of 0.05 from 0.10 to 0.45. 2SGICA using a specific λ0 was applied to each dataset. The ROC area and temporal correlation of the task-related component were recorded. Here we define a score that is a weighted sum of the ROC area and temporal correlation coefficient expressed in Eq. (15) for each estimated task-related component. SC ¼ ξROC þ ð1−ξÞTC;

ð15Þ

where SC is a measure of the estimation accuracy of the task-related component. ROC means the ROC area corresponding to the task-related component and TC means the temporal correlation coefficient between the task-related component and the reference function. The parameter ξ is bounded between 0 and 1 and can be defined according to specific contexts. In this study, we set ξ = 0.5, which implies an equal weighting of 50% spatial and 50% temporal accuracy measure. The means of the SC values across 200 datasets were calculated for each λ0. The parameter λ0 with the highest SC was selected as the optimal value. Then, we varied β from 10000 to 55000 with an increment of 5000 by setting λ0 as the optimal value to investigate the performance of 2SGICA in the case of various β values. The ROC area and temporal correlation of the task-related component were recorded. The means of the SC values across 200 datasets were calculated for each β. 3.2. Real fMRI data experiment A real fMRI experiment was performed to further demonstrate the feasibility of the proposed method and to compare the performances of FastICA, InfomaxICA, ODL, MFICA, SGICA and 2SGICA. 3.2.1. Data acquisition 1) Finger tapping task dataset. Twenty-six right hand-dominant college students with normal vision were recruited from Beijing Normal University. The participants included 14 females and 12 males (mean age: 23 years ±2 years).They had no histories of neurological disorders, psychiatric disorders, experience with typewriters, or any experience learning to play musical instruments. Moreover, they gave written informed consent according to the guidelines set by the MRI Center of Beijing Normal University. The entire experimental session lasted 270-s and was composed of four 30-s task blocks alternated with five 30-s rest blocks. Before scanning, all of the participants were instructed that from their index to little finger, each of the four fingers of their right hand represented a single digit number: one, two, three and four. Participants were required to tap the sequence 4-2-3-1-3-4-2 with their right hands during the task blocks. During rest blocks, they were instructed to maintain fixation through the whole scanning session.

349

Brain scans were performed at the MRI Center of Beijing Normal University using a 3.0-T Siemens whole-body MRI scanner. A single-shot T2*-weighted, gradient-echo, EPI sequence was used for functional imaging acquisition with the following parameters: TR/TE/flip angle = 3000 ms/40 ms/90°, acquisition matrix = 64×64; field of view (FOV) = 240 mm, and slice thickness = 5 mm with no interslice gap. Thirty-two axial slices parallel to the AC-PC line were obtained in an interleaved order to cover the entire cerebrum and cerebellum. 2) Resting state dataset . Resting state fMRI data from nineteen healthy subjects (nine males, ten females, mean age: 24±4 years), taken from the Newark dataset (TR=2s; # slice=32; # time points=135) from the publicly available 1000 Functional Connectomes Project at www. nitrc.org/projects/fcon_1000/, were used for this study. During the resting-state fMRI scan, subjects remained relaxed with their eyes closed for 270 seconds. 3.2.2. Preprocessing The serial functional images of each participant were corrected for slice-timing, realigned, spatially normalized into the standard Montreal Neurological Institute template space, resliced into 3 mm × 3 mm × 4 mm voxels, and spatially smoothed with an 8 mm × 8 mm × 8 mm Gaussian kernel by SPM8 (Statistical Parametric Mapping; http:// www.fil.ion.ucl.ac.uk/spm) software. 3.2.3. Data analysis 1) Finger tapping task dataset. The six methods were performed separately on the multi-subject fMRI data. All of the parameters for these algorithms were identical to those that were used in the simulation. Prior to group analysis, a two-step PCA was used to reduce data dimensionality, and the number of components was set to 24 according to the minimum description length (MDL) criteria (Li et al., 2007). For the InfomaxICA, FastICA and ODL algorithms, decompositions were repeated 50 times using ICASSO toolbox to obtain the stable results. For 2SGICA, SGICA and MFICA algorithms, the ICA decompositions run only once. Following the decomposition and back-reconstruction, single subject maps and time courses were reconstructed. To identify the task-related component, the correlation coefficient of the mean time courses of each component over 26 subjects with the reference function was calculated. The component with the highest correlation coefficient was selected as the task-related component. The subsequent group analyses of the task-related components obtained by the six methods were conducted using the one-sample t-tests in SPM8 to identify the brain regions that were engaged in the task-related component. Additionally, the general linear model (GLM) analysis was applied to each dataset. The group datasets were analyzed using a random effects model. A one-sample t-test was performed to identify the brain regions that were activated by the task compared to the rest condition. All of the statistical results from the GLM analysis and the four ICA methods were corrected for multiple comparisons via a false discovery rate (FDR) (Chumbley et al., 2010) at pb0.05. Furthermore, quantitative evaluations of the results obtained with these methods were conducted. We supposed that the activation pattern that was estimated by GLM was true and employed the GLM result as a suitable benchmark for a ROC-based comparison. This strategy has been applied in several previous studies (Aragri et al., 2006; Lu et al., 2004; Wang, 2009). For each method, the ROC area of each subject’s task-related component was computed using the activated regions detected by GLM as the true activation. Then, the correlation coefficient of the time course of each subject’s task-related component with the reference function was also calculated. Nonparametric Friedman tests were used to test for differences between the six methods on ROC areas and temporal correlation coefficients. If the Friedman test showed significant differences between these methods, post-hoc Wilcoxon tests were conducted to assess any significant differences

350


between any two pairs of these methods. Moreover, the computation time of each decomposition was recorded. 2) Resting state dataset. Similar to the processing that was applied to the dataset of the finger tapping task, FastICA, InfomaxICA, ODL, MFICA,SGICA and 2SGICA methods were performed on the multisubject resting-state fMRI dataset respectively. Among various resting brain networks, default mode network (DMN) is the most important one and has been studied extensively. Therefore, we mainly investigated DMN in this study. After decomposition and back-reconstruction for each method, the DMN component of each subject was identified using the Goodness of Fit method. For selection of the best-fit component, we used the region of interest (ROI) coordinates from an independent research (Fransson and Marrelec, 2008) to construct the DMN template. These ROIs include precuneus/posterior cingulate cortex (pC/PCC), left and right lateral parietal cortex (lLPC and rLPC), dorsal and ventral medial prefrontal cortex (dMPFC and vMPFC), left and right medial temporal lobe (lMTL and rMTL), left and right lateral temporal cortex (lLTC and rLTC).The Goodness of Fit of each component with the DMN template was calculated by taking the average z-score of voxels falling within the template minus the average zscore of voxels outside the template for each run of each subject (Greicius et al., 2004). The mean Goodness of Fit of each component across all subjects was obtained. The component with the greatest mean Goodness of Fit can be selected as the DMN component. The subsequent group analyses of the DMN components obtained by the six methods were conducted using the one-sample t-tests in SPM8 to identify the brain regions that were engaged in the DMN. In order to quantitatively evaluate the results obtained by each method, we constructed DMN template that were based on the anatomical structures by using WFU (Wake Forest University) Pickatlas toolbox (Maldjian et al., 2003) together with the AAL (Automatic Anatomical Labelling) atlas (Tzourio-Mazoyer et al., 2002). The AAL atlas structures included pC/PCC, bilateral LPC, dMPFC, vMPFC, bilateral MTL and bilateral LTC (Fransson and Marrelec, 2008). We then computed the number of activated voxels inside these important DMN nodes to compare the difference between the DMNs that were extracted by the six methods. 4. Results 4.1. Simulated experiment 4.1.1. Single subject analysis Fig. 3 displays the variation of mean ROC area, temporal correlation of the task-related component that was separated by the six methods and the computation time with the CNR levels. It is clear that 2SGICA produced the highest ROC area and temporal correlation at all of the CNR levels among the six methods, indicating that 2SGICA’s performance was superior. Because FastICA had the smallest ROC area, it showed the worst spatial detection power at all of the CNR levels. The spatial detection power of SGICA is better than FastICA and MFICA and worse than InfomaxICA and ODL. The temporal correlation of the taskrelated component that was estimated by ODL is the smallest. The computation time results (Fig. 3C) show that FastICA converged with minimum computation time, while MFICA took much more time than the other methods at all CNR levels. 2SGICA, SGICA and ODL were faster than InfomaxICA and slower than FastICA. 4.1.2. Multi-subject analysis Each individual’s ROC area and the temporal correlation of the task-related components that was estimated by the six methods are shown in Fig. 4A and C. The ROC area of 2SGICA was the highest for all subjects, while that of FastICA was the lowest. The mean value (± standard deviation) of the ROC area in descending order was 0.9741(± 0.0037), 0.9222(± 0.0052), 0.9127(± 0.0049),

Fig. 3. Simulated results of the single-subject analysis for the InfomaxICA, FastICA, ODL, MFICA, SGICA and 2SGICA methods. (A) The variation of the mean ROC area with the CNR level. (B) The variation of the mean temporal correlation coefficient with the CNR level. (C) Computation time of each decomposition for the six methods.

0.8949(± 0.0056), 0.8718(± 0.0061), and 0.8674(± 0.0062) for 2SGICA, InfomaxICA, ODL, SGICA, MFICA and FastICA respectively. The mean value (± standard deviation) of the temporal correlation in descending order was 0.9648(± 0.0028), 0.9646(± 0.0028), 0.9643(± 0.0028), 0.9637(± 0.0028), 0.9632(± 0.0028), and 0.9560 (± 0.0038) for 2SGICA, InfomaxICA, SGICA, MFICA, FastICA, and ODL respectively. For the ROC areas, the Friedman test showed significant differences between the six methods (χ2 = 100.00, p b 0.001). Moreover, the post-hoc Wilcoxon tests indicated that the difference of ROC area between any two of these methods was significant (pb0.001, see Fig. 4B). For the temporal correlation coefficients, the Friedman test also showed significant differences between the six methods (χ2 = 96.886, p b 0.001). The post-hoc Wilcoxon tests indicated that the difference between any two of these methods reached statistically significance (pb0.004, see Fig. 4D). All of the above results were Bonferroni corrected for multiple comparisons. Fig. 4E shows the computation time of each single decomposition in the multi-subject analysis. Since the fixed initial values are used in MFICA, SGICA, and 2SGICA, each ICA decomposition took the same computation time and the computation time of MFICA, 2SGICA and SGICA did not vary with the decomposition times. It can be seen in Fig. 4E that FastICA is the fastest algorithm and MFICA is the slowest algorithm. The computation time of MFICA is about two orders greater in magnitude than that of the other five methods. The 2SGICA, SGICA and ODL algorithms are faster than InfomaxICA and slower than FastICA.


The activation maps of the task-related components that were estimated by the six methods are presented in Fig. 5. During the generation of the simulated data, the task was added to the green regions in Fig. 2. From Fig. 5, it can be seen that the activated regions detected by the six methods covered all the green regions activated by the task. Except the true activation, all the six method detected some false activated regions. 2SGICA only detected some small false activation on the lower edge while the other five methods detected the false activations both on the edges and inside the circles. Obviously, 2SGICA produced the smallest false activations while ODL produced the largest false activation.

4.1.3. Determination of the parameters β and μ0 Fig. 6A displays the variation of the SC values with the increasing λ0 when β is set to 50000. It can be seen that the mean values of SC were very close when 0.08 ≤ λ0 ≤ 0.45. Moreover, the mean value of SC was the highest in the case of λ0 = 0.15. Thus, we selected 0.15 as the optimal value of λ0. The variations of the mean value of SC with the increasing β in the case of λ0 = 0.15 are shown in Fig. 6B. The results indicated that the mean SC value was the highest in the case of β = 50000. Accordingly, we set λ0 = 0.15 and β = 50000 as the optimal parameters in the simulated and real fMRI data analysis.

351

4.2. Real fMRI experiment 4.2.1. Finger tapping task dataset Fig. 7 shows the comparison of the performance of the six methods for the real fMRI data. The ROC areas of 2SGICA were the highest for all subjects and those of FastICA were the lowest for most subjects (see Fig. 7A). Although the temporal correlation coefficients of the six ICA methods were close to each other for all subjects, the correlations of 2SGICA were the highest and those of ODL were the lowest for most subjects (see Fig. 7C). The mean values (± standard deviation) of the ROC area in descending order were 0.7141(± 0.0348), 0.7043 (± 0.0335), 0.6966(± 0.0363), 0.6929(± 0.0288), 0.6857(± 0.0376), and 0.6782(± 0.0380) for 2SGICA, InfomaxICA, SGICA, ODL, MFICA, and FastICA, respectively. The mean values (±standard deviation) of the temporal correlation in descending order were 0.8132(±0.0716), 0.8115(± 0.0723), 0.8086(± 0.0712), 0.7941(± 0.0857), 0.7934 (± 0.0782),and 0.7761(± 0.0703) for 2SGICA, SGICA, InfomaxICA, FastICA, MFICA, and ODL, respectively. Friedman tests on ROC areas showed significant differences between the six methods (χ2 = 80.330, p b 0.001). The post-hoc Wilcoxon tests showed that ten out of fifteen pairs of these methods showed significance differences (pb0.004, see Fig. 7B), and one pair showed marginally significant difference (p=0.0068 for ODL vs. 2SGICA). For temporal correlation

Fig. 4. Simulated results of the multiple-subject analysis for the InfomaxICA, FastICA, ODL, MFICA, SGICA and 2SGICA methods. (A) The ROC area of each subject. (B) Comparisons of the ROC area between the six methods. (C) The temporal correlation coefficient of each subject. (D) Comparisons of the temporal correlation between the six methods. (E) Computation time of each decomposition for the six methods. ** represents pb0.004 (Bonferroni corrected); * represents pb0.007 (Bonferroni corrected).

352


Fig. 5. Spatial group activation maps of the task-related component that were extracted from the simulated data by FastICA (A), InfomaxICA (B), ODL (C), MFICA (D), SGICA (E) and 2SGICA (F).

coefficients, Friedman tests showed significant differences between the six methods (χ2 = 70.330, p b 0.001). The post-hoc Wilcoxon tests indicated that ten out of fifteen pairs of these methods

displayed significant difference (pb0.004, see Fig. 7D), and one pair showed marginally significant difference (p=0.0058 for InfomaxICA vs. 2SGICA). All of the above results were Bonferroni corrected for

Fig. 6. The determination of the optimal β and λ0. (A) The variation of the mean SC value with the initial step-size (λ0) for β = 50000. (B) The variation of the mean SC value with β for λ0=0.15.


353

Fig. 7. Results of the real fMRI data for the InfomaxICA, FastICA, ODL, MFICA, SGICA and 2SGICA methods. (A) The ROC area of each subject. (B) Comparisons of the ROC area between the six methods. (C) The temporal correlation coefficient of each subject. (D) Comparisons of the temporal correlation between the six methods. (E) Computation time of each decomposition for the six methods. Note that although MFICA, SGICA and 2SGICA run once with ATGP, we still plot their computation time 50 times in order to compare with other three methods. ** represents pb0.004 (Bonferroni corrected); * represents pb0.007 (Bonferroni corrected).

multiple comparisons. For the computation time, FastICA showed the fastest converging speed while the computation time of MFICA was about two orders greater in magnitude than that of the other methods. The converging speed of 2SGICA, SGICA and ODL was faster than InfomaxICA and slower than FastICA (see Fig. 7E). The spatial maps of the task-related component discovered by FastICA, InfomaxICA, MFICA, ODL, SGICA and 2SGICA are presented in Fig. 8A-F. The local maxima coordinates of the spatial maps are reported in Table 1 and Table 2. Moreover, the activated brain regions that were detected by GLM are displayed in Fig. 8G. It is clear that the spatial activations of the six methods largely overlapped with the activated regions of GLM. Consistent with previous studies (Hanakawa et al., 2008; Lacourse et al., 2005; Zhang et al., 2011), motor-related activation was mainly located in the bilateral supplementary motor area (SMA), left primary motor cortex (M1), bilateral premotor area (PMA), bilateral posterior parietal lobule (PPL), thalamus, striatum and cerebellum. For 2SGICA, SGICA, InfomaxICA and FastICA, the bilateral SMA, M1, bilateral PMA and bilateral PPL constituted the first cluster C1, the thalamus and striatum constituted the second cluster C2, and the cerebellum constituted the third cluster C3. Moreover, a cluster (C4) in white matter and cerebrospinal fluid that should not belong to the motor-related network was detected by InfomaxICA, FastICA, ODL, MFICA, SGICA and ODL, except 2SGICA. The MFICA method merged cluster C1 and C2 into one cluster while the ODL method merged all the four clusters (C1, C2, C3

and C4) into one cluster. Among the six methods, FastICA estimated the smallest activation. 2SGICA detected greater activation than MFICA, SGICA, InfomaxICA and FastICA, especially in the thalamus and cerebellum (cluster C2 and C3 in Fig. 8 and Table 1 and Table 2). Although ODL seemed to detect the biggest clusters, the task-related network that was extracted by ODL covered not only motor-related areas, but also some motor-unrelated areas, such as anterior and posterior cingulate cortex (ACC and PCC), ventromedial prefrontal cortex (vMPFC). Although the correlation coefficients between the time course and the reference function are very close for the six methods, the correlation coefficient of 2SGICA is slightly higher than that of the other methods (Fig. 7D). 4.2.2. Resting state dataset The spatial maps of the DMN component estimated by FastICA, InfomaxICA, ODL, MFICA,SGICA, 2SGICA are presented in Fig. 9A-F. The local maxima coordinates of the spatial maps are reported in Table 3. It is clear that the spatial activations of the six methods largely overlapped and contains all the important nodes of DMN. Among the six methods, 2SGICA detected the biggest clusters in lIPC, lLTC, rLTC and lMTL regions, ODL detected the biggest cluster in pC/PCC, InfomaxICA detected the biggest cluster in dMPFC/vMPFC and SGICA detected the biggest cluster in rIPC and rMTL (see Table 4). Although the cluster sizes of pC/PCC, dMPFC/vMPFC, rIPC and rMTL are not the largest for 2SGICA, they are

354


Fig. 8. Spatial activation maps of real fMRI data. (A-E) Activated regions of the task-related components that were estimated by FastICA (A), InfomaxICA (B), ODL (C), MFICA (D), SGICA (E) and 2SGICA (F). (G) Activated regions detected by GLM. SC1: size of cluster 1; SC2: size of cluster 2; SC3: size of cluster 3; SC4: size of cluster 4.

the second largest among the six methods. Moreover, the rMTL was very small in the DMN that was extracted by InfomaxICA and dMPFC/vMPFC was very small in the DMN that was extracted by ODL. 5. Discussion In this study, we proposed the 2SGICA method that incorporates the sparse prior of the sources into the ICA model and seeks a matched Laplacian pdf for each source by using a two-step decomposition procedure. We demonstrated the robustness and feasibility of the method under conditions of different noise levels. The results from the simulated data and real fMRI data demonstrated that 2SGICA outperformed SGICA, ODL, MFICA, InfomaxICA, and FastICA in both spatial detection power and temporal course estimation. The validity of the statistical assumptions implicit in the ICA method is essential to the successful application of ICA to biomedical signals. For fMRI data, independence among the brain sources turns out to be a powerful and effective assumption (Calhoun and Adali, 2006; Long et al., 2009). However, Daubechies et al. (2009) claimed that two ICA algorithms, InfomaxICA and FastICA, select for sparsity rather than

independence. The follow-up work by Calhoun et al. (2013) showed that the claims made in Daubechies et al. (2009) were not supported by experimental evidence and the ICA algorithms are indeed identifying maximally independent sources (Calhoun et al., 2013). Our study focuses on using sparse modeling for fMRI analysis. In the human nervous system, sparsity is motivated by evidence of neuronal coding efficiency and sparse coding (Vinje and Gallant, 2000; Waydo et al., 2006). For fMRI data, only a small number of voxels respond to external stimuli. Thus, the sparsity of sources underlying fMRI data is a valid assumption. Laplacian prior has higher kurtosis and puts greater weights on values close to zero than the Gaussian prior, thus the Laplacian prior can induce sparser representation of the data (Girolami, 2001; Lewicki and Sejnowski, 2000). The sparse fMRI sources can thus be obtained using Laplacian priors (Kim et al., 2012). It is the rationale that we choose Laplacian in the present study, and the results also validate this rationale. In the simulated experiment, the ROC area and the temporal correlation of the task-related components that were extracted by 2SGICA in both the single-subject and multi-subject analysis were the highest at all noise levels among the six methods. The results suggest that

Table 1 Activated regions of the task-related components that were detected by FastICA, InfomaxICA, SGICA and 2SGICA. cluster (Left/Right) Region #

BA

1

4/6 3913 (−36,−19,66) 4/6 (36,−13,62) 3/4 (−36.−25,54) 3/4 (42,−34,50) 6 (−9,−7,70) 6 (3,−1,58) 40 (−42,−40,54) 40 (45,−34,50) 7 (−27,−49,62) 7 (36,−46,62) 388 (−21,2,−10) (15,8,−10) (−12,−19,2)

2

3

L precentral gyrus R precentral gyrus L postcentral gyrus R postcentral gyrus L supplementary motor area R supplementary motor area L inferior parietal lobule R inferior parietal lobule L superior parietal lobule R superior parietal lobule L putamen R putamen L thalamus R thalamus L cerebellum R cerebellum Vermis

FastICA k

972

Coordinate

InfomaxICA Tmax

k

Coordinate

24.29 5428 (−33,−28,58) 12.28 (33,−13,62) 18.81 (−36,−25,54) 7.67 (42,−34,50) 17.22 (−9,−7,70) 15.16 (3,−1,58) 11.92 (−36,−43,54) 7.05 (45,−34,50) 11.26 (−27,−52,62) 4.59 (36,−46,62) 6.87 563 (−21,2,−10) 6.01 (21,5,−6) 6.95 (−12,−19,2) (9,−7,10) (−27,−55,−26) 7.81 1328 (−24,−55,−26) (24,−49,−30) 12.89 (24,−49,−30) (6,−64,−14) 10.18 (6,−64,−14)

Note. BA: Brodmann’s area; k: cluster size; Coordinate = max coordinate (mm) in MNI space.

SGICA Tmax

k

2SGICA Coordinate

25.67 4828 (−33,−22,66) 14.09 (33,−13,62) 20.33 (−36,−25,54) 10.04 (42,−34,50) 19.37 (−9,−7,70) 17.15 (3,−1,58) 13.71 (−42,−40,54) 9.75 (45,−34,50) 14.89 (−27,−52,62) 6.69 (36,−43,62) 6.82 606 (−21,2,−10) 5.50 (21,5,−6) 8.59 (−12,−19,2) 4.32 (9,−7,10) 8.85 1339 (−24,−55,−26) 14.42 (21,−49,−30) 12.09 (6,−64,−26)

Tmax

k

Coordinate

24.18 5871 (−33,−28,58) 12.83 (33,−13,62) 20.02 (−36,−25,54) 8.94 (42,−34,50) 19.29 (−9,−7,70) 16.05 (3,−1,58) 13.76 (−42,−40,54) 8.35 (45,−34,50) 13.91 (−27,−52,62) 6.50 (33,−49,62) 7.06 930 (−21,2,−10) 5.53 (21,5,−6) 8.03 (−12,−19,2) 4.79 (9,−7,10) 8.77 1681 (−24,−52,−26) 13.46 (24,−49,−30) 11.82 (6,−64,−26)

Tmax 26.32 13.98 20.13 10.34 19.74 17.44 14.41 10.23 14.97 7.11 7.94 6.60 10.10 5.83 9.54 15.19 12.91


355

Table 2 Activated regions of the task-related components that were detected by MFICA and ODL. cluster #

1

2

3

4

(Left/Right) Region

L precentral gyrus R precentral gyrus L postcentral gyrus R postcentral gyrus L supplementary motor area R supplementary motor area L inferior parietal lobule R inferior parietal lobule L superior parietal lobule R superior parietal lobule L putamen R putamen L thalamus R thalamus L cerebellum R cerebellum Vermis R posterior cingulate cortex L anterior cingulate cortex R anterior cingulate cortex

BA

4/6 4/6 3/4 3/4 6 6 40 40 7 7

MFICA

ODL

k

Coordinate

Tmax

k

Coordinate

Tmax

4782

(−33,−28,58) (33,−13,62) (−36,−25,54) (42,−31,50) (−9,−7,70) (3,−1,58) (−36,−43,58) (45,−34,50) (−27,−52,62) (−21,2,6) (−21,2,−10) (15,5,−10) (−12,−19,2) (9,−7,10) (−24,−55,−26) (24,−49,−30) (6,−64,−18)

24.78 12.07 19.71 7.34 18.98 15.73 14.24 6.32 12.42 5.53 7.87 6.17 7.65 5.00 8.53 13.56 11.17

10980

(−27,−16,62) (30,−10,62) (−36,−25,54) (42,−34,50) (−15,−10,62) (15,−4,54) (−39,−37,50) (42,−37,50) (−30,−52,62) (21,−55,62)

26.53 11.60 16.50 14.74 17.01 11.49 18.81 19.70 19.37 11.89

(−9,−19,−2)

8.54

(−24,−49,−26) (24,−55,−26) (6,−61,−18) (9,41,6) (−12,−55,26) (21,−61,14)

12.13 20.01 15.26 10.61 5.23 5.08

1112

30 32 32

Note. BA: Brodmann’s area; k: cluster size; Coordinate = max coordinate (mm) in MNI space.

2SGICA has better robustness to noises, spatial detection power and temporal estimation accuracy than the other five methods. Moreover, the group activation maps show that 2SGICA has the lowest false activation than the other five methods. It has been demonstrated that the potential of ICA methods can be fully realized by applying additional constraints that are physiologically realistic (Stone et al., 2002; Suzuki et al., 2002). In this study, although SGICA and MFICA integrated the sparsity prior into the ICA model, it is still outperformed by InfomaxICA in simulated. The main possibility for this finding is that the pdfs of all of the sources use the same Laplacian prior, which is not physiologically realistic. To make the Laplacian priors closer to the true pdfs of sources, 2SGICA fitted the Laplacian functions to the pdfs of the sources that were separated from the first-step SGICA. Accordingly, our results indicate that the adaptive selection of the Laplacian prior that is matched to each source is very helpful to refine the separation of sparse sources. Although ODL showed higher spatial detection power in the simulated results than SGICA and MFICA, its spatial detection power is worse than InfomaxICA and 2SGICA and its time course estimation is the worst among the six methods. The spatial activation results of the simulated data suggest that ODL has the larger false activation than the

other five methods. The worse performance of ODL than InfomaxICA and 2SGICA may be attributed to the fact that only sparsity of sources is considered in the model of ODL. All the simulated results suggest that the combination of independence and sparsity of sources in the blind source separation model using adaptive Laplacian prior is able to improve the source estimation. Moreover, the simulated results indicated that 2SGICA had faster computation speed than InfomaxICA and MFICA. The improved performance of 2SGICA did not come at the expense of computational complexity compared to InfomaxICA and MFICA. The MFICA is a time-consuming approach because MFICA employs the EM algorithm and the E-step has to be solved iteratively at each step. For the real fMRI data of the finger tapping task, 2SGICA still displayed significantly higher spatial detection power than the other five methods (See Fig. 7). All the six methods detected the important motor-related regions. The main difference of the activation maps between the six methods is the cluster size. In contrast to InfomaxICA, FastICA, MFICA and SGICA, 2SGICA detected larger cluster size in C1, C2 and C3 that included bilateral SMA, M1, bilateral PMA, bilateral PPL, thalamus, striatum and cerebellum. All these regions have been

Fig. 9. Spatial activation maps of the default mode networks that were extracted by FastICA (A), InfomaxICA (B), ODL (C), MFICA (D), SGICA (E) and 2SGICA (F).

Abbreviation: BA: Brodmann’s area; pC/PCC: precunes/posterior cingulate cortex; lIPC: left inferior parietal cortex; rIPC: right inferior parietal cortex; dMPFC: dorsal medial prefrontal cortex; vMPFC: ventral medial prefrontal cortex; lLTC: left lateral temporal cortex; rLTC: right lateral temporal cortex; lMTL: left medial temporal lobe; rMTL: right medial temporal lobe.

(−3,−58,30) (−42,−70,46) (45,−58,30) (9,62,14) (−6,56,−6) (−54,−4,−22) (60,−7,−30) (−30,−28,−18) (30,−19,−30) 29.36 19.72 13.90 13.99 14.53 9.19 10.37 7.92 6.36 (−3,−55,30) (−42,−70,46) (45,−58,34) (0,44,22) (−6,56,−6) (−54,−4,−26) (60,−7,−30) (−27,−16,−26) (30,−19,−30) 24.46 15.30 12.01 11.64 9.89 8.08 6.69 7.12 5.82 (0,−55,30) (−42,−70,38) (45,−58,30) (3,50,14) (−6,56,−10) (−54,−4,−22) (51,−4,−22) (−30,−28,−18) (18,−7,−22) 17.17 10.97 10.64 5.71 4.90 6.29 8.43 6.75 6.14 (−3,−67,42) (−42,−73,34) (45,−55,26) (−12,56,14) (−6,59,−14) (−54,−7,−22) (45,11,−38) (−24,−28,−22) (24,−22,−14) 20.62 15.35 10.23 25.97 18.76 7.50 11.22 5.44 6.35 (−3,−55,30) (−45,−70,46) (54,−58,34) (−3,44,26) (−6,50,−2) (−60,−13,−22) (54,5,−42) (−27,−22,−22) (30,−19,−30) 26.74 17.37 12.88 12.26 12.18 10.07 10.08 7.12 6.46 30/31 39 39 10/32 10/32 21 21 35/36 35/36

Tmax

pC/PCC lIPC rIPC dMPFC vMPFC lLTC rLTC lMTL rMTL

(−3,−58,30) (−42,−70,42) (45,−58,30) (−3,47,26) (−6,56,−6) (−54,−4,−22) (60,−7,−30) (−27,−16,−26) (30,−16,−30)

2SGICA

Coordinate Tmax

SGICA MFICA ODL

Coordinate Coordinate

Tmax InfomaxICA

Coordinate

FastICA BA Region

Table 3 Activated regions of the default mode networks that were detected by FastICA, InfomaxICA, ODL, MFICA, SGICA and 2SGICA.

Tmax

Coordinate

Tmax

Coordinate

Tmax

R. Ge et al. / NeuroImage 118 (2015) 344–358 30.27 19.98 14.19 16.80 14.16 10.61 11.08 8.25 6.10

356

Table 4 Size of the nine nodes of the default mode networks that were detected by FastICA, InfomaxICA, ODL, MFICA, SGICA and 2SGICA. Network node

FastICA

InfomaxICA

ODL

MFICA

SGICA

2SGICA

pC/PCC lIPC rIPC dMPFC/vMPFC lLTC rLTC lMTL rMTL

1003 356 315 1694 421 545 102 85

994 378 261 2323 408 458 49 24

1281 273 213 174 189 359 32 29

911 295 283 1252 253 361 72 81

1068 402 372 2032 377 566 104 98

1091 403 369 2128 581 726 122 96

*Note that dMPFC and vMPFC are combined together. Abbreviation: pC/PCC: precunes/ posterior cingulate cortex; lIPC: left inferior parietal cortex; rIPC: right inferior parietal cortex; dMPFC: dorsal medial prefrontal cortex; vMPFC: ventral medial prefrontal cortex; lLTC: left lateral temporal cortex; rLTC: right lateral temporal cortex; lMTL: left medial temporal lobe; rMTL: right medial temporal lobe.

identified as critical structures for the acquisition and/or retention of motor behaviors (Doyon and Benali, 2005; Doyon et al., 2003; Lacourse et al., 2005; Zhang et al., 2011). These results indicate that 2SGICAhas superior ability in detection of true activation. It should be noted that the cluster C4 that mainly located in white matter and cerebrospinal fluid should not appear in the motor-related network. However, all the five methods except 2SGICA detected the artifact cluster C4. The real fMRI results further demonstrate that 2SGICA has the smallest false positive rate among the six methods, which is consistent with the finding of the simulation experiment. Although ODL can detect large clusters in motor-related areas, the motor-related network of ODL included both the motor-related regions and some motor-unrelated regions, such as PCC, ACC and vMPFC that belong to the critical nodes of the DMN (Buckner et al., 2008; Smith et al., 2009). Sparse representation methods do not require independence or none-overlapping of the brain patterns. Thus, the motor-related network uncovered by ODL overlapped with DMN to some extent. As for resting-state fMRI data, 2SGICA detected the largest clusters in four DMN nodes and the second largest cluster in the other four DMN nodes among the six methods. The results further demonstrated that 2SGICA showed superior spatial detection power. Although InfomaxICA detected more activation in the MPFC and ODL detected more activation in the pC/PCC than 2SGICA, both methods detected very small activated clusters in the bilateral MTL that is an important node within DMN (Buckner et al., 2008; Fransson and Marrelec, 2008). Moreover, ODL detected much fewer activation in the dMPFC/vMPFC compared to the other five methods. Consistent with the results obtained by the simulation, real finger tapping task and resting-state fMRI data further demonstrate that 2SGICA tends to detect more true activation and less false activation than the other five methods. It should be noted that the improvements of 2SGICA are more obvious in the recovery of the spatial maps than in recovery of the time course for the simulated and real fMRI data although 2SGICA outperforms other methods in both the spatial maps and the time-courses estimation. For 2SGICA, it is assumed that the independent sources underlying the observed data are sparse and have Laplacian distributions. In the context of fMRI data, spatial 2SGICA is used to identify the components that are spatially independent. It is the sparse priors of spatial sources rather than the temporal sources of fMRI data that are introduced into the ICA model. Accordingly, more improvements of 2SGICA were seen in spatial maps than in time-courses estimation. Unlike the FastICA and InfomaxICA algorithms, which cannot determine the unique results in different decompositions (McKeown et al., 1998), different 2SGICA decompositions of the same data can produce exactly the same separation results. 2SGICA’s stability is due to the ATGP method, which is used in the generation of initial value. For FastICA and InfomaxICA, the initial values of the unmixing matrix W were randomly generated, which may result in two major issues. One is that the final components are generally different if two different


sets of random initial values are used. Furthermore, the orders of components produced by ICA are also different when different initial values are used. These issues are particularly severe when the data space has been suppressed by data dimensionality reduction (Ouyang et al., 2008). Several methods that perform repeated ICA decompositions were proposed to obtain more stable results (Himberg et al., 2004; Yang et al., 2008); however, repeated decompositions are timeconsuming. In the present study, the 2SGICA method generates the initial values of the mixing matrix A using ATGP, which repeatedly uses an orthogonal subspace projector to generate a suitable set of initial values to replace the random initial values. Consequently, the results acquired by 2SGICA are consistent no matter how many times 2SGICA is performed and the stability of 2SGICA was largely improved by using ATGP. In the natural gradient-based algorithm employed by 2SGICA, the learning step size plays a key role in the convergence of the algorithm. The small step size can result in the slow convergence of the algorithm. Although a large step size leads to rapid convergence, the steady-state error of the algorithm may also be large. In this study, 2SGICA and SGICA employed an adaptive step size selection procedure to determine an optimal step size adaptively based on the time-varying environment. We introduced an error-estimation that is the difference of the mixing matrix between two adjacent iterations based on the learning process. When the error estimation was large, the step size was increased to accelerate the convergence; however, when the error estimation was small, the step size was decreased to approach the optimum and avoid the bounce around the optimum. Therefore, the step size selection procedure is able to adaptively determine the step size according to the estimation of the mixing matrix in each iterative step, which can guarantee a rapid convergence speed and good stability of the algorithm. For 2SGICA, there are two additional parameters β and λ0 that needed to be specified. The parameter β determined the similarity between the Laplacian function and Cosh function in Eq. (6), and λ0 determined the initial value of the step size. The results of our simulation suggest that β and λ0 had a very slight impact on the performance of 2SGICA for βN35000 and 0.07 ≤ λ0 ≤ 0.45. Although a cosh function with large β approximates the Laplacian function, an extremely largeβ may lead to the computation overflow of Matlab. In this study, we chose 50000 as the optimal value of β and 0.15 as the optimal value of λ0 according to the ROC results and temporal correlations from the simulation. Good results can be obtained from both the simulated and real fMRI data using the current setting of β and λ0 for 2SGICA. It should be noted the parameters β ∈ [10000, 55000] and λ0 ∈ [0.01, 0.45] were tested by 200 simulated datasets with different CNR (20 simulated subjects × 10 CNR levels). Convergence is achieved in every case without exception. Moreover, both the finger tapping and resting-state real fMRI data also achieved the convergence for the 2SGICA algorithm using the optimal parameter β and λ0. Accordingly, the results indicated that the 2SGICA algorithm has good convergence and stability. 6. Conclusion In summary, we demonstrated the feasibility and robustness of the 2SGICA method, which incorporates the sparse prior into the ICA model using a two-step separation procedure. The performances of 2SGICA, SGICA, FastICA, InfomaxICA, ODL and MFICA were compared using both simulated and real fMRI data. The results demonstrated that 2SGICA displays better robustness, spatial detection power and time course estimation than the other methods. The improved source separation of 2SGICA suggests that 2SGICA has the potential to be a useful addition to the existing ICA techniques for fMRI data analysis. Acknowledgment This work is supported by Key Program of National Natural Science Foundation of China (91320201); the National Natural Science

357

Foundation of China (61271111, 61473044); the Funds for International Cooperation and Exchange of the National Natural Science Foundation of China (61210001) and the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry.

References Abolghasemi, V., Ferdowsi, S., Sanei, S., 2015. Fast and incoherent dictionary learning algorithms with application to fMRI. SIViP 9, 147–158. Allassonniere, S., Younes, L., 2012. A stochastic algorithm for probabilistic independent component analysis. Ann. Appl. Stat. 6, 125–160. Aragri, A., Scarabino, T., Seifritz, E., Comani, S., Cirillo, S., Tedeschi, G., Esposito, F., Di Salle, F., 2006. How does spatial extent of fMRI datasets affect independent component analysis decomposition? Hum. Brain Mapp. 27, 736–746. Attias, H., 1999. Independent factor analysis. Neural Comput. 11, 803–851. Attwell, D., Laughlin, S.B., 2001. An energy budget for signaling in the grey matter of the brain. J. Cereb. Blood Flow Metab. 21, 1133–1145. Babaie-Zadeh, M., Jutten, C., Mansour, A., 2006. Sparse ICA via cluster-wise PCA. Neurocomputing 69, 1458–1466. Beckmann, C.F., Smith, S.M., 2005. Tensorial extensions of independent component analysis for multisubject FMRI analysis. NeuroImage 25, 294–311. Beloozerova, I.N., Sirota, M.G., Swadlow, H.A., 2003. Activity of different classes of neurons of the motor cortex during locomotion. J. Neurosci. 23, 1087–1097. Bermond, O., Cardoso, J.-F., 1999. Approximate likelihood for noisy mixtures. Proc. ICA. Citeseer, pp. 325–330. Bronstein, A.M., Bronstein, M.M., Zibulevsky, M., Zeevi, Y.Y., 2005. Sparse ICA for blind separation of transmitted and reflected images. Int. J. Imaging Syst. Technol. 15, 84–91. Buckner, R.L., Andrews‐Hanna, J.R., Schacter, D.L., 2008. The brain's default network. Ann. N. Y. Acad. Sci. 1124, 1–38. Calhoun, V.D., Adali, T., 2006. Unmixing fMRI with independent component analysis. IEEE Eng. Med. Biol. Mag. 25, 79–90. Calhoun, V.D., Adali, T., 2012. Multisubject independent component analysis of fMRI: a decade of intrinsic networks, default mode, and neurodiagnostic discovery. IEEE Rev. Biomed. Eng. 5, 60–73. Calhoun, V., Adali, T., Pearlson, G., Pekar, J., 2001. A method for making group inferences from functional MRI data using independent component analysis. Hum. Brain Mapp. 14, 140–151. Calhoun, V.D., Potluru, V.K., Phlypo, R., Silva, R.F., Pearlmutter, B.A., Caprihan, A., Plis, S.M., Adalı, T., 2013. Independent component analysis for brain fMRI does indeed select for maximal independence. PLoS One 8, e73309. Chang, C.-I., Xiong, W., Chen, H.-M., Chai, J.-W., 2011. Maximum orthogonal subspace projection approach to estimating the number of spectral signal sources in hyperspectral imagery. IEEE J. Sel. Top. Sign. Proces. 5, 504–520. Chen, S.S., Donoho, D.L., Saunders, M.A., 1998. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61. Chumbley, J., Worsley, K., Flandin, G., Friston, K., 2010. Topological FDR for neuroimaging. NeuroImage 49, 3057–3064. Daubechies, I., Roussos, E., Takerkart, S., Benharrosh, M., Golden, C., D'ardenne, K., Richter, W., Cohen, J., Haxby, J., 2009. Independent component analysis for brain fMRI does not select for independence. Proc. Natl. Acad. Sci. 106, 10415–10422. Davies, M., Mitianoudis, N., 2004. Simple mixture model for sparse overcomplete ICA. IEE Proc. Vis. Image Signal Proc. 151, 35–43. Dennis, J., 1977. Non-linear least squares and equations. In: Jacobs, D. (Ed.), Nonlinear least squares, state of the art in numerical analysis. Academic Press, New York, pp. 269–312. Doyon, J., Benali, H., 2005. Reorganization and plasticity in the adult brain during learning of motor skills. Curr. Opin. Neurobiol. 15, 161–167. Doyon, J., Penhune, V., Ungerleider, L.G., 2003. Distinct contribution of the cortico-striatal and cortico-cerebellar systems to motor skill learning. Neuropsychologia 41, 252–262. Elad, M., Aharon, M., 2006. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15, 3736–3745. Erhardt, E.B., Allen, E.A., Wei, Y., Eichele, T., Calhoun, V.D., 2012. SimTB, a simulation toolbox for fMRI data under a model of spatiotemporal separability. NeuroImage 59, 4160–4167. Esposito, F., Formisano, E., Seifritz, E., Goebel, R., Morrone, R., Tedeschi, G., Di Salle, F., 2002. Spatial independent component analysis of functional MRI time series: To what extent do results depend on the algorithm used? Hum. Brain Mapp. 16, 146–157. Esposito, F., Scarabino, T., Hyvarinen, A., Himberg, J., Formisano, E., Comani, S., Tedeschi, G., Goebel, R., Seifritz, E., Di Salle, F., 2005. Independent component analysis of fMRI group studies by self-organizing clustering. NeuroImage 25, 193–205. Fransson, P., Marrelec, G., 2008. The precuneus/posterior cingulate cortex plays a pivotal role in the default mode network: Evidence from a partial correlation network analysis. NeuroImage 42, 1178–1184. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.P., Frith, C.D., Frackowiak, R.S., 1994. Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2, 189–210. Ge, R., Fu, Y., Wang, D., Yao, L., Long, Z., 2014. Age-related alterations of brain network underlying the retrieval of emotional autobiographical memories: An fMRI study using independent component analysis. Front. Hum. Neurosci. 8, 629-629. Ge, R., Zhang, H., Yao, L., Long, Z., 2015. Motor imagery learning induced changes in functional connectivity of the default mode network. IEEE Trans. Neural Syst. Rehabil. Eng. 23, 138–148.

358


Georgiev, P., Theis, F., Cichocki, A., Bakardjian, H., 2007. Sparse component analysis: a new tool for data mining. Data mining in biomedicine. Springer, US, pp. 91–116. Girolami, M., 2001. A variational method for learning sparse and overcomplete representations. Neural Comput. 13, 2517–2532. Greicius, M.D., Srivastava, G., Reiss, A.L., Menon, V., 2004. Default-mode network activity distinguishes Alzheimer's disease from healthy aging: evidence from functional MRI. Proc. Natl. Acad. Sci. U. S. A. 101, 4637–4642. Hanakawa, T., Dimyan, M.A., Hallett, M., 2008. Motor planning, imagery, and execution in the distributed motor network: a time-course study with functional MRI. Cereb. Cortex 18, 2775–2788. He, Z., Xie, S., Zhang, L., Cichocki, A., 2008. A note on Lewicki-Sejnowski gradient for learning overcomplete representations. Neural Comput. 20, 636–643. Himberg, J., Hyvärinen, A., Esposito, F., 2004. Validating the independent components of neuroimaging time series via clustering and visualization. NeuroImage 22, 1214–1222. Højen-Sørensen, P.A., Winther, O., Hansen, L.K., 2002. Mean-field approaches to independent component analysis. Neural Comput. 14, 889–918. Hong, B., Pearlson, G.D., Calhoun, V.D., 2005. Source density driven independent component analysis approach for fMRI data. Hum. Brain Mapp. 25, 297–307. Hyvärinen, A., Oja, E., 2000. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430. Kim, Y.-H., Kim, J., Lee, J.-H., 2012. Iterative approach of dual regression with a sparse prior enhances the performance of independent component analysis for group functional magnetic resonance imaging (fMRI) data. NeuroImage 63, 1864–1889. Kotz, S., Kozubowski, T., Podgorski, K., 2001. The Laplace distribution and generalizations: a revisit with applications to communications, exonomics, engineering, and finance. Springer Science & Business Media. Kreutz-Delgado, K., Murray, J.F., Rao, B.D., Engan, K., Lee, T.-W., Sejnowski, T.J., 2003. Dictionary learning algorithms for sparse representation. Neural Comput. 15, 349–396. Lacourse, M.G., Orr, E.L., Cramer, S.C., Cohen, M.J., 2005. Brain activation during execution and motor imagery of novel and skilled sequential hand movements. NeuroImage 27, 505–519. Lee, K., Tak, S., Ye, J.C., 2011. A data-driven sparse GLM for fMRI analysis using sparse dictionary learning with MDL criterion. IEEE Trans. Med. Imaging 30, 1076–1089. Lennie, P., 2003. The cost of cortical computation. Curr. Biol. 13, 493–497. Lewicki, M.S., Sejnowski, T.J., 2000. Learning overcomplete representations. Neural Comput. 12, 337–365. Li, Y.O., Adalı, T., Calhoun, V.D., 2007. Estimating the number of independent components for functional magnetic resonance imaging data. Hum. Brain Mapp. 28, 1251–1266. Li, Y., Namburi, P., Yu, Z., Guan, C., Feng, J., Gu, Z., 2009. Voxel selection in fMRI data analysis based on sparse representation. IEEE Trans. Biomed. Eng. 56, 2439–2451. Li, Y., Long, J., He, L., Lu, H., Gu, Z., Sun, P., 2012. A sparse representation-based algorithm for pattern localization in brain imaging data analysis. PLoS One 7 e50332. Long, Z., Chen, K., Wu, X., Reiman, E., Peng, D., Yao, L., 2009. Improved application of independent component analysis to functional magnetic resonance imaging study via linear projection techniques. Hum. Brain Mapp. 30, 417–431. Lu, Y., Jiang, T., Zang, Y., 2004. A split–merge-based region-growing method for fMRI activation detection. Hum. Brain Mapp. 22, 271–279. Lv, J., Jiang, X., Li, X., Zhu, D., Chen, H., Zhang, T., Zhang, S., Hu, X., Han, J., Huang, H., 2015. Sparse representation of whole-brain fMRI signals for identification of functional networks. Med. Image Anal. 20, 112–134. Ma, L., Wang, B., Chen, X., Xiong, J., 2007. Detecting functional connectivity in the resting brain: a comparison between ICA and CCA. Magn. Reson. Imaging 25, 47–56. Mairal, J., Bach, F., Ponce, J., Sapiro, G., 2010. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60. Maldjian, J.A., Laurienti, P.J., Kraft, R.A., Burdette, J.H., 2003. An automated method for neuroanatomic and cytoarchitectonic atlas-based interrogation of fMRI data sets. NeuroImage 19, 1233–1239. McKeown, M.J., Makeig, S., Brown, G.G., Jung, T.-P., Kindermann, S.S., Bell, A.J., Sejnowski, T.J., 1998. Analysis of fMRI data by blind separation into independent spatial components. Hum. Brain Mapp. 6, 160–188.

Olshausen, B.A., 1996. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609. Olshausen, B.A., Field, D.J., 2004. Sparse coding of sensory inputs. Curr. Opin. Neurobiol. 14, 481–487. Ouyang, Y.-C., Chen, H.-M., Chai, J.-W., Chen, C.-C., Poon, S.-K., Yang, C.-W., Lee, S.-K., Chang, C.-I., 2008. Band expansion-based over-complete independent component analysis for multispectral processing of magnetic resonance images. IEEE Trans. Biomed. Eng. 55, 1666–1677. Perez-Orive, J., Mazor, O., Turner, G.C., Cassenaer, S., Wilson, R.I., Laurent, G., 2002. Oscillations and sparsening of odor representations in the mushroom body. Science 297, 359–365. Ramezani, M., Marble, K., Trang, H., Johnsrude, I., Abolmaesumi, P., 2015. Joint sparse representation of brain activity patterns in multi-task fMRI data. IEEE Trans. Med. Imaging 34, 2–12. Silverman, B.W., 1986. Density estimation for statistics and data analysis. CRC press. Skudlarski, P., Constable, R.T., Gore, J.C., 1999. ROC analysis of statistical methods used in functional MRI: individual subjects. NeuroImage 9, 311–329. Smith, S.M., Fox, P.T., Miller, K.L., Glahn, D.C., Fox, P.M., Mackay, C.E., Filippini, N., Watkins, K.E., Toro, R., Laird, A.R., 2009. Correspondence of the brain's functional architecture during activation and rest. Proc. Natl. Acad. Sci. 106, 13040–13045. Stone, J., Porrill, J., Porter, N., Wilkinson, I., 2002. Spatiotemporal independent component analysis of event-related fMRI data using skewed probability density functions. NeuroImage 15, 407–421. Suzuki, K., Kiryu, T., Nakada, T., 2002. Fast and precise independent component analysis for high field fMRI time series tailored using prior information on spatiotemporal structure. Hum. Brain Mapp. 15, 54–66. Svensén, M., Kruggel, F., Benali, H., 2002. ICA of fMRI group study data. NeuroImage 16, 551–563. Tang, X., Zhang, X., Ye, J., 2014. Adaptive step-size natural gradient ICA algorithm with weighted orthogonalization. Circ. Syst. Signal Process. 33, 211–221. Todros, K., Tabrikian, J., 2007. Blind separation of independent sources using Gaussian mixture model. IEEE Trans. Signal Process. 55, 3645–3658. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B., Joliot, M., 2002. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 15, 273–289. Vinje, W.E., Gallant, J.L., 2000. Sparse coding and decorrelation in primary visual cortex during natural vision. Science 287, 1273–1276. Vlassis, N., Motomura, Y., 2001. Efficient source adaptivity in independent component analysis. IEEE Trans. Neural Networks 12, 559–566. von Hoff, T.P., Lindgren, A.G., 2002. Adaptive step-size control in blind source separation. Neurocomputing 49, 119–138. Wang, Z., 2009. A hybrid SVM–GLM approach for fMRI data analysis. NeuroImage 46, 608–615. Waydo, S., Kraskov, A., Quiroga, R.Q., Fried, I., Koch, C., 2006. Sparse representation in the human medial temporal lobe. J. Neurosci. 26, 10232–10234. Welling, M., Weber, M., 2001. A constrained EM algorithm for independent component analysis. Neural Comput. 13, 677–689. Wixted, J.T., Squire, L.R., Jang, Y., Papesh, M.H., Goldinger, S.D., Kuhn, J.R., Smith, K.A., Treiman, D.M., Steinmetz, P.N., 2014. Sparse and distributed coding of episodic memory in neurons of the human hippocampus. Proc. Natl. Acad. Sci. 111, 9621–9626. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y., 2009. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227. Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S., 2010. Sparse representation for computer vision and pattern recognition. Proc. IEEE 98, 1031–1044. Yang, Z., LaConte, S., Weng, X., Hu, X., 2008. Ranking and averaging independent component analysis by reproducibility (RAICAR). Hum. Brain Mapp. 29, 711–725. Zhang, H., Xu, L., Wang, S., Xie, B., Guo, J., Long, Z., Yao, L., 2011. Behavioral improvements and brain functional alterations by motor imagery training. Brain Res. 1407, 38–46.

Parallel group independent component analysis for massive fMRI data sets.

A Parcellation Based Nonparametric Algorithm for Independent Component Analysis with Application to fMRI Data.

Robust data driven model order estimation for independent component analysis of FMRI data with low contrast to noise.

Independent component analysis of instantaneous power-based fMRI.

A Semiparametric Approach to Source Separation using Independent Component Analysis.

Validation of Shared and Specific Independent Component Analysis (SSICA) for Between-Group Comparisons in fMRI.

Spatiotemporal independent component analysis combine general linear model applied to fMRI for eliminating neural noise.

Recursive approach of EEG-segment-based principal component analysis substantially reduces cryogenic pump artifacts in simultaneous EEG-fMRI data.

Independent Vector Analysis for Gradient Artifact Removal in Concurrent EEG-fMRI Data.

Bayesian Models for fMRI Data Analysis.

A Local Learning Rule for Independent Component Analysis.

How to stop or change a motor response: Laplacian and independent component analysis approach.

Analysis of Residual Dependencies of Independent Components Extracted from fMRI Data.

A novel approach for fMRI data analysis based on the combination of sparse approximation and affinity propagation clustering.

A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data.

Temporal dynamics of sensorimotor integration in speech perception and production: independent component analysis of EEG data.

Automatic denoising of functional MRI data: combining independent component analysis and hierarchical fusion of classifiers.

ECG-based gating in ultra high field cardiovascular magnetic resonance using an independent component analysis approach.

A Bayesian inverse solution using independent component analysis.

Group-level spatial independent component analysis of Fourier envelopes of resting-state MEG data.

Case for fMRI data repositories.

fMRI using an ICA approach.

Analysis of whole-brain resting-state FMRI data using hierarchical clustering approach.

Differential functional brain network connectivity during visceral interoception as revealed by independent component analysis of fMRI TIME-series.