WASICA: An effective wavelet-shrinkage based ICA model for brain fMRI data analysis.

Accepted Manuscript Title: WASICA: An effective wavelet-shrinkage based ICA model for brain fMRI data analysis Author: Nizhuan Wang Weiming Zeng Yingchao Shi Tianlong Ren Yanshan Jing Jun Yin Jiajun Yang PII: DOI: Reference:

S0165-0270(15)00094-1 http://dx.doi.org/doi:10.1016/j.jneumeth.2015.03.011 NSM 7178

To appear in:

Journal of Neuroscience Methods

Received date: Revised date: Accepted date:

16-2-2014 8-3-2015 9-3-2015

Please cite this article as: Wang N, Zeng W, Shi Y, Ren T, Jing Y, Yin J, Yang J, WASICA: An effective wavelet-shrinkage based ICA model for brain fMRI data analysis, Journal of Neuroscience Methods (2015), http://dx.doi.org/10.1016/j.jneumeth.2015.03.011 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Title Page

Title:

ip t

WASICA: An effective wavelet-shrinkage based ICA model for brain fMRI data analysis

Author names and affiliations:

201306, China;

2

us

Lab of Digital Image and Intelligent Computation, Shanghai Maritime University, Shanghai, Department of Neurology, Shanghai Jiao Tong University Affiliated Sixth

People’s Hospital, Shanghai, 200233, China.

M

ponding author:

an

1

cr

Nizhuan Wang1, Weiming Zeng1, Yingchao Shi1, Tianlong Ren1, Yanshan Jing1, Jun Yin1, Jiajun Yang2

ce pt

Type of the article:

ed

Weiming Zeng, College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, CHINA; Tel: 86-021-38282873; Fax: 86-021-38282800; E-mail address:[email protected]

Ac

Full-length research paper

Page 1 of 78

Highlights

1.

Wavelet shrinkage leads to sparse approximation of the sources and suppresses

ip t

noise; Sparse approximation coefficients set satisfies the super-Gaussian distribution;

3.

WP reconstruction generates an effective approximation with nuisance exclusion;

4.

WASICA exhibits the superiority in signal recovery and temporal dynamics

us

cr

2.

extraction;

an

WASICA could recover the super-Gaussian sources or some Gaussian-like

ce pt

ed

M

sources;

Ac

5.

Page 2 of 78

WASICA: An effective wavelet-shrinkage based ICA model for

cr

ip t

brain fMRI data analysis

1

an

Jiajun Yang2*

Lab of Digital Image and Intelligent Computation, Shanghai Maritime University, Shanghai,

M

201306, China.

Department of Neurology, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital,

ce pt

Shanghai, 200233, China.

ed

2

us

Nizhuan Wang1, Weiming Zeng1*, Yingchao Shi1, Tianlong Ren1, Yanshan Jing1, Jun Yin1,

 Corresponding author at: College of Information Engineering, Shanghai Maritime University, 201306,

China;

Tel:

86-021-38282873;

Fax:

86-021-38282800;

E-mail

Ac

Shanghai,

address:[email protected] (W. Zeng). Department of Neurology, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, 200233, China; E-mail address: [email protected] (J. Yang).

1

Page 3 of 78

Abstract Background: Researches declared that the super-Gaussian property contributed to the success of some spatial independent component analysis (ICA) algorithms in brain

ip t

fMRI source separation (e.g., Infomax and FastICA), which implied that sparse

cr

approximation transforming the sources (super-Gaussian or Gaussian-like) with

stronger super-Gaussian feature would possibly improve the separation performance

us

of these algorithms.

an

New method: This paper presented a novel wavelet-shrinkage based ICA (WASICA) model, an extension of our previous SACICA, for single-subject analysis. In contrast,

M

two main aspects had been effectively enhanced: (1) sparse approximation

ed

coefficients set formation, made up of two sub-procedures: the wavelet-shrinkage of wavelet packet (WP) tree nodes, and the automatic nodes selection and integration

ce pt

based on the relative WP energy; (2) ICA-based decomposition and reconstruction, composed of temporal dynamics extraction using ICA, WP reconstruction based on the sparse approximation coefficients set and least-square-based functional networks

Ac

reconstruction.

Results: The wavelet-shrinkage and the automatic nodes selection and integration simultaneously transformed both the mixtures and underlying sources into effective sparse approximation coefficients sets, exhibiting stronger super-Gaussian distribution; WP projected-back approximation with nuisance-exclusion contributed to networks reconstruction. Comparison with existing methods: Simulation 1 revealed WASICA successfully 2

Page 4 of 78

recovered super-Gaussian and some Gaussian-like sources. Simulation 2 and hybrid data experiments showed that WASICA with good temporal performance had stronger source recovery ability and signal detection sensitivity spatially than FastICA,

ip t

Infomax and SACICA did; the higher intra-consistency in task-related experiments denoted WASICA occupied stronger spatial robustness against smooth kernels.

cr

Conclusions: WASICA was a promising brain signal separation model with charming

us

spatial-temporal performance.

Keywords: fMRI; wavelet packet decomposition; sparse approximation; wavelet

Ac

ce pt

ed

M

an

shrinkage; ICA

3

Page 5 of 78

1. Introduction 1.1. ICA and fMRI source separation Recently, the independent component analysis (ICA, spatial/temporal ICA) has

ip t

been widely used to analyze the blood oxygen level dependent (BOLD) functional

cr

magnetic resonance imaging (fMRI) data, aiming at detecting functional connectivity

among discrete cortical brain regions at single-subject level or multi-subject level, due

us

to its fully data-driven feature without the prior neuroscience knowledge or

an

experience (McKeown et al., 1998; Biswal et al., 1999; Beckmann & Smith, 2004, 2005; Calhoun et al., 2001a, 2001b; Schöpf et al., 2010; Wang et al., 2012a). In

M

contrast to the univariate general linear model (GLM), which is performed on a

ed

voxel-by-voxel basis (Bagarinao et al., 2003), the ICA method is intrinsically a multivariate approach, which could extract multiple brain networks that are engaged

ce pt

in various elements of cognitive processing. This capability makes ICA an increasingly attractive exploratory tool to study functional networks at rest (Beckmann et al., 2005; Damoiseaux et al., 2006; Schöpf et al., 2010; Wang et al.,

Ac

2012a), during a cognitive task (Calhoun et al., 2008), or under the mental disorder condition, for example, autism spectrum disorder (Ambrosino et al., 2014) and dementia (Rytty et al., 2013). 1.2. Combining wavelets with ICA Very recently, Calhoun et al. (2013) has fully demonstrated that the unimodal super-Gaussian distributions of the sources’ probability density functions (pdf) 4

Page 6 of 78

contribute to the success of FastICA (Hyvärinen et al., 1997) and Infomax (Bell & Sejnowski, 1995) with standard parameter settings. This verification by Calhoun and his colleagues implies that if the pdfs of the underlying sources or the transformed

ip t

sources occupy the stronger unimodal super-Gaussian distributions, the possibly better source separation performance will be achieved by ICA with regular parameters

cr

setting, e.g., Infomax with sigmoid nonlinearity assuming a unimodal super-Gaussian

us

source. Recently, some studies have suggested that the separation performance of FastICA can be further improved by sparse approximation of the fMRI mixtures, so

an

that the coefficients of the transformed data (and underlying sources) exhibit stronger

M

super-Gaussian (Wang et al., 2012b, 2013). Fortunately, according to the former researches, the sparse approximation process can be fast achieved by the wavelet or

ed

wavelet packet transformation, exploiting the sparseness of the wavelet coefficients (Kisilev et al., 2000, 2003; Bronstein et al., 2005; Khullar et al., 2011a, 2011b; Wang

ce pt

et al., 2013). For example, Khullar et al. (2011a) recently have presented a wavelet-domain based framework (w-ICA) for de-noising and source separation on

Ac

the fMRI mixtures at single-subject level, which utilized the multidirectional de-noising scheme with three dimensional stationary discrete wavelet transform. In comparison to the Gaussian smoothing de-noising, this method has demonstrated significant improvement and preserved more accurate signal shape. As a development of w-ICA, Khullar et al. (2011b) have extended this framework to de-noise the multi-group (healthy/patient) fMRI data, gaining higher specificity and increasing shape accuracy compared with Gaussian smoothing. The authors have pointed that the 5

Page 7 of 78

de-correlated nature of the denoised wavelet coefficients was an aid to the more accurate source estimation in w-ICA framework in the wavelet domain. However, in our opinion about this point, the sparseness of the denoised wavelet coefficients is

ip t

also probably beneficial to the separation, owing to that the threshed wavelet coefficients are expected to have the stronger super-Gaussian distribution. Further,

cr

Wang et al. (2012b, 2013) have reported that the sparse approximation coefficients set

us

formed based on the Lp norm of the wavelet packet tree nodes was useful to blind source separation of fMRI data using ICA. Besides the fMRI signal separation field,

an

particularly, Kisilev et al. (2000, 2003) have demonstrated that adopting the selected

M

wavelet coefficients with higher sparsity yielded better separation performance while adopting all wavelet coefficients yielded intermediate sparsity and separation

ed

performance. Bronstein et al. (2005) also have reported that the sparse ICA (SPICA) achieved better separation performance in separating the transmitted and reflected

ce pt

images blindly in comparison to the standard ICA. The above studies imply that the wavelet-based processing combining with a source separation technique such as ICA

Ac

could be possibly promising to the functional connectivity detection on human fMRI data.

1.3. What do we propose? Founding

on

the

researchers’

former

studies,

in

this

research,

a

wavelet-shrinkage based ICA model (WASICA) with more relaxed sources’ pdfs assumptions is proposed, aiming at separating the super-Gaussian sources and some Gaussian-like sources, which is a further development of our previous SACICA 6

Page 8 of 78

model (Wang et al., 2013). In contrast to SACICA, the proposed model is with main validity and novelty in two procedures: the sparse approximation coefficients set formation procedure and ICA-based decomposition and reconstruction procedure. The

ip t

sparse approximation coefficients set formation takes advantage of the following two operations: the wavelet shrinkage of the wavelet packet tree nodes and an automatic

cr

nodes selection and integration based on the relative wavelet packet energy (RWPE),

us

transforming the wavelet packet coefficients into the sparse approximation coefficients set with obviously stronger super-Gaussian distribution than the original

an

spatial domain fMRI data. Meanwhile, the formed sparse approximation coefficients

M

set forces the underlying sources in the mixtures to be sparser or stronger super-Gaussian. Thus, as the aforementioned research (Calhoun et al., 2013) implied,

ed

the sparse approximation coefficients set about the fMRI mixtures is expected to enhance the estimation accuracy of the sources (e.g., super-Gaussian and

ce pt

Gaussian-like sources) as well as the mixing matrix when combining with ICA method. Also, the unmixing operation in ICA-based decomposition and reconstruction

Ac

procedure is based on the denoised reconstruction approximation data, generated by inverse wavelet packet transformation of the sparse approximation coefficients set, which is beneficial to generate more accurate functional networks. The remainder of this paper is organized as follows: the theory and framework of WASICA will be firstly presented, and followed by the simulation and hybrid data experimental tests, which are designed to evaluate the signal recovery ability, signal detection sensitivity and temporal dynamics extraction performance concerning 7

Page 9 of 78

WASICA in comparison to FastICA, Infomax and SACICA. Meanwhile, the real fMRI datasets such as the task-related data and resting-state data are also applied to verify the effectiveness and some merits of WASICA. Finally, results and analysis

advantages and limitations of this new data analysis model.

cr

2. Theory and Method

ip t

will be presented together with interpretations, discussions and conclusions related to

us

2.1. Sparse approximation theory

an

The functional connectivity detection of the human brain activities can be formulated as a kind of blind source separation (BSS) problem, assuming that there

M

exists the functional integration of activity in multiple macroscopic loci (McKeown et

ed

al., 1998; Biswal et al., 1999; Beckmann & Smith, 2004). This BSS problem aims at retrieving the underlying sources, namely the functional networks, denoted in vector

ce pt

 s1  s  2 notation as S    with the size Q  M , assuming the observed mixtures, namely      sQ 

Ac

 x1  x  the fMRI mixing data, X   2  with the size P  M , can be written as:      xP 

X  AS ,

(1)

where A is a mixing matrix (or temporal dynamics) with the size P  Q from RQ to R P , the row vectors si and xi are both of size 1 M (Comon & Jutten, 2010). P is the number of timepoints, Q is the number of hidden sources, and M is the

number of voxels. Next, it is assumed that each source can be (approximately) 8

Page 10 of 78

represented by a linear combination of a few row atoms  m with the size 1 M , of be unit energy. Specifically, each (unknown) source signal can be written as: K

si   ci (m)m

(2)

m 1

where only few entries ci (m) of row vector cs are significant, most of them being

ip t

i

cr

negligible. Informally, a set of coefficients ci (m) are considered sparse if most of the coefficients are zero or very small and only a few of them are significant. In statistical

are said to have a sparse or

us

(or probabilistic) terms, coefficients ci (m)

an

super-Gaussian distribution if their histogram (or probability density) has a strong peak at the origin with heavy tails. Meanwhile, as the scholars Calhoun et al. (2013)

M

have pointed out, the appropriate choice of origin for the coordinate system in which one foresees to evaluate the sparseness of the signal is quite essential and should be

ed

ce pt

noted. These atoms  m

1    are called a complete dictionary   2  with the size      K 

K  M if these atoms can be utilized to reconstruct any signal si which lives in a

Ac

spatial domain of dimension R M . Thus, the sparse model of source si can be written in matrix form:

where csi   ci (1), ci (2),

si  csi  , ci ( K -1), ci ( K ) 

is a row vector of

(3) K

coefficients.

Furthermore, the sparse representation model of the Q sources S can be formulated as follow: S  Cs  ,

(4)

9

Page 11 of 78

where the C s satisfies:

, Cs ( K )  , where Cs with the size Q  K .

ip t

cs1    cs  Cs   2   Cs (1), Cs (2),   csQ 

When the dictionary  is a basis for the signal space, as is the discrete orthonormal

cr

Fourier basis, or a bi-orthogonal wavelet basis (Mallat, 1998), the only transform 

us

allowing perfect reconstruction is   1 .

the dictionary  takes the following form:

an

Very similarly, a sparse approximation process of the mixtures X considering

, C X ( K )  is a matrix with dimensions P  K in

ce pt

ed

cx1    cx2    where C X     C X (1), C X (2), cxi      c   xP 

(5)

M

X  CX  ,

which each c xi is a row vector of K coefficients cx (m) . The C X is called as the i

Ac

sparse approximation coefficients of X subjecting to the dictionary  . In this study, the K is assumed equal to M , in which case equations (2)-(4) become equalities instead of approximations. Combining formula (1), (4) and (5), the resultant formula (6) is deduced as follow: CX  ACs ,

(6)

when a proper dictionary  is selected. For example, if  is an orthonormal basis, a unique representation exists for the mixtures (and the sources) and can thus be 10

Page 12 of 78

calculated simply by inverse linear transform, necessarily satisfying CX  ACs . According to formula (6), the original BSS problem of X is reduced to a separation problem of sparse C X with likely sparser Cs . Comparing formula (1) with (6), the

ip t

mixing matrix A is unchanged if the approximation CX  ACs is with sufficient accuracy by a certain sparse approximation algorithm; for example, the matching

cr

pursuit algorithm (Tropp, 2006; Tropp et al., 2006) was shown to guarantee the

us

relation CX  ACs when Cs was sufficiently sparse (Gribonval & Nielsen, 2008). Namely, the mixing matrix A could be estimated on the sparse approximation

an

coefficients C X .

M

Sparse approximation coefficients C X of the mixtures X can facilitate both the estimation of the mixing matrix A and that of the sources by blind source

ed

separation methods such as ICA with Infomax or FastICA algorithms, owing to both C X and Cs being sparser or with stronger super-Gaussian distribution, enhancing

ce pt

the separation performance of these ICA algorithms (Calhoun et al., 2013). Good sparse approximation is challenging in fMRI because the set of significant

Ac

coefficients is different for each volume. Our approach using RWPE guarantees "good quality" estimation of the sparse approximation coefficients using a common C X across volumes (see section 2.2.2.2). Meanwhile, the sparse approximation process of the mixtures could be implemented effectively by wavelet or wavelet packet transformation, which can yield sources with diverse levels of sparsity (Kisilev et al., 2000, 2003; Bronstein et al., 2005; Wang et al., 2012b, 2013). Thus, founding on the above sparse approximation theory, the promising model, namely WASICA, with 11

Page 13 of 78

regard to the sparse coefficients set C X formation and the extraction of temporal dynamics A and functional networks S will be presented in the next section. 2.2. Framework of WASICA

ip t

The framework of WASICA is presented in Fig.1 vividly, which consists of three

cr

main procedures. The wavelet packet decomposition (P1, in Fig.1) regarding to each

volume of mixtures X is firstly implemented, which yields a series of wavelet

us

packet trees. The sparse approximation coefficients set formation (P2, in Fig.1) is

an

followed immediately. This procedure mainly consists of two sub-procedures: the wavelet shrinkage of the wavelet packet tree nodes of each volume (P2.1, in Fig.1)

M

and the automatic nodes selection and integration based on the RWPE (P2.2, in Fig.1).

ed

The ICA-based decomposition and reconstruction (P3, in Fig.1) comes lastly, which consists of three sub-procedures, namely temporal dynamics extraction using ICA

ce pt

(P3.1, in Fig.1), wavelet packet reconstruction based on the sparse approximation coefficients set (P3.2, in Fig.1) and least-square-based functional network reconstruction (P3.3, in Fig.1) orderly. Next, the details of P1, P2 and P3 are

Ac

illustrated in the following 2.2.1, 2.2.2 and 2.2.3 sections, respectively. 2.2.1. Wavelet packet decomposition Sparsity means that only a small fraction of coefficients differ significantly from zero when the signals are properly represented (Kisilev et al., 2000, 2003; Wang et al., 2013), whose pdfs exhibit the stronger super-Gaussian distributions having an obvious peak at the origin with heavy tails. This property will play a great role in 12

Page 14 of 78

blind source separation using Infomax and FastICA algorithms, with regular parameters setting (Calhoun et al., 2013). Considering this, to obtain the sparse approximation coefficients set of the fMRI mixtures X , the 1D wavelet packet

ip t

transformation is firstly implemented on the volumes of the fMRI mixtures, which is similar to the SACICA model. The main reason for the use of 1D wavelet is that each

cr

volume data embedding too many zero leads to that the noise level  i about the

us

mixture xi regarding to the following wavelet shrinkage procedure (see section 2.2.2.1) could not be accurately estimated by means of the proposed interquartile

an

range measure when using 3D wavelet under the proposed WASICA framework;

the discussion of Section 5.3).

M

however, 1D wavelet works well enough for handling this condition in WASICA (see

ed

For fMRI sources having oscillatory components, wavelet packets might be appropriate to express these signals. Also, using wavelet packets allows one to

ce pt

analyze given signals not only with a scale-oriented decomposition but also on frequency sub-bands. In the following, a brief theory of wavelet packet transform will

Ac

be presented (the details for Kisilev et al., 2000; Wang et al., 2013). Particularly, the wavelet packet library consists of the triple-indexed family of functions:

 jlk (t )  2 j /2 l (2 j t  k ), j,k  , l   .

(7)

As in the case of the wavelet packet transform, j , k are the scale and shift parameters, respectively, and l is the frequency parameter, related to the number of oscillations of a particular generating function l (t) . The set of functions  jl (t) forms a

 j, l 

wavelet packet and this set of functions can be split into two parts at a 13

Page 15 of 78

coarser scale:  j 1,2l (t) and  j 1,2l +1 (t) . It follows that these two form an orthonormal basis of the subspace which spans  jl (t) . A family of wavelet packet functions on a binary tree could be formed (as shown in Fig.2). The nodes of this tree are numbered , J -1, J

( J

denoting the

ip t

j  0, 1,

by two indices: the depth of the level

, 2 j -1 at the specific

decomposition level) and the number of nodes l = 0, 1, 2,

cr

level j (Mallat, 1998; Kisilev et al., 2000). Furthermore, for the ith source signal

(or functional network) si , the decomposition coefficients csjlk could be expressed as:

(8)

,

denotes the inner product operation and the coefficients csjlk are spilt

 j, l 

sets corresponding to the nodes of the wavelet packet tree. Further,

i

M

where into

an

csjlk = si , jlk , i

us

i

ed

equation (8) can be written in matrix form as CS  S T , where T denotes the transpose operation of a matrix. Similarly, the decomposition coefficients cxjlk of xi i

ce pt

(the ith volume of the fMRI mixtures) could be written as: cxjlki = xi , jlk .

(9)

Ac

Thus, the further matrix form of equation (9) can be expressed as CX  X T . In terms of the volume xi , the different wavelet packet tree nodes yield different degrees of source sparsity, and are naturally de-correlated. In particular, the scholars Kisilev et al. (2003) demonstrated that selecting wavelet coefficients which yield higher source sparseness also leads to better separation performance while utilizing all wavelet coefficients yield intermediate separation performance. This research denotes that the proper selection of the wavelet packet tree nodes is a key issue and has 14

Page 16 of 78

obvious impact on the signal source separation performance. Further, Wang et al. (2013) proposed a SACICA model for fMRI analysis, which adopted the FastICA to extract the mixing matrix on a single wavelet packet node selected based on a Lp

ip t

norm, generating better spatial source recoverability and signal detection sensitivity than FastICA. In contrast to SACICA, in this study, the wavelet shrinkage and novel

cr

automatic node selection and integration operations over the wavelet packet tree

us

nodes are proposed to form the more effective sparse approximation coefficients set

an

of the fMRI mixtures. The concrete procedures will be presented in the next section. 2.2.2. Sparse approximation coefficients set formation

M

2.2.2.1. Wavelet shrinkage of wavelet packet tree nodes

ed

For convenience and simplicity, suppose that there are P volumes for the fMRI mixtures and each volume with M voxels is analyzed by the 1D wavelet packet

ce pt

decomposition (which works well for noise level estimation, see Section 5.3) with J levels, yielding 2 j nodes at level j , totally generating P wavelet packet trees. The lth node at level j of the ith volume is denoted as Node(i, j , l ) , where

Ac

1  i  P , 0  j  J and 0  l  2 j  1 , respectively. Specifically, the node indexes at

the level J is recorded in the vector INDEX J   0,1,

, 2 J  1 with the size

1 2 J .

So far, the wavelet shrinkage has been extensively studied by many scholars and presents state-of-the-art performance in signal denoising (Donoho & Johnstone, 1994,1995; Donoho, 1995), power spectrum estimation (Moulin,1994), etc. The success of wavelet shrinkage is mainly owing to the fact that the wavelet transform is 15

Page 17 of 78

good at energy compression and the assumption that the small coefficients are more likely due to noise, while large coefficients are important signal features (Donoho & Johnstone, 1995). Gao (1998) has pointed out that shrinkage of the empirical wavelet

ip t

coefficients worked best when the underlying set of true coefficients was sparse. Meanwhile, we have noted that the wavelet shrinkage not only suppressed the noise in

cr

wavelet coefficients, but also simultaneously contributed to a sparse approximation to

us

both the mixtures X explicitly and the underlying sources S implicitly. In wavelet shrinkage, common shrinkage functions include the hard thresholding, soft

an

thresholding (Donoho & Johnstone, 1994, 1995; Donoho, 1995), Semisoft (Gao &

M

Bruce, 1997), Garotte (Gao, 1998), and Hard-Soft (Zeng et al., 2004). In this study, the hard shrinkage function is chosen to perform the thresholding operation for each

ed

node at the J level of each wavelet packet tree, except the approximation node (node index l  0 ). The corresponding formulas are expressed as follows:

ce pt

Node H (i,J,l )   Hi ( Node(i,J,l )), l  INDEX J  l  0 ,

 H ( x)  

(11)

i   i  2log(n) ,

(12)

i

Ac

0 x  i ,  x x  i

(10)

 i  IQR  Node(i,J,1: 2J  1)  *0.7413 ,

(13)

where 1  i  P , IQR represents the calculation of interquartile range (a robust variability measure in statistics) of a given vector, and n represents the length of the vector Node(i,J,1: 2 J  1) , which is a concatenation of the nodes at the J level of the ith wavelet packet tree except the approximation node Node(i,J,0) . After the wavelet shrinkage is implemented on the level J nodes Node(1: P,J,1: 2 J  1) , the 16

Page 18 of 78

corresponding

threshed

nodes

Node H (1: P,J,1: 2 J  1)

are

retrieved,

where

NodeH (1: P,J,0)  Node(1: P,J,0) .

2.2.2.2. Automatic nodes selection and integration based on the RWPE

ip t

After the nodes NodeH (1: P,J,0 : 2J  1) are retrieved, how to select and

cr

integrate these wavelet packet tree nodes and to form the sparse approximation coefficients set should be investigated, which has quite important effects on the

us

separation performance of WASICA. Fortunately, the wavelet-based information tools

an

are popular to apply in brain signal for quantitative analysis. For example, wavelet energy, relative wavelet energy, wavelet entropy, relative wavelet entropy and wavelet

M

statistical complexity have been often used in the characterization of scalp EEG

ed

records corresponding to secondary generalized tonic-clonic epileptic seizures (Rosso et al., 2001, 2006). In this study, the relative wavelet packet energy (RWPE) criterion

ce pt

is used to select the threshed wavelet tree nodes and form the sparse approximation coefficients set. Next, the formulation definition of RWPE of the threshed wavelet packet tree nodes is given for convenience. According to the above denotation, the

Ac

energy of the lth node at level J of the ith volume can be formulated as follow:

where

F

Energy  i,J,l   Node H (i,J,l ) , 2

(14)

F

denotes the Frobenius norm of a vector. The total energy of ith volume

can be expressed as: 2 J 1

Energytot  i    Node H (i,J,l ) l 0

2 F

.

(15)

Then, the RWPE of the lth node at level J of the ith volume has the following 17

Page 19 of 78

formation: RWPE  i,J,l  

Energy  i,J,l  . Energytot  i 

(16)

Further, the mean relative wavelet packet energy (MRWPE) and percentage of

calculated as the next respective expressions:

cr

MRWPEJ  l  2 1 J

 MRWPE  l 

,

(17)

(18)

J

an

PMRWPEJ  l  

1 P  RWPE  i,J,l   ,   P  i 1 

us

MRWPEJ  l  

ip t

MRWPE (PMRWPE) for the lth node at level J across all P volumes could be

l 0

where l  INDEX J . Thus, according to the formula (18), the measuring vector

M

PMRWPEJ  0 : 2 J  1 for each node at level J regarding to all the P wavelet

ed

packet trees is retrieved. In the follow, the vector PMRWPEJ  0 : 2 J  1 is sorted in descending order, and select the top q elements in PMRWPEJ  0 : 2 J  1 such that q 1

l 0

ce pt

 PMRWPE  l   Thresh J

is satisfied, where Thresh is an arbitrary threshold that

represents the proportion of total energy retained; meanwhile, the correspondingly

Ac

selected nodes indexes are recorded in the vector INDEX with the size of q . Immediately, the nodes selection operation can be formulated as: H   Node 1: P, J , l  , l  INDEX Node 1: P, J , l    ,  0, l  INDEX  l  INDEX J H sel

(19)

where the bold 0 denotes the zero vector. In the last, the sparse approximation coefficients set regarding to the fMRI mixtures can be integrated as follow: H CX  Nodesel 1: P, J , INDEX J  .

(20)

18

Page 20 of 78

Hereunto, the sparse approximation coefficients set C X is formed, which not only transforms the mixtures into one with sparser feature (or with stronger super-Gaussian property), but also is expected to lead the underlying sources C s in C X to stronger

ip t

sparsity (or stronger super-Gaussian distribution) than the si in X did. According to the research (Calhoun et al., 2013), the sparse approximation coefficients set C X

cr

is expected to be beneficial to the super-Gaussian or Gaussian-like sources separation

us

using ICA with FastICA and Infomax algorithms, whose performance is validated in

an

the Results and Analysis section. 2.2.3. ICA-based decomposition and reconstruction ICA-based

decomposition

and

reconstruction

M

The

consists

of

three

ed

sub-procedures: (i) temporal dynamics extraction using ICA, similar to SACICA; (ii) wavelet packet reconstruction based on the sparse approximation coefficients set; (iii)

ce pt

least-square-based functional networks reconstruction. After the sparse approximation coefficients set C X is formed, the first sub-procedure takes place. Thinking back to formula (6) CX  ACs , ICA could be utilized to extract the temporal dynamics A

Ac

from C X , where the estimator of A by ICA is denoted as A . Compared with formula (1) X  AS , in presence of noise, formula (6) is expected to retrieve the more accurate estimation A of A due to the sparser feature (or stronger super-Gaussian distribution) of C X and C s , in contrast to X and S , respectively. In the follow, the approximation xi for the fMRI mixture xi could be achieved by inverse wavelet packet transformation immediately, based on the sparse H approximation coefficients cxi , where cxi  Nodesel i, J , INDEX J  . Thus, the

19

Page 21 of 78

approximation of the ith volume could be formulated as follow: xi  cxi , i  1, 2,

,P.

(21)

Further, the wavelet packet reconstruction approximation of the fMRI mixtures X is

cr

ip t

 x1      achieved as X   xi  .      xP 

us

Since S  CS  , X  CX  , and CX  ACS , then it follows that X  AS and,



A



T

A



1

 A

T

X,

(22)

M

 s1    s2 S        sQ 

an

thus, S can be estimated using least squares as shown in following equation:

ed

where X preserves the denoising feature in contrast to X , and A is the estimator temporal dynamic A determined by ICA. Meanwhile, since X is considered a

ce pt

denoised version of X , it should improve estimation of the sources in the least squares procedure and the related experimental validation is presented in Section Appendix B.

Ac

2.3. Implementation of WASICA

In this section, the implementation of WASICA will be briefly described. The

mainly involved operations such as the wavelet packet decomposition, wavelet packet nodes thresholding, nodes energy calculation and inverse wavelet packet transformation were implemented by the in-house functions wpdec, wpthcoef, wenergy and wprec, respectively, provided by the Matlab software (Matlab 2012b, 20

Page 22 of 78

Mathworks Inc., Sherborn, MA, USA). Meanwhile, the Infomax (Makeig et al., 1996) implemented in ICA Toolbox for Psychophysiological Research (version 3.4) or FastICA

(Hyvärinen

et

al.,

1997)

in

FastICA

2.5

toolbox

ip t

(http://research.ics.aalto.fi/ica/fastica/code/dlcode.shtml) was utilized in WASICA to perform the ICA decomposition.

cr

Based on the above description, we have established all the theory, methods and

us

implementation details about WASICA; and, next, we will give some experimental

proposed WASICA in following sections.

M

3. Experimental Tests

an

tests with analysis and discussion to verify the effectiveness and merits of the

ed

The designed experimental tests consist of simulation data, hybrid data, task-related data and resting-state data experiments. The performance evaluation

ce pt

includes the following aspects: the separation performance of WASICA regarding to super-Gaussian sources or Gauss-like sources is firstly investigated on Simulation 1 dataset; then, the recovery ability and detection sensitivity of functional signal and the

Ac

accuracy of temporal dynamics extraction as to WASICA, compared with FastICA, Infomax and SACICA, are assessed on Simulation 2 dataset and the hybrid dataset; the effectiveness and merits validation of WASICA on task-related dataset are further investigated with temporal correlation analysis and spatial intra-consistency analysis, in comparison to GLM and other three above mentioned methods; at last, the effectiveness verification of WASICA on the resting-state dataset is also implemented.

21

Page 23 of 78

3.1. Simulation dataset 3.1.1. Simulation 1: Super-Gaussian and Gauss-like sources dataset These sources dataset was provided by the scholars Vince D. Calhoun and

ip t

Rogers F. Silva, which was recently utilized to explore the really successful nature of some ICA algorithms, e.g., FastICA and Infomax (Calhoun et al., 2013). These

cr

sources dataset includes four cases of simulation sources, i.e., the case of unimodal

us

super-Gaussian sources with small box size (Case A), the one of bimodal close-to-Gaussian sources (in the sense of kurtosis) with medium box size (Case B),

an

the one of bimodal very-close-to-Gaussian sources (in the sense of kurtosis) with

M

large box size (Case C) and the one of unimodal super-Gaussian sources with very large box size (Case D); each case consists of 100 realizations and each realization is

ed

with two simulated sources (S1 and S2) (for details, please see Figure 1 and Table 1 of Calhoun et al., 2013). On the basis of known paired sources S1 and S2 of every

ce pt

realization regarding to each case, the randomly generated mixing matrix with size of 2  2 , whose values belonged to

 0,1 ,

was used to form the simulated mixtures

Ac

(formed 100 mixture realizations for each case, totally 400 mixtures realizations for all above four cases), which could be applied to completely assess the performance of WASICA.

3.1.2. Simulation 2: A comprehensive simulation data The recently released toolbox SimTB (http://mialab.mrn.org/software; Erhardt et al., 2012) was used to generate a comprehensive simulation dataset to evaluate the performance of the FastICA, Infomax, SACICA and the proposed WASICA in this 22

Page 24 of 78

study. The SimTB toolbox could provide the ground truth of the simulated spatial maps (SM) and time courses (TC) that different estimation methods could be compared against, which was used in evaluating the capability of the temporal domain

ip t

concatenated group ICA (Calhoun et al., 2001b) under conditions of spatial, temporal, and amplitude variability (Allen et al., 2012) and the intrinsic networks extracting

cr

performance of first-level ICA and second-level ICA (Calhoun & Allen, 2013). Thus,

us

one subject data was generated, with V=148×148 voxels, 12 spatial sources without expansion or contraction (selected from the default source template provided by

an

SimTB) and 120 time points at TR (time of repeat)=2 seconds. The baseline intensity

M

was set to 800, and the baseline map was shown in Fig.3A. Each source, depicted in Fig.3B, represented a spatial pattern that underwent certain activation over time. Two

ed

sources (10 and 12) shared task-related modulation in addition to having unique fluctuations. For source 10, the strength of task-modulation (expressed as the ratio

ce pt

between task event amplitude and unique event amplitude) was set to 4, while task-relatedness was smaller for source 12, set to 2. Task-modulation was introduced

Ac

with a block design (24 seconds (s) off, 24s on, five blocks), which was convolved with a canonical hemodynamic response function to simulate the slow dynamics of the vascular response (Friston et al., 1995). Activation for the other ten sources was described solely by unique hemodynamic fluctuations with no task-related variation. All sources had unique events that occurred with a probability of 0.2 at each TR. For task-modulated sources (10 and 12), unique events were added with smaller amplitudes (0.2 and 0.4, respectively). For sources not of interest (no task modulation), 23

Page 25 of 78

the unique amplitude was 1. For all sources, the percent signal change was centered at 3 with a standard was deviation of 0.25. Additive noise was included to reach a specified contrast-to-noise ratio of 1. The TCs corresponding to the selected 12

ip t

sources were depicted in Fig.3C.

cr

3.2. Hybrid data

To assess the source recovery ability of WASICA at a deeper level, the hybrid

us

data experimental test was designed. One 7×7×7-voxel simulated signal S (marked in

an

Fig.4A) was added to the normalized resting-state fMRI data, at an amplitude equal to 0.3 times of the average amplitude of the signal of in-brain voxels within the mean

ed

mixing function, as shown in Fig.4B.

M

brain volume over all time-points. The signal changed over time, corresponding to

Furthermore, in order to evaluate the detection sensitivity of functional signal

ce pt

concerning WASICA, the weak signal change in contrast to the background was constructed by the smoothing technique on hybrid data at variant smooth kernels (namely full width at half maximum, FWHM). The illustration was shown in Fig.5,

Ac

where Area A denoted the added original simulated signal S, and Area B denoted the subtle signal change area (generated by smooth technique) related to the simulated signal S on the surrounding area of signal S. We called union of Area A and Area B the affected area of signal S due to smoothing, denoted as Area A B . If a

certain analysis method could discover more voxels in Area A B influenced by smooth operation, we said that this method had better performance of detection sensitivity of functional signal. Namely, for a given method, the discovered region 24

Page 26 of 78

closer to Area A B , the stronger functional signal detection sensitivity. 3.3. Task-related Data One participant took part in this designed visual stimulus task experimental test,

ip t

informed about the purpose of this study and given the written informed consent prior

cr

to inclusion. The visual paradigm was OFF-ON-OFF-ON-OFF-ON-OFF in 20 seconds each block. The visual stimulus was a radial blue/yellow checkerboard,

us

reversing at 7 Hz, corresponding to the “ON” state. At the “OFF” state, the participant

an

was required to focus on the cross at the center of the screen. The BOLD fMRI dataset was acquired on a Philips 3.0 Tesla scanner using a multi-element receiver coil

M

to allow partially parallel image acquisition, a single-shot SENSE gradient echo EPI

ed

with TR of 2.0s. There were 40 slices providing the whole-brain coverage, with a SENSE acceleration factor of 3.0 and scan resolution of 80×80; the in-plane

ce pt

resolution was 3mm×3mm; slice thickness was 3 mm; slice gap was 1 mm. 3.4. Resting-state Data

Ac

The resting-state dataset was downloaded from the public neuroimaging database (http://www.nitrc.org/projects/fcon 1000/). This dataset was released by Dr. James J. Pekar and Dr. Stewart H. Mostofsky, including twenty-three healthy participants (8 males and 15 females). The participants’ ages ranged from 20 to 40. BOLD fMRI data were acquired on a Philips 3.0 Tesla scanner using a multi-element receiver coil to allow partially parallel image acquisition with single-shot SENSE gradient echo EPI with the following acquisition parameters: a SENSE acceleration factor=2.0, TR=2.5s, 25

Page 27 of 78

scan resolution=96×96, in-plane resolution=2.67mm×2.67mm, slice thickness=3mm, 47 slices providing the whole-brain coverage and time points=123. In the resting-state experimental test, data from a single subject was selected to evaluate the performance

ip t

of the proposed WASICA.

4. Results and Analysis

cr

4.1. Data processing

us

4.1.1. General data processing description

an

All computation was implemented on a server with Intel(R) Xeon(R) E5606 2.13 GHz processor and 32 GB RAM. The operating system platform was Windows Server

ed

running on the Matlab software.

M

Enterprise Service Pack1. All steps for preprocessing or processing algorithms were

For the simulation data experiments, no preprocessing step was involved. For all experiments,

the

common

preprocessing

was

performed

in

SPM8

ce pt

other

(www.fil.ion.ucl.ac.uk/spm/), including slice-timing, motion correction, spatial normalization. To assess the functional signal detection sensitivity of WASICA, the

Ac

smooth technique with varying Gaussian kernel sizes (FWHM ranged from 1mm to 8mm with step 1mm) was used to generate subtle functional signal area surrounding the simulated signal S on the hybrid data. Similarly, for the task-related data, the smooth Gaussian kernels also ranged from 1mm to 8mm with step 1mm. On the resting-state dataset, the normalized and unsmoothed data was used in Appendix B, and the smoothed data with FWHM equal to 8mm was used in Section 4.6. All the voxels' TCs except the simulation experiments were detrended by a first-degree 26

Page 28 of 78

polynomial (i.e., linear trend) and slow oscillations (i.e., discrete cosine transform (DCT) basis function up to cut-off frequency of 1/250 Hz). In simulation and hybrid data experimental tests, the four methods, namely

ip t

FastICA, Infomax, SACICA, and WASICA, were applied to do the performance comparison, while the GLM was adopted in task-related experimental test besides the

cr

above four methods. In resting-state experimental test, only WASICA was used to

us

explore the resting-state networks (RSN). The order number of simulation data was set to 13 (twelve simulated sources and one baseline component) for the above four

an

data-driven methods. The component order of the other three kinds of experimental

M

tests, for the above four data-driven methods, were automatically estimated by Laplace approximation method (Minka, 2000), which was also used in estimating the

ed

number of sources in the probabilistic ICA model of fMRI data analysis (Beckman & Smith, 2004). For SACICA, the 1D wavelet packet decomposition with three levels

ce pt

(one level for Simulation 1) using db4 wavelet basis and Shannon entropy was applied to the experimental dataset, and the FastICA on the sparse approximation

Ac

coefficients set was implemented (Wang et al., 2013). In WASICA, the 1D wavelet packet decomposition with four levels (one level and three levels for Simulation 1 and Simulation 2, respectively), db4 wavelet basis and Shannon entropy was also applied to the experimental data, where the out-of-brain areas of this data were excluded (no exclusion in the simulation data); for the automatic nodes selection and integration based on the RWPE procedure, the thresholds Thresh for all the experimental tests were set to 0.99. Particularly, the GLM analysis using SPM8 software was applied to 27

Page 29 of 78

the task-related dataset, with the p-value and extent threshold equal to 0.05 (corrected for multiple comparisons) and 10 voxels, respectively. In addition, the location and display of these normalized networks were assessed

ip t

by the PickAtlas software toolbox (Maldjian et al., 2003, 2004) and MRIcroN software (http://www.mricron.com), respectively.

cr

4.1.2. Reliability consideration of ICA estimates

us

In our four designed experimental tests, the stochastic nature of the involved

an

ICAs (e.g., FastICA, Infomax) or ICA-based algorithms (e.g., SACICA, WASICA) was considered. Regarding to Simulation 1, each of the randomly generated 400

M

mixtures (100 mixture realizations for each case, four cases in total) was run 12 times

ed

by each testing algorithm. Meanwhile, for Simulation 2, hybrid, task-related, and resting-state experiments, the stable run results with the maximum stability metric

ce pt

based on the intra-cluster similarity matrix after ICASSO (Himberg et al., 2004) was used, which was firstly proposed by Ma et al. (2011), and was later developed as the default

approach

in

the

GIFT toolbox

(http://mialab.mrn.org/software/gift/,

Ac

GIFTv2.0a). The corresponding parameters of ICA running times, minimum cluster size and max cluster size were set to 12, 2 and 13, respectively. 4.2. Measures for performance assessment To the beginning, the Pearson correlation coefficient and normalized mean square error (NMSE) metrics used in this research could be defined as follows: corrcoef ( A,B) 

cov(vec( A), vec( B)) , var (vec( A))var (vec( B))

(23)

28

Page 30 of 78

NMSE ( A,B) 

vec( A)  vec( B) vec  B 

2

2 F

,

(24)

F

where A and B were the estimation and reference vectors (or matrices), respectively; cov( ) , var ( ) and vec( ) represented the covariance operation,

ip t

variance operation and vector transform operation (transforming a matrix into a

cr

column vector), respectively.

Further, the normalized inter-symbol interference (ISI) (Moreau & Macchi, 1994;

us

Anderson et al., 2012; Calhoun et al., 2013) was considered to evaluate the separation

an

performance in this study, which was invariant to the scaling and permutation ambiguities:

Q  Q  Q  Q  Ga,b Ga,b 1  1      1  ,     b 1  a 1 max c Gc,b  2Q  Q  1  a 1  b 1 max c Ga,c   

M

ISI  G  

(25)

ed

where G  WA , Ga,b was the ath and bth element of G and Q represented the number of sources. This performance metric was bounded between 0 and 1 and the

ce pt

lowest the ISI value the better the separation performance. In addition, the receiver operating characteristic (ROC) curve and ROC power

Ac

analysis (Skudlarski et al., 1999) were taken to evaluate the performance among different algorithms. In our hybrid data experiments, ROC curve was utilized to assess the performance of signal recovery ability corresponding to FastICA, Infomax, SACICA and WASICA. Each point determined by the true positive rate (TPR) and false positive rate (FPR) of the ROC curve was calculated on the basis of the real active template (see Fig.5, illustrating the template with FWHM=6mm) and the z-thresh values of the z-score maps varying from 1.0 to 3.0 with

2 step length. The 30

29

Page 31 of 78

sizes of the real active templates of simulated signals in hybrid data experimental test were determined by the varying Gaussian smoothing kernels. The ROC power analysis was designed to evaluate the detection sensitivity of functional signal with

ip t

regard to FastICA, Infomax, SACICA and WASICA across the variant FWHM values. Each ROC power value was the area surrounded by the ROC curve subjected to a

cr

certain FWHM level.

us

4.3. Simulation data result

an

4.3.1. Results for Simulation 1

For each simulated mixture in Simulation 1, the FastICA, Infomaxsuper,

M

Infomaxsub, SACICA, WASICAsuper, WASICAsub and WASICAfast1 were run 12 times

ed

each to alleviate the result random effects of individual runs. Thus, for each method mentioned above, we obtained 1200 separation results with regard to each case, i.e.,

ce pt

Case A, Case B, Case C and Case D. Further, the normalized ISI values were calculated according to equation (25) for each case, and the corresponding mean values and SDs (standard deviation) of ISIs for each case were recorded in Table 1. As

Ac

presented in Table 1, it was easily concluded that (1) FastICA, Infomaxsuper and all versions of WASICA perform well for Case A (the case of unimodal super-Gaussian sources with small box size); (2) WASICAsuper and WASICAsub perform well, and all others mostly fail for Case B (the case of bimodal close-to-Gaussian sources with

1

Infomaxsuper and Infomaxsub denote Infomax with sigmoid nonlinearity that assumes a unimodal super-Gaussian

source and with a nonlinearity which assumes a sub-Gaussian source, respectively. WASICAsuper, WASICAsub and WASICAfast represent WASICA combining with Infomaxsuper, Infomaxsub and FastICA, respectively. 30

Page 32 of 78

medium box size); (3) only Infomaxsub performs well for Case C (the one of bimodal very-close-to-Gaussian sources with large box size); (4) FastICA, Infomaxsuper, WASICAsuper and WASICAsub perform well for Case D (the one of unimodal

ip t

super-Gaussian sources with very large box size). In addition, we also could observe that (1) WASICA embedding with Infomax was better for bimodal Gauss-like sources

cr

estimation than the others did (Case B); WASICA combining with Infomax

us

demonstrated better separation performance than WASICA with FastICA; WASICAsuper with mostly lower mean values and less SDs of ISIs exhibited better and

an

stable performance than WASICAsub did. Thus, we selected WASICAsuper as the one

M

for the following experimental analysis, and we signified WASICAsuper as WASICA for simplicity. Meanwhile, Infomaxsuper was also selected as the representative of

ed

Infomax for the following experimental tests due to the spatial super-Gaussian

ce pt

property of brain functional networks (McKeown et al., 1998). 4.3.2. Results for Simulation 2

In Simulation 2 experimental test, the FastICA, Infomax (namely Infomaxsuper),

Ac

SACICA, WASICA (namely WASICAsuper) were run 12 times each to alleviate the result from random effects of individual runs, and the most stable run results (see Section 4.1.2) of each method were used as the representative results. Meanwhile, the intensity of all the detected SMs and TCs was normalized to the range  1, +1 and +0.5 , -0.5，

respectively, for display purposes. The SMs determined by the above

four methods and the ground truth sources were depicted in Fig.6, denoting that all four methods successfully detected the twelve simulated sources. Generally, the SMs 31

Page 33 of 78

extracted by WASICA were visually with less noise and variation, compared with the estimates by the others. Further, to perform a quantitative performance comparison of the four methods in both the spatial and temporal domains, the Pearson correlation

ip t

coefficients and NMSEs between the spatial/temporal ground truths and the corresponding spatial/temporal estimates determined by the four methods were

0.9826 ) 0.7843，

denoted that all the 12 SMs

us

(with zoom-in correlation range

cr

calculated and plotted in Fig.7. Fig.7A (with normal correlation range 0， 1 ) and B

determined by WASICA had the stronger spatial correlation with their corresponding

an

ground truth sources, compared with the corresponding ones by the other three

M

methods. Further, the minimum one among the spatial correlation coefficients corresponding to WASICA was 0.9403, whereas the minimum one among the spatial

ed

correlation coefficients corresponding to FastICA, Infomax and SACICA were 0.7841, 0.7965 and 0.7843, respectively. This phenomenon also denoted that the WASICA had

ce pt

the more accuracy of source recovery capability in spatial domain, in contrast to the other three methods. In addition, the spatial NMSE curves against the source number

Ac

as to the four methods were depicted in Fig.7E, and it was clearly shown that the spatial NMSE curve corresponding to the WASICA lay at the bottom among the four curves, which further declared that all the twelve SMs extracted by WASICA were indeed closer to the ground truth sources than the corresponding ones by the other three methods. Besides the spatial domain analysis, the temporal domain comparison analysis was also investigated. To the beginning, the temporal Pearson correlation coefficients 32

Page 34 of 78

between the ground truth TCs of all simulated sources and the corresponding TCs determined by the above four methods were calculated, and the temporal correlation curves against different sources corresponding to the above four methods were depicted in Fig.7C (with normal correlation range

1 ) 0，

and D (with zoom-in

ip t

correlation range  0.9360， 1 ). Similar to spatial domain, it also could be observed

cr

that the temporal correlation curve as to WASICA lay at the top among all the four

us

curves. Meanwhile, the mean value and the SD value of each temporal correlation curve were respectively calculated and recorded as 0.9753 and 0.0209 for FastICA,

an

0.9884 and 0.0082 for Infomax, 0.9699 and 0.0203 for SACICA, 0.9959 and 0.0031

M

for WASICA, denoting that WASICA had the largest mean value and least SD value. Furthermore, the temporal NMSE curves against the source number as to the four

ed

methods were also illustrated in Fig.7F, and it was clearly shown that the temporal NMSE curve regarding to WASICA lay at the bottom among the four curves. In

ce pt

addition, the mean ISI values (see equation (25)) and the SD values based on the 12 runs for the above four methods were calculated and recorded in Table 2; As expected,

Ac

the proposed WASICA performed the best with the least mean value and SD value of ISI. To sum up, all the above quantitative analysis in temporal domain revealed that the proposed method had better temporal dynamics extraction performance than the other three methods did on Simulation 2. 4.4. Hybrid data result The ROC curves of signal S on the unsmoothed hybrid data experimental test corresponding to FastICA, Infomax, SACICA and WASICA were depicted in Fig.8A 33

Page 35 of 78

(with normal TPR range

0,1 )

and B (with zoom-in TPR range

0.9329,1 ),

respectively. The ROC analysis demonstrated that WASICA occupied the stronger signal recovery ability in spatial domain than the other three methods did, specifically

ip t

obvious for FPR less than 0.05. Furthermore, in order to assess detection sensitivity of the functional signal regarding to the above four methods, the ROC power analysis

cr

was performed on the hybrid dataset with varying FWHM values (from 0mm to 8mm

power range

0.4284, 0.9968 )

us

with 1mm steps). Fig.8C (with normal power range  0,1 ) and D (with zoom-in presented the evaluated results of the detection

an

sensitivity performance using ROC power analysis, which denoted that the proposed

M

method had the better detection sensitivity of functional signal than the other three methods did. Specifically, the predominance in signal detection sensitivity as to

ed

WASICA was more obvious when the FWHM value was less than 6mm. According to Fig.8D, it was easily observed that FastICA had slightly weaker functional signal

ce pt

detection sensitivity than both SACICA and Infomax did, while these two methods had the similar performance in signal sensitivity detection.

Ac

4.5. Task-related data result

The visual stimulus dataset at different FWHM values was explored by GLM,

FastICA, Infomax, SACICA and WASICA, where the results were displayed in Fig.9. On the basis of Fig.9A, in terms of each row (at a specific FWHM value), the SMs (spatial map) detected by the five methods were very similar, which obviously denoted the effectiveness of the proposed method in spatial domain. Meanwhile, in terms of each column of Fig.9A, it could be observed that the SMs detected by 34

Page 36 of 78

WASICA across the varying FWHM values (the fifth column) were with more consistency and less variation than the ones by the other four methods visually. A quantitative comparison about the spatial consistency is discussed further below in

ip t

detail. Furthermore, the visual stimulus reference block regarding to GLM and the corresponding TCs (note: each method gave one TC for the entire SM with regard to

cr

visual stimulus) respectively given by FastICA, Infomax, SACICA and WASICA on

us

the smoothed data (FWFM=8mm) were exhibited in Fig.9B, and the high temporal correlation (the correlation coefficients marked in Fig.9B) between the TC extracted

an

by WASICA and the other TCs (or reference block for GLM) by the other four

M

methods further demonstrated the temporal domain effectiveness of WASICA for the visual stimulus task data. Similarly, the good temporal domain effectiveness of

ed

WASICA was also supported by the results from the visual task-related data under different FWHM values (see Fig.S1).

ce pt

In addition, the spatial map consistency analysis across variant FWHM values (called intra-consistency) for the visual stimulus task experiment was investigated.

Ac

Suppose that the SMs related to the given stimulus at variant smoothing kernels (ranging from 0mm to V mm with 1mm step) were extracted by a certain method, for example, WASICA, and we denoted the SMs with FWHM size equal to u mm as SM smoothu . Thus, the Pearson correlation coefficients between any two different spatial

maps SM smoothu and SM smoothv could be calculated, where variant smoothing kernel ranged from 0mm to V mm with 1mm step. Thus, on the basis of formula (23), the formulation expression could be represented as: 35

Page 37 of 78





CCSM  u,v   abs corrcoef ( SM smoothu ,SM smoothv ) , which subjected to u, v 0,1, 2,

(26)

,V  1,V   u  v . Further, the function vec()

was used to transform the matrix

CCSM

to a vector

V  V  1 ' . Based on CCSM , 2

ip t

' ' CCSM  vec  CCSM  , and the length n0 of CCSM was

' CCSM , namely

cr

two intra-consistency metrics across the variant smooth kernels, namely mean

follow:

 CC h 1

' SM

(h)

(27)

1

 2  CC (h)  MCL   .  h 1  n0

' SM

2

(28)

M

1 VCL    n0

n0

an

1 MCL  n0

us

consistency level (MCL) and variant consistency level (VCL) could be expressed as

ed

The higher MCL value and lower VCL value, the better intra-consistency of the SMs across the variant smooth kernels. On the visual stimulus experimental test, the MCL

ce pt

and VCL values of the SMs extracted by GLM, FastICA, Informax, SACICA and WASICA, with V equal to 8, were calculated and recorded in Table 3, respectively. As expected, it exhibited that WASICA had the highest MCL values and lowest VCL

Ac

values, which demonstrated that the SMs extracted by WASICA occupied the best intra-consistency against the smooth kernel change among the five methods. The least VCL value of WASICA also supported that the SMs (the fifth column, shown in Fig.9A, detected by WASICA) visually held the strongest intra-consistency and least variation across the variant FWHM values. Last but not the least, from the first row of Fig.9A (unsmoothed), it could be obviously observed that the proposed method discovered more active voxels on the 36

Page 38 of 78

occipital lobe related to the visual stimulus than the other four methods did, where the number of active voxels of each SM (the first row of Fig.9A) determined by GLM, FastICA, Infomax, SACICA and WASICA were 4655, 8227, 8152, 8107 and 9108,

ip t

respectively. Thus, the voxels only discovered by WASICA (on unsmoothed data) in contrast to the GLM, FastICA, Infomax and SACICA, were extracted and depicted in

cr

Fig.10A. It could be seen that the very many voxels (shown in Fig.10A) really

us

belonged to the visual stimulus related occipital cortex area. Furthermore, the TCs corresponding to the depicted voxels were extracted from the original normalized and

an

unsmoothed visual stimulus task data, and the mean TC of these voxels were

M

calculated, normalized and plotted in Fig.10B. Meanwhile, the left circular shift operation with 4 time points (the estimated delay of the BOLD response to neuronal

ed

stimulation) was performed on the normalized mean TC, and the corresponding shifted mean TC was also presented in Fig.10B. It could be clearly observed that the

ce pt

mean TC and shifted mean TC (shown in Fig.10B) of these voxels were highly correlated to the reference block with the Pearson correlation coefficients equal to

Ac

0.3519 and 0.8203 (marked in Fig.10B), respectively. Thus, these results in terms of both the spatial and temporal domains demonstrated the effectiveness of the voxels only detected by WASICA, in comparison to GLM, FastICA, Infomax and SACICA. 4.6. Resting-state data result The proposed WASICA was lastly utilized to assess the RSNs in the resting-state experimental test. As expected, some typical RSNs reported by many previous researches (Beckmann et al., 2005; Damoiseaux et al., 2006; Schöpf et al., 2010; 37

Page 39 of 78

Wang et al., 2012a, 2013) were successfully discovered such as VIN (a predominantly occipital network, Fig.11A), LVN (a lateral visual network involving the lateral and superior occipital gyrus, Fig.11B), DMN (a default mode network involving primarily

ip t

precuneus and prefrontal lobe and parietal regions, Fig.11C), AUN (a network involving the auditory regions, Fig.11D), ECN (an executive control network

cr

including superior and middle prefrontal cortex, anterior cingulate and paracingulate

us

gyri, Fig.11E), DPN1 and DPN2 (networks mainly involving dorsal parietal cortex, Fig.11F and J), LWMN and RWMN (networks including the left and right hemisphere

an

regions known to be involving in working memory, Fig.11G and H), and SMN (a

M

network including bilateral sensorimotor cortex, Fig.11I), being exhibited in Fig.11. In addition, Fig. 11K depicted a network including the superior temporal (BA 22) and

ed

superior frontal (BA 9) areas; Fig. 11L showed part of the prefrontal (BA 10/11) areas and limbic lobe, which seemed to be part of DMN shown in Fig. 11C (Damoiseaux et

ce pt

al., 2006). According to RSNs shown in Fig.11, it clearly demonstrated that WASICA was also effective to perform resting-state data analysis to extract the resting-state

Ac

functional brain sources.

5. Discussion and Conclusion 5.1. Merits of WASICA In this present study, we demonstrated an effective functional connectivity detection model──WASICA, aiming at separating the super-Gaussian sources and some Gaussian-like sources, which demonstrated the superiority of sparse approximation in functional network dynamics detection. Similar to our previous 38

Page 40 of 78

SACICA, the proposed model was composed of three procedures: the wavelet packet decomposition, the sparse approximation coefficients set formation and the ICA-based decomposition and reconstruction. In terms of theory, the proposed model

ip t

first fully took advantage of the sparsity of the wavelet packet tree nodes, which was one of the essential aspects of WASICA success. Then, the wavelet shrinkage

cr

operation of the wavelet packet tree nodes played a great role in not only contributing

us

to a sparser approximation of the fMRI mixtures explicitly and of the sources implicitly, but also suppressing the noise in wavelet coefficients. In the follow, the

an

automatic nodes selection and integration based on the RWPE acted a great part in

M

forming the effective sparse approximation coefficients set as to the mixtures. The features of the wavelet packet transform, wavelet shrinkage operation and the

ed

automatic nodes selection and integration operation were explored in Appendix A, which denoted that these operations could transform the mixtures X (representing

ce pt

the simulation dataset in Appendix A) with sub-Gaussian distribution in spatial domain into the sparse approximation coefficients set C X with strong super-Gaussian

Ac

distribution (or strong sparsity) in wavelet domain. According to the research (Calhoun et al. 2013) and sparse approximation theory, this could contribute to the accuracy of temporal dynamics model estimation for both super-Gaussian and Gaussian-like sources when using ICA (i.e., Infomax) with the regular parameters setting. Furthermore, the approximation set X reconstructed based on the sparse approximation coefficients set C X was expected to exclude much nuisance noise in original data and to enhance the functional network recovery ability and detection 39

Page 41 of 78

sensitivity. Combining with the de-noised approximation dataset

X , the

least-square-based functional networks reconstruction ensured that the estimated functional networks were achieved with more accuracy than the ones on the basis of

ip t

original mixture X , where the related performance validation was presented in Appendix B.

cr

In practice, to begin with, WASICA showed good separation performance for all

us

super-Gaussian cases (Case A and D) and close-to-Gaussian sources case (Case B), and failed in very-close-to-Gaussian sources case (Case C, discussed in Section 5.4)

an

according to the results presented in Table 1. It was worth noting that: on one hand,

M

only WASICA succeeded in Case B where other methods, i.e., FastICA, Infomax and SACICA, mostly failed, which revealed the superiority of WASICA in Gaussian-like

ed

sources separation; on other hand, compared with Infomax, the WASICA embedding with Infomax was less subject to the prerequisite that a nonlinearity assumption

ce pt

should satisfy a unimodal super-Gaussian or a sub-Gaussian source. This merits originated from the effective super-Gaussian property of the formed sparse

Ac

approximation coefficients set in WASICA. Thus, combining WASICA with Infomaxsup was a better choice for signal separation. Further, WASICA had shown the superiority in both the spatial and temporal domain performance in the designed experimental tests. For example, in Simulation 2 experiment, the spatial correlation analysis and spatial NMSE comparison (depicted in Fig.7A, B and E) clearly demonstrated the stronger source recovery ability of WASICA in contrast to the FastICA, Infomax and SACICA; further, the temporal correlation analysis, temporal 40

Page 42 of 78

domain NMSE analysis and the normalized ISI metric evaluation also revealed that the proposed method had the best temporal dynamic model extraction performance among these four methods (shown in Fig.7C, D, F and Table 2). In the follow, in the

ip t

hybrid data experimental tests, the ROC analysis on the unsmoothed hybrid data (shown in Fig.8A and B) denoted that the WASICA held the superiority in the source

cr

recovery ability compared with the other three methods; meanwhile, the ROC power

us

analysis (i.e., the area under the curve, AUC) of the hybrid dataset with different smooth kernels presented that the WASICA was clearly possessed of the stronger

an

signal detection sensitivity than the other three methods (results shown in Fig.8C and

M

D). Thus, the Simulation 2 and hybrid data experimental results revealed that the proposed WASICA had the best spatial and temporal domain separation performance

ed

among the four data-driven methods. Furthermore, the task-related data experimental results also presented the validity and merits of the proposed model. For example,

ce pt

Fig.9 revealed that WASICA had both the spatial domain and temporal domain effectiveness of functional network detection regarding to the visual stimulus data

Ac

across the variant FWHMs; meanwhile, the higher intra-consistency of the visual networks by WASICA revealed that the proposed method held the stronger spatial robustness against the smooth kernels than the GLM, FastICA, Infomax and SACICA did (shown in Table 3); especially, from the first row of Fig.9A (unsmoothed), it was clearly observed that the proposed method discovered more active voxels (depicted in Fig.10A) on the occipital lobe related to the visual stimulus than the other four methods did; the mean TC (calculated based on the time series of voxels shown in 41

Page 43 of 78

Fig.10A from the normalized and unsmoothed task-related data) and the corresponding shifted mean TC (shown in Fig.10B) of these voxels only detected by WASICA were highly correlated to the reference block design related to the visual

ip t

stimulus, which declared that the voxels (depicted in Fig.10A) were located correctly. Lastly, it was also verified that the proposed model could powerfully extract the RSNs

cr

on the resting-state dataset. Some typical RSNs such as VIN, LVN, DMN, AUN, ECN,

us

DPN1, DPN2, LWMN, RWMN, SMN and so on, were successfully extracted (exhibited in Fig.11). In a word, all the theoretical and experimental features and

an

merits mentioned above demonstrated that WASICA was a promising blind functional

M

signal separation model with stronger source recovery ability, signal detection sensitivity, better temporal dynamic model extraction and stronger spatial robustness

ed

against the smoothing kernels for fMRI data analysis.

ce pt

5.2. Selection of decomposition level & PMRWPE Thresh The parameters such as decomposition level and PMRWPE Thresh in WASICA should be noted. The authors recommend that the number of wavelet packet

Ac

decomposition level ( J ) should be set slightly larger, as compared to the simulated single-slice data, for the volume data with multi-slices and huge quantity voxels, to achieve better blind separation performance. For example, in our Simulation 1 and Simulation 2 dataset, the decomposition level was set to J  1 and J  3 , respectively, for the simulated volume data with just one slice; in the other three kinds of experimental tests, the decomposition level was set to J  4 for the normalized volume data with 91 slices. Further, though the setting of the parameter PMRWPE 42

Page 44 of 78

Thresh was empirical, the authors recommend Thresh  0.99 for fMRI data

analysis on the basis of our experimental tests, which usually generated good-enough results. In addition, on the basis of the above given PMRWPE Thresh , the automatic

ip t

nodes selection and integration procedure of WASICA can adaptively choose the different number (namely the q value ) of the shrunk wavelet nodes on the

cr

simulation dataset, hybrid dataset and real dataset, respectively (the automatically

us

determined q values and the total number of nodes at the level J on different

an

experimental tests were recorded in Table S1 of the supplementary material). 5.3. 1D wavelet vs. 3D wavelet

M

Under the WASICA’s framework, the authors recommend that the 1D wavelet

ed

packet analysis should be utilized to form the sparse approximation coefficients set, rather than the 3D wavelet or wavelet packet analysis, although each volume of the

ce pt

fMRI mixtures was 3D data. The main reason was that each data volume contained too many zeros and the noise level  i about the mixture xi could not be accurately estimated based on the interquartile range (see formula (13)) or the median estimator

Ac

(proposed by Donoho & Johnstone, 1994). For example, the common normalized volume data was usually with the dimensions 91109  91 , the non-zero voxel number was approximate to 274599, while the zero value voxel number was approximate to 628030. This phenomenon meant the noise level  i was always estimated as 0 based on the interquartile range or median measure. Thus, the incorrect noise level estimation could put the wavelet shrinkage sub-procedure and the automatic nodes selection and integration sub-procedure out of action, in the 3D 43

Page 45 of 78

wavelet case. However, as to the 1D wavelet packet transformation based WASICA model, the non-brain area voxels were excluded for the real fMRI data analysis in the implementation procedure, which was beneficial to noise level  i estimation.

ip t

Meanwhile, our experimental results also denoted the WASICA using 1D wavelet for C X formation outperformed the WASICA using 3D wavelet for C X formation

cr

under the current WASICA framework (shown in Fig.S2 of the supplementary

us

material). In spite of that, 3D wavelets plus better handling of coefficients related to areas out of the brain would possibly yield better signal compression and spatial

an

characterization (i.e., make C X more super-Gaussian). For example, Kullar et al.

M

(2011b) successfully implemented the denoising on 3D wavelets by "masking" sub-bands, which could be a potentially better way to handle zero-valued coefficients

ed

corresponding to areas out of the brain.

ce pt

5.4. Limitation and Future work

According to the results shown in Table 1, one limitation is that the Case C of bimodal very-close-to-Gaussian sources with large box size was not successfully

Ac

recovered by all versions of WASICA. However, in real fMRI data analysis, spatial functional networks usually depicted stronger super-Gaussian property (McKeown et al., 1998), which implies that this proposed WASICA still could be effective for fMRI data analysis. In spite of that, the bimodal very-close-to-Gaussian sources separation under WASICA is one of our future works. Meanwhile, another future research could be involved in the multi-subjects-based WASICA for group fMRI mixtures separation and distinguishing inference between different control groups. 44

Page 46 of 78

Acknowledgments The authors greatly appreciate the insightful comments of anonymous reviewers, which have helped the authors to improve the manuscript. The authors are highly

ip t

grateful to Dr. Jianbiao Chen and Dr. Elena Allen for help with some discussions

cr

about sparse representation and the usage of SimTB toolbox, respectively. The authors also deeply appreciate the essential help from Dr. Vince D. Calhoun and Mr. Rogers F.

us

Silva for providing the Simulation 1 dataset. Research supported by the National

an

Natural Science Foundation of China (Grant No.31170952), the Innovation Program of Shanghai Municipal Education Commission (Grant No. 11ZZ143), the Research

M

Fund for the Doctoral Program of Higher Education of China (Grant No.

ed

20113121120004), the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning, the Project Sponsored by the

ce pt

Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, Shanghai Municipal Natural Science Foundation of China (Grant No. 13ZR1455600) and the Programs for Graduate Special Endowment Fund for

Ac

Innovative developing (Grant No. 2013ycx072) and Excellent Doctoral Dissertation Cultivation (Grant No. 2013bxlp003) of Shanghai Maritime University.

Appendix A. Wavelet shrinkage & automatic nodes selection and integration: the effect investigation In WASICA, the sparse approximation coefficients set formation consisted of two sub-procedures, namely the wavelet shrinkage of wavelet packet tree nodes and 45

Page 47 of 78

the automatic nodes selection and integration based on the RWPE. In this part, the performance of the two sub-procedures was further investigated to declare the merits of WASICA. On Simulation 2, the kurtosis values of each volume data xi , 1  i  P ,

coefficients

Node H (i,J,0 : 2J  1)

and

the

selected

and

ip t

the corresponding wavelet coefficients Node(i,J,0 : 2 J  1) , the threshed wavelet integrated

wavelet

cr

H (i, J ,0 : 2 J  1) (namely c xi ) were calculated based on the coefficients Nodesel

us

Matlab in-house function kurtosis, respectively. Please note: the implemented kurtosis function was without subtracting 3, which made the Gaussian distribution with

an

kurtosis value equal to 3, super-Gaussian distribution with value greater than 3 and

M

sub-Gaussian distribution with value less than 3. The kurtosis curves corresponding to H xi , Node(i,J,0 : 2 J  1) , Node H (i,J,0 : 2J  1) and Nodesel (i, J ,0 : 2 J  1) , against

ed

the time points ( 1  i  120 ), were plotted in Fig.A-1 E. The mean value and the SD value as to each kurtosis curve were marked in Fig.A-1 E. It was easily observed that

ce pt

(i) the mixture data X of Simulation 2 in spatial domain had a sub-Gaussian distribution with kurtosis value less than 3; (ii) all the three kinds of the wavelet

Ac

coefficients, satisfying super-Gaussian distribution, had greater kurtosis values than X did;

(iii)

the

kurtosis

values

of

the

original

wavelet

coefficients

( Node(i,J,0 : 2 J  1) ), threshed wavelet coefficients ( Node H (i,J,0 : 2J  1) ) and H (i, J ,0 : 2 J  1) ) demonstrated an selected and integrated wavelet coefficients ( Nodesel

increasing trend, in that order. Further, Fig.A-1 F (with zoom-in mode) also depicted that the kurtosis values of Node H (i,J,0 : 2J  1) was slightly greater than the one of Node(i,J,0 : 2 J  1) , where 1  i  120 . Meanwhile, the histograms with regard to 46

Page 48 of 78

18th volume data and its three corresponding kinds of wavelet coefficients H (18, J ,0 : 2J  1) ) were ( Node(18,J,0 : 2J  1) , Node H (18,J,0 : 2J  1) and Nodesel

presented in order in Fig. A.1 (A-D), with labeling the respective kurtosis values,

ip t

which implied that the wavelet coefficients embedding with more zero were really sparser (or stronger super-Gaussian distribution) than the spatial domain, specifically

cr

H (18, J ,0 : 2J  1) . for Nodesel

us

To sum up, the above mentioned two sub-procedures were really beneficial to form the sparse approximation coefficients set, which we anticipated would help

an

identify transformed underlying sources C s with stronger super-Gaussian distribution

M

and, thus, give more accurate estimation of temporal dynamics according to the research (Calhoun et al., 2013) and sparse approximation theory (described in section

ed

2.1). As was expected, the sparse approximation coefficients set C X regarding to the original X indeed contributed to the temporal dynamic A estimation, as was fully

ce pt

validated in Section 4.3.2 Results for Simulation 2.

Appendix B. Brain networks reconstruction: Mixture

X Vs.

Ac

Approximation set X

The brain networks reconstruction in the last procedure of WASICA (i.e.,

ICA-based decomposition and reconstruction) is based on the unmixing operation of the approximation set X , which is generated by the inverse wavelet packet transformation as to the sparse approximation coefficients set C X (see Eq. (21)). The denoised feature of X in contrast to X could be beneficial to the quality of reconstructed functional networks. In order to validate this assumption, the 47

Page 49 of 78

normalized and unsmoothed resting-state BOLD fMRI data (the same subject used in Section 4.6) was analyzed by WASICA with only different reconstruction procedures, i.e., reconstruction strategies from mixture X or its approximation set X . Also, the

ip t

reliability consideration of WASICA was involved by the most stable run procedure (see Section 4.1.2). Based on two groups of results (one for X , the other for X )

cr

generated by WASICA, seven relatively obvious functional networks were selected

us

and presented by slice-by-slice (shown in Fig. B.1 A-G). Meanwhile, the obviously different regions of the equivalent SMs (spatial map) from two comparable groups

an

were marked by green box for X and blue box for X , respectively. By contrast, it

M

could be observed that the active regions of SMs reconstructed from X with relatively less noise variation visually were more clustered and obvious than the ones

ed

from X . This investigation supports the above assumption, which the approximation set X with denoised feature contributes to the quality of the separated brain

ce pt

networks in contrast to X .

References

Ac

Allen EA, Erhardt EB, Wei Y, Eichele T, Calhoun VD. Capturing inter-subject variability with group independent component analysis of fMRI data: a simulation study. Neuroimage 2012; 59(4):4141–4159. Anderson M, Li X, Rodriguez P, Adali T. An effective decoupling method for matrix optimization and its application to the ICA problem. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012; pp:1885–1888. Ambrosino S, Bos DJ, van Raalten TR, Kobussen NA, van Belle J, Oranje B, Durston 48

Page 50 of 78

S. Functional connectivity during cognitive control in children with autism spectrum disorder: an independent component analysis. J Neural Transm 2014; pp:1–11. Bagarinao E, Matsuo K, Nakai T, Sato S. Estimation of general linear model

ip t

coefficients for real-time application. Neuroimage 2003; 19(2):422–429. Beckmann CF, Smith SM. Probabilistic independent component analysis for

cr

functional magnetic resonance imaging. IEEE Trans Med Imaging 2004;

us

23(2):137–152.

Beckmann CF, Smith SM. Tensorial extensions of independent component analysis

an

for multisubject FMRI analysis. Neuroimage 2005; 25(1):294–311.

M

Beckmann CF, DeLuca M, Devlin JT, Smith SM. Investigations into resting-state connectivity using independent component analysis. Philos Trans R Soc B 2005;

ed

360(1457):1001–1013.

Bell AJ, Sejnowski TJ. An information-maximization approach to blind separation

ce pt

and blind deconvolution. Neural Comput 1995; 7(6):1129–1159. Biswal BB, Ulmer JL. Blind source separation of multiple signal sources of fMRI

Ac

data sets using independent component analysis. J Comput Assist Tomo 1999; 23(2):265–271.

Bronstein AM, Bronstein MM, Zibulevsky M, Zeevi YY. Sparse ICA for blind separation of transmitted and reflected images. Int J Imag Syst Tech 2005; 15(1):84–91. Calhoun VD, Adali T, Pearlson GD, Pekar JJ. Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related 49

Page 51 of 78

waveforms. Hum Brain Mapp 2001a; 13(1):43–53. Calhoun VD, Adali T, Pearlson GD, Pekar JJ. A method for making group inferences from functional MRI data using independent component analysis. Hum Brain Mapp

ip t

2001b; 14(3):140–151. Calhoun VD, Kiehl KA, Pearlson GD. Modulation of temporally coherent brain

cr

networks estimated using ICA at rest and during cognitive tasks. Hum Brain Mapp

us

2008; 29(7):828–838.

Calhoun VD, Potluru VK, Phlypo R, Silva RF, Pearlmutter BA, Caprihan A, Plis SM,

an

Adalı T. Independent component analysis for brain fMRI does indeed select for

M

maximal independence. PloS one 2013; 8(8):e73309.

Calhoun VD, Allen E. Extracting intrinsic functional networks with feature-based

ed

group independent component analysis. Psychometrika 2013; 78(2):243–259. Comon P, Jutten C. Handbook of Blind Source Separation, Independent Component

ce pt

Analysis and Applications. New York: Academic, Elsevier, 2010. Damoiseaux J, Rombouts S, Barkhof R, Scherltens P, Stam CJ, Smith SM, Beckmann

Ac

CF. Consistent resting state networks across healthy subjects. Proc Natl Acad Sci U S A 2006; 103(37):13848–13853. Donoho DL, Johnstone IM. Ideal spatial adaption via wavelet shrinkage. Biometrika 1994; 81(3):425–455. Donoho DL, Johnstone IM. Adapting to unknown smoothness via wavelet shrinkage. J Am Stat Assoc 1995; 90(432):1200–1224. Donoho DL. De-noising via soft-thresholding. IEEE Trans Inform Theory 1995; 50

Page 52 of 78

41(3):613–627. Erhardt EB, Allen EA, Wei Y, Eichele T, Calhoun VD. SimTB, a simulation toolbox for fMRI data under a model of spatiotemporal separability. NeuroImage 2012;

ip t

59(4):4160–4167. Friston KJ, Frith CD, Turner R, Frackowiak RS. Characterizing evoked

cr

hemodynamics with fMRI. NeuroImage 1995; 2:157–165.

us

Gao HY, Bruce AG. WaveShrink with firm shrinkage. Stat Sinica 1997; 7(4):855–874. Gao HY. Wavelet shrinkage denoising using the non-negative garrote. J Comput

an

Graph Stat 1998; 7(4):469–488.

M

Gribonval R, Nielsen M. Beyond sparsity: Recovering structured representations by L1 minimization and greedy algorithms. Adv Comput Math 2008; 28(1):23–41.

ed

Himberg J, Hyvärinen A, Esposito F. Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 2004; 22(3):

ce pt

1214–1222.Hyvärinen A, Oja E. A Fast Fixed-Point Algorithm for Independent Component Analysis. Neural Comput 1997; 9(7):1483–1492. Khullar S, Michael A,

Ac

Correa N, Adali T, Baum S, Calhoun VD. Wavelet-based fMRI analysis: 3-D denoising,

signal

separation,

and

validation

metrics.

NeuroImage

2011a;

54(4):2867–2884.

Khullar S, Michael A, Correa N, Adali T, Baum S, Calhoun VD. Wavelet-based denoising and independent component analysis for improving multi-group inference in fMRI data. IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2011b; pp:456–459. 51

Page 53 of 78

Kisilev P, Zibulevsky M, Zeevi YY, Pearlmutter BA. Multiresolution framework for blind source separation, CCIT Report no. 317, Technion Press, 2000. Kisilev P, Zibulevsky M, Zeevi YY. Multiscale Framework For Blind Separation of

ip t

Linearly Mixed Signals. J Mach Learn Res 2003; 4:1339–1363. Ma S, Correa NM, Li XL, Eichele T, Calhoun VD, Adali T. Automatic identification

cr

of functional clusters in FMRI data using spatial dependence. IEEE Trans BIO-MED

us

ENG 2011; 58(12): 3406–3417.

Makeig S, Bell AJ, Jung TP, Sejnowski TJ. Independent component analysis of

an

electroencephalographic data. Advances in neural information processing systems

M

1996; pp:145–151.

Maldjian JA, Laurienti PJ, Burdette JB, Kraft RA. An Automated Method for

ed

Neuroanatomic and Cytoarchitectonic Atlas-based Interrogation of fMRI Data Sets. NeuroImage 2003; 19(3):1233–1239.

ce pt

Maldjian JA, Laurienti PJ, Burdette JH. Precentral Gyrus Discrepancy in Electronic Versions of the Talairach Atlas. Neuroimage 2004; 21(1):450–455.

Ac

Mallat S. A Wavelet Tour of Signal Processing, Academic Press, San Diego, CA, 1998.

McKeown MJ, Makeig S, Brown GG, Jung TP, Kindermann SS, Bell AJ, Sejnowski TJ. Analysis of fMRI data by blind separation into spatial independent component analysis. Hum Brain Mapp 1998; 6:160–188. Minka TP. Automatic choice of dimensionality for PCA. Technical Report 514, MIT, 2000. 52

Page 54 of 78

Moreau E, Macchi O. A one stage self-adaptive algorithm for source separation. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994; pp: III-49–III-52.

ip t

Moulin P. Wavelet thresholding techniques for power spectrum estimation. IEEE Trans Signal Proces 1994; 42(11):3126–3136.

cr

Rytty R, Nikkinen J, Paavola L, Elseoud AA, Moilanen V, Visuri A, Tervonen O,

us

Renton AE, Traynor BJ, Kiviniemi V, Remes AM. GroupICA dual regression analysis of resting state networks in a behavioral variant of frontotemporal dementia. Front

an

Hum Neurosci 2013; 7:461.

M

Rosso OA, Blanco S, Yordanova J, Kolev V, Figliola A, Schürmann M, Başar E. Wavelet entropy: a new tool for analysis of short duration brain electrical signals. J

ed

Neurosci Methods 2001; 105(1):65–75.

Rosso OA, Martin MT, Figliola A, Keller K, Plastino A. EEG analysis using

ce pt

wavelet-based information tools. J Neurosci Methods 2006; 153(2):163–182. Schöpf V, Kasess CH, Lanzenberger R, Fischmeister F, Windischberger C, Moser E.

Ac

Fully exploratory network ICA (FENICA) on resting-state fMRI data. J Neurosci Methods 2010; 192(2):207–213. Skudlarski P, Constable RT, Gore JC. ROC analysis of statistical methods used in functional MRI: individual subjects. NeuroImage 1999; 9(3):311–329. Tropp JA, Gilbert AC, Strauss MJ. Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit. Signal Process 2006; 86(3):572–588. Tropp JA. Algorithms for simultaneous sparse approximation. Part II: Convex 53

Page 55 of 78

relaxation. Signal Process 2006; 86(3):589–602. Wang N, Zeng W, Chen L. A Fast-FENICA method on resting state fMRI data. J Neurosci Methods 2012a; 209(1):1–12.

ip t

Wang N, Zeng W. A combination model of ICA and sparsity prior with respect to fMRI signal analysis. Front Comput Neurosci Conference Abstract: Bernstein

cr

Conference 2012b. doi: 10.3389/conf.fncom.2012.55.00014.

us

Wang N, Zeng W, Chen L. SACICA: A sparse approximation coefficients based ICA model for functional magnetic resonance imaging data analysis. J Neurosci Methods

an

2013; 216(1):49–61.

M

Zeng W, Luo L, Liang X, Wang S, Guo S. The pre-processing of de-noising in fMRI data based on wavelet analysis. Journal of applied sciences 2004; 22(1):118–123. (In

Ac

ce pt

ed

Chinese)

54

Page 56 of 78

Table Legends Table 1. Source estimates for four cases on Simulation 1. Table 2. The normalized ISI ( mean  SD ) values as to FastICA, Infomax, SACICA and

ip t

WASICA on Simulation 2 (12 runs for each method).

cr

Table 3. Intra-consistency metric values as to GLM, FastICA, Infomax, SACICA and

Ac

ce pt

ed

M

an

us

WASICA on visual task dataset.

55

Page 57 of 78

Table 1. Source estimates for four cases on Simulation 1 Results Algorithm

FastICA SACICA Source s1 (excess) Kurtosis 0.8840  0.1459 Infomaxsuper Infomaxsub Source s2 (excess) Kurtosis 0.8118  0.1455 WASICAsuper WASICAsub WASICAfast Summary: FastICA, Infomaxsuper and all versions of WASICA perform well.

an

cr

Unimodal, super-Gaussian sources

ISI( mean  SD ) 0.1359  0.1274 0.7595  0.1969 0.7316  0.2054 0.4151  0.2701 0.0593 ± 0.0120 0.0627 ± 0.0521 0.2164  0.2530

Algorithm

ISI( mean  SD )

B. Medium:

Algorithm

FastICA SACICA Source s1 (excess) Kurtosis 0.2572  0.1095 Infomaxsuper Infomaxsub Source s2 (excess) Kurtosis 0.0885  0.1131 WASICAsuper WASICAsub WASICAfast Summary: WASICAsuper,WASICAsub performs well. All others mostly fail.

M

us

Bimodal, close to Gaussian sources

C. Large:

good is ISI

Temporally and spatially constrained ICA of fMRI data analysis.

fMRI using an ICA approach.

Inter-Group Brain Networks across Multiple fMRI Datasets.

Comparison of IVA and GIG-ICA in Brain Functional Network Estimation Using fMRI Data.

Impact of automated ICA-based denoising of fMRI data in acute stroke patients.

An effective differential expression analysis of deep-sequencing data based on the Poisson log-normal model.

GPU-based parallel group ICA for functional magnetic resonance data.

Bayesian Models for fMRI Data Analysis.

Graph-based inter-subject pattern analysis of FMRI data.

A Bayesian probit model with spatially varying coefficients for brain decoding using fMRI data.

Is Model Fitting Necessary for Model-Based fMRI?

ICA of full complex-valued fMRI data using phase information of spatial maps.

Evidence of stimulus correlated empathy modes--Group ICA of fMRI data.

General nonunitary constrained ICA and its application to complex-valued fMRI data.

Learning Effective Connectivity Network Structure from fMRI Data Based on Artificial Immune Algorithm.

Is First-Order Vector Autoregressive Model Optimal for fMRI Data?

ICA-based artifact removal diminishes scan site differences in multi-center resting-state fMRI.

Multi-modal data fusion using source separation: Two effective models based on ICA and IVA and their properties.

Network Based Prediction Model for Genomics Data Analysis.

The connectivity domain: Analyzing resting state fMRI data using feature-based data-driven and model-based methods.

A Parcellation Based Nonparametric Algorithm for Independent Component Analysis with Application to fMRI Data.

Parallel group independent component analysis for massive fMRI data sets.

Robust data driven model order estimation for independent component analysis of FMRI data with low contrast to noise.

Effective connectivity of brain regions related to visual word recognition: An fMRI study of Chinese reading.