Motor imagery classification via combinatory decomposition of ERP and ERSP using sparse nonnegative matrix factorization.

Journal of Neuroscience Methods 249 (2015) 41–49

Contents lists available at ScienceDirect

Journal of Neuroscience Methods journal homepage: www.elsevier.com/locate/jneumeth

Computational Neuroscience

Motor imagery classification via combinatory decomposition of ERP and ERSP using sparse nonnegative matrix factorization Na Lu a,∗ , Tao Yin b a b

State Key Laboratory for Manufacturing Systems Engineering, Systems Engineering Institute, Xi’an Jiaotong University, Xi’an, Shaanxi, China Vision Microsystems Co., Ltd, Shanghai, China

h i g h l i g h t s • MALS-NMF allows for negative entry while preserving the pure addition of components. • Sparsity constraint has been incorporated into MALS-NMF. • The knowledge obtained by MALS-NMF on ERP and ERSP have been combined.

a r t i c l e

i n f o

Article history: Received 21 October 2014 Received in revised form 26 March 2015 Accepted 27 March 2015 Available online 3 April 2015 Keywords: Motor imagery Classification Event related potential Nonnegative matrix factorization Brain computer interface

a b s t r a c t Background: Brain activities could be measured by devices like EEG, MEG, MRI etc. in terms of electric or magnetic signal, which could provide information from three domains, i.e., time, frequency and space. Combinatory analysis of these features could definitely help to improve the classification performance on brain activities. NMF (nonnegative matrix factorization) has been widely applied in pattern extraction tasks (e.g., face recognition, gene data analysis) which could provide physically meaningful explanation of the data. However, brain signals also take negative values, so only spectral feature has been employed in existing NMF studies for brain computer interface. In addition, sparsity is an intrinsic characteristic of electric signals. New method: To incorporate sparsity constraint and enable analysis of time domain feature using NMF, a new solution for motor imagery classification is developed, which combinatorially analyzes the ERP (event related potential, time domain) and ERSP (event related spectral perturbation, frequency domain) features via a modified mixed alternating least square based NMF method (MALS-NMF for short). Results: Extensive experiments have verified the effectivity the proposed method. The results also showed that imposing sparsity constraint on the coefficient matrix in ERP factorization and basis matrix in ERSP factorization could better improve the algorithm performance. Comparison with existing methods: Comparisons with other eight representative methods have further verified the superiority of the proposed method. Conclusions: The MALS-NMF method is an effective solution for motor imagery classification and has shed some new light into the field of brain dynamics pattern analysis. © 2015 Elsevier B.V. All rights reserved.

1. Introduction To explore the brain dynamics under different mental task is of essential importance to understand the brain functional mechanism and develop techniques to assist people with disabilities (Wolpaw, 2007; Cuiwei et al., 1995). EEG (electroencephalogram) is a widely used non-invasive technique measuring the electrical signal of the brain activities. The recorded brain signals present

∗ Corresponding author. Tel.: +86 153 88608981; fax: +86 029 82665487. E-mail address: [email protected] (N. Lu). http://dx.doi.org/10.1016/j.jneumeth.2015.03.031 0165-0270/© 2015 Elsevier B.V. All rights reserved.

characteristics like very high dimension, non-linear, non-stationary and low signal to noise ratio, which bring challenges to the existing signal analysis methods (Wolpaw et al., 2002). In recent years, the analysis methods for brain dynamics, especially the classification methods have attracted increasing attention of researchers from different fields like bioengineering, computer science, and cognitive science (Stewart et al., 2014; Heung-Il and Seong-Whan, 2013; Li et al., 2013). More specifically, within the BCI (brain computer interface) field, a wealth of studies has been conducted to mine the brain response patterns related to different mental activities. As the most popular technique in BCI, EEG records the voltage variation along

42

N. Lu, T. Yin / Journal of Neuroscience Methods 249 (2015) 41–49

time by multiple electrodes deployed over the skull. With respect to different mental tasks, the variation pattern of the EEG signal reveals distinct characteristics temporally and spatially. Many methods have been developed or employed to uncover the featured patterns buried in the EEG data, such as common spatial pattern (CSP) (Haiping et al., 2010; Kai Keng et al., 2008), independent component analysis (ICA) (Makeig et al., 2004; Makeig and Onton, 2009), principle component analysis (PCA), and time-frequency methods like Fourier analysis (Makeig, 1993), Hilbert-Huang transform (HHT) (Wu et al., 2011) and wavelets (Ting et al., 2008). All these methods aim at finding a basis composed of few intrinsic vectors (components), and the linear combination of which could accurately or approximately represent the samples. NMF has attracted increasing attention in recent decades due to its physically interpretable representation as pure parts addition (Lee and Seung, 1999; Hoyer, 2004). Some previous research has tried to address the classification problem of brain signals, especially the motor imagery, using NMF (Lee et al., 2006, 2010; Lee and Choi, 2009). However, NMF requires the raw data to be nonnegative, which is obviously not the case for electric signal. The existing papers on EEG classification using NMF mainly employ spectral features, like PSD (power spectral density), Wavelet parameters and so on, to analyze the data, which thus only consider the frequency domain information (Lee et al., 2006, 2010; Lee and Choi, 2009; Liu et al., 2004). However, to combine time domain and frequency domain features could high possibly benefit the classification algorithm, as suggested in Makeig et al. (2004). ERP (event related potential) is a widely used time domain feature for motor imagery classification, the average of which captures the fluctuation mode of the brain response to specific motor imagery. However, it has been indicated in Makeig (1993) that the ERP cannot fully reveal the characteristic of the brain response to the external stimuli. On the other hand, the frequency domain knowledge also only carries certain aspect of the information conveyed by the data. ERSP (event related spectral perturbation) measures the change against the baseline power and could deliver information that is not revealed by ERP (Makeig et al., 2004). Therefore, combining ERP and ERSP is a promising solution for better performance in motor imagery classification. In addition, a critical and intrinsic property of electric signal is sparsity, including the brain signal. The thriving technologies like sparse coding (Olshausen and Field, 1996), compressive sensing (Candes et al., 2006; Candes and Tao, 2005), have constructed the basis and validated the effectivity of sparse representation. There is good reason to believe that a small number of intrinsic patterns exist for specific motion related sensorimotor rhythms (SMRs) which could be combined to represent the recorded EEG sequence. Therefore, introducing sparse representation of EEG signals into the analysis of brain dynamics could help to figure out the intrinsic pattern structure of the brain activities. We also notice that some efforts have been made to incorporate sparsity considerations into the BCI research (Wang et al., 2012; Younghak et al., 2012). Based on the above considerations, a new solution for brain dynamics (motor imagery) classification is developed in this paper. A modified MALS-NMF method is proposed first, which has incorporated the sparsity constraint and eliminated the nonnegativity constraint on the basis matrix in NMF. The ERP and ERSP features are then combinatorially decomposed by MALS-NMF, and the results of which are used to train a SVM classifier based on one-versus-rest strategy. Extensive experiments have verified the superiority of the proposed method. It is worthwhile to highlight the several contributions of the proposed method here: 1. Different from the existing application of NMF method in brain signal analysis which only works in the frequency domain, the

proposed MALS-NMF method allows for negative entries while preserving the pure addition manner of components (nonnegative coefficient matrix), which has thus enabled analysis in both time and frequency domain, and laid basis for uncovering the intrinsic brain response patterns. 2. Sparsity constraint has been incorporated into the MALS-NMF, which is an essential characteristic of the brain signal in question. 3. The knowledge obtained by MALS-NMF on ERP and ERSP have been combined for the motor imagery classification task, which is verified to have superior performance through extensive experiments. The discovered patterns have also shown promise in understanding the intrinsic patterns of brain activities. The rest of the paper is organized as follows: Section 2 discussed the related researches. The MALS-NMF method and the overall solution are described in Section 3. Experiments and comparisons are given in Section 4. Section 5 forms conclusion. 2. Related research NMF is one of the matrix factorization methods that has attracted much attention of researchers in recent years (Lee and Seung, 1999; Deng et al., 2011; Ding et al., 2010). One unique property of NMF is that it imposes nonnegativity constraint on both decomposed matrices, which could result in physically meaningful part representation due to its pure addition manner. Many NMF variants have been developed in the past decade, and here we only describe several representative studies due to the space limit. Ding et al. (2010) developed two variants of NMF, called Semi-NMF and convex NMF. The former one allows negative entries in the decomposed matrices and thus compromises the nonnegativity constraint; the latter one forces the basis vectors to be a convex combination of data points, which introduces sparsity into the algorithm implicitly. However, there is not a combination of the two variants. Li et al. (2001) recognized that the basis learnt by the original NMF are not necessarily localized, and thus may not be well separated parts. They proposed the local nonnegative matrix factorization (LNMF) method by introducing sparsity and redundancy constraints to obtain a minimum number of basis vectors to achieve better locality. However, these constraints are not sufficient for the algorithm to produce non-overlapping parts. E.g., for face images, the locality level is determined by how well the faces are aligned in the training samples. The sparsity constraint in Hoyer (2004) shares same property with that of LNMF. Cai et al. (Wu et al., 2011) proposed a graph regularized nonnegative matrix factorization (GNMF), which constructs an affinity graph to introduce the inter-sample geometric similarity as a closeness constraint on the sample representations. A constrained nonnegative matrix factorization (CNMF) was developed by Liu et al. (Haifeng et al., 2012) with the label information as a hard constraint, which leads to a merged representation of samples from the same class. However, one assumption of CNMF is that two samples from the same class must have the same representation using the bases obtained after decomposition, which may not be practical given the fact, e.g., the face images of the same person can be very different under different conditions like different view angles. Kim and Park (2007) developed a sparse NMF based on ALS, which has been modified to form the MALS-NMF in this paper by allowing for negative basis matrix. More specifically, for brain signal analysis, some variants of NMF such as GNMF (group NMF) (Lee and Choi, 2009), semisupervised NMF (Lee et al., 2010) etc. have been developed. Among which, GNMF is specially designed for multiple subjects analysis which maximizes the inter-subject difference and minimizes the intra-subject distinction. The semi-supervised NMF incorporates the known labels of samples into the matrix factorization problem


and has obtained some improvement on the classification performance. However, all these methods only consider the frequency domain knowledge due to the nongegativity constraint of NMF. Recent research in compressive sensing (Donoho, 2006; Candes and Wakin, 2008) suggest that natural signal is sparse in essence, which has triggered the development of sparse analysis of data, such as sparse coding and various machine learning method with sparsity constraint. The work reported by Shin et al. (Wang et al., 2012; Younghak et al., 2012; Higashi and Tanaka, 2012) showed the potential of sparse representation for brain signal analysis. Shin et al. (Younghak et al., 2012) proposed a sparse representationbased classification scheme for motor-imagery, which employed the band power of CSP filtered signals as the feature dictionary and sought for the sparse representation using the dictionary. The sparsity discussed in this method resides in the combination parameters rather than the feature space. Wang et al. (2012) developed a L1 norm based CSP (common spatial filter), which is turned out to be robust to outliers. The MALS-NMF method developed in our paper combines the benefits of NMF method and sparsity introduced by L1 norm, which has shed some new light in the intrinsic pattern extraction of brain signal analysis.

3.1. Sparse NMF for brain signal analysis based on MALS NMF is one of the most popular matrix decomposition methods which is formulated as

minW,H

1 2

(WT A)aj T

(W WH)aj

(AHT )ia (W(HHT ))ia

+ H2F

m

+˛

W(i, :)21

i=1

,

(2)

where ˛ > 0 is a parameter balancing the approximation accuracy and the sparsity of W which is called sparsity parameter for convenience, and the parameter > 0 is used to suppress H2F which may become very large when W21 is relatively small. The optimization problem in Eq. (2) could be solved by iterating the following ALS procedures: minH

2 A √W H − 0k×n , Ik×k

minW

(3)

F

s.t. H ≥ 0

2 HT AT T W − . √ ˛e1×k 01×m

(4)

F

1 2

minW,H

⎧ ⎨

A − WH2F + W2F + ˇ

⎩

n j=1

H(:, j)21

⎫ ⎬

⎭.

(5)

s.t. H ≥ 0 To optimize Eq. (5), the corresponding ALS procedures could be written as

2 W A H − 01×n , ˇe1×k

minH

F

s.t. H ≥ 0

minW

,

2 HT AT T W − . √ 0k×m Ik×k

(6)

(7)

F

and Wia ← Wia

A − WH2F

s.t. H ≥ 0

(1)

where A is the data matrix of size m × n with each column from a sample vector, W ∈ Rm×k and H ∈ Rk×n are respectively the basis matrix and coefficient matrix, which satisfy nonnegative constraint. Multiplicative rule is widely employed to obtain the approximate solution of Eq. (1) since NMF has been brought back into the attention of the researchers (Lee and Seung, 1999), which is formulated as Haj ← Haj

sparse basis W and meanwhile matrix H is nonnegative, the matrix decomposition problem could be formulated as

Ik×k is an identity matrix of size k × k; 0k×n is a matrix with all entries as zero; and e1×k is a vector with all entries as 1. One should note that the minimization problem of Eq. (4) has no nonnegativity constraint, while Eq. (3) does. Similarly, when sparsity of H is desirable, the following optimization problem could be formulated:

3. Method

A ≈ WH,

43

,

where 1 ≤ a ≤ k, 1 ≤ j ≤ n and 1 ≤ i ≤ m. Local convergence under the multiplicative rule has been proven in Lee and Seung (2001). To acquire a sparse decomposition of brain signals, sparsity penalty must be enforced on the matrix which is expected to be sparse. In case of motion imagery analysis, we could expect either the basis matrix W or the coefficient matrix H to be sparse, which respectively leads to a sparse basis and a compact combination of the basis components. Also, one should note that when the brain signal is analyzed in the time domain, both the data matrix and basis matrix should allow for negative entries. Therefore, the nonnegativity constraint on W should be relieved. Considering the above discussions and also possible rigorous convergence analysis, a mixed alternating least square (MALS) solution is developed, which combines both unconstrained ALS (Berry et al., 2007) and nonnegativity constrained ALS (Kim and Park, 2007). For spectral domain analysis, L1 norm is used to enforce the sparsity constraint on one of the factor matrices. When we expect a

The parameter ˇ is also called sparsity parameter. Obviously, one alternating part of both the above ALS procedures (Eqs. (3) and (4); Eqs. (6) and (7)) has nonnegativity constraint, and no such constraint is imposed on the other part. In this sense the above variant of ALS is termed as mixed alternating least square (MALS). The convergence of the MALS-NMF could be justified in a similar way as done in Kim and Park (2007). Technically, the MALSNMF has relaxed the dual nonnegativity constraints to one. It could be seen clearly that Eqs. (2) and (5) are nonlinear programming problems for which the optimal solutions need satisfy the first order necessary conditions, i.e., the Karush–Kuhn–Tucker (KKT) conditions (Boyd and Vandenberghe, 2004). The KKT condition of e.g. Eq. (2) could be written as (∀a, i, j) Hia ≥ 0,

∂f (W, H) ∂W ∂f (W, H) ∂H

Haj

≥ 0,

ia

≥ 0, aj

∂f (W, H) ∂H

≥ 0. aj

(8)

44


The four inequalities in Eq. (8) respectively correspond to the primal feasibility, stationarity (the 2nd and 3rd inequalities) and complementary slackness. Accordingly, a L1 norm based KKT residual could be defined as

kkt =

k m k n ∂f (W, H) + min H , ∂f (W, H) . aj ∂W ∂H ia aj i=1

a=1

a=1

j=1

(9) Considering the number of elements that has not converged in matrices W and H, a normalized residual could be formulated as =

kkt , (#W + #H )

(10)

where # W is the number of elements in matrix W that did not converge, i.e., (∂f(W, H)/∂W)ia = / 0, and # H is defined in the same way. When < ε, the stop criterion is reached, where ε is a predefined small positive number. In this paper, ε is set as 0.001. 3.2. Classification of the combinatory patterns of ERP and ERSP The brain response to certain motor imagery task could be analyzed in time domain, frequency domain, or both. And as a matter of fact, the existing NMF related methods for brain dynamics analysis is limited in spectral data (Lee et al., 2010; Lee and Choi, 2009), which apparently could be improved by combining the time domain feature. ERP (event related potential) measures the time course fluctuation of the brain response after certain event onset, which is usually averaged among multiple trials from one subject to cancel out the undesirable noise. The averaged ERP is highly replicable toward one specific motor imagery, which provides the basis to uncover the intrinsic components behind the pattern. Naturally, the MALSNMF could be an ideal candidate for this task due to its pure additive combination, which allows for negative entries into the component, and meanwhile sparseness on the basis or coefficient matrix is incorporated. Physiologically, the motor imagery related SMR (sensory motor rhythm) include two frequency bands of brain wave, i.e., Mu (8–14 Hz) and Beta (14–24 Hz). The spectral feature could reveal important information about the ongoing motor imagery. ERSP (event related spectral perturbation) measures the change of spectral power with respect to the baseline, which conveys characteristics of event related brain dynamics not contained in the ERP (Makeig et al., 2004; Makeig, 1993). In this paper, the ERSP is computed by 3-cycles Morlet wavelet. Each trial is fed to the Morlet wavelet analysis, and for each time point, the spectral power related to each frequency within the interval of 5 to 50 Hz is computed. A baseline spectral power is also computed based on the EEG data without event related stimulus. The final ERSP is obtained by computing the log power difference between the spectral estimate of each trial and the baseline estimate. The obtained vectors of each time point within a trial is then stacked to form the feature vector of ERSP. However, the change of the spectral power from the baseline may also take negative value, thus MALS-NMF method is also a good choice for ERSP analysis. The overall flowchart of the classification method based on the combinatory patterns of ERP and ERSP is given in Fig. 1. The time sequence from multiple EEG channels are first acquired; after appropriate preprocessing, e.g., artificial noise removal and band filtering, the data are employed to obtain the ERP feature and ERSP feature respectively. Then the MALS-NMF is applied to the ERP and ERSP features to extract the intrinsic pattern components. The decomposed ERP and ERSP parameter vectors have then been concatenated to form the input feature vector for the following

Fig. 1. Flowchart of the MALS-NMF based classification method.

classification. Finally, the concatenated time domain and frequency domain features are used to train a SVM classifier. It should be noted that LDA (Linear Discriminant Analysis) and SVM are the most used classifiers in the BCI research which have turned out to be effective (Younghak et al., 2012; Blankertz et al., 2011; Wei et al., 2008; Subasi and Ismail Gursoy, 2010). In our experiments, SVM and LDA have obtained very competitive results. However, the results from SVM are slightly better than those of LDA. Therefore, SVM has been finally selected as the classifier to apply. 4. Experiments 4.1. Compared methods and datasets To evaluate the performance of the proposed MALS-NMF based motor imagery classification method, extensive comparison experiments have been performed on several representative methods. Specifically, we have considered six feature decomposition methods NMF, ALS-NMF, Semi-NMF, ICA, PCA and EMD (Huang and Wu, 2008), one frequency domain method (Wavelet), and one classical BCI feature analysis method CSP (common spatial pattern method (Yijun et al., 2005)). Many variants of CSP method have been developed in recent years (Wang et al., 2012; Lotte and Cuntai, 2011), which improve the algorithm performance by imposing regularization on either the estimation of the covariance matrix or the objective function. Different priors imposed could make different degree of improvement to the performance, which also applies to MALS-NMF (e.g., imposing class label as supervision). Therefore, the comparison experiments in the paper have been conducted among the most representative methods from each category and some modified versions of these methods. Among these compared methods, NMF and ALS-NMF methods only consider the PSD (power spectral density) feature due to the


dual nonnegativity constraint. Semi-NMF employs the same features as MALS-NMF without considering sparsity. The number of components in the basis matrix is selected by experiments in a range from 10 to 30 for all the NMF family methods. Specifically, it is set as 14 for dataset IIIa and 12 for dataset IVa respectively. The datasets will be described in the end of this section. The Wavelet and EMD methods apply the extracted features from the ERP data for further classification. The number of the EMD components is equal to the number of channels because one time course component could be extracted from one channel. Six level wavelet packet decomposition based on Sym4 wavelet from the Symlets family is employed in the Wavelet method. For methods ICA, PCA and CSP, the systematically concatenated vectors of ERP and ERSP have been employed to extract the features for classifier training. In addition, the number of independent components for ICA is set equal to the number of channels for collecting data as suggested in Groppe et al. (2009). There are no parameters to tune in the methods of PCA and CSP. The features extracted using all the compared methods are respectively employed to train a linear SVM for motor imagery classification task. It should be emphasized that for a fair comparison, the experimental data have been preprocessed in the same way and the parameters (if any) of all the methods have been tuned as suggested in the corresponding literature. Clustering accuracy (AC) (Wu et al., 2011; Deng et al., 2005) is employed to evaluate the performances of the compared methods, which is defined as

n AC =

i=1

ı(Ci , map(Ci )) n

,

where n is the number of trials of the motor imagery, C is the ground truth of class labels, C is the obtained result, map(Ci ) is the mapping from Ci to their corresponding labels in C (the optimal mapping is obtained using the Kuhn-Munkres algorithm (Kuhn, 1955)), and ı(x, y) is the delta function which takes 1 when x = y and zero otherwise. Two publicly available datasets from BCI competition III have been employed to evaluate the performance of the proposed MALSNMF based method, i.e., dataset IIIa (Lal et al., 2005) and dataset IVa (Dornhege et al., 2004). In these two datasets, the brain activity during the rest time of the subject has also been recorded, which will provide the baseline spectral information. Dataset IIIa includes the recordings from three subjects (K3b, K6b and L1b) performing four motor imagery tasks, respectively as left hand, right hand, foot and tongue movement. A 64 channel EEG equipment from Neuroscan was employed to make the recordings and 60 channels of EEG data were collected, which was sampled at 250 Hz and band filtered between 1 and 50 Hz. Within one trial, the first 2 s were quite during which the subject sat relaxed in a chair with armrests; an acoustic stimulus presents at t = 2 s to indicate the start of trial and at the same time a cross “+” is displayed; at t = 3 s an arrow pointing to the left, right, up and down was shown to the subject, and the subject is asked to imagine specific movement related to the displayed cue until t = 7 s. There are 180, 120 and 120 trials in the training set respectively for subject K3b, K6b and L1b, and 180, 120 and 120 trials in their corresponding testing set. Dataset IVa contains the EEG recordings of five subjects (aa, al, av, aw and ay) performing three imagined motions (left hand, right hand and right foot). 280 trials of two classes for visual cues “right hand” and “right foot” are given for the competition. The data were recorded with 118 EEG channels with a sampling frequency of 1000 Hz and band-pass filtered between 0.05 and 200 Hz. Two kinds of visual stimuli (letters behind a fixed cross or randomly moving object) have been used to give hint to the subjects to perform certain motor imagery. For computational efficiency, all the data have been band-pass filtered between 8 and 35 Hz using an

45

equiripple filter and downsampled to 100 Hz in the experiments of this paper. In addition, to segment the ERP sequence, the visual cue onset has been employed as the time trigger of the event response to the visual stimulus in one trial as done in Younghak et al. (2012) and Blankertz et al. (2011). Therefore, all the trials in dataset IIIa are windowed from 3 s to 7 s during which the visual cue was displayed; the trials in dataset IVa are windowed by the length of 5.25 s after the visual cue onset. In case of ERSP extraction, similar procedure to that used in Makeig (1993) has been employed as discussed in Section 3.2. 4.2. Performance evaluation A 5 fold cross-validation scheme is employed to evaluate the algorithm performance. Specifically, all the trials of each dataset are divided into 5 subsets of equal size, four of which are used as the training set and the rest one for testing, so there are 5 runs of experiments for each dataset. The results of the classification accuracy were averaged over these 5 runs. Tables 1 and 2 respectively give the comparison results of dataset IIIa and IVa. For dataset IIIa, the first 2 second data in all the trials has been averaged and analyzed by Morlet Wavelets as the baseline power for the computation of ERSP; in case of dataset IVa, the data recorded during the period of 35–40 s before the visual stimulation is used for the baseline power computation. In addition, multiple motor imagery tasks are involved, and thus the one-versus-rest strategy has been adopted in the SVM to fulfill multi-class classification (Chih-Wei and Chih-Jen, 2002). An interesting phenomenon observed is that when the sparsity is imposed on different matrix (basis matrix or coefficient matrix), different results have been obtained during the experiments. The best performance is obtained when the coefficient matrix in the ERP decomposition and basis matrix in the ERSP decomposition are forced to be sparse, the results of which has been reported in this section. This discovery is in consistence with the physical reality that the feature components of the brain dynamics are expected to be sparse in the spectrum, while less sparse along the time course component. Specifically, for subjects aa, al, and ay in dataset IVa, parameters ˇ (sparsity parameter on coefficient matrix in ERP factorization) and ˛ (sparsity parameter on basis matrix in ERSP factorization) are all set 1; for subjects av and aw in dataset IVa and all the subjects in dataset IIIa, ˇ is set 0.1 and ˛ is set 1. Parameter k is set through experiments by cross-validation and our experiments suggest that 15 could be a good option for the tested datasets. The influence of the sparsity parameters are further discussed in Section 4.4. The results of dataset IIIa are listed in Table 1, where the best classification accuracy has been highlighted in bold face, and the standard deviation is also given. It could be seen that the MALSNMF method has obtained the best result among the 9 compared algorithms with about 4% improvement over the second best algorithm of CSP on average. More specifically, a least 3.1% increase for subject L1b and a best 7.1% increase for subject K6b in classification accuracy have been observed, as compared to the second best algorithm. It also should be noted that an improvement of about 20.1% has been made compared to the NMF method with only power spectral density (PSD) considered. ALS-NMF has incorporated sparsity constraint on the basis matrix, and also only PSD is employed due to the dual nonnegativity constraints. While Semi-NMF combines both ERP and ERSP features as done by MALS-NMF without considering sparsity. ALS-NMF and Semi-NMF respectively rank third and fourth among the compared methods. These results clearly suggested the superiority of the proposed MLAS-NMF method for motor imagery classification. Table 2 summarized the results of dataset IVa including accuracy and corresponding standard deviation. The results in Table 2 further

46


Table 1 Classification accuracy on BCI competition III dataset IIIa (%). Data

MALS-NMF

ICA

PCA

NMF

ALS-NMF

Semi-NMF

Wavelets

EMD

CSP

K3b K6b L1b

90.38 ± 0.63 60.02 ± 1.21 63.25 ± 0.32

68.32 ± 2.38 51.12 ± 1.92 53.23 ± 1.81

70.28 ± 1.02 50.36 ± 1.21 52.86 ± 1.35

72.18 ± 2.23 53.48 ± 1.98 52.17 ± 2.03

85.73 ± 1.35 55.28 ± 2.78 59.39 ± 2.15

72.56 ± 1.05 52.37 ± 2.03 53.49 ± 2.53

63.28 ± 2.58 45.23 ± 3.03 46.56 ± 3.28

69.12 ± 2.01 50.31 ± 2.52 49.83 ± 2.71

87.23 ± 0.89 56.12 ± 1.33 62.23 ± 0.35

Avg.

71.22

57.56

57.83

59.28

66.80

59.47

51.69

56.42

68.53

Table 2 Classification accuracy on BCI competition III dataset IVa (%). Data

MALS-NMF

ICA

PCA

NMF

ALS-NMF

Semi-NMF

Wavelets

EMD

CSP

aa al av aw ay

64.81 ± 0.32 93.62 ± 1.16 59.68 ± 1.31 73.53 ± 0.54 55.36 ± 1.13

53.22 ± 2.20 80.82 ± 2.35 48.17 ± 3.02 62.87 ± 2.11 48.37 ± 2.35

61.29 ± 1.98 87.18 ± 2.01 51.92 ± 2.73 69.73 ± 2.05 50.93 ± 2.01

63.00 ± 0.73 83.24 ± 1.35 54.00 ± 1.98 70.03 ± 1.84 50.21 ± 2.38

64.18 ± 1.07 87.93 ± 1.93 57.98 ± 2.15 71.87 ± 1.64 53.23 ± 2.35

62.01 ± 1.28 84.11 ± 1.83 54.06 ± 2.53 70.13 ± 2.01 50.22 ± 2.73

54.36 ± 3.01 83.10 ± 2.13 50.18 ± 2.87 63.73 ± 2.33 49.25 ± 3.21

54.63 ± 2.32 82.73 ± 1.87 49.75 ± 3.20 65.92 ± 2.17 51.09 ± 3.23

63.33 ± 0.50 90.38 ± 1.03 55.00 ± 2.17 71.56 ± 0.59 51.15 ± 2.03

Avg.

69.40

58.69

64.21

64.10

67.04

64.11

60.12

60.82

66.28

verified the performance of the proposed method, where the MALSNMF performed the best among the seven compared algorithms. On average, the improvement of the classification accuracy is about 3.5% against the second best algorithm ALS-NMF, and is about 4.7% against the third best CSP. It could be noticed that the two widelyused feature extraction methods, i.e. ICA and PCA, do not perform as well as the MALS-NMF method for all the cases. The reasons could be that the independency assumption of ICA may not be completely satisfied for brain signals and also the signs of the component and coefficient matrices may introduce activation polarity ambiguity in ICA (Makeig and Onton, 2009), and the PCA method is sensitive to gross noise (Qian et al., 2014), which is an unavoidable issue for EEG signals. Therefore, based on these results we can conclude that MALS-NMF method based scheme is an effective solution for analyzing the patterns of brain EEG data. Comparing the results of MALS-NMF, NMF, ALS-NMF and SemiNMF, several interesting conclusions could be made. NMF method has dual nonnegativity constraint; Semi-NMF has relaxed one nonnegativity constraint which has enabled possible feature combination of ERP and ERSP; sparsity constraint is considered in ALS-NMF with dual nonnegativity constraint; MALS-NMF relaxes one nonnegativity constraint and also incorporates sparsity restriction. Among these four methods, MALS-NMF obtained the best performance, while ALS-NMF and Semi-NMF respectively rank the second and third, which suggests that the incorporated sparsity constraint contributes more to the improvement of the classification performance than the combination of the temporal and spectral features. Furthermore, with both sparsity constraint and feature combination incorporated, the MALS-NMF has obtained the best performance. The results in Table 2 of dataset IVa evaluate the classification performance with each trial windowed by the length of 5.25 s after the visual cue onset. To further improve the algorithm performance, another data collection method segmenting each trial from 0.5 s to 2.5 s after the cue onset has been conducted. The

classification results of MALS-NMF method and other compared methods on these data are given in Table 3, which are much better than those in Table 2. Also, the results of MALS-NMF method are better than or comparable to those of the state-of-the-art methods. In addition, to conduct statistical analysis on the performance of the proposed method and the other compared approaches, a paired t-test is performed between MALS-NMF and the other methods for each subject. All the obtained p-value of the t-test is less than 0.05 which suggests that the difference is significant. 4.3. Discussion of the obtained features To give an intuitive illustration of the obtained basis vectors of matrix W, Fig. 2 visualizes the basis vectors of subject aw from dataset IVa. Interestingly, one could note that the obtained basis vectors presented as ERP components at different time points (Luck, 2005), which suggest that the onset time of the ERP components (like P1 and P2) varies even for the same subject along time. These results could possibly give an physically meanningful explanation of the obtained bases, which benefits from the nonnegative constraint on the coefficient matrix. However, a thorough quantification and analysis of the properties of the obtained bases from a physiological viewpoint still need lots of additional efforts in the future, which could be of interest to the research community. 4.4. Influence of the parameters As stated in Section 4.2, when the sparsity constraint is imposed on the coefficient matrix in the ERP factorization and on the basis matrix in the ERSP factorization, best classification performance has been achieved. To reveal the influence of the sparsity parameters, Tables 4 and 5 present the results of dataset IVa with variations of the sparsity parameters. In Table 4, the results are obtained with ˛ fixed as 1. It could be seen that the best results are obtained at ˇ = 1 for subjects aa, al and

Table 3 Classification accuracy on BCI competition III dataset IVa (%). Data

MALS-NMF

ICA

PCA

NMF

ALS-NMF

Semi-NMF

Wavelets

EMD

CSP

aa al av aw ay

84.18 ± 1.23 95.36 ± 0.78 70.38 ± 0.81 81.92 ± 1.01 80.38 ± 0.56

62.72 ± 1.03 85.92 ± 0.83 63.98 ± 1.27 70.73 ± 1.35 68.23 ± 1.61

67.38 ± 1.51 88.23 ± 1.03 65.63 ± 1.23 71.38 ± 2.07 63.83 ± 2.38

72.28 ± 1.21 85.43 ± 1.13 64.73 ± 1.08 75.28 ± 1.36 66.10 ± 0.85

71.27 ± 1.03 89.23 ± 0.83 64.88 ± 1.23 76.32 ± 1.01 67.36 ± 2.13

72.11 ± 1.21 86.23 ± 1.16 63.76 ± 0.93 73.35 ± 1.03 65.28 ± 1.02

61.23 ± 1.31 85.17 ± 1.25 60.29 ± 1.37 69.93 ± 2.38 60.59 ± 1.93

64.38 ± 1.21 84.39 ± 2.03 59.78 ± 1.17 69.21 ± 2.13 61.29 ± 1.21

72.93 ± 1.01 92.38 ± 0.53 66.25 ± 0.83 75.46 ± 1.21 78.95 ± 0.53

Avg.

82.44

70.31

71.29

72.76

73.81

72.14

67.44

67.81

77.19


47

Fig. 2. Basis vectors of subject aw in dataset IVa obtained by MALS-NMF.

Table 4 Results with the variation of the sparsity parameter ˇ in ERP factorization (%). ˇ

0

0.01

0.1

1

3

5

aa al av aw ay

62.08 87.69 55.39 70.30 52.37

62.25 90.18 55.56 70.36 53.26

64.16 91.55 59.68 73.53 55.13

64.81 93.62 58.93 72.37 55.36

62.18 92.68 58.37 71.89 55.08

61.03 90.74 55.46 70.23 52.39

ay; and for subjects av and aw, ˇ = 0.1 gives the best performance. In Table 5, ˇ is fixed as 1, where the best results have been obtained at ˛ = 1. It could be clearly seen that when the sparsity constraint is incorporated and set within appropriate range, the performance of the algorithm could be noticeably improved. Specifically, with the change of ˇ (ERP) between 0.01 and 5, the accuracy has varied by 3.81–7.61%; when the parameter ˛ changes, the performance has been improved about 3.89% (least) to 6.11% (best). In addition, to evaluate the performance difference with and without the sparsity constraint on one of the features, the results at ˇ = 0 and ˛ = 0 have also been reported in Tables 3 and 4, which have shown different extent of performance improvement on different subject. However, one need note that certain amount of sparsity has been imposed on the other feature when no sparsity is considered for one feature, i.e. when ˇ = 0, ˛ = 1, and when ˛ = 0, ˇ = 1. Results of sparsity free method are referred to that of the semi-NMF method given in Tables 1 and 2. To give a better visualization, Fig. 3 plots the variation curve of Tables 4 and 5 without the values at 0 due to the use of logarithmic Table 5 Results with the variation of the sparsity parameter ˛ in ERSP factorization (%). ˛

0

0.01

0.1

1

3

5

aa al av aw ay

62.03 86.23 55.78 71.20 51.83

62.08 90.11 57.56 71.66 52.17

64.25 91.23 57.68 71.93 54.63

64.81 93.62 58.49 72.34 55.36

63.80 92.26 56.93 70.88 55.28

62.17 90.37 55.82 69.33 54.84

axis. In addition, the error bars have also been given. To verify whether the parameter ˇ and ˛ significantly affect the classification accuracy, paired t-test has been conducted between the results with different setting of parameters in Tables 4 and 5. The obtained p-value is less than 0.01 which indicates that the influence of the sparsity parameter setting is statistically significant. Considering that certain amount of sparsity has been imposed for all these results, t-test has also been performed between the results of semi-NMF method (without sparsity constraint) and the results of Tables 4 and 5. In this case, p-value less than 0.001 has been obtained which further verified the effectiveness of the sparsity constraint is significant. Therefore, it could be concluded that the sparsity parameters ˇ and ˛ have significant influence on the classification performance which could be set through experiments or cross-validation. Besides ˛ and ˇ, another parameter in MALS-NMF method is as shown in Eqs. (2) and (5), which is important to keep H2F or W2F small. If H2F becomes very large, the small value of

m

W(i, :

i=1

)21 may be caused by non-sparse low values in W, which would thus influence the expected sparsity. Similar conclusion applies for W2F . Therefore, an appropriate selection of is required. In our experiments, the value of is set to be the square of the maximal element in matrix A as suggested by the ALS-NMF method in Kim and Park (2007).

4.5. Improvement of the algorithm performance To further improve the classification performance of MALSNMF method, four simply modified methods based on MALS-NMF methods have been proposed and tested on datasets IVa and IIIa, including MALS-GNMF, MALS-SSNMF, combination of MALSNMF with CSP, and combination of MALS-NMF with continuous wavelet transform (CWT). Among which, MALS-GNMF incorporates an affinity graph as a prior regulator as what is done in GNMF (Deng et al., 2011), and MALS-SSNMF introduces the class

48


Fig. 3. Performance variation with respect to sparsity parameters. (a) Table 3 results with the variation of the sparsity parameter ˇ in ERP factorization. (b) Table 4 results with the variation of the sparsity parameter ˛ in ERSP factorization.

Table 6 Classification accuracy of the modified MALS-NMF algorithm (%). Dataset

BCI Competition III dataset IVa

Subject

aa

al

Av

aw

ay

K3b

K6b

L1b

MALS-GNMF MALS-SSNMF Combination with CSP Combination with CWT

86.41 ± 0.83 85.32 ± 1.01 83.28 ± 1.10 75.29 ± 1.38

98.32 ± 0.67 96.28 ± 0.91 98.87 ± 1.03 94.59 ± 1.21

71.39 ± 0.78 68.28 ± 1.21 70.92 ± 0.63 69.39 ± 1.52

83.08 ± 0.83 82.26 ± 0.85 78.63 ± 1.03 76.03 ± 1.12

82.18 ± 1.12 81.33 ± 1.61 88.62 ± 1.10 78.67 ± 2.01

92.18 ± 0.31 91.11 ± 0.35 94.06 ± 0.28 83.13 ± 1.31

66.32 ± 0.83 62.17 ± 0.93 64.68 ± 0.88 61.71 ± 0.93

89.43 ± 1.03 86.38 ± 1.23 73.37 ± 1.81 71.62 ± 1.62

BCI competition III dataset IIIa

label matrix as the supervising prior into the NMF procedure (Lee et al., 2010). In another word, both MALS-GNMF and MALS-SSNMF employ the prior label information to regulate the MALS-NMF method by imposing a penalty term. In MALS-GNMF method, the affinity graph G is a 0-1 matrix where an entry Gij = 1 indicates that trial i and trial j are from the same class. Matrix G is employed to weight the distance between trials in the new space spanned by the basis W, which is then introduced as a penalty in the objective function. The parameter controlling the importance of this penalty is set to 10 as suggested in Deng et al. (2011) which has also be verified to hold the best performance through experiments with this parameter varying from 10 to 500. In MALS-SSNMF method, the parameter balance the importance of the supervising term is set to 1 which is selected from experiments conducted in the range of 0.5–3 as suggested in Lee et al. (2010). Besides modifying the MALS-NMF method itself, the feature obtained through MALS-NMF method has been combined respectively with features resulted from CSP method and CWT method to perform the classification task. There is no parameter to tune in CSP method. For CWT, a total of 15 frequency scales are computed which are equally spaced over the range of 8–15 Hz with a frequency spacing of 0.5 Hz as suggested by Younghak et al. (2012) and Wang et al. (2013). Morlet wavelet is employed. The classification performance of the four modified methods is given in Table 6. It could be seen from Table 6 that the algorithm performance has been significantly improved by the algorithm modification. Among the four compared variants, the MALS-GNMF method and the combination with CSP have obtained the best performance, which are better than or comparable to the results of the state-ofthe-art methods (Wang et al., 2012; Lotte and Cuntai, 2011). These results suggest that appropriate prior could effectively regulate the MALS-NMF method.

5. Conclusion A novel solution for motor imagery classification is developed in this paper, i.e. combinatory decomposition of ERP and ERSP based on MALS-NMF (Mixed Alternating Least Square based NMF). The obtained combinatory features are then used to train a SVM classifier based on one-versus-rest strategy. This solution has employed NMF on both time domain and frequency domain for the first time to the best knowledge of the authors. In addition, sparsity has been incorporated into the objective function of MALS-NMF, which is an intrinsic characteristic of the brain signals. Extensive experiments on motor imagery classification have verified the efficiency of the proposed method. It has also been discovered that the sparsity imposed on the coefficient matrix in ERP factorization and basis matrix in ERSP factorization could better improve the discriminant power of the extracted features.

Acknowledgements This work is supported by the Fundamental Research Funds for the Central Universities, National Natural Science Foundation of China grant 61105034, Research Fund for the Doctoral Program of Higher Education of China grant 20100201120040, and China Postdoctoral Science Foundation grants 20110491662 and 2012T50805.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.jneumeth.2015. 03.031


References Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ. Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 2007;52:155–73. Blankertz B, Lemm S, Treder M, Haufe S, Müller K-R. Single-trial analysis and classification of ERP components — a tutorial. Neuroimage 2011;56:814–25. Boyd S, Vandenberghe L. Convex optimization. Cambridge: Cambridge University Press; 2004. Candes EJ, Tao T. Decoding by linear programming. IEEE Trans Inform Theory 2005;51:4203–15. Candes EJ, Wakin MB. An introduction to compressive sampling. IEEE Signal Process Mag 2008;25:21–30. Candes EJ, Romberg J, Tao T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inform Theory 2006;52:489–509. Chih-Wei H, Chih-Jen L. A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 2002;13:415–25. Cuiwei L, Chongxun Z, Changfeng T. Detection of ECG characteristic points using wavelet transforms. IEEE Trans Biomed Eng 1995;42:21–8. Deng C, Xiaofei H, Jiawei H. Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 2005;17:1624–37. Deng C, Xiaofei H, Jiawei H, Huang TS. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 2011;33:1548–60. Ding C, Tao L, Jordan MI. Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 2010;32:45–55. Donoho DL. Compressed sensing. IEEE Trans Inform Theory 2006;52:1289–306. Dornhege G, Blankertz B, Curio G, Muller K. Boosting bit rates in noninvasive EEG single-trial classifications by feature combination and multiclass paradigms. IEEE Trans Biomed Eng 2004;51:993–1002. Groppe DM, Makeig S, Kutas M. Identifying reliable independent components via split-half comparisons. Neuroimage 2009;45:1199–211. Haifeng L, Zhaohui W, Xuelong L, Deng C, Huang TS. Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 2012;34:1299–311. Haiping L, How-Lung E, Cuntai G, Plataniotis KN, Venetsanopoulos AN. Regularized common spatial pattern with aggregation for EEG classification in small-sample setting. IEEE Trans Biomed Eng 2010;57:2936–46. Heung-Il S, Seong-Whan L. A novel bayesian framework for discriminative feature extraction in brain–computer interfaces. IEEE Trans Pattern Anal Mach Intell 2013;35:286–99. Higashi H, Tanaka T. Time sparsification of EEG signals in motor-imagery based brain computer interfaces. In: 2012 annual international conference of the IEEE engineering in medicine and biology society (EMBC); 2012. p. 4271–4. Hoyer PO. Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 2004;5:1457–69. Huang NE, Wu Z. A review on Hilbert-Huang transform: method and its applications to geophysical studies. Rev Geophys 2008;46:RG2006. Kai Keng A, Zhang Yang C, Haihong Z, Cuntai G. Filter Bank Common Spatial Pattern (FBCSP) in brain–computer interface. In: IEEE international joint conference on neural networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence); 2008. p. 2390–7. Kim H, Park H. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 2007, June;23:1495–502. Kuhn HW. The Hungarian method for the assignment problem. Naval Res Logist Q 1955;2:83–97. Lal TN, Hinterberger T, Widman G, Schröder M, Hill J, Rosenstiel W, et al. Methods towards invasive human brain computer interfaces. In: Advances in neural information processing system; 2005. p. 737–44. Lee H, Choi S. Group nonnegative matrix factorization for EEG classification. In: Presented at the international conference on artificial intelligence and statistics; 2009.

49

Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature 1999;401:788–91. Lee D, Seung S. Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems; 2001. p. 556–62. Lee H, Cichocki A, Choi S. Nonnegative matrix factorization for motor imagery EEG classification. In: Kollias S, Stafylopatis A, Duch W, Oja E, editors. Artificial Neural Networks – ICANN 2006, vol. 4132. Berlin/Heidelberg: Springer; 2006. p. 250–9. Lee H, Yoo J, Choi S. Semi-supervised nonnegative matrix factorization. IEEE Signal Process Lett 2010;17:4–7. Li SZ, XinWen H, HongJiang Z, Qiansheng C. Learning spatially localized, parts-based representation. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition, 2001. CVPR 2001, vol. 1; 2001. pp. I-207–I-212. Li P, Xu P, Zhang R, Guo L, Yao D. L1 Norm based common spatial patterns decomposition for scalp EEG BCI. BioMed Eng Online 2013;12:77. Liu W, Zheng N, Li X. Nonnegative matrix factorization for EEG signal classification. In: Yin F-L, Wang J, Guo C, editors. Advances in neural networks – ISNN 2004, vol. 3174. Berlin/Heidelberg: Springer; 2004. p. 470–5. Lotte F, Cuntai G. Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms. IEEE Trans Biomed Eng 2011;58:355–62. Luck SJ. An introduction to event-related potentials and their neural origins. MIT Press; 2005. Makeig S. Auditory event-related dynamics of the EEG spectrum and effects of exposure to tones. Electroencephalogr Clin Neurophysiol 1993;86:283–93. Makeig S, Onton J. ERP features and EEG dynamics: an ICA perspective. In: Luck SJ, Kapppenman E, editors. Oxford handbook of event-related potential components. New York: Oxford University Press; 2009. Makeig S, Debener S, Onton J, Delorme A. Mining event-related brain dynamics. Trends Cogn Sci 2004;8:204–10. Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 1996;381:607–9. Qian Z, Deyu M, Zongben X, Wangmeng Z, Lei Z. Robust principle component analysis with complex noise. In: Presented at the international conference on machine learning; 2014. Stewart AX, Nuthmann A, Sanguinetti G. Single-trial classification of EEG in a visual object task using ICA and machine learning. J Neurosci Methods 2014;228:1–14. Subasi A, Ismail Gursoy M. EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst Appl 2010;37:8659–66. Ting W, Guo-zheng Y, Bang-hua Y, Hong S. EEG feature extraction based on wavelet packet decomposition for brain computer interface. Measurement 2008;41:618–25. Wang H, Tang Q, Zheng W. L1-Norm-based common spatial patterns. IEEE Trans Biomed Eng 2012;59:653–62. Wang Y, Veluvolu K, Lee M. Time-frequency analysis of band-limited EEG with BMFLC and Kalman filter for BCI applications. J Neuroeng Rehabil 2013;10:109. Wei W, Xiaorong G, Bo H, Shangkai G. Classifying single-trial EEG during motor imagery by iterative spatio-spectral patterns learning (ISSPL). IEEE Trans Biomed Eng 2008;55:1733–43. Wolpaw JR. Brain–computer interfaces as new brain output pathways. J Physiol 2007, March;579:613–9. Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM. Brain–computer interfaces for communication and control. Clin Neurophysiol 2002;113:767–91. Wu C-H, Chang H-C, Lee P-L, Li K-S, Sie J-J, Sun C-W, et al. Frequency recognition in an SSVEP-based brain computer interface using empirical mode decomposition and refined generalized zero-crossing. J Neurosci Methods 2011;196: 170–81. Yijun W, Shangkai G, Xiaorong G. Common spatial pattern method for channel selelction in motor imagery based brain–computer interface. In: 27th annual international conference of the engineering in medicine and biology society, 2005. IEEE-EMBS 2005; 2005. p. 5392–5. Younghak S, Seungchan L, Junho L, Heung-No L. Sparse representation-based classification scheme for motor imagery-based brain–computer interface systems. J Neural Eng 2012;9:056002.

Structure constrained semi-nonnegative matrix factorization for EEG-based motor imagery classification.

Sparse Nonnegative Matrix Factorization Strategy for Cochlear Implants.

Link community detection using generative model and nonnegative matrix factorization.

Variational regularized 2-D nonnegative matrix factorization.

Convex nonnegative matrix factorization with manifold regularization.

Max-min distance nonnegative matrix factorization.

Nonnegative matrix factorization for the identification of EMG finger movements: evaluation using matrix analysis.

A fast algorithm for nonnegative matrix factorization and its convergence.

Multiplicative update rules for concurrent nonnegative matrix factorization and maximum margin classification.

3-D Lung Segmentation by Incremental Constrained Nonnegative Matrix Factorization.

Uncovering community structures with initialized Bayesian nonnegative matrix factorization.

Automated graph regularized projective nonnegative matrix factorization for document clustering.

Identifying overlapping communities as well as hubs and outliers via nonnegative matrix factorization.

Impact of the Choice of Normalization Method on Molecular Cancer Class Discovery Using Nonnegative Matrix Factorization.

Online nonnegative matrix factorization with robust stochastic approximation.

A Quasi-Likelihood Approach to Nonnegative Matrix Factorization.

Symptom Names Using Multi-View Nonnegative Matrix Factorization.

Machine learning source separation using maximum a posteriori nonnegative matrix factorization.

Determining functional units of tongue motion via graph-regularized sparse non-negative matrix factorization.

Adaptation of motor imagery EEG classification model based on tensor decomposition.

Mining seasonal marine microbial pattern with greedy heuristic clustering and symmetrical nonnegative matrix factorization.

Single-channel blind separation using L₁-sparse complex non-negative matrix factorization for acoustic signals.

Low-Rank and Sparse Matrix Decomposition for Genetic Interaction Data.

Integrative clustering by nonnegative matrix factorization can reveal coherent functional groups from gene profile data.