Improving brain-computer interface classification using adaptive common spatial patterns.

Computers in Biology and Medicine 61 (2015) 150–160

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

Improving brain–computer interface classification using adaptive common spatial patterns Xiaomu Song a,n, Suk-Chung Yoon b a b

Department of Electrical Engineering, School of Engineering, Widener University, Chester, PA 19013, USA Department of Computer Science, College of Arts and Sciences, Widener University, Chester, PA 19013, USA

art ic l e i nf o

a b s t r a c t

Article history: Received 20 November 2014 Accepted 22 March 2015

Common Spatial Patterns (CSP) is a widely used spatial filtering technique for electroencephalography (EEG)-based brain–computer interface (BCI). It is a two-class supervised technique that needs subjectspecific training data. Due to EEG nonstationarity, EEG signal may exhibit significant intra- and intersubject variation. As a result, spatial filters learned from a subject may not perform well for data acquired from the same subject at a different time or from other subjects performing the same task. Studies have been performed to improve CSP's performance by adding regularization terms into the training. Most of them require target subjects' training data with known class labels. In this work, an adaptive CSP (ACSP) method is proposed to analyze single trial EEG data from single and multiple subjects. The method does not estimate target data's class labels during the adaptive learning and updates spatial filters for both classes simultaneously. The proposed method was evaluated based on a comparison study with the classic CSP and several CSP-based adaptive methods using motor imagery EEG data from BCI competitions. Experimental results indicate that the proposed method can improve the classification performance as compared to the other methods. For circumstances where true class labels of target data are not instantly available, it was examined if adding classified target data to training data would improve the ACSP learning. Experimental results show that it would be better to exclude them from the training data. The proposed ACSP method can be performed in real-time and is potentially applicable to various EEG-based BCI applications. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Brain–computer interface Common spatial patterns Electroencephalography Adaptive Nonstationarity

1. Introduction Brain–computer interface (BCI) is a communication technique that aims to identify a subject's brain intents and translate them into machine commands to control the operations of electromechanical devices. Electroencephalography (EEG) might be the most widely used noninvasive imaging technique in BCI. Due to the nonstationary nature of EEG, which is usually caused by changes of electrodes impedance or positions, subjects' attention, fatigue, eye movements, or muscular activity, EEG signals exhibit large intraand inter-subject variation [1]. As a result, an observed EEG pattern from a subject might not be repeatable from the same subject at a different time or from different subjects performing the same task. Various methods have been proposed to address the nonstationarity in EEG-based BCI [1,2]. These methods were focused on either feature extraction process [1,3–16], or feature modelling and classification [16–29]. Some methods adapt to the intra- and/or inter-

n Correspondence to: One University Place, Room 369, Chester, PA 19013. Tel.: þ 1 610 499 4058. E-mail address: [email protected] (X. Song).

http://dx.doi.org/10.1016/j.compbiomed.2015.03.023 0010-4825/& 2015 Elsevier Ltd. All rights reserved.

subject variation through supervised adaptive learning [20,24,30], semi-supervised or unsupervised learning [3,4,7,11,17–19,23,31–34], while others try to identify stationary patterns that are common within a single subject or across multiple subjects [1,5,6,8– 10,12,14,13,15,21,22,25–27,35,36]. Among these studies, methods developed based on common spatial patterns (CSP) have been paid special attention. CSP is a two-class spatial filtering technique that maximizes the variance of band-passed EEG signals from one class while minimizing the signal variance from the other [37]. It is efficient in extracting representative features for BCI classification, and can be extended for multi-class BCI applications. The original CSP method is a supervised and subject specific technique that requires training data from a target subject with known class labels. It is typically used on a subject-by-subject basis, and might not perform well for multi-subject BCI. In order to improve the multi-subject performance of CSP, prior information from different subjects can be added to the CSP learning via regularization. The regularization is typically implemented in two ways [14]. One is to calculate a weighted average of covariance matrices of EEG data from different subjects [3,38,4,6,7,12,39]. Fixed experiential weights are often used [3,4,12,39], but adaptive weights are also proposed to better quantify the similarity between training

X. Song, S.-C. Yoon / Computers in Biology and Medicine 61 (2015) 150–160

and testing data [40,41,38,6,7]. The other is to regularize the CSP objective function by adding a penalization term to impose prior information from multiple subjects [5,15,14,10,9]. By incorporating multi-subject information, the regularized CSP methods can outperform the conventional CSP in multi-subject BCI classification tasks. Most of the regularized CSP methods require labelled training data from target subjects. If training data are unlabelled, an estimation of their class labels is performed so that the data can be assigned to a specific class to update the covariance matrix of this class [3,9,7,4]. Erroneous estimations would affect the CSP learning and deteriorate the BCI classification performance. In this work, a different method to perform adaptive CSP (ACSP) learning is investigated. The method uses unlabelled EEG data from target subjects to learn spatial filters without an estimation of class labels for the target data, and updates spatial filters for both classes simultaneously using adaptive weights. There is no classification needed during the adaptive learning, and spatial filters can be updated in real-time to adapt to intraand inter-subject variation in EEG. It can be used to classify single trial EEG data from single or multiple subjects. The proposed method was evaluated using multi-subject motor imagery EEG data from BCI competitions III and IV. The remaining part of the paper is organized as follows. The classic CSP method is introduced in Section 2.1. The proposed ACSP method is described in Sections 2.2, 2.3, and 2.4. Section 2.5 explains the experimental EEG data used in this study, and how the method was evaluated. Experimental results are described and discussed in Section 3. Finally, Section 4 summarizes the paper.

2. Materials and methods 2.1. Common spatial patterns The proposed adaptive CSP method is developed based on the classic CSP approach [42,43]. CSP is a supervised two-class spatial filtering technique that aims to maximize feature variation for one class and simultaneously minimize feature variation for the other. Given an M N matrix Ei ðyÞ that represents the ith trial of EEG data collected under a brain task with class label y, y A f1; 2g, the normalized class-specific spatial covariance matrix Cy is computed as: Cy ¼

ny 1 X Ei ðyÞETi ðyÞ ; ny i ¼ 1 trðEi ðyÞETi ðyÞÞ

ð1Þ

where Ei ðyÞ is mean-centered, M is the number of channels, N is the number of time points, ny is the number of EEG trials in class y, and T is the matrix transpose operator. Based on the covariance matrix, the CSP training is to maximize the following Rayleigh coefficient: WCy WT ; P W y Cy W T

ð2Þ

which is equivalent to solve the generalized eigenvalue problem [40,14,37]: C1 WT ¼ C2 WT Λ;

ð3Þ

where the matrix W consists of spatial filters in rows, and Λ is a diagonal matrix assorted in descending order of eigenvalues of C2 1 C1 that measure the variance ratio between the two classes. With the projection matrix W, the spatial filtering of a trial Ei ðyÞ is computed as: Zi ¼ WEi ðyÞ:

ð4Þ

The columns of W 1 are the common spatial patterns that are considered as time-invariant EEG source distribution vectors. The

151

discrimination is based on the feature projections on W with maximal variations, which are the first and last m rows of Zi . Based on Zi , a feature vector is constructed for the ith trial with the rth spatial filter: " # Varðzr Þ xr ¼ log P2m ; ð5Þ j ¼ 1 Varðzj Þ where VarðÞ is the variance calculator, and zr is the rth row of Zi . The logarithmic transformation is applied to make the distribution of xr more close to Gaussian. 2.2. Adaptive common spatial patterns In CSP and some of its extensions for multi-subject analysis, the spatial filter W is calculated and then fixed for the processing of new data [6,12,15]. When there is no or unlabelled training data from target subjects, fixed spatial filters are usually not sufficient to characterize spatial covariance structures of new data. CSP extensions have been proposed to adapt to unlabelled data [3,38,9,7,4]. For example, in an adaptive method proposed in [3], the class label of each testing trial is first estimated. Then the trial is assigned to the estimated class to update its covariance matrix with a fixed weight, and CSP features are updated and reclassified. In a parametric model-based adaptive method [4], CSP features extracted from a testing trial are modelled by a two-component Gaussian mixture model (GMM). The expectation maximization (EM) algorithm is used to estimate class labels for testing trials. The classified trials showing high posterior class probabilities are added to the estimated class to update its covariance matrix and CSP features. This process is repeated multiple times until the overall change of class labels between two contiguous iterations is below a predefined threshold. In another adaptive method [7], an initial classification is first performed on a testing trial, and then the covariance matrix of the estimated class is updated based upon a weight calculated using the Kullback–Leibler divergence (KLD) between the training and testing trials. After updating the covariance matrix, CSP features are updated and reclassified. This process can be repeated multiple times. An initial classification is required in these methods to assign a testing trial to a class to update the class spatial covariance matrix. Ideally, if the new trial is from class y, then it should be similar to training trials from y in terms of feature variation, data distribution, or normalized spatial covariance matrix, and be correctly classified by a classifier. Due to EEG nonstationarity, however, the expected similarity may not be apparent, and it is possible that the new data is more similar to training data of the opposite class. If the new trial is mis-classified, the spatial filters updated based on the erroneous classification could affect the BCI classification. In this work, a different way to perform the ACSP learning is proposed. Instead of estimating class labels for new EEG trials, a similarity measure between new and training data in each class is calculated, and spatial filters of both classes are simultaneously updated based on the similarity measure. Three different similarity measures are used based upon which the ACSP method is developed. The details of the proposed method are described as follows. Given a new EEG trial from a target subject with an unknown class label and a normalized spatial covariance matrix Cnew , the following method is proposed to calculate the new class covariance matrices: C1 ¼ C2 ¼

ϕ1

n1 þ sgnðϕ1 Þ

ϕ2

n2 þ sgnðϕ2 Þ

Cnew þ

n1 C1 ; n1 þ sgnðϕ1 Þ

Cnew þ

n2 C2 ; n2 þ sgnðϕ2 Þ

ð6Þ

152


where n1 and n2 are numbers of training trials from classes 1 and 2, respectively. ϕ1 Z 0 and ϕ2 Z 0 are two measures that quantify the similarity between the target and training data in the two classes, and need to be estimated. sgn(x) is the sign function that equals to 1 if x 4 0, and equals to 0 if x ¼0. Three different similarity measures are used to estimate ϕy, y A f1; 2g. 2.2.1. Feature variance-based distance When a new EEG trial is projected onto existing spatial filters of the two classes using Eq. (4), the feature variance xr is computed using (5). ϕy is calculated as the ratio of the sum of feature variance in class y to the overall feature variance of the two classes: P y ðxr Þ ; ð7Þ ϕy ¼ P 1 r P 2 r ðxr Þ þ r ðxr Þ where x1r is computed from the first rth row of the feature projection on W, x2r is from the last rth row of the feature projection, and xyr is the rth feature in class y. The feature variance defined in (5) was originally proposed as a primary CSP-based feature for BCI classification, where a class with a greater projected feature variance has a higher possibility to be the true motor imagery class. Therefore, it may be used as a similarity indicator between training and target data, based upon which a weight parameter can be derived to update the class covariance matrices.

2.2.2. Kullback–Leibler divergence KLD is a distance measure between two probability distributions, and used to quantify the similarity between distributions of target and training data. If EEG data in each trial are normalized to zero mean and standard deviation, then we may assume that EEG data follow a zero mean M-dimensional multivariate Gaussian distribution, where M is the number of EEG channels. The probability distributions of a new target EEG trial and training data are represented as f new ¼ N ð0; Cnew Þ and f y ¼ N ð0; Cy Þ, respectively. The KLD between fnew and fy is calculated as: 1 detðCnew Þ trðCy 1 Cnew Þ log M ; ð8Þ KLðf new ; f y Þ ¼ 2 detðCy Þ where detðÞ is the determinant operator, and y A f1; 2g. Since KLD is not a symmetric distance measure, KLðf new ; f y Þ a KLðf y ; f new Þ. KLD can be symmetrized by adding KLðf new ; f y Þ and KLðf y ; f new Þ together: KLDðf new ; f y Þ ¼ 12 ½KLðf new ; f y Þ þKLðf y ; f new Þ The parameter

ϕy ¼ 1

ð9Þ

ϕy is computed as:

KLDðf new ; f y Þ : KLDðf new ; f 1 Þ þ KLDðf new ; f 2 Þ

ð10Þ

If Cnew is not from class y, theoretically the value of KLDðf new ; f y Þ=ðKLDðf new ; f 1 Þ þ KLDðf new ; f 2 ÞÞ is relatively large, resulting in a small ϕy. On the contrary, a large ϕy will be obtained to assign a greater weight to Cnew .

The parameter

i;j

ð12Þ

If Cnew is not from class y, theoretically the difference between Cnew and Cy is greater than the case that Cnew is from class y, resulting in a small ϕy. Contrarily, a large ϕy will be assigned to Cnew . ϕ1 and ϕ2 can be estimated by using one of the three similarity measures. After calculating C y using (6), the remaining steps are the same as the CSP method to obtain updated spatial filters for feature extraction. It can be observed from (6) that the weights for Cy and Cnew are also related to the number of training trials. More training trials lead to less weights for the target data. A greater number of training trials means a higher opportunity to provide more prior information about target subjects, and consequently a lower chance that the new trial may exhibit considerably different patterns. When the number of training trials is large, the weights assigned to Cnew are relatively small, and have a slight effect on the overall covariance matrices. However, such small variation may result in a significant change of feature distribution and final classification results. It was confirmed by the experimental results of this study. The proposed ACSP method is equivalent to a simplified EM algorithm [44,45], where the EEG data is characterized by a zero-mean two-component multivariate GMM. The two components correspond to the two classes in CSP. The probability of each component is approximated using the weights of Cnew and Cy shown in (6) (the E-step), and the covariance matrix of each component is estimated using the weighted average of Cnew and Cy (the M-step). To facilitate the real-time processing, no iteration between the E-step and the M-step is performed. The proposed method is quite different from another EM-based approach described previously [4], where the GMM is used to model CSP features instead of EEG data, and a decision of class label is made in each EM iteration. By assigning partial membership to each target EEG trial, it is expected that the proposed ACSP method can adapt to intra- and inter-subject variation and provide a better learning performance than the classic CSP method. 2.3. Implementations of ACSP In the following sections, the ACSP implementation using the feature variance-based distance is called ACSP-Ia, the one using the symmetrized KLD is named ACSP-Ib, and the other using FN is denoted as ACSP-Ic. Although different similarity measures could be used, the proposed ACSP can be implemented following a general procedure:

Step 1: A spatial filter W is computed using the classic CSP with EEG training data.

Step 2: Input a new EEG trial from a target subject. Step 3: ACSP-Ia: Compute feature projection of the new data on

2.2.3. Frobenius norm Frobenius norm (FN) is a matrix norm defined as the square root of the sum of the absolute squares of its elements. It can be used to measure the distance between two matrices. In this work, FN is used to estimate the similarity between Cnew and Cy : sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ½Cnew ði; jÞ Cy ði; jÞ2 : FNðCnew ; Cy Þ ¼ ð11Þ

ϕy is estimated using

FNðCnew ; Cy Þ : ϕy ¼ 1 FNðCnew ; C1 Þ þ FNðCnew ; C2 Þ

W using (4) and (5). ACSP-Ib and -Ic: Compute the covariance matrix Cnew of the new data, and the KLD or FN between Cnew and Cy , y A f1; 2g using (9) or (11). Step 4: Estimate ϕ1 and ϕ2 using (7) (ACSP-Ia), or (10) (ACSPIb), or (12) (ACSP-Ic). Step 5: Compute C 1 and C 2 using (6), and update the spatial filter W. Project the target and training data onto the updated W and extract features using (4) and (5). Step 6: Features extracted from the training data are used to train/retrain a data classifier to classify features extracted from the target data. Step 7: Go to Step 2 for the next target trial.


2.4. Non-accumulative vs. accumulative implementation of ACSP In existing ACSP studies [3,7,4], classified target trials are added to training data to improve the CSP learning performance. This procedure is denoted as accumulative ACSP in this work. On the contrary, non-accumulative ACSP does not include classified trials to update the training data. The proposed ACSP method can be implemented in the accumulative or non-accumulative way, and we examined both cases in the experiments. For the accumulative ACSP, we examined two different implementations. The first is to add classified target trials to training data of the estimated class, and the update of spatial covariance matrix is class-specific. The second implementation is to update C1 and C2 to C 1 and C 2 using (6) after the classification. This is equivalent to add target trials to training data of both classes with the weights shown in (6). The first class-specific accumulative implementation is denoted as “ACCU-1” in the experimental study, and the second implementation that updates spatial filters for both classes is called “ACCU-2”. In the non-accumulative ACSP, classified trials are not added to training data, and as a result, C1 and C2 are not updated to C 1 and C 2 after the classification of each target trial although C 1 and C 2 are computed during the ACSP learning. Intuitively, the accumulative ACSP would outperform the nonaccumulative one because previous studies have shown improvements induced by including target subjects' data into the CSP training. This is usually true if there is a feedback loop or “ground truth” to know true class labels of target trials. However, if the true class label is not instantly available in real-time BCI applications, mis-classified target trials will be added to training data of a wrong class and affect the CSP learning and final classification. The non-accumulative ACSP does not add classified trials to the training data, and would be an alternative option to the accumulative one if it outperforms the accumulative ACSP in this situation. A comparison study was performed between the nonaccumulative and accumulative ACSP and the results are reported in Section 3. 2.5. Evaluation 2.5.1. EEG datasets The proposed method was evaluated using four datasets from BCI Competitions III (datasets IIIa and IVa) and IV (datasets IIa and IIb) [46,47]. Dataset IIIa was recorded from three subjects using a 64channel EEG system at a sampling rate of 250 Hz and band-pass filtered between 1 and 50 Hz with a 50 Hz notch filter on. One of the four motor imagery tasks, including left hand, right hand, foot, and tongue, was performed during the data acquisition. For each subject, there are 60 training and 60 testing trials from each class. The left hand and foot data were used in this work. A time segment from 3.5 s to 5 s was extracted from each trial. Dataset IVa was acquired from five subjects using 118 EEG channels at a sampling rate of 1000 Hz while the subjects were performing one of the left hand, right hand, and right foot motor imagery tasks. The data were bandpass filtered between 0.05 and 200 Hz, and down-sampled to 100 Hz. In each task, the numbers of training and testing trials vary over different subjects, and are listed in Table 1. The five subjects' IDs are “aa”, “al”, “av”, “aw”, and “ay”. The data associated with the left and right hand tasks were used in this study. Dataset IIa was collected from 9 subjects with 22 EEG channels while the subjects were performing one of the four motor imagery tasks: left hand, right hand, foot, and tongue. Two sessions of data were acquired from each subject on different days. Each session consists of 288 trials, with 72 trials for each task. In this work, only the left and right hand data were used. The data from the first session were used for training, and those from the second session were used for the evaluation. Dataset IIb was acquired from the same set of subjects as that in IIa using 3 EEG channels while the subjects

153

Table 1 The number of training and testing trials of five individual subjects in dataset IVa.

Training Testing

aa

al

av

aw

ay

168 112

224 56

84 196

56 224

28 252

were performing either left or right hand task. Five sessions were recorded for each subject on different days. Three sessions were used for training, and the other two were used for the testing. In each session there are about 60–80 trials from each task condition. All trials in IIa and IIb were sampled at 250 Hz and band-pass filtered between 0.5 and 100 Hz during the acquisition. A 50 Hz notch filter was applied to attenuate line noise. A time segment from 2 s to 4 s was extracted from each trial. Eye movement artifacts in IIa and IIb were attenuated using regression based on the simultaneously acquired electrooculogram data [48]. All EEG trials were band-pass filtered in the 8–32 Hz frequency band, which is considered to contain the most relevant information of motor imagery [49]. The filtered EEG data have a zero mean and were normalized to standard deviation.

2.5.2. Method evaluation The evaluation was based upon the intra-subject and intersubject classification performance. In the intra-subject evaluation, the training and testing trials were from the same subject. In the inter-subject study, two types of evaluations were performed. One was to use training trials from all subjects in a dataset to learn spatial filters, and apply them to the testing data from all subjects in the same dataset. The other is more challenging: the training trials from only one subject in a dataset were used to learn spatial filters, and the testing trials from all other subjects in the same dataset were used for the evaluation. A cross validation was performed so that each subject's training trials were used once to train spatial filters, while all other subjects' evaluation trials were used for the testing. A linear support vector machine (SVM) was used as the data classifier. In the accumulative or nonaccumulative ACSP, the linear SVM is retrained using the updated features each time after a new target trial is provided and the ACSP learning is performed. Accuracy and Cohen's kappa (κ) coefficient were computed to evaluate the classification performance [50]. κ ranges from 0 to 1 where 0 corresponds to a chance level classification with a 50% accuracy, and 1 means a perfect classification. Compared to accuracy, Cohen's kappa (κ) coefficient provides a more reliable evaluation of classification performance for unbalanced two- or multi-class classifications problems. The effects of training data size on the ACSP learning were also investigated for both intra- and inter-subject studies. The training data from each subject were split into two groups by indices odd and even. Each group was used once to train spatial filters, and the ACSP learning and classification were performed on all testing data. The final accuracy and κ were computed using the classification results from the two groups. The evaluation of the non-accumulative ACSP was based on a comparison study with the classic CSP and the semi-supervised importance weighted extreme energy ratio (SIWEER) method [34]. The SIWEER method was developed based on the extreme energy ratio (EER) criterion [51], which is theoretically equivalent to CSP but computationally more efficient. The SIWEER method uses the Kullback–Leibler importance estimation procedure (KLIEP) to update the weights of class covariance matrices for the CSP learning [52]. KLIEP is a covariate shift adaptation technique where a Gaussian kernel is used to measure the distance between training and testing data, and the kernel width σ needs to be estimated using cross validation. An optimal setting of σ ¼8 was also suggested based on the previous

154


experimental study of the SIWEER method [34]. In the experimental study, both cross validation and σ ¼8 were investigated and their results are quite similar. Finally the results obtained using σ ¼8 were selected for the comparison. The study of the accumulative ACSP was based upon a comparison with two existing accumulative ACSP methods [3,7]. The first method, which is briefly described in Section 2.2, uses a fixed experiential weight for the covariance matrix Cnew of the testing trial [3], and is called ACSP-II in this work. The class label of a testing trial is first estimated in ACSP-II using the CSP features and a nonlinear SVM with the radial basis function (RBF) kernel. Then Cnew is assigned to the estimated class to update its covariance matrix with a weight of 0.05, and the weight assigned to the current covariance is 0.95. Finally the CSP learning is performed to update spatial filters and features for the SVM classification. The second method is also described in Section 2.2 and named ACSP-III in the comparison study [7]. It performs an initial classification on the testing trial using the Filter Bank Common Spatial Pattern (FBCSP) method [53,54]. Then the covariance matrix of the estimated class is updated by assigning a weight of n=ðN þnÞ to the current covariance, where N is the total number of training trials in the class, and n is the number of testing trials analyzed by ACSP-III so far. A weight of n=ðN þ nÞKLðf new ; f y Þ is assigned to the covariance of the testing trial, where y is the estimated class, and KLðf new ; f y Þ is the KLD between the testing and training trials as defined in (8). After updating the covariance matrix, CSP learning is performed and features are extracted based on the new spatial filters and classified using a SVM classifier. The updating of covariance matrix, CSP learning, feature re-extraction and classification can be repeated multiple times. In this study, three times of iteration was used as suggested in the original work [7]. Since the focus of this work is the adaptive CSP learning, and in order to make the comparison study under the same data/feature condition and not obscured by contributions from advanced preprocessing, feature extraction/selection, and data classifiers, the input to all ACSP methods is minimally preprocessed to remove major artifacts as described in Section 2.5.1, and the same linear SVM is used as the data classifier for all methods. Some processing steps in ACSP-II and -III were not performed, including the subject specific bandpass filtering and nonlinear SVM in ACSP-II, and multi-band filtering and feature extraction in ACSP-III. Therefore, the numerical results reported in the next section are not directly comparable to the results published in the original works using the two methods [3,7], as well as other works competing for the best results on the datasets from BCI competitions. 3. Results In order to determine an optimal m value in (5) for feature extraction after the CSP/ACSP training, candidate m values ranging from 1 to 5 were examined for datasets IIa, IIIa, and IVa using the classic CSP in the intra- and inter-subject studies. It was found that in most cases m¼2 generates the highest classification accuracy. Thus m¼ 2 was used for the three datasets in the CSP- and ACSP-based feature extraction in the experimental study. Since there are only three EEG channels in dataset IIb, m¼1 was used for this dataset. 3.1. Intra-subject study Fig. 1 shows a comparison of the intra-subject classification accuracy between the CSP, SIWEER, and non-accumulative implementation of the proposed ACSP method. Each circle in the figure represents the classification accuracy of an individual subject from one of the four datasets. It was observed that the ACSP method outperforms the conventional CSP on most subjects, and ACSP-Ia, -Ib, and -Ic exhibit a similar performance in terms of the number of subjects showing increased accuracy. The SIWEER method can

improve the accuracy for part of the subjects. The overall classification accuracy Pa and κ values for the four datasets are given in Table 2. It can be seen that the adaptive method can improve the spatial filtering and lead to increased Pa and κ values compared to CSP. For dataset IIa, a Pa of 71.84% was achieved from ACSP-Ib associated with a κ value of 0.44. This Pa value is 14.43% higher than that from the CSP. A twotailed t-test was performed on the classification accuracies of all individual subjects, and the improvement obtained from ACSP-Ib is significant at the 0.003 level. The improvements from ACSP-Ia and -Ic are close to that of ACSP-Ib. For dataset IIb, the performances of ACSPIa, -Ib, and -Ic are also close to each other, and the increases in Pa are greater than 13.0% as compared to CSP. After examining the accuracies of individual subjects, it was found that ACSP-Ia, -Ib, and -Ic can improve the classification accuracy for all subjects in this dataset, and the increases range from 0.62% to 31.88%. The t-test indicates that the increases in Pa obtained from ACSP-Ia, -Ib, and -Ic are significant at the 0.007 level. The highest Pa for dataset IIIa was from ACSP-Ia with an increase of 21.35% compared to that of CSP. The corresponding κ is 0.51. The significance level of this increase is 0.18. The accuracies computed for individual subjects in this dataset range from 50.0% to 93.1% with a standard deviation of 22.60%. This large standard deviation is the primary reason for the high significance level. For dataset IVa, ACSP-Ib resulted in the highest accuracy with a Pa of 76.55% and κ ¼0.53. The increase in Pa is 9.65% compared to CSP, and the significance level of this increase is 0.09. The improvements from ACSP-Ia and -Ic are close to that of ACSP-Ib. It was also found that the improvement generated from SIWEER is not so significant as compared to the proposed method. Table 3 lists Pa and κ values when a half of the training data were used in each method. It was found that the final classification performance was not significantly affected for datasets IIa and IIb when ACSP-Ia, -Ib, -Ic, and SIWEER were used. For dataset IIIa, further increases of 10.39% (SIWEER) and 2.81% (ACSP-Ib) in Pa were achieved, while decreases of 16.79% (SIWEER) and 15.12% (ACSP-Ia, ACSP-Ib) were observed from dataset IVa. After checking the accuracies of individual subjects obtained by both of the methods, it was found that the largest decrease was from subject “ay” with the least number of training trials but the largest number of testing trials. The second largest decrease was from subject “aw”, and the third was from subject “av”. An increase in accuracy was observed from subjects “aa” and “al”. After reducing the training trials to a half, the numbers of training trials are 84 for “aa”, 112 for “al”, 42 for “av”, 28 for “aw”, and 14 for “ay”. This implies that an increased number of training trials does not necessarily improve the CSP learning because sometimes more training data could bring more artifacts. On the other hand, if the number of training trials is too small to obtain representative spatial patterns for each subject, the CSP learning is also affected. 3.2. Inter-subject study 3.2.1. Multi-subject training For the inter-subject study of ACSP with multi-subject training data, the training trials of all subjects in each dataset were used with an equal weight. Fig. 2 illustrates a comparison of the classification accuracies of all individual subjects in the four datasets between the CSP, SIWEER, and the non-accumulative ACSP. Table 4 gives the overall classification performance. For dataset IIa, ACSP-Ia, -Ib and -Ic achieved an increase of 22.91% in Pa as compared to CSP at a significance level of 0.0004. An increase of 7.89% in Pa was obtained for dataset IIb using ACSP-Ib. This increase is significant at the 0.005 level. The highest improvement for dataset IIIa was obtained using ACSP-Ia, with an increase of 22.48% in Pa as compared to CSP at a significance level of 0.13. Similar to the intra-subject study, the high significance level is due to a large standard deviation of accuracies, which range from 56.67% to 90.0% with a standard deviation of 16.67%. For dataset IVa, the highest increase of 6.9% in Pa is from ACSP-Ib, and the significance


155

Fig. 1. A comparison of the intra-subject classification accuracies of all subjects in the four datasets between the classic CSP, SIWEER, and proposed non-accumulative ACSP. Each circle represents the classification accuracy from an individual subject in datasets IIa, IIb, IIIa, and IVa.

Table 2 A comparison of the intra-subject learning performance between the classic CSP, SIWEER, and proposed non-accumulative ACSP when training and testing data are from same subjects in datasets IIb, IIa, IIIa, and IVa. Pa(%): overall accuracy, κ: kappa statistic, p: p value of the t-test.

Table 3 A comparison of the intra-subject learning performance between the classic CSP, SIWEER, and proposed non-accumulative ACSP when a half of training data are used for the learning of spatial filters. Dataset

Dataset

IIa Pa

κ

p

Pa

κ

p

CSP SIWEER ACSP-Ia ACSP-Ib ACSP-Ic

57.41 59.18 71.37 71.84 71.53

0.15 0.18 0.43 0.44 0.43

NA 0.54 0.004 0.003 0.004

61.62 61.73 75.49 75.53 75.53

0.23 0.23 0.51 0.51 0.51

NA 0.35 0.007 0.007 0.007

Dataset

IIIa


IIa

IIb

IIIa

IVa

IIb


Pa

κ

Pa

κ

Pa

κ

Pa

κ

60.73 60.42 71.72 71.53 71.06

0.21 0.21 0.43 0.43 0.42

61.88 62.46 75.16 75.11 75.12

0.24 0.25 0.5 0.5 0.5

60.11 68.26 77.53 78.09 77.53

0.2 0.37 0.55 0.56 0.55

52.08 52.02 60.83 61.43 61.31

0.04 0.04 0.22 0.23 0.23

IVa

Pa

κ

p

Pa

κ

p

53.93 57.87 75.28 74.72 74.72

0.08 0.16 0.51 0.49 0.49

NA 0.34 0.18 0.18 0.18

66.9 68.81 75.95 76.55 76.07

0.34 0.38 0.52 0.53 0.52

NA 0.35 0.1 0.09 0.1

level is 0.25. The range of individual subjects' accuracies is from 50% to 74.14% with a standard deviation of 10.73%. The relatively low increase and large standard deviation resulted in the high significance level. ACSP-Ia, -Ib, and -Ic exhibit a similar performance for these datasets. Slight increases in Pa were obtained by the SIWEER method for datasets IIa, IIIa, and IVa, but a decrease was observed from dataset IIb.

The reduction of training trials to a half did not bring too much variation in the classification performance for all four datasets when the non-accumulative ACSP was used, as shown in Table 5. Compared to the Pa values in Table 4, the highest Pa is reduced a little for datasets IIa, IIb, and IVa, but slightly increased for dataset IIIa. When the SIWEER method was used with the reduced training sets, a 14.63% decrease in accuracy was observed from dataset IIb, but an increase of 8.71% was obtained for dataset IIIa. If we compare the results in Tables 2 and 4, it can be seen that Pa and κ values obtained from CSP in the inter-subject study using multi-subject training are lower than those from the intra-subject study for three of the four datasets. The proposed ACSP method can raise the inter-subject classification performance to a similar level of the intra-subject classification for datasets IIa, IIb, and IIIa. For dataset IVa, the small numbers of training trials in part of the subjects and the unbalanced

156


Fig. 2. A comparison of the inter-subject classification accuracies of all individual subjects in the four datasets between the classic CSP, SIWEER, and proposed nonaccumulative ACSP when the training and testing data are from all subjects in each dataset.

Table 4 A comparison of the inter-subject learning performance between the classic CSP, SIWEER, and proposed non-accumulative ACSP when training and testing data are from all subjects in each dataset. Dataset

IIa

IIb

Dataset

Pa

κ

p

Pa

κ

p


49.85 50.0 72.76 72.76 72.76

0.0 0.0 0.46 0.46 0.46

NA 0.35 0.0004 0.0004 0.0004

67.15 64.75 74.44 75.04 74.44

0.34 0.3 0.49 0.5 0.49

NA 0.44 0.01 0.005 0.01

Dataset

IIIa


Table 5 A comparison of the inter-subject learning performance between the classic CSP, SIWEER, and proposed non-accumulative ACSP when a half of training data from all subjects in each dataset are used for the learning of spatial filters.

CSP SIMEER ACSP-Ia ACSP-Ib ACSP-Ic

IIa

IIb

IIIa

IVa

Pa

κ

Pa

κ

Pa

κ

Pa

κ

50.19 50.5 71.57 71.6 71.6

0.004 0.01 0.43 0.43 0.43

50.12 50.12 74.61 74.61 74.61

0.002 0.002 0.49 0.49 0.49

69.1 63.2 75.0 74.44 74.16

0.38 0.26 0.5 0.49 0.48

50.18 53.33 57.08 57.02 57.08

0.0 0.06 0.14 0.14 0.14

IVa

Pa

κ

p

Pa

κ

p

51.12 54.49 73.6 71.91 72.47

0.02 0.09 0.47 0.44 0.45

NA 0.32 0.13 0.13 0.12

51.79 53.57 58.57 58.69 58.57

0.04 0.07 0.17 0.18 0.17

NA 0.17 0.25 0.25 0.25

training and testing data would be two primary reasons that the ACSP method cannot make the inter-subject classification performance close to the intra-subject case.

3.2.2. Single-subject training The inter-subject study using single-subject training data is an extreme case to examine the adaptability of the proposed method.

Fig. 3 shows a comparison of the classification accuracies of all individual subjects in the four datasets between the classic CSP, SIWEER, and the non-accumulative ACSP. Table 6 shows the overall classification performance. For dataset IIa, the highest increase in Pa when compared to CSP was from ACSP-Ic, and the significance level of this 5.97% increase is 0.03. For dataset IIb, ACSP-Ib provided an increase of 13.75% in Pa at a significance level of 0.005. For dataset IIIa, the highest Pa was obtained using ACSPIa, and it is 3.93% higher than that of CSP. The highest improvement for dataset IVa was from ACSP-Ib with a 2.83% increase in Pa at a significance level of 0.08. When SIWEER was used, slight increases in accuracy were obtained for all four datasets. Under such a challenging experimental condition, the proposed ACSP method and SIWEER can still improve the learning of spatial filters


157

Fig. 3. A comparison of the inter-subject classification accuracies of all individual subjects in the four datasets between the classic CSP, SIWEER, and proposed nonaccumulative ACSP when the training data are from only one subject in one of the four datasets and testing data are from all other subjects in the same dataset.

Table 6 A comparison of the inter-subject learning performance between the classic CSP, SIWEER, and proposed non-accumulative ACSP when training data are from only one subject in a dataset, and testing data are from all other subjects in the same dataset. Dataset

IIa

IIb

Dataset

Pa

κ

p

Pa

κ

p


53.21 56.39 58.71 59.11 59.19

0.06 0.13 0.17 0.18 0.18

NA 0.21 0.03 0.03 0.03

56.27 56.29 69.85 70.02 70.0

0.13 0.13 0.39 0.4 0.4

NA 0.64 0.005 0.005 0.005

Dataset

IIIa


Table 7 A comparison of the inter-subject learning performance between the classic CSP, SIWEER, and proposed non-accumulative ACSP when a half of training data from an individual subject in one of the four datasets are used for the learning of spatial filters, and testing data are from all other subjects in the same dataset.


IIa

IIb

IIIa

IVa

Pa

κ

Pa

κ

Pa

κ

Pa

κ

54.89 52.96 58.37 58.52 58.42

0.09 0.06 0.17 0.17 0.17

59.02 59.15 67.75 67.81 67.81

0.18 0.18 0.36 0.36 0.36

53.79 53.65 53.37 52.39 53.37

0.08 0.07 0.07 0.05 0.07

50.07 50.97 55.85 57.08 57.2

0.0 0.02 0.12 0.14 0.14

IVa

Pa

κ

p

Pa

κ

p

50.28 50.84 54.21 53.09 53.37

0.01 0.02 0.08 0.06 0.07

NA 0.18 0.18 0.29 0.25

51.01 52.92 53.24 53.84 53.66

0.02 0.06 0.06 0.08 0.07

NA 0.1 0.12 0.11 0.07

and result in increased Pa and κ. Additionally, ACSP-Ia, -Ib, and -Ic show a similar performance, and outperform the SIWEER method. Table 7 shows the classification performances of the CSP, SIWEER, and non-accumulative ACSP methods obtained using a half of the training trials. Compared to the results in Table 6, the accuracies obtained by ACSP were slightly decreased for datasets

IIa, IIb, and IIIa, but increased for dataset IVa. The accuracies obtained using the SIWEER method were increased for datasets IIb and IIIa, but decreased for datasets IIa and IVa. The Pa values obtained from ACSP are higher than those from CSP for datasets IIa, IIb and IVa, but slightly lower for dataset IIIa. When SIWEER was used, the accuracies of datasets IIb, IIIa, and IVa are close to those obtained from CSP. If we compare the results in Tables 4 and 6, it can be seen that the classification performance of ACSP using a single subject's training data are lower than that using multisubject training data. Similar observations were made from the SIWEER method except for dataset IIa. This verified that the use of multiple subjects' training data may improve the learning of spatial filters and discrimination performance.

158


Table 8 A comparison of the learning performance between the two accumulative implementations of the proposed ACSP method and two existing accumulative ACSP methods in the intra-subject study and the inter-subject study using single- and multi-subject training data. Intra-subject

IIa

IIb

IIIa

IVa

Pa

κ

Pa

κ

Pa

κ

Pa

κ

62.65 69.14

0.25 0.38

74.93 75.46

0.49 0.51

75.84 75.84

0.52 0.52

58.93 67.38

0.18 0.35

ACSP-Ia ACCU-1 ACCU-2 ACSP-Ib ACCU-1 ACCU-2 ACSP-Ic ACCU-1 ACCU-2 ACSP-II ACSP-III

62.11 72.38

0.24 0.45

74.96 75.42

0.49 0.51

75.84 75.28

0.52 0.51

58.21 68.69

0.16 0.37

62.27 71.99 64.51 62.19

0.24 0.44 0.29 0.24

74.96 75.56 75.11 68.49

0.49 0.51 0.5 0.37

75.84 76.4 72.47 74.72

0.52 0.53 0.45 0.49

56.43 67.98 59.76 58.57

0.13 0.36 0.19 0.17

Inter-subject (multi-subject)

IIa

IIb

IIIa

IVa

Pa

κ

Pa

κ

Pa

κ

Pa

κ

70.83 71.6

0.42 0.43

74.19 74.26

0.48 0.49

70.22 67.98

0.4 0.36

54.05 57.62

0.08 0.15


70.83 72.53

0.42 0.45

74.19 74.26

0.48 0.49

64.61 70.79

0.29 0.42

56.67 63.1

0.13 0.26

70.83 72.53 59.49 69.29

0.42 0.45 0.19 0.39

74.19 74.23 74.19 70.67

0.48 0.48 0.48 0.41

63.48 69.1 55.62 70.22

0.27 0.38 0.11 0.4

55.48 63.21 54.4 56.31

0.11 0.26 0.09 0.13

Inter-subject (single-subject)

IIa


IIb

IIIa

IVa

Pa

κ

Pa

κ

Pa

κ

Pa

κ

51.26 54.46

0.03 0.09

65.57 67.11

0.31 0.34

50.0 51.97

0.0 0.04

50.42 51.25

0.01 0.02

51.34 57.31

0.03 0.15

65.48 69.19

0.31 0.38

50.56 54.78

0.01 0.09

50.0 53.15

0.0 0.06

51.36 57.87 52.38 52.25

0.03 0.16 0.04 0.04

65.47 69.79 63.31 60.87

0.31 0.39 0.27 0.22

50.0 51.97 50.28 50.0

0.0 0.04 0.01 0.0

50.0 53.33 49.76 49.61

0.0 0.07 0.0 0.0

3.3. Non-accumulative vs. accumulative ACSP In this work we also examined the effects of including classified trials into the training data if there is no feedback loop or “ground truth” to know true class labels of testing trials. Table 8 shows the accuracies of the four datasets when the proposed accumulative ACSP implementations described in Section 2.4 were performed for the intra-subject study and the inter-subject study using single- and multi-subject training data. For comparison, the results obtained from the two existing accumulative ACSP methods (ACSP-II and -III) described in Section 2.5.2 are also provided. It was observed that ACCU-2 outperforms ACCU-1, ACSP-II and -III in most cases. The performances of ACCU-1, ACSP-II and -III are close to each other. ACCU-1, ACSP-II and -III perform the class specific updating of the spatial covariance matrix, while ACCU-2 implements a weighted update of spatial covariance matrices for both classes. If we compare the results in Table 8 to those shown in Tables 2, 4 and 6, it can be found that the proposed non-accumulative ACSP usually outperforms ACSP-II, -III, and the proposed accumulative ACSP. To investigate how the proposed accumulative ACSP affects the classification performance, the EEG data from each evaluation session in datasets IIa and IIb were divided into four blocks, each of which has the same number of trials. Then the accuracy of each

block was computed from the results of inter-subject study using single-subject training data over all evaluation sessions and all subjects in each dataset. Fig. 4 shows the overall accuracies of individual blocks from the two datasets using the proposed nonaccumulative (NACCU) ACSP, ACCU-1, and ACCU-2, where (a)– (c) are the block accuracies obtained from ACSP-Ia, -Ib, and -Ic using dataset IIa, and (d)–(e) are those obtained using dataset IIb. It was observed that the block accuracies from ACCU-1 and ACCU2 are lower than those from the non-accumulative ACSP. This verifies that mis-classified trials can deteriorate the learning of spatial filters. On the contrary, the non-accumulative ACSP does not accumulate spatial covariance information from the opposite class, and provides better discrimination performance. It was also found that the block accuracies obtained from ACCU-1 are lower than those from ACCU-2. This is consistent to the results shown in Table 8. In ACCU-2, the spatial covariance matrices are updated for both classes instead of one class. Although the accumulation of partial spatial covariance from the opposite class affects the discrimination performance of spatial filters in ACCU-2, compared to the accumulation of an entire spatial covariance from the opposite class, mis-classified trials have less effects on the learning of spatial filters in ACCU-2. The observations from this comparison study imply that if true class labels cannot be provided for target


159

Fig. 4. The block-wise classification accuracy obtained from the evaluation sessions of all subjects in datasets IIb and IIa when the proposed ACSP method is implemented in the two accumulative ways. ACCU-1 is the first proposed accumulative ACSP implementation. ACCU-2 is the second proposed accumulative ACSP implementation. NACCU represents the proposed non-accumulative ACSP. (a)–(c) The block-wise accuracies of ACSP-Ia, -Ib, and -Ic when NACCU and ACCU-1 and -2 are implemented on dataset IIa. (e)–(g) The block-wise accuracies of ACSP-Ia, -Ib, and -Ic when NACCU and ACCU-1 and -2 are implemented on dataset IIb.

trials, the non-accumulative ACSP would be a better choice for motor imagery BCI tasks. 3.4. Limitation A major limitation of the proposed ACSP method is that its performance is affected by EEG artifacts, such as eye blinks and swallowing. Cy is the average covariance matrices of all trials in training data, and Cnew is the covariance matrix of a testing trial. The artifacts in training trials can be attenuated after the averaging, while possible artifacts in the testing trial may lead to an ill-conditioned Cnew that could generate unreliable estimations of the similarity measures defined in (7), (10) and (12). For instance, a poorly estimated Cnew dominates the symmetric KLD defined in (9) [1], and as a result, the similarity measure computed using the symmetric KLD is not reliable. Similar issues will also appear for the other two similarity measures. Therefore, some preprocessing is necessary to remove the EEG artifacts before the ACSP learning.

4. Conclusion The CSP method has been shown an efficient but subject-specific tool to identify discriminative spatial patterns for EEG-based BCI systems. In order to improve the single- and multi-subject performances of CSP, an adaptive CSP method is proposed to integrate spatial covariance of target data into the learning of spatial filters. The proposed ACSP method was evaluated based on a comparison study with the classic CSP method and SIWEER, a CSP extension that implements an importance weighted covariate shift adaptation to update spatial filters for the CSP learning. The motor imagery data

provided from BCI competitions III and IV were used in the comparison study. Three circumstances of BCI classification were examined, including intra-subject classification and inter-subject classification using multi-subject and single-subject training data. Experimental results show that the proposed ACSP outperforms the classic CSP and SIWEER methods in all three circumstances. To investigate if classified EEG trials could be added to training data to improve the ACSP learning when true class labels of target data are not instantly available, two accumulative implementations of the proposed ACSP were investigated and compared to two existing accumulative ACSP methods. The results show that one of the proposed accumulative ACSP implementations outperforms the others. In addition, the nonaccumulative ACSP outperforms the accumulative ones. There are two major innovations in the proposed method. First, it does not estimate class labels for target data during the adaptive learning. Second, it updates spatial filters for both classes simultaneously. Since the purpose of this study is to examine the performance of the proposed adaptive learning method, the algorithm does not include advanced preprocessing and classification to compete with the best of BCI competitions. In the future work, the proposed method will be integrated with advanced noise/artifacts removal, feature extraction/ selection, and nonlinear data classification methods, such as those showing impressive performances in the past BCI competitions, and be evaluated on more EEG data.

Conflict of interest statement None declared.

160


References [1] W. Samek, M. Kawanabe, K.-R. Müller, Divergence-based framework for common spatial patterns algorithms, IEEE Rev. Biomed. Eng. 7 (2014) 50–72. [2] S. Sun, J. Zhou, A review of adaptive feature extraction and classification methods for EEG-based brain–computer interfaces, in: Proceedings of International Joint Conference on Neural Networks, 2014, pp. 1746–1753. [3] S. Sun, C. Zhang, Adaptive feature extraction for EEG signal classification, Med. Biol. Eng. Comput. 44 (2006) 931–935. [4] Y. Li, C. Guan, An extended EM algorithm for joint feature extraction and classification in brain–computer interfaces, Neural Comput. 18 (2006) 2730–2761. [5] B. Blankertz, M. Kawanabe, R. Tomioka, F. Hohlefeld, V. Nikulin, K.-R. Müller, Invariant common spatial patterns: alleviating nonstationarities in brain– computer interfacing, Adv. Neural Inf. Process. Syst. 20 (2008) 1–8. [6] H. Kang, Y. Nam, S. Choi, Composite common spatial pattern for subject-tosubject transfer, IEEE Signal Process. Lett. 16 (8) (2009) 683–686. [7] A. Bamdadian, C. Guan, K. Ang, J. Xu, Online semi-supervised learning with KL distance weighting for motor imagery-based BCI, in: Proceedings of the 34th International Conference of IEEE EMBS, 2012, pp. 2732–2735. [8] W. Samek, F. Meinecke, K.-R. Müller, Transferring subspaces between subjects in brain–computer interfacing, IEEE Trans. Biomed. Eng. 60 (8) (2013) 2289–2298. [9] H. Wang, D. Xu, Comprehensive common spatial patterns with temporal structure information of EEG data: minimizing nontask related EEG component, IEEE Trans. Biomed. Eng. 59 (9) (2012) 2496–2505. [10] W. Samek, C. Vidaurre, K.R. Müller, M. Kawanabe, Stationary common spatial patterns for brain computer interfacing, J. Neural Eng. 9 (026013) (2012) 1–14. [11] K. Thomas, C. Guan, C. Lau, A. Vinod, K. Ang, Adaptive tracking of discriminative frequency components in electroencephalograms for a robust brain– computer interface, J. Neural Eng. 8 (036007) (2011) 1–15. [12] F. Lotte, C. Guan, Learning from other subject helps reducing brain–computer interface calibration time, in: Proceedings of the 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, 2010, pp. 614–617. [13] W. Samek, M. Kawanabe, C. Vidaurre, Group-wise stationary subspace analysis-a novel method for studying non-stationarities, in: Proceedings of the Fifth International Brain Computer interface Conference, 2011, pp. 16–20. [14] F. Lotte, C. Guan, Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms, IEEE Trans. Biomed. Eng. 58 (2) (2011) 1318–1324. [15] D. Devlaminck, B. Wyns, M. Grosse-Wentrup, G. Otte, P. Santens, Multisubject learning for common spatial patterns in motor-imagery BCI, Computat. Intell. Neurosci. 2011 (217987) (2011) 1–9. [16] M. Krauledat, M. Tangermann, B. Blankertz, K.-R. Müller, Towards zero training for brain computer interfacing, PLoS One 3 (8) (2008) 1–12. [17] P. Shenoy, M. Krauledat, B. Blankertz, R. Rao, K.-R. Müller, Towards adaptive classification for BCI, J. Neural Eng. 3 (2006) R13–R23. [18] C. Vidaurre, C. Sannelli, K.-R. Müller, B. Blankertz, Machine-learning-based coadaptive calibration for brain–computer interfaces, Neural Comput. 23 (3) (2011) 791–816. [19] S. Lu, C. Guan, H. Zhang, Unsupervised brain computer interface based on intersubject information and online adaptation, IEEE Trans. Neural Syst. Rehabil. Eng. 17 (2) (2009) 135–145. [20] Y. Li, H. Kambara, Y. Koike, M. Sugiyama, Application of covariate shift adaption techniques in brain–computer interfaces, IEEE Trans. Biomed. Eng. 57 (6) (2010) 1318–1324. [21] M. Arvaneh, C. Guan, K. Ang, C. Quek, Optimizing spatial filters by minimizing within-class dissimilarities in electroencephalogram-based brain–computer interface, IEEE Trans. Neural Netw. Learn. Syst. 24 (4) (2013) 610–618. [22] H. Kang, S. Choi, Bayesian multi-subject common spatial patterns with Indian buffet process priors, in: Proceedings of the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013, pp. 3347–3351. [23] C. Vidaurre, M. Kawanabe, P. von Bünau, B. Blankertz, K.R. Müller, Toward unsupervised adaptation of LDA for brain computer interfaces, IEEE Trans. Biomed. Eng. 58 (3) (2011) 587–597. [24] C. Vidaurre, A. Schlögl, R. Cabeza, R. Scherer, G. Pfurtscheller, Study of online adaptive discriminant analysis for EEG-based brain computer interfaces, IEEE Trans. Biomed. Eng. 54 (3) (2007) 550–556. [25] C. Gouy-Pailler, M. Congedo, C. Brunner, C. Jutten, G. Pfurtscheller, Nonstationary brain source separation for multiclass motor imagery, IEEE Trans. Biomed. Eng. 57 (2) (2010) 469–478. [26] M. Alamgir, M. Grosse-Wentrup, Y. Altun, Multitask learning for brain– computer interfaces, in: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010, pp. 17–24. [27] S. Fazli, F. Popescu, M. Danóczy, B. Blankertz, K.-R. Müller, C. Grozea, Subjectindependent mental state classification in single trials, Neural Netw. 22 (2009) 1305–1312.

[28] R. Tomioka, K.-R. Müller, A regularized discriminative framework for EEG analysis with application to brain–computer interface, NeuroImage 49 (2010) 415–432. [29] B. Blankertz, S. Lemm, M. Treder, S. Haufe, K.-R. Müller, Single-trial analysis and classification of ERP components—a tutorial, NeuroImage 56 (2011) 814–825. [30] M. Sugiyama, M. Krauledat, K.-R. Müller, Covariate shift adaptation by importance weighted cross validation, J. Mach. Learn. Res. 8 (2007) 985–1005. [31] P.-J. Kindermans, D. Verstraeten, B. Schrauwen, A Bayesian model for exploiting application constraints to enable unsupervised training of a P300-based BCI, PLoS One 7 (2012) e33758. [32] P.-J. Kindermans, M. Schreuder, B. Schrauwen, K.-R. Müller, M. Tangermann, True zero-training brain–computer interfacing - an online study, PLoS One 9 (7) (2014) e102504. [33] J. Höhn, E. Holz, P. Staiger-Sälzer, K.-R. Müller, A. Kübler, M. Tangermann, Motor imagery for severely motor-impaired patients: evidence for brain– computer interfacing as superior control solution, PLoS One 9 (8) (2014) e104854. [34] S. Sun, Semi-supervised feature extraction for EEG classification, Pattern Anal. Appl. 16 (2013) 213–222. [35] P. von Bünau, F. Meinecke, F. Kira, K.-R. Müller, Finding stationary subspaces in multivariate time series, Phys. Rev. Lett. 103 (2009) 214101. [36] W. Wojcikiewicz, C. Vidaurre, M. Kawanabe, Improving classification performance of bcis by using stationary common spatial patterns and unsupervised bias adaptation, in: Lecture Notes in Computer Science, vol. 6679, 2011, pp. 34–41. [37] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, K.-R. Müller, Optimizing spatial filters for robust EEG single-trial analysis, IEEE Signal Process. Mag. 25 (1) (2008) 41–56. [38] C. Chen, W. Song, J. Zhang, Z. Hu, H. Xu, An adaptive feature extraction method for motor-imagery BCI systems, in: Proceedings of International Conference on Computational Intelligence and Security, 2010, pp. 275–279. [39] H. Lu, K. Plataniotis, A. Venetsanopoulos, Regularized common spatial patterns with generic learning for EEG signal classification, in: Proceedings of IEEE International EMBC Conference, 2009, pp. 6599–6602. [40] R. Tomioka, J. Hill, B. Blankertz, K. Aihara, Adapting spatial filtering methods for nonstationary BCIs, in: Proceedings of 2006 Workshop on InformationBased Induction Sciences, 2006, pp. 65–70. [41] Q. Zhao, L. Zhang, A. Cichocki, J. Li, Incremental common spatial pattern algorithm for BCI, in: Proceedings of International Joint Conference on Neural Networks, 2008, pp. 2657–2660. [42] J. Müller-Gerking, G. Pfurtscheller, H. Flyvbjerg, Designing optimal spatial filters for single-trial EEG classification in a movement task, Clin. Neurophysiol. 110 (1999) 787–798. [43] H. Ramoser, J. Müller-Gerking, G. Pfurtscheller, Optimal spatial filtering of single trial EEG during imagined hand movement, IEEE Trans. Rehabil. Eng. 8 (4) (2000) 441–446. [44] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B 39 (1) (1977) 1–38. [45] J.A. Bilmes, A Gentle Tutorial on the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, Technical Report, Technical Report TR-97-021, International Computer Science Institute, 1998. [46] B. Blankertz, K.-R. Müller, et al., The BCI competition III: validating alternative approaches to actual BCI problems, IEEE Trans. Neural Syst. Rehabil. Eng. 14 (2006) 153–159. [47] M. Tangermann, K.-R. Müller, et al., Review of the BCI competition IV, Front. Neurosci. 6 (55) (2012) 1–31. [48] A. Schlögl, C. Keinrath, D. Zimmermann, R. Scherer, R. Leeb, G. Pfurtscheller, A fully automated correction method of EOG artifacts in EEG recordings, Clin. Neurophys. 118 (1) (2007) 98–104. [49] G. Pfurtscheller, C. Neuper, Motor imagery and direct brain–computer communication, Proc. IEEE 89 (7) (2001) 1123–1134. [50] J. Cohen, A coefficient of agreement for nominal scales, Educ. Psychol. Meas. 20 (1960) 37–46. [51] S. Sun, The extreme energy ratio criterion for EEG feature extraction, in: Lecture Notes in Computer Science, vol. 5164, 2008, pp. 919–928. [52] M. Sugiyama, T. Suzuki, S. Nakajima, H. Kashima, P. von Bünau, M. Kawanabe, Direct importance estimation for covariate shift adaptation, Ann. Inst. Stat. Math. 60 (4) (2008) 699–746. [53] K. Ang, Z. Chin, H. Zhang, C. Guan, Filter bank common spatial pattern FBCSP algorithm using online adaptive and semi-supervised learning, in: Proceedings of IEEE 2011 International Joint Conference on Neural Networks, 2011, pp. 392–396. [54] K. Ang, Z. Chin, C. Wang, C. Guan, H. Zhang, Filter bank common spatial pattern algorithm on BCI competition IV datasets 2a and 2b, Front. Neurosci. 6 (39) (2012) 1–9.

Motor Imagery Classification Using Mu and Beta Rhythms of EEG with Strong Uncorrelating Transform Based Complex Common Spatial Patterns.

Local temporal correlation common spatial patterns for single trial EEG classification during motor imagery.

Improving robustness against electrode shift of high density EMG for myoelectric control through common spatial patterns.

Major Depression Detection from EEG Signals Using Kernel Eigen-Filter-Bank Common Spatial Patterns.

Improving cellulase production by Aspergillus niger using adaptive evolution.

SSVEP recognition using common feature analysis in brain-computer interface.

Improving EEG-Based Emotion Classification Using Conditional Transfer Learning.

Spectral-spatial classification for noninvasive cancer detection using hyperspectral imaging.

Unification and classification of two-dimensional crystalline patterns using orbifolds.

A Heat Vulnerability Index: Spatial Patterns of Exposure, Sensitivity and Adaptive Capacity for Santiago de Chile.

Optimization of Classification Strategies of Acetowhite Temporal Patterns towards Improving Diagnostic Performance of Colposcopy.

Adaptive and non-adaptive divergence in a common landscape.

Common Patterns of Congenital Lower Extremity Shortening: Diagnosis, Classification, and Follow-up.

Spatial Competition: Roughening of an Experimental Interface.

Robust Brain-Machine Interface Design Using Optimal Feedback Control Modeling and Adaptive Point Process Filtering.

Adaptive data embedding framework for multiclass classification.

Adaptive multiclass classification for brain computer interfaces.

Quadcopter flight control using a low-cost hybrid interface with EEG-based classification and eye tracking.

EEG Classification for Hybrid Brain-Computer Interface Using a Tensor Based Multiclass Multimodal Analysis Scheme.

Classification of diabetes maculopathy images using data-adaptive neuro-fuzzy inference classifier.

Hand movements classification for myoelectric control system using adaptive resonance theory.

Multiclass Classification by Adaptive Network of Dendritic Neurons with Binary Synapses Using Structural Plasticity.

EEG-Based BCI System Using Adaptive Features Extraction and Classification Procedures.

GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research.