Artificial Intelligence in Medicine 62 (2014) 165–177

Contents lists available at ScienceDirect

Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim

Transductive domain adaptive learning for epileptic electroencephalogram recognition Changjian Yang a , Zhaohong Deng a,b,∗∗ , Kup-Sze Choi c,∗ , Yizhang Jiang a , Shitong Wang a a b c

School of Digital Media, Jiangnan University, 1800 Lihu Avenue, Wuxi, Jiangsu Province 214122, PR China Department of Biomedical Engineering, University of California, Davis, One Shields Avenue, Davis, CA 95616-5270, USA Centre for Smart Health, School of Nursing, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

a r t i c l e

i n f o

Article history: Received 12 November 2013 Received in revised form 15 August 2014 Accepted 8 October 2014 Keywords: Transfer learning Wavelet packet decomposition Short time Fourier transform Kernel principal component analysis Electroencephalogram Epilepsy detection

a b s t r a c t Objective: Intelligent recognition of electroencephalogram (EEG) signals is an important means for epilepsy detection. Almost all conventional intelligent recognition methods assume that the training and testing data of EEG signals have identical distribution. However, this assumption may indeed be invalid for practical applications due to differences in distributions between the training and testing data, making the conventional epilepsy detection algorithms not feasible under such situations. In order to overcome this problem, we proposed a transfer-learning-based intelligent recognition method for epilepsy detection. Methods: We used the large-margin-projected transductive support vector machine method (LMPROJ) to learn the useful knowledge between the training domain and testing domain by calculating the maximal mean discrepancy. The method can effectively learn a model for the testing data with training data of different distributions, thereby relaxing the constraint that the data distribution in the training and testing samples should be identical. Results: The experimental validation is performed over six datasets of electroencephalogram signals with three feature extraction methods. The proposed LMPROJ-based transfer learning method was compared with five conventional classification methods. For the datasets with identical distribution, the performance of these six classification methods was comparable. They all could achieve an accuracy of 90%. However, the LMPROJ method obviously outperformed the five conventional methods for experimental datasets with different distribution between the training and test data. Regardless of the feature extraction method applied, the mean classification accuracy of the proposed method is above 93%, which is greater than that of the other five methods with statistical significance. Conclusion: The proposed transfer-learning-based method has better classification accuracy and adaptability than the conventional methods in classifying EEG signals for epilepsy detection. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Epilepsy is one of the most common diseases in human brain that affects a population of 50 million people in the world [1]. It is a transient brain dysfunction phenomenon caused by abnormal brain neurons. There has been increasing interest in the use of intelligent recognition technology for the detection of epilepsy based on electroencephalogram (EEG) signals. It has become an

∗ Corresponding author. Tel.: +852 3400 3214; fax: +852 2364 9663. ∗∗ Co-corresponding author at: School of Digital Media, Jiangnan University, 1800 Lihu Avenue, Wuxi, Jiangsu Province 214122, PR China. Tel.: +86 13771571629. E-mail addresses: [email protected] (Z. Deng), [email protected] (K.-S. Choi). http://dx.doi.org/10.1016/j.artmed.2014.10.002 0933-3657/© 2014 Elsevier B.V. All rights reserved.

important means for epilepsy detection since many physiological and pathological information in brain can be obtained from the EEG signals. At present, a variety of intelligent recognition methods [2–7] have been applied to EEG signal identification, including decision tree algorithm (DT) [6], naïve Bayes algorithm (NB) [4,6], support vector machine algorithm (SVM) [3], nearest-mean algorithm (NM) [4], linear discriminant analysis method (LDA) [2,5,7]. Although these existing methods have demonstrated both effectiveness and feasibility for epilepsy detection, they all face a critical challenge – the effectiveness of these methods are dependent on the premise that the training data and testing data are drawn from samples of identical distribution. However, this assumption cannot be satisfied in many practical application scenarios. For instance, when drifting exists between the distributions of the training

166

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

and testing EEG samples, the performance of the conventional intelligent recognition methods would deteriorate significantly. In fact, the distributions of EEG data collected from both epileptic patients and normal people may vary from time to time depending on multiple factors, such as health status, drug actions and the timing of EEG measurements. Typically, the EEG signals collected can be divided into three classes, including (1) Class 1: EEG signals obtained from healthy people in normal conditions; (2) Class 2: EEG signals derived from epileptic patients during seizure free interval (preictal); and (3) Class 3: EEG signals obtained from epileptic patients during seizure (ictal). The data distribution characteristics of these three classes of signals are somewhat independent. Generally, the conventional recognition methods use the labeled data from the first two classes of signals to build a classification model. If this model is used to identify Class 3 EEG signals, the identification accuracies will decline greatly, and the existing intelligent recognition methods would no longer be appropriate for handling such situations. Hence, an epileptic EEG identification method that is more adaptable to differences in EEG data distributions is needed to meet the challenge. In this paper, a novel epileptic EEG signal recognition method based on transfer learning is proposed. The method is advantageous in that it allows for differences in distributions between the training and testing data, which greatly improves the adaptation ability of epileptic EEG signal identification. The rest of this paper is organized as follows. In Section 2, classical feature extraction and recognition methods on EEG signals for epilepsy detection are reviewed briefly. In Section 3, the EEG signal recognition classifier based on transfer learning method is proposed. The experimental studies are reported in Section 4, and the conclusion is given in the last section. 2. Related work 2.1. EEG signals processing methods Identification of epileptic EEG signals is generally divided into two steps. First, feature extraction methods are employed to extract useful features from the original EEG signals. A classifier is then obtained by using the EEG training data, which is subsequently used to classify the testing data of epileptic EEG signals. This section provides a brief review of several classical feature extraction and classification methods that have been widely applied in EEG signals recognition for epilepsy detection. 2.1.1. Feature extraction methods The major feature extraction methods used for epileptic EEG signals can be divided into three types, namely (1) time-domain analysis, (2) frequency-domain analysis and (3) time-frequency analysis. In time-domain analysis, features are extracted by analyzing the waveform parameters of EEG signals, such as average value of waveform, wave amplitude and wave variance. Spike wave, sharp wave and slow wave in EEG signals can all be reflected in the time domain [10]. For example, Litt et al. used time-domain analysis method to obtain useful features in EEG signals and identified regular pattern of epilepsy seizure [8]. In frequency-domain analysis, features in EEG signals are analyzed in the frequency domain. Power spectrum analysis of EEG signals is performed to transform changes in EEG signal amplitude into changes in EEG signal power so that the variations in brain waves at different frequencies can be directly observed. Feature extraction method for EEG signals based on FFT and principal component analysis has been proposed [9]. Time-frequency analysis is based on the fact that EEG signals do not only contain distinctive features in the time domain but also energy distribution characteristics in the frequency domain.

According to the known characteristics of epileptic waveforms, the typical frequency ranges of slow waves, sharp waves and spike waves are 1–2.5 Hz, 5–12.5 Hz, and 13.5–50 Hz, respectively [10]. Wavelet transform method can be used to convert the EEG signals and exactly split them into waves of these three frequency bands. This method provides a convenient way to extract features from the EEG signals and thereby making it a widely used approach for epilepsy detection. Susana et al. applied EEG time-frequency analysis to identify the characteristics in the frequency bands of EEG signal and studied the dynamic changes and time evolution [11]. Besides, a modified fast wavelet transform method was proposed to achieve high computational speed and improve accuracy [12]. 2.1.2. Classification methods Many intelligent classification methods have been applied for the identification of epileptic EEG signals since 1990s [13–20]. A brief introduction of the commonly used methods is given as follows. (1) Support vector machine: SVM is an effective tool for solving pattern recognition and function estimation problems [3]. It is particularly useful for classification involving small size and high dimension datasets, and has been widely used in epileptic EEG intelligent detection [13]. (2) Decision tree: Decision tree and rules for classification are generated by processing the training data using induction method. The testing data are then classified by using the obtained decision tree and the rules. Decision tree classifiers have been used for the recognition of EEG signals with features extracted by Fast Fourier Transform [14]. (3) Naïve Bayes algorithm: The algorithm is derived from the Bayes theorem in probability theory. Automatic detection of spike waves between epileptic periods has been realized by using a data mining model based on NB [15]. (4) Linear discriminant analysis: While LDA is a classical and widely used feature extraction method, it can be further exploited for classification [16]. For EEG signals, LDA has been used for feature extraction and identification by employing the extracted features as new features for classification [17]. (5) Nearest mean algorithm: NM is a well-known algorithm that has been used in classification problems involving high dimensional data [18,19]. It has been successfully applied for the detection of epileptic EEG spikes [20]. 2.2. Challenges for the conventional methods The existing intelligent recognition methods described above have achieved success in applications concerning epileptic EEG detection, demonstrating a high level of classification accuracy and validity. However, all these methods are based on the same assumption that the training data and testing data are originated from samples of identical distribution. When the training and testing datasets are acquired from different, yet related distributions, the performance of these conventional methods would degrade significantly. In order to tackle this challenge, this paper proposes a more adaptive identification algorithm for epileptic EEG signals based on transfer learning technology. 3. Epileptic EEG signal recognition based on transfer learning 3.1. Transfer learning technology Conventional classification methods employ a large amount of labeled training data to obtain a decision function which is then applied to categorize the unlabeled test samples. These classification methods are all based on the premise that the training data and testing data have the same distribution. When the distribution characteristics of EEG signal samples do not meet this requirement, satisfactory classification results could not be achieved with the conventional methods. In recent years, transfer learning is being

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

developed as a promising research direction to tackle this issue [21,22]. Transfer learning methods focus on the investigation of knowledge transfer problems in related fields/domains (similar but different) by capitalizing the useful knowledge in a domain (usually referred to as the source domain) to enhance the classification performance of the classifier in another domain (usually referred to as the target domain). Depending on whether the target domain contains labeled samples or not, transfer learning techniques can be divided into two categories: inductive transfer learning method and transductive transfer learning method [23]. Inductive transfer learning method is mainly used in scenes where large amount of labeled data are available in the source domain while a few labeled data in the target domain. On the other hand, transductive transfer learning method is mainly applied to scenes where labeled data are only available in the source domain and the target domain contains unlabeled data. Clearly, transductive transfer learning method can be applied to a wider variety of problems when compared to inductive method. In this paper, we aim to identify epileptic EEG signals without any labeled data in the target domain and therefore the transductive transfer learning method is adopted for epileptic EEG signals recognition. Among the existing transductive transfer methods, the large-margin-projected transductive support vector machine (LMPROJ) is simple and easy to implement, and more importantly, its performance is superior to most of the other existing transductive transfer learning methods. The LMPROJ method makes use of the maximal mean discrepancy between the training domain and test domain to learn the useful knowledge, along with the incorporation of the large margin learning mechanism. In this paper, we will investigate automatic epilepsy detection based on EEG signals using the LMPROJ-based transductive learning method. 3.2. LMPROJ LMPROJ is a transductive learning method based on large margin mechanism in the feature space [24]. In order to achieve a desired feature transformation for the classification task in the target domain, LMPROJ computes the minimum distribution distance between the projected training data in source domain and the projected testing data in target domain by using maximum mean discrepancy (MMD) and minimum regularization risk. First, we briefly introduce the formulation of SVM which is the basis of the LMPROJ method. Given a dataset x1 , y 1 ), . . ., ( xn , y n ) ∈ X × {±1}, SVM attempts to obtain a hyperplane f dividing the given training dataset into two categories. It has been shown that a hyperplane with large margin would achieve minimum structural risk [25]. The corresponding objective function for SVM training is defined as f = arg minC f ∈HK

n 

V (xi , yi , f ) +

1 ||f ||2K 2

 2  n  n m     1 1 1   MMD2 =  (x ) − (z ) = K(xj , xk ) j k  n m n2  j=1  k=1 j,k=1 +

2  1  K(z j , z k ) − K(xj , z k ) 2 nm m

 1 ||w||2 + C εj 2 j=1

s.t. εj ≥ 0 yj (w T (xj ) + b) ≥ 1 − εj

m

n,m

j,k=1

j,k=1

(3)

where K(·) is a kernel function. The goal of the MMD-mechanism-based transfer learning method is to obtain a projected vector w that minimizes the distance between two distributions in the projected space, while at the same time maximizing the classification performance of the classifier for the training data. Thus, the general formula of the objective decision function for transfer learning method based on MMD can be formulated as f = arg min C

n 

2

V (xk , yk , w) +

1 2

 f  + df,k (P , P  ) 

k=1

(4)

K

Compared with Eq. (1), an additional term – the distance mea ) – is introduced into Eq. sure of the two distributions df,k (P , P (4), where P represents the distribution of the training samples,  is the distribution for the testing samples.  is a coefficient and P controlling the influence of the distribution distance term in Eq. (4). Based on the general framework above, for LMPROJ, the MMD dis ) [28]. Given a training tance has been used to measure df,k (P , P dataset Ds = {{ x1 , y1 } , . . ., { xn , yn }} and a testing set Ds = { z 1 , . . ., z n }, the LMPROJ method estimates the distribution distance under a given projection w with the MMD given as follows.

 2  n  m     1 1  2  df,k (P , P ) =  f (xj ) − f (z k )  n m  j=1  k=1

 2  n  m  1 T  1 T  = (w (xj ) + b) − (w (zk ) + b)  n m  j  k ⎛ 1 = 2⎝ n

i=1

n

min

where ( xj ) is a kernel feature vector of xj in the Reproducing Kernel Hilbert Spaces (RKHS); εj are the slack variables of the training sample and b is the bias of the linear hyperplane. The LMPROJ method estimates the projected distance between the two distributions of the source domain and the target domain for transfer learning in RKHS based on the MMD measure [27]. Given a set of n training samples Ds = {{ x1 , y1 } , . . ., { xn , yn }} and a set of m testing samples Ds = { z 1 , . . ., z n }, the squared MMD distance between the distributions of the two domains is expressed as follows:

(1)

where HK is a kernel mapping space; ||f|| is the L2 norm of the decision function f [26]; V is a function measuring the risk in terms of the predicted results for the training data; C is a regularization coefficient balancing the two terms in the objective function. When the decision function f is a linear function expressed by the projected vector w and the bias b, the optimization objective for SVM can be represented as:

167

n 

⎞2 w (xj )⎠ T

j=1

1 + 2 m

 m 

T

w (z k )

k

2  T w (xj )w T (z k ) nm n,m



(5)

j,k=1

Based on the objective function in Eq. (4) and the distance measure in Eq. (5), the final objective function of LMPROJ can be expressed as

 1  2 w2 + C εj + df,k (P , P ) 2 n

(2)

min

s.t. εj ≥ 0, yj

j=1 (w T (x

j ) + b)

≥ 1 − εj ,

(6)

∀j = 1, . . ., n

168

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

where the vector w is a projected vector in RKHS. According to the representer theorem [29], the projected vector w can be rewritten as

 m+n

w=

Table 1 Description of datasets. People Healthy subjects

ˇi ϕ(S)i = ϕ(S)ˇ,

(7)

Group

Size of group

Description of datasets

A

100

B

100

EEG signals measured from healthy people with eyes open EEG signals measured from healthy people with eyes closed

C

100

D

100

E

100

i=1

where ϕ(S) is a matrix containing both the training samples and testing samples, i.e., ϕ(S) = [( s1 ), . . ., ( sn+m )] = [( x1 ), . . ., ( xn ), ( z 1 ), . . ., ( z m )]. Thus the projected distance between the two distributions can be further expressed as 2 df,k (P , P )

Epileptic subjects

1 T 1 T T T = 2 ˇ KTrain [1]n×n KTrain ˇ + 2 ˇ KTest [1]m×m KTest ˇ n m −

1 T T T ˇ (KTrain [1]n×m KTest + KTest [1]m×n KTrain )ˇ mn

EEG signals obtained in hippocampal formation of the opposite hemisphere of brain during seizure free intervals EEG signals obtained from within epileptogenic zone during seizure free intervals EEG signals measured during seizure activity

(8) where KTrain is a (m + n) × n kernel matrix of the training data; KTest is a (m + n) × m kernel matrix for the testing data, and [1]n×m is a (m + n) × (m + n) matrix of unity elements. Furthermore, let

Then, the dual problem in Eq. (11) can be transformed into the following quadratic programming problem by substituting Eqs. (13a)–(13c) into Eq. (12).

˛

1 T T − (K [1]n×m KTest + KTest [1]m×n KTrain ) mn Train

(9)

 1  ˛j ˛k yj yk K j ( + ˝)K k + ˛i 2 N

N

arg max −

1 1 T T ˝ = 2 KTrain [1]n×n KTrain + 2 KTest [1]m×m KTest n m

N

j=1 k=1 N

s.t. 0 ≤ ˛i ≤ C,



j=1

(14)

˛i yi = 0

j=1  )2 As a (n + m) × (n + m) positive semi-definite matrix, df,k (P , P can be simplified as  df,k (P , P ) = ˇT ˝ˇ 2

(10)

Finally, LMPROJ can be expressed as the quadratic programming problem below: min ˇ

T

1 2

+

1 2

˝ ˇ+C

n 

εj

i=1

(11)

s.t. εj ≥ 0 T

yj (ˇ Kj + b) ≥ 1 − εj ,

∀j = 1, . . ., n

where  = ϕ(S)T ϕ(S) is a (m + n) × (m + n) kernel matrix and Kj = ϕ(S)T ( xi ) is a (m + n) column vector. According to optimization theory, the Lagrange function of Eq. (11) is given by L(ˇ, ε, i , ˛i ) = ˇ

T

1 2

+



 1 εj ˝ ˇ +C 2 N

j=1

+

N 

˛i (1 − εi − yi ˇT K j − yi b) +

i=1

N 

i εi

(12)

i=1

The necessary conditions for optimal solution are as follows:

 ∂L ˛j yj K j = 0 = ( + ˝)ˇ − ∂ˇ

(13a)

∂L = C − ˛i − i = 0 ∂εj

(13b)

∂L  ˛i yi = 0 = ∂b

(13c)

N

j=1

N

j=1

Finally, the optimal solution for the original optimization problem in Eq. (11) is given by ˇ=

N 

˛∗j yj ( + ˝)

−1

Kj

(15)

j=1

where ˛∗j is the optimal solution of the dual problem in Eq. (14). In summary, LMPROJ aims to obtain a kernel classifier by leveraging transfer learning based on the maximum margin and MMD distribution measure between the source domain and the target domain [29]. By using the transfer leaning method LMPROJ discussed in this section, an epileptic EEG signal recognition algorithm is presented in the paper. The algorithm is described in Fig. 1. 4. Experiments 4.1. Data and experiments 4.1.1. Data source The EEG data used in this study are publicly available on the web from the University of Bonn, Germany (http://www.meb.unibonn.de/epileptologie/science/physik/eegdata.html). The complete data archive contains five groups of data (denoted by groups A to E), each containing 100 single channel EEG segments of 23.6 s duration. The sampling rate of all datasets was 173.6 Hz. Groups A and B consist of segments acquired from surface EEG recording performed on five healthy volunteer subjects using standardized electrode placement scheme. Recording was made when the subjects were relaxed in awaken state with eyes open (group A) and eyes closed (group B), respectively. Groups C, D and E are data obtained from volunteer subjects with epilepsy. EEG signals in group C were recorded from the hippocampal formation of the opposite hemisphere of brain, while those in group D were measured from within the epileptogenic zone. The data in both groups C and D were measured during seizure free intervals. Group E contains EEG signals recorded during seizure activity. Table 1 gives a detailed description of the five sets of data. Fig. 2 shows the typical signals in each group. Further details about the setting

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

169

Fig. 1. The LMPROJ-based epilepsy detection algorithm.

of the EEG signal measurement can be found in [30]. Since each group of data contains 100 EEG signals and it is hard to visualize their characteristics all together, only one typical signal in each group is shown to facilitate intuitive observation of the differences in the signals among the five groups of data. 4.1.2. Experimental datasets We constructed two types of experimental datasets to compare the performance and effectiveness of different conventional classification algorithms and the proposed LMPROJ-based transfer learning method. The first type of experimental datasets consists of training and testing data drawn from the same distribution, while the second type was constructed such that the distributions of training and testing data were different. In each experimental dataset, all the training and testing datasets were constructed by mixing the five groups of EEG signals acquired from the University of Bonn at different proportions. As shown in Table 2, six experimental datasets were constructed for performance evaluation, including (1) two datasets with the identical distributions between the target domain and source domain; (2) two datasets with different distributions between the two domains for binary classification; (3) two datasets with different

Fig. 2. Typical EEG signals in groups A to E.

distributions between the two domains for multi-class classification. In particular, Table 2 presents the difference in means and standard deviations between the training and testing datasets for all the six experimental datasets, where the difference was evaluated using the Euclidean metric. We can observe that the Euclidean metric of standard deviations of datasets 1 and 2 were much smaller than that of datasets 3–6, demonstrating that the training and testing data in dataset 1 and 2 had very similar distribution while obvious drifting exists between the distributions of the training and testing data in datasets 3–6. For binary classification, the positive class refers to healthy subjects and the negative class refers to epileptic subjects. For multi-class classification, the classification task is to identify three different classes, i.e., health subjects, epileptic subjects in the preictal state and ictal state. For the datasets with identical distribution between the target domain and source domain, the classical cross validation strategy was used. However, for the datasets with the different distribution between the two domains, i.e., the training domain and test domain in the transductive learning scene, the classical strategy was not directly applicable and a cross-validation-like strategy was adopted instead, which is illustrated as follows by taking dataset 5 as an example. The cross-validation-like strategy involved two steps. First, the training data were sampled from groups A, C and E, while the test data were sampled from groups B, C and E. The number of samples is given in Table 2. The sampling method will ensure the training and test data in dataset 5 have different distribution. Different classifiers are then trained and tested with these data. Next, the samples in the training and test data are swapped, and the classifiers are trained and tested again. Hence, all the sampled data in dataset 5 are tested after performing these two steps. The two-step procedure described above is therefore similar to the traditional twofold cross validation. With the classification results obtained, the classification accuracy on dataset 5 can be computed. Finally, since the data used in dataset 5 were only a subset sampled from the original groups (A, B, C, E), in order to make the experimental results more reliable from statistical sense, the above procedure was repeated 20 times. For performance comparison, the mean and standard deviation (S.D.) of the classification accuracies were reported, along with the p-values of the t-tests which were calculated based on the classification results obtained by the

170

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

Table 2 Composition of the experimental datasets. Distribution characteristics

Datasets

Training dataset

||mc − mr ||

Test dataset

|| c − r || Mean

Same distribution

1 2

A, E – each 75 A, B, E – each 75

A, E – each 25 A, B, E – each 25

37.34 25.59

54.03 42.21

Different distribution

3 4

A, E – each 25 A, E – each 25

A, C – each 25 A, D – each 25

29.68 29.32

191.76 181.28

Different distribution

5 6

A, C, E – each 25 A, D, E – each 25

B, C, E – each 25 B, D, E – each 25

30.3 30.13

185.9 176.56

mc and mr denote the means of data in the training set and test set, respectively; ␴c and ␴r denote the vector of standard deviations of data in the training set and test set, respectively.

proposed LMPROJ-based method and other traditional methods respectively.

4.1.3. Feature extraction In the paper, feature extraction was conducted by using wavelet packet decomposition (WPD), short time Fourier transform (STFT) and kernel principal component analysis (KPCA), respectively. The EEG signals processed by feature extraction were then used to train and test different classifiers. Figs. 3–5 show the extracted features of EEG signals in group A by WPD, STFT and KPCA respectively. For WPD, the Daubechies4 (db4) wavelet coefficients were used to decompose the original EEG signals into a series of binary wavelets. The signals were split into six bands in the frequency domain. Table 3 shows the frequency bands corresponding to db4 wavelet coefficients. The decomposed EEG signals of group A is shown in Fig. 3. To perform STFT feature extraction on the EEG signals, a small sliding window was used for the Fourier transform. The spectrogram was computed with a 1 s hamming window for every 0.5 s, which is a commonly adopted approach in EEG signal processing systems. Given a continuous EEG signal x(t), a function of limited

Table 3 The bands of wavelet packet decomposition. Wavelet coefficient

Frequency band (Hz)

db(4,0) db(4,5) db(4,4) db(4,3) db(4,2) db(4,1)

0–2 2–4 4–8 8–15 16–30 31–60

width window g(t) and the center of a small window u, STFT can be computed by





FSTFT (t, f ) =

x(t)g ∗ (t − u)e−j2 ft dt

where F is a transformation function that maps the EEG signals into the time-frequency plane. The process of performing STFT on the EEG signals is as follows. Firstly, the STFT method divides EEG signals into different local stationary signal segments. A group of spectra of local signals is then obtained through Fourier transform, and we can see the time-varying characteristics of the signals with

45 Band1 Band2 Band3 Band4 Band5 Band6

40

35

Energy values

30

25

20

15

10

5

0

0

10

20

30

(16)

−∞

40 50 60 The sequence number of samples

70

80

Fig. 3. The extracted features of EEG signal in group A by WPD.

90

100

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

171

35

30

Energy values

25

20

15

10 Band1 Band2 Band3 Band4 Band5 Band6

5

0

0

10

20

30

40 50 60 The sequence number of samples

70

80

90

100

Fig. 4. The extracted features of EEG signal in group A by STFT.

discrepancy in local spectrum at different times. Finally, the energy of the EEG signals is divided into five frequency bands (Delta 0–4 Hz, Theta 4–8 Hz, Alpha 8–15 Hz, Beta 15–30 Hz, Gamma 30–60 Hz) using the transformation function in Eq. (16), which are the features derived. As an example, the extracted EEG signals of group A are shown in Fig. 4. In addition to these two commonly used EEG signal feature extraction methods, a nonlinear feature extraction method, i.e., the kernel principal component analysis (KPCA), was also adopted to extract features from the EEG signals in the experiments [31]. KPCA

can realize complicated nonlinear mapping and has demonstrated better feature extraction abilities for EEG signals [32]. The features extracted by KPCA for the EEG signals of group A is shown in Fig. 5.

4.1.4. Classification After performing feature extraction on the original EEG signals with STFT, WPD and KPCA, respectively, six classification methods, namely, SVM [13], LDA [16,17], DT [14], NB [15], NM [18–20] and LMPROJ [24] were adopted to train and test all the datasets.

0.5

KPC1 KPC2 KPC3 KPC4 KPC5 KPC6

0.4

0.3

Feature values

0.2

0.1

0

-0.1

-0.2

-0.3

-0.4

0

10

20

30

40 50 60 The sequence number of samples

70

80

Fig. 5. The extracted features of EEG signal in group A by KPCA.

90

100

172

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

Table 4 Performance comparison of six classifiers on non-normalized datasets based on WPD feature extraction. Experimental datasets

LDA

DT

NB

NM

SVM

LMPROJ

1

Means (S.D.) p-Valuea

0.922 (0.012) 2.5e−08(+)

0.887 (0.01) 9.3e−07(+)

0.851 (0.006) 3.5e−06(+)

0.798 (0.008) 2.87e−06(+)

0.918 (0.004) 7.8e−05(+)

0.937 (0.004) –

2

Means (S.D.) p-Value

0.927 (0.087) 2.8e−04(+)

0.936 (0.008) 2.9e−04(−)

0.922 (0.012) 1.7e−05(+)

0.833 (0.02) 2.2e−05(+)

0.92 (0.001) 4.4e−05(+)

0.936 (0.0017) –

3

Means (S.D.) p-Value

0.544 (0.007) 4.2e−10(+)

0.825 (0.02) 0.01(+)

0.527 (0.007) 9.1e−06(+)

0.425 (0.004) 1.76e−06(+)

0.824 (0.06) 0.0078(+)

0.941 (0.011) –

4

Means (S.D.) p-Value

0.561 (0.009) 2.9e−08(+)

0.808 (0.02) 0.007(+)

0.573 (0.008) 1.1e−05(+)

0.469 (0.01) 1.49e−05(+)

0.816 (0.044) 0.0043(+)

0.945 (0.008) –

5

Means (S.D.) p-Value

0.772 (0.006) 1.5e−010(+)

0.802 (0.018) 0.0014(+)

0.559 (0.005) 1.6e−05(+)

0.515 (0.008) 6.39e−06(+)

0.807 (0.03) 9.3e−04(+)

0.925 (0.008) –

6

Means (S.D.) p-Value

0.739 (0.01) 9.6e−09(+)

0.778 (0.07) 0.0039(+)

0.622 (0.026) 7.4e−05(+)

0.535 (0.009) 3.16e−06(+)

0.81 (0.06) 0.0011(+)

0.945 (0.005) –

a The superscripts (+) and (−) denote that the LMPROJ method is better or worse than the method under comparison based on t-test results. The smaller the p-value, the more significant the difference of the average. A p-value of 0.05 is considered statistically significant.

Table 5 Performance comparison of six classifiers on non-normalized datasets based on STFT feature extraction. Experimental datasets

LDA

DT

NB

NM

SVM

LMPROJ

1

Means (S.D.) p-Valuea

0.983 (0.036) 0.088(−)

0.994 (0.003) 0.391(−)

0.947 (0.005) 0.052(+)

0.990 (0.019) 0.014(−)

0.988 (0.003) 0.014(−)

0.977 (0.0014) –

2

Means (S.D.) p-Value

0.981 (0) 1.57e−08(+)

0.951 (0.02) 7.65e−05(+)

0.933 (0.017) 2.64e−05(+)

0.890 (0.036) 2.3e−04(+)

0.994 (0.002) 0.637(−)

0.994 (0.0013) –

3

Means (S.D.) p-Value

0.536 (0.03) 7.6e−06(+)

0.62 (0.09) 0.004(+)

0.53 (0.022) 8.3e−06(+)

0.495 (0.003) 1.5e−06(+)

0.51 (0.015) 8.3e−06(+)

0.945 (0.008) –

4

Means (S.D.) p-Value

0.565 (0.015) 2.92e−05(+)

0.675 (0.05) 0.0021(+)

0.632 (0.003) 3.3e−05(+)

0.537 (0.007) 2.5e−06(+)

0.562 (0.031) 8.96e−05(+)

0.938 (0.01) –

5

Means (S.D.) p-Value

0.622 (0.02) 4.62e−05(+)

0.476 (0.04) 1.34e−04(+)

0.417 (0.043) 1.46e−04(+)

0.483 (0.02) 1.3e−05(+)

0.864 (0.02) 0.0011(+)

0.952 (0.008) –

6

Means (S.D.) p-Value

0.601 (0.03) 3.78e−05(+)

0.46 (0.02) 3.57e−06(+)

0.375 (0.022) 1.87e−05(+)

0.484 (0.018) 2.97e−05(+)

0.874 (0.054) 0.158(+)

0.949 (0.01) –

a The superscripts (+) and (−) denote that the LMPROJ method is better or worse than the method under comparison based on t-test results. The smaller the p-value, the more significant the difference of the average. A p-value of 0.05 is considered statistically significant.

Table 6 Performance comparison of six classifiers on non-normalized datasets based on KPCA feature extraction. Experimental datasets

LDA

DT

NB

NM

SVM

LMPROJ

1

Means (S.D.) p-Valuea

0.882 (0.058) 0.0094(+)

0.932 (0.017) 0.096(+)

0.82 (0.075) 0.045(+)

0.747 (0.064) 0.006(+)

0.92 (0.02) 0.023(+)

0.958 (0.01) –

2

Means (S.D.) p-Value

0.827 (0.065) 0.006(+)

0.86 (0.026) 0.008(+)

0.682 (0.094) 0.01(+)

0.649 (0.041) 6e−04(+)

0.91 (0.011) 0.024(+)

0.948 (0.004) –

3

Means (S.D.) p-Value

0.741 (0.09) 0.15(+)

0.795 (0.07) 0.01(+)

0.628 (0.05) 0.0023(+)

0.673 (0.11) 0.034(+)

0.79 (0.02) 5.1e−04(+)

0.949 (0.05) –

4

Means (S.D.) p-Value

0.765 (0.09) 0.16(+)

0.944 (0.06) 0.67(+)

0.625 (0.13) 0.02(+)

0.721 (0.12) 0.038(+)

0.82 (0.04) 0.04(+)

0.963 (0.009) –

5

Means (S.D.) p-Value

0.46 (0.061) 2.9e−04(+)

0.739 (0.043) 0.005(+)

0.491 (0.3) 0.048(+)

0.447 (0.06) 3.1e−04(+)

0.710 (0.02) 3.4e−04(+)

0.948 (0.015) –

6

Means (S.D.) p-Value

0.53 (0.035) 3.3e−05(+)

0.711 (0.02) 6.5e−04(+)

0.641 (0.10) 0.023(+)

0.487 (0.04) 1.9e−04(+)

0.721 (0.02) 2.3e−04(+)

0.95 (0.008) –

a The superscripts (+) and (−) denote that the LMPROJ method is better or worse than the method under comparison based on t-test results. The smaller the p-value, the more significant the difference of the average. A p-value of 0.05 is considered statistically significant.

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

173

1

0.9

Classification accuracy

0.8

0.7

0.6

0.5

0.4

0.3

0.2 1

1.5

LDA DT NB NM SVM LMPROJ 2 2.5

3 3.5 4 The number of datasets

4.5

5

5.5

6

Fig. 6. Performance of classification methods on non-normalized datasets using WPD for feature extraction.

Furthermore, the normalization process for the training and test data was also applied in our experiments, where the data were normalized to zero mean and unit variance for each feature. The classification results on the normalized datasets are reported accordingly. 4.2. Results and discussion Experiments were conducted by applying the feature extraction and classification methods described above to the six experimental datasets. The results on the datasets without normalization are shown in Tables 4–6 and Figs. 6–8. Meanwhile, the classification performance of the normalized datasets based on different feature extraction and classification methods are shown in Tables 7–9.

From the experimental results, the following observations can be made:

(1) For experimental datasets 1 and 2, where the training data and testing data have identical distribution, the performance of the five conventional classification methods and the proposed LMPROJ-based transfer learning based method were comparable, as shown by the results on the non-normalized datasets in Tables 4–6 and on the normalized datasets in Tables 7–9. However, the LMPROJ method obviously outperformed the conventional approaches for experimental datasets 3–6, which always gave the best accuracy in the classification of the EEG signals.

1

0.9

Classification accuracy

0.8

0.7

0.6

0.5

0.4

1

LDA DT NB NM SVM LMPROJ 1.5

2

2.5

3 3.5 4 The number of datasets

4.5

5

5.5

6

Fig. 7. Performance of classification methods on non-normalized datasets using STFT for feature extraction.

174

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

1

Classification accuracy

0.9

0.8

0.7

0.6 LDA DT NB NM SVM LMPROJ

0.5

0.4 1

1.5

2

2.5

3 3.5 4 The number of datasets

4.5

5

5.5

6

Fig. 8. Performance of classification methods on non-normalized datasets using KPCA for feature extraction.

(2) When the difference in distributions between the training and testing datasets became more significant, as shown from dataset 1 to dataset 6, the performance of all the conventional methods degraded considerably, as shown by the results on the non-normalized datasets in Tables 4–6 and Figs. 6–8, and on the normalized datasets in Tables 7–9. On the contrary, we can see that the LMPROJ-based transfer learning method was robust against the variation in distribution between the training and testing data. High classification accuracy could still be achieved for all the experimental datasets, regardless of the amount of difference in distribution. (3) In particular, although both SVM and LMPROJ are large margin based methods, LMPROJ is obviously advantageous for datasets where drifting between the distributions of the training and testing data exists. Thus, the finding also validates clearly that

transfer learning is necessary and indispensable for situations of this kind. (4) The accuracy achieved on the normalized datasets has been improved to a certain extent when compared with that on the non-normalized datasets, especially for SVM, where the classification performance increases greatly, even for datasets 3–6 with the data of different distributions. This demonstrates the effectiveness of normalization. However, for the datasets with the different distribution between the training and test data, the classification performance of the previous methods is still sub-optimal and inferior to the proposed method. Overall, it is concluded that the proposed LMPROJ-based transfer learning method is in general more robust and its performance in classifying EEG signals for epilepsy detection is better than

Table 7 Performance comparison of six classifiers on normalized datasets based on WPD feature extraction. Experimental datasets

LDA

DT

NB

NM

SVM

LMPROJ

1

Means (S.D.) p-Valuea

0.95 (0.008) 8.8e−04(−)

0.915 (0.009) 1(−)

0.896 (0.017) 0.085(+)

0.92 (0.015) 0.65(−)

0.922 (0.005) 0.17(−)

0.915 (0.018) –

2

Means (S.D.) p-Value

0.94 (0.01) 0.02(+)

0.943 (0.009) 0.14(+)

0.942 (0.008) 0.096(+)

0.915 (0.05) 2.2e−05(+)

0.933 (0.005) 1.2e−14(+)

0.953 (0.012) –

3

Means (S.D.) p-Value

0.526 (0.013) 4.5e−10(+)

0.705 (0.017) 7.2e−08(+)

0.482 (0.009) 3.8e−12(+)

0.439 (0.01) 8.7e−11(+)

0.66 (0.002) 5.6e−09(+)

0.92 (0.015) –

4

Means (S.D.) p-Value

0.575 (0.016) 7.0e−09(+)

0.742 (0.017) 1.2e−06(+)

0.473 (0.023) 1.3e−09(+)

0.492 (0.026) 2e−09(+)

0.657 (0.006) 1.0e−10(+)

0.915 (0.018) –

5

Means (S.D.) p-Value

0.730 (0.015) 6.2e−09(+)

0.855 (0.007) 2.4e−08(+)

0.697 (0.012) 3.9e−11(+)

0.705 (0.005) 1.5e−12(+)

0.872 (0.008) 1.0e−16(+)

0.95 (0.006) –

6

Means (S.D.) p-Value

0.701 (0.009) 9.6e−12(+)

0.834 (0.011) 5.7e−08(+)

0.662 (0.032) 1.2e−08(+)

0.645 (0.009) 1.3e−12(+)

0.915 (0.013) 3.2e−15(+)

0.947 (0.01) –

a The superscripts (+) and (−) denote that the LMPROJ method is better or worse than the method under comparison based on t-test results. The smaller the p-value, the more significant the difference of the average. A p-value of 0.05 is considered statistically significant.

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

175

Table 8 Performance comparison of six classifiers on normalized datasets based on STFT feature extraction. Experimental datasets

LDA

DT

NB

NM

SVM

LMPROJ

1

Means (S.D.) p-Valuea

0.987 (0.046) 0.02(+)

0.993 (0.003) 0.6(−)

0.978 (0.004) 2.5e−06(+)

0.978 (0.004) 2.5e−06(+)

0.993 (0.03) 0.6(−)

0.992 (0.02) –

2

Means (S.D.) p-Value

0.982 (0.004) 0.049 (−)

0.988 (0.004) 2.9e−04 (−)

0.986 (0.009) 0.025 (−)

0.968 (0.002) 0.004(+)

0.994 (0.001) 3.3e−07(+)

0.978 (0.003) –

3

Means (S.D.) p-Value

0.796 (0.01) 6.2e−10(+)

0.552 (0.008) 1.5e−12(+)

0.471 (0.05) 8.1e−09(+)

0.435 (0.04) 6.5e−09(+)

0.523 (0.016) 1.2e−10(+)

0.946 (0.007) –

4

Means (S.D.) p-Value

0.821 (0.007) 9.9e−10(+)

0.619 (0.01) 1e−11(+)

0.468 (0.01) 6.7e−13(+)

0.457 (0.018) 1.9e−11(+)

0.575 (0.035) 6.7e−09(+)

0.947 (0.004) –

5

Means (S.D.) p-Value

0.958 (0.006) 0.22(+)

0.904 (0.01) 4.3e−06(+)

0.732 (0.007) 3.3e−11(+)

0.630 (0.004) 1.7e−10(+)

0.698 (0.09) 1.4e−10(+)

0.963 (0.009) –

6

Means (S.D.) p-Value

0.950 (0.01) 8.2e−04(+)

0.933 (0.01) 8.3e−05(+)

0.798 (0.024) 5.9e−07(+)

0.635 (0.009) 5.8e−11(+)

0.687 (0.06) 1.4e−10(+)

0.968 (0.01) –

a The superscripts (+) and (−) denote that the LMPROJ method is better or worse than the method under comparison based on t-test results. The smaller the p-value, the more significant the difference of the average. A p-value of 0.05 is considered statistically significant.

that of the conventional methods. The proposed method has also demonstrated stronger adaptability to datasets exhibiting a drift in distribution between the training and testing data compared. These findings show that transfer learning technology can effectively improve the recognition of EEG signals for epilepsy detection, and the proposed LMPROJ-based method is a promising approach in this regard. However, it is also noted that LMPROJ-based method has an inherent disadvantage by its nature. When transductive transfer learning method is used to train a classifier, both the training and testing data are required in the learning procedure. Thus, the LMPROJ-based method proposed in this paper indeed tightly couples the training and the testing data to optimize the parameters of classifier, which means that for each different testing set, it is necessary to re-train the corresponding classifier.

4.3. Comparison with previous work on same data source While the proposed LMPROJ-based EEG classification method has shown obvious advantage for datasets with different distribution between the training and test data, we further compare it with related methods which have also obtained high accuracy with data obtained from the same data source used in our study. In this section, the two datasets that have been used in different literatures were adopted, as shown in Table 10 [7,34–39]. The accuracy obtained by the methods reported in previous literatures is also given in the table. The methods in [35,37–39] were implemented for EEG signal classification based on neural network, whereas the methods in [7,34,36] were used for classification based on SVM. In both cases, different feature extraction methods were employed. For comparison, we also conducted the experiment using four classifiers, i.e., LMPROJ-based classifier and the traditional DT, LDA and

Table 9 Performance comparison of six classifiers on normalized datasets based on KPCA feature extraction. Experimental datasets

LDA

DT

NB

NM

SVM

LMPROJ

1

Means (S.D.) p-Valuea

0.976 (0.004) 3.8e−15(+)

0.955 (0.023) 1.3e−06(+)

0.895 (0.02) 8.5e−11(+)

0.984 (0.001) 2e−13(+)

0.98 (0.007) 1.1e−13(+)

0.987 (0.004) –

2

Means (S.D.) p-Value

0.95 (0.003) 8.9e−09(+)

0.948 (0.008) 3.5e−05(+)

0.867 (0.014) 1.7e−07(+)

0.926 (0.006) 1.4e−07(+)

0.975 (0.01) 0.003(+)

0.99 (0.006) –

3

Means (S.D.) p-Value

0.864 (0.013) 5.9e−07(+)

0.645 (0.014) 2.6e−10(+)

0.753 (0.033) 6e−07(+)

0.86 (0.015) 3.7e−06(+)

0.88 (0.006) 5.1e−07(+)

0.97 (0.01) –

4

Means (S.D.) p-Value

0.927 (0.007) 1.3e−05(+)

0.763 (0.022) 1.7e−07(+)

0.789 (0.02) 3.9e−08(+)

0.876 (0.018) 4.1e−06(+)

0.907 (0.004) 3.6e−06(+)

0.965 (0.009) –

5

Means (S.D.) p-Value

0.66 (0.016) 2.4e−09(+)

0.625 (0.033) 2.9e−08(+)

0.598 (0.009) 2.0e−12(+)

0.562 (0.013) 1.6e−12(+)

0.885 (0.01) 6.8e−07(+)

0.96 (0.01) –

6

Means (S.D.) p-Value

0.672 (0.016) 3.9e−09(+)

0.675 (0.01) 4.4e−10(+)

0.664 (0.014) 1.9e−09(+)

0.627 (0.012) 4.2e−10(+)

0.916 (0.008) 7.6e−05(+)

0.965 (0.012) –

a The superscripts (+) and (−) denote that the LMPROJ method is better or worse than the method under comparison based on t-test results. The smaller the p-value, the more significant the difference of the average. A p-value of 0.05 is considered statistically significant.

176

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177

Table 10 Comparison with previous studies that used the same data source. Groups

Accuracy obtained in literatures

A and E

100% 99% 100% 100%

Accuracy obtained in our experiments STFT for feature extraction

A, D and E

96.79% 94.07% 94.83% 98.3%

KPCA for feature extraction

Ploat and Günes (2008) [33] Kumari and Jose (2011) [34] Kumar et al. (2010) [35] Orhan and Gurbuz [36]

99.3% 98.7% 99.3% 99.2%

DT LDA SVM LMPROJ

95.5% 97.6% 98% 98.7%

DT LDA SVM LMPROJ

Guler et al. (2005) [37] Bao et al. (2009) [38] Ubeyli (2009) [39] Peng and Lu (2011) [7]

95.3% 96% 94.7% 98%

DT LDA SVM LMPROJ

96% 94.7% 95.6% 98%

DT LDA SVM LMPROJ

SVM, based on the feature extraction methods STFT and KPCA, respectively. In our experiments, the training and test data were normalized, respectively, to zero mean and unit variance. From the results shown in Table 10, we can see that all the methods have achieved high accuracy for the same data source, but the results are indeed anticipated from two aspects: first, the classification tasks were simple and thus many classifiers could be effective for the different feature extraction methods employed. Second, all the datasets used in the experiment have the same distribution for the training and test data, and therefore many traditional classifiers appeared to be suitable for the classification tasks. Despite high classification accuracy, these previous methods cannot handle non-ideal situations where the data distribution in the training and test data is different. As demonstrated by the results on the datasets of different distribution constructed in our experiments (Section 4.1.2), the performance of traditional methods such as SVM based identification methods was sub-optimal and inferior to the proposed method due to the complexity of the data. 5. Conclusions This paper presents a transfer-learning-based EEG signals recognition method for epilepsy detection. The method can cope with situations where the difference in distributions between the training and testing data exists. Although the proposed transfer learning method has demonstrated distinctive effectiveness, there are rooms for the further study and improvement. For example, in addition to EEG signal classification, it is also important to develop more effective transfer-learning-based feature extraction methods for EEG signals in order to further improve the overall performance of automatic epilepsy detection based on EEG. Acknowledgments This work was supported in part by the Hong Kong Research Grants Council (PolyU 5134/12E), the National Natural Science Foundation of China (61170122, 61272210), the Ministry of Education Program for New Century Excellent Talents (NCET-120882) the Fundamental Research Funds for the Central Universities and the Outstanding Youth Fund of Jiangsu Province (BK20140001). References [1] Talevi A, Cravero MS, Castro EA, Bruno-Blanch LE. Discovery of anticonvulsant activity of abietic acid through application of linear discriminant analysis. Bioorg Med Chem Lett 2007;17(6):1684–90. [2] Dorai A, Ponnambalam K. Automated epileptic seizure onset detection. In: Proceedings of the international conference on autonomous and intelligent systems. Piscataway, NJ, USA: IEEE Press; 2010. p. 1–4. [3] Cortes C, Vapnik V. Support vector networks. Mach Learn 1995;20(3):273–95. [4] Iscan Z, Dokur Z, Demiralp T. Classification of electroencephalogram signals with combined time and frequency features. Expert Syst Appl 2011;38(8):10499–505.

[5] Subasi A, Ismail GM. EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst Appl 2010;37(12):8659–66. [6] Tzallas AT, Tsipouras MG, Fotiadis DI. Epileptic seizure detection in EEGs using time-frequency analysis. IEEE Trans Inf Technol Biomed 2009;13(5): 703–10. [7] Peng P, Lu BL. Immune clonal algorithm based feature selection for epileptic EEG signal classification. In: Proceedings of the 11th International conference on information science, signal processing and their applications. Piscataway, NJ, USA: IEEE Press; 2012. p. 848–53. [8] Litt B, Esteller R, Echauz J, D‘Alessandro M, Shor R, Henry T, et al. Epileptic seizures may begin hours in advance of clinical onset: a report of five patients. Neuron 2001;30(1):51–64. [9] Vivaldi EA, Bassi A. Frequency domain analysis of sleep EEG for visualization and automated state detection. In: Proceedings of the 28th annual international conference of the IEEE engineering in medicine and biology society. Piscataway, NJ, USA: IEEE Press; 2006. p. 3740–3. [10] Fong GCY, Shah PU, Gee MN, Serratosa JM, Castroviejo IP, Khan S, et al. Childhood absence epilepsy with tonic–clonic seizures and electroencephalogram 3–4-Hz spike and multispike-slow wave complexes: linkage to chromosome 8q24. Am J Hum Genet 1998;63(4):1117–29. [11] Blanco S, Kochen S, Rosso OA, Salgado P. Applying time-frequency analysis to seizure EEG activity. IEEE Trans Eng Med Biol Mag 1997;16(1):64–71. [12] Zhang Z, Kawabata H, Liu ZQ. EEG analysis using fast wavelet transform. In: Proceeding of the IEEE international conference on systems, man, and cybernetics. Piscataway, NJ, USA: IEEE Press; 2000. p. 2959–64. [13] Shen M, Chen J, Lin C. Modeling of nonlinear medical signal based on local support vector machine. In: Proceedings of the IEEE instrumentation and measurement technology conference. Piscataway, NJ, USA: IEEE Press; 2009. p. 675–9. [14] Polat K, Günes¸ S. Classification of epileptiform EEG using a hybrid system based on decision tree classifier and fast Fourier transform. Appl Math Comput 2007;187(2):1017–26. [15] Valenti P, Cazamajou E, Scarpettini M, Aizemberg A, Silva W, Kochen S. Automatic detection of interictal spikes using data mining models. J Neurosci Methods 2006;150(1):105–10. [16] Patel K, Chern-Pin C, Fau S, Bleakley CJ. Low power real-time seizure detection for ambulatory EEG. In: Proceedings of the 3rd international conference on pervasive computing technologies for healthcare. Piscataway, NJ, USA: IEEE Press; 2009. p. 1–7. [17] Mihandoost S, Amirani MC, Varghahan BZ. Seizure detection using wavelet transform and a new statistical feature. In: Proceedings of 5th international conference on application of information and communication technologies. Piscataway, NJ, USA: IEEE Press; 2011. p. 1–5. [18] Fukunaga F. Introduction to statistical pattern recognition. Boston, USA: Academic Press; 1990. [19] Shin D, Kim S. Nearest mean classification via one-class SVM. In: Proceedings of the international joint conference on computational sciences and optimization. Piscataway, NJ, USA: IEEE Press; 2009. p. 593–6. [20] Wahlberg P, Salomonsson G. Feature extraction and clustering of EEG epileptic spikes. Comput Biomed Res 1996;29(5):382–94. [21] Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng 2010;22(10):1345–59. [22] Taylor ME, Stone P. Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 2009;10(1):1633–85. [23] Arnold A, Nallapati R, Cohen WW. A comparative study of methods for transductive transfer learning. In: Proceedings of the 7th IEEE international conference on data mining workshops. Piscataway, NJ, USA: IEEE Press; 2007. p. 77–82. [24] Quanz B, Huan J. Large margin transductive transfer learning. In: Proceedings of the 18th ACM conference on information and knowledge management. New York, NY, USA: ACM Press; 2009. p. 1327–36. [25] Vapnik V. Statistical learning theory. Hoboken, NJ, USA: John Wiley & Sons Inc.; 1998. [26] Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 2006;7(12):2399–434. [27] Gretton A, Borgwardt K, Rasch MJ, Scholkopf B, Smola AJ. A kernel method for the two-sample problem. J Mach Learn Res 2008;1:1–10.

C. Yang et al. / Artificial Intelligence in Medicine 62 (2014) 165–177 [28] Ben-David S, Blitzer J, Crammer K, Pereira F. Analysis of representations for domain adaptation. In: Proceedings of the 2006 conference on advances in neural information processing systems, vol. 19. Cambridge, MA, USA: MIT Press; 2007. p. 137–44. [29] Schölkopf B, Herbrich R, Smola A. A generalized representer theorem. In: Helmbold D, Williamson B, editors. Proceedings of the 14th annual conference on computational learning theory. Berlin, Heidelberg: Springer; 2001. p. 416–26. [30] Andrzejak RG, Lehnertz K, Mormann F, Rieke C, David P, Elger CE. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys Rev E 2001;64(6):1907–14. [31] Teixeiraa AR, Tomea AM, Stadlthannerb K, Langb EW. KPCA denoising and the pre-image problem revisited. Digit Signal Process 2008;18(4):568–80. [32] Xiong YJ, Zhang R, Zhang C, Yu XL. A novel estimation method of fatigue using EEG based on KPCA-SVM and complexity parameters. Appl Mech Mater 2013;373:965–9. [33] Polat K, Günes S. Artificial immune recognition system with fuzzy resource allocation mechanism classifier, principal component analysis and FFT method based new hybrid automated identification system for classification of EEG signals. Expert Syst Appl 2008;34(3):2039–48.

177

[34] Kumari SS, Jose JP. Seizure detection in EEG using time frequency analysis and SVM. In: Proceedings of international conference on emerging trends in electrical and computer technology. Piscataway, NJ, USA: IEEE Press; 2011. p. 626–30. [35] Kumar SP, Sriraam N, Benakop PG, Jinaga BC. Entropies based detection of epileptic seizures with artificial neural network classifiers. Expert Syst Appl 2010;37(4):3284–91. [36] Orhan U, Gurbuz E. Classifying discrete interval densities of EEG signals by using DWT and SVM. In: Proceedings of international conference on innovations in intelligent systems and applications. Piscataway, NJ, USA: IEEE Press; 2012. p. 1–4. [37] Güler NF, Übeyli ED, Güler I. Recurrent neural networks employing Lyapunov exponents for EEG signals classification. Expert Syst Appl 2005;29(3): 506–14. [38] Bao FS, Gao JM, Hu J, Lie DYC, Zhang YL, Oommen KJ. Automated epilepsy diagnosis using interictal scalp EEG. In: Proceedings of 31st annual international conference of engineering in medicine and biology society. Piscataway, NJ, USA: IEEE Press; 2009. p. 6603–7. [39] Übeyli ED. Combined neural network model employing wavelet coefficients for EEG signals classification. Digit Signal Process 2009;19(2):297–308.

Transductive domain adaptive learning for epileptic electroencephalogram recognition.

Intelligent recognition of electroencephalogram (EEG) signals is an important means for epilepsy detection. Almost all conventional intelligent recogn...
1MB Sizes 1 Downloads 6 Views