c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

journal homepage: www.intl.elsevierhealth.com/journals/cmpb

Epileptic seizure predictors based on computational intelligence techniques: A comparative study with 278 patients César Alexandre Teixeira a,∗ , Bruno Direito a , Mojtaba Bandarabadi a , Michel Le Van Quyen b , Mario Valderrama b , Bjoern Schelter c , Andreas Schulze-Bonhage d , Vincent Navarro e , Francisco Sales f , António Dourado a a

Centre for Informatics and Systems, University of Coimbra, Portugal Centre de Recherche de l’Institut du Cerveau et de la Moelle épinière (CRICM), Paris, France c Freiburg Center for Data Analysis and Modeling, University of Freiburg, Freiburg, Germany d Epilepsy Center, University Hospital of Freiburg, Freiburg, Germany e Epilepsy Unit, Groupe Hospitalier Pitié-Salpêtrière, Paris, France f Centro Hospitalar e Universitário de Coimbra, Coimbra, Portugal b

a r t i c l e

i n f o

a b s t r a c t

Article history:

The ability of computational intelligence methods to predict epileptic seizures is evalu-

Received 31 July 2013

ated in long-term EEG recordings of 278 patients suffering from pharmaco-resistant partial

Received in revised form

epilepsy, also known as refractory epilepsy. This extensive study in seizure prediction con-

2 December 2013

siders the 278 patients from the European Epilepsy Database, collected in three epilepsy

Accepted 15 February 2014

centres: Hôpital Pitié-là-Salpêtrière, Paris, France; Universitätsklinikum Freiburg, Germany; Centro Hospitalar e Universitário de Coimbra, Portugal.

Keywords:

For a considerable number of patients it was possible to find a patient specific predictor

Epileptic seizure prediction

with an acceptable performance, as for example predictors that anticipate at least half of

Artificial neural networks

the seizures with a rate of false alarms of no more than 1 in 6 h (0.15 h−1 ). We observed

Support vector machines

that the epileptic focus localization, data sampling frequency, testing duration, number of

EPILEPSIAE project

seizures in testing, type of machine learning, and preictal time influence significantly the

European Epilepsy Database

prediction performance. The results allow to face optimistically the feasibility of a patient specific prospective alarming system, based on machine learning techniques by considering the combination of several univariate (single-channel) electroencephalogram features. We envisage that this work will serve as benchmark data that will be of valuable importance for future studies based on the European Epilepsy Database. © 2014 Elsevier Ireland Ltd. All rights reserved.



Corresponding author. Tel.: +351 968483495. E-mail addresses: [email protected], [email protected] (C. Alexandre Teixeira).

http://dx.doi.org/10.1016/j.cmpb.2014.02.007 0169-2607/© 2014 Elsevier Ireland Ltd. All rights reserved.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

1.

Introduction

About 1% of the population suffers from epilepsy in some time of their lives. From these, about 30% are resistant to antiepileptic drugs and surgical treatments, and must live with the permanent possibility of a seizure [1]. The apparent unpredictability of seizure occurrence imposes a painful limitation to their daily lives and results in a high risk of endangering situations [2,3]. A transportable system that could warn the patient of an impending seizure, or, in a more advanced state, trigger an antiepileptic device to disarm the seizure occurrence (like a stimulator), would dramatically improve their quality of life. The first research on seizure prediction is considered to be the work of Viglione and Walsh, in 1975 [4], searching for reproducible and observable patterns in EEG that could eventually predict an oncoming seizure. Since then, many research groups devoted their efforts towards the prediction of seizures. The search for appropriate pre-seizure patterns was generally accomplished by the processing of the raw EEG data. These processing results in a set of descriptors, the features that are expected to present coherent changes before seizures. Several features from time, frequency and time–frequency domains have been considered (see for example the excellent review in [5]). Most of these features were considered independently and the generation of alarms was mostly based on optimized threshold levels. Computational intelligence techniques have also been proposed for seizure prediction. Several recent studies [6–10] report good results using customized algorithms based on multiple features and machine-learning classifiers. For example, on intracranial recordings of 18 patients (total 433 h of recording and 80 seizures), from the Freiburg iEEG database, described at http://epilepsy.uni-freiburg. Park de/freiburg-seizure-prediction-project/eeg-database, et al. [10] reported a high sensitivity (97.5%) and a low false prediction rate (0.27 per hour). The study used support vector machines (SVM) to classify a high dimensional feature set based on relative spectral power features. However they eliminated samples with artifacts, in a way that is not feasible in real-time, and used discontinuous intracranial EEG recordings, containing just 50 min of preictal data [10]. A rigorous evaluation of the prediction performances requires information over days, as a patient undergoes different physiological or epileptic states [5]. Moreover, the evaluation should consider the raw data as it is collected. Another drawback of most studies is the unavailability of extensive testing on baseline data free from seizures. In summary, so far, prediction performances have not been systematically evaluated on large and collaborative databases, leaving the question open as to whether they prove to be clinically useful [11–13]. This work reports a study on a database with data from 278 patients with long-term EEG recordings, completely and coherently annotated [14,15]. Besides EEG data, the database also contains the ECG, and other patients’ information (the medication, clinical data, etc.). This European Database on Epilepsy was one of the main outcomes of the European project EPILEPSIAE (http://www.epilepsiae.eu). The role of the

325

European Database is to provide the research and clinical communities with data that eliminates the problems so far referred of short-term discontinuous data segments, and low number of patients, enabling trustworthy evaluations of the prediction performance. An approach was followed close to the conditions that will be required for an actual clinical application in real time, i.e., not excluding any data block even if artifacts may be present. We minimize the artifacts effects by performing supervised training, i.e., by implicitly “telling” the classifiers that they should not take artifacts in account. So, it is expected that a properly trained classifier should minimize the influence of artifacts when applied on the testing data. The novelties of our study can be summarized as: (i) We tested, whether an EEG-based algorithm is able to predict seizures in a quasi-prospective setting on a large multi-center database of long-term recordings. The standardized EEG annotation protocol previously developed was followed to ensure the comparability and reliability of the seizure onset times in all the three participating clinical centers [14]. (ii) We employed a larger set of EEG features simultaneously measuring several physiological parameters of the brain activities embedded in the electrical signal. Feature combination has a higher potential to address the challenges presented by the heterogeneity of EEG patterns seen in refractory epilepsy [10,16]. (iii) After feature extraction, three types of machine learning classifiers were systematically applied to distinguish between the adopted four cerebral states: preictal (immediately preceding a seizure), ictal (between seizures onset and offset), postictal (the transition state right after a seizure) and interictal (between seizures, the baseline). (iv) For each type of classifier a high number of distinct hidden layer structures and parameters have been applied and compared. After training, the classifier that presents sensitivity and false prediction rate, in the testing data, more close to the optimal performance (100% of sensitivity, 0 h−1 of false prediction rate) was selected as the best one. (v) Regularization techniques have been developed for the smoothing of the output of the classifiers in order to reduce the false prediction rate, making the transition of the point-by-point classification to seizure-by seizure classification. Section 2 describes the data used, referring the most important patient and EEG recordings characteristics. The different methodological details employed are described in Section 3. The results, related analysis and overall conclusions are reported in Sections 4 and 5.

2.

Description of the data

Long-term EEG recordings from 278 epilepsy patients (149 males (53.8%); age range, 2–67 years; mean age: 34.3 years) suffering from medically intractable partial epilepsy were analyzed. Data was recorded in three different epilepsy units (Unidade de Monitorizac¸ão em Epilepsia e Sono, Centro

326

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

Hospitalar e Universitário de Coimbra, Portugal; Unité d’Épilepsie of the Pitié-là-Salpêtrière Hospital, Paris, France; Epilepsy Center, Epilepsiezentrum, Universitätsklinikum Freiburg, Germany) in a total of almost 2031 days (48,742 h) of EEG including 2702 seizures. 68% of the patients had temporal lobe epilepsy. Concerning lateralization 42% present epilepsies in the left side of the brain, 41% in the right side, 8% in both sides (bilateral), and for 0.7% it was impossible to define a lateralization. In 227 patients, EEG was recorded using 22–37 scalp electrodes; the average recording period was 162 h. In 42 patients, intracranial EEG with 14–124 recording sites was recorded using stereotactically implanted depth electrodes, subdural grids and/or strips; the average recording period was 253 h. Nine “mixed” patients were subjected to both intracranial and scalp EEG recordings. In the mixed patients the number of electrodes vary between 54 and 113, being the average recording time of 140 h. EEG data were recorded using Nicolet, Micromed, Compumedics, or Neurofile NT digital video EEG systems at sampling rates of 250 Hz, 256 Hz, 400 Hz, 512 Hz, 1024 Hz, 2048 Hz or 2500 Hz. In all the acquisition systems data was recorded relative to a common electrical reference. We considered only clinical seizures that were characterized by using a standardized EEG annotation protocol, developed to ensure an uniform and reliable marking of seizure onset and offset times at all three hospitals [14] under clinical supervision. Seizure onset was defined as the first electrographic sign of the oncoming seizure. Alternatively, if the seizure onset could not be identified unequivocally based on the EEG, the clinical onset time was used. The number of recorded seizures ranged from 3 to 33, from 3 to 94, and from 5 to 26 for the scalp, invasive and mixed EEG recordings, respectively. Supplemental Table SI and SII show patients’ characteristics and details of the recordings. Patients with suffix 00, 02 and 03 refer to patients monitored in Coimbra, Freiburg, and Paris, respectively. This study was made under patient informed consent and ethical Commissions approval.

3.

Methodology

The proposed methodology is summarized in Fig. 1. The first step was the selection of six electrodes. For each of the selected electrodes, 22 univariate features were computed, leading to a feature space with 132 dimensions. The overall feature time series were split into two data portions: one for training of the classifiers parameters, and the other for testing, or validation, of the trained classifiers in unseen data. After training, the classifiers are applied in the testing data and finally the prediction performance is evaluated concerning sensitivity (to be maximized) and false prediction rate (FPR) (to be minimized). These two conflicting goals put forward additional challenges to the problem.

3.1.

Electrode selection

A relatively small number of electrodes was used in our study. The main motivation to use as few EEG electrodes as possible was to be close to the conditions that will be required for an actual clinical application. Indeed, only a few patients are willing to wear large number of EEG electrodes for signal acquisition on a long-term basis [3]. The fundamental aspect for this is the stigmatization and discomfort of using electrodes for long time. In this study it was decided to select six electrodes as a good compromise between available EEG information and patient comfort. Six electrodes were also used in other studies, showing to be appropriate and leading to solid conclusions [10,17,18]. The six electrodes were selected by three approaches: (i) random selection, (ii) spatial discretization of the patient scalp by choosing the electrodes F7, Fz, F8, T5, Pz, and T6, and (iii) a selection based on the seizure origin and propagation. The selection of F7, Fz, F8, T5, Pz, and T6, resulted in the analysis of frontal, central and parietal regions, aiming to capture with six electrodes the generalized electrical activity of the brain captured at the scalp level. For the latter selection mode, three focal electrodes that are early involved in seizure activity, and three extra-focal electrodes that were not or were very late involved in seizure spread were selected. The electrodes involved in the seizure activity were identified by clinical neurophysiologists in the EEG data. Other authors already considered the random selection in order to perform a naive and blind selection, and aiming to justify the need for intelligent techniques. The selection based on the opinion of clinical neurophysiologists was also considered for both selections of focal and extra focal channels [18].

3.2.

Feature extraction

Twenty-two univariate features were computed for all the selected electrodes, using consecutive 5 s windows without overlap. The single pre-processing method applied was a notch filter to discard the 50 Hz power-line component. The features considered are listed in Table 1 and are briefly described in the following. The prediction error derived from an autoregressive model of the EEG signal has been pointed-out for both detection [19] and prediction [20]. The idea is that, as the seizure approaches, the EEG signals become well behaved and thus better predicted by the autoregressive model, decreasing the prediction error in the preictal phase. With the seizure onset, this decrease in the EEG prediction error disappears and higher error values are again observed. The decorrelation time is by definition the time of the first zero-crossing of the autocorrelation sequence of a given EEG signal. If the time at which the first zero-crossing occurs approaches zero it means that the signal samples are less correlated. The extreme case is for a white noise sequence that theoretically presents a decorrelation time of zero. In the time before seizures a decrease on the power related with the lower frequencies of the EEG has been reported, which results in a decrease on the decorrelation time [21]. Hjorth mobility [22] is a quantification of the variance of the slopes of a time-series normalized by the variance of its

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

327

Fig. 1 – Methodology. The first step is the selection of six raw EEG electrodes. The second step involves the computation of 22 univariate features for each channel, leading to a feature space with 132 dimensions. In third place the overall feature time series are split into two parts for training and test. In the forth stage classifiers are trained, considering different parameters. In fifth place, and after training of the classifiers, alarms are generated in the test data, and finally sixth step includes the performance assessment of each predictor.

amplitude distribution, and is an estimate of the mean frequency. Hjorth complexity [22] is the variance of the rate of the slope changes having as reference an ideal sinusoid, and is an estimate of the bandwidth. The decrease on the power of the lower frequencies with the proximity of the seizure onset has also as consequence the increase of the Hjorth Mobility and Complexity [21]. The spectral power in different frequency bands of the EEG was also considered for seizure prediction. As reported by [21], the transfer of power from the lower to the higher frequencies was observed before the seizure onset. Spectral edge frequency is a quantification of the power distribution along the spectral range of a given signal. Usually most of the power of an EEG signal is contained in the range 0–40 Hz, and the spectral edge-frequency is defined as the minimum frequency up to which 50% of the total power up to 40 Hz is contained in a given signal. The spectral edge-power is the corresponding half power of the signal up to 40 Hz. The variance is equivalent to the energy of the signal, skewness and kurtosis are measures of the symmetry, and relative peakness or flatness of the amplitude distribution, respectively. It was reported that variance and kurtosis vary significantly in the preictal phase. A decrease in variance and an increase in kurtosis were observed in the preictal time when compared with interictal based data [23].

Wavelet transform enables a time–frequency decomposition of a given signal in several levels (sub-signals). The first levels are related with the high frequency components of the signal, while the last ones are related with the lower frequency contents. By computing the energy of the signals originated by the decomposition, a measure of the energy in different frequency ranges can be achieved.

3.3.

Data splitting: the training and the testing sets

The computed feature time series for each patient were split into two parts. A first part, containing the first two or three seizures was used for the training set in order to optimize the parameters of the classifiers. The other part was never used for any parameter optimization, and was used as the testing set, which is also known as the validation set. The training, based on the first two or three seizures, aims to simulate a real world situation where a predictor has to be developed based on a limited number of seizures that occurred during an initial period in clinic environment, and that should be evaluated afterwards on new EEG data collected in a second period corresponding to the testing data. The average duration of the testing set was approximately 78 h, 94 h, and 70 h for the scalp, invasive and mixed patients, respectively. In total,

328

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

Table 1 – Univariate features computed. Feature AR model predictive error Decorrelation time Energy Hjorth

Complexity Mobility

Relative power

Delta (30 Hz)

Spectral edge

Frequency Power

Statistics

Mean Variance Skewness Kurtosis

Energy of Daubechies 4 Wavelet Coefficients

Decomposition level 1 Decomposition level 2 Decomposition level 3 Decomposition level 4 Decomposition level 5 Decomposition level 6

approximately 22,291 h of testing data were considered, containing a total of 1519 seizures.

3.4.

Training of the predictors as classifiers

3.4.1.

Definition and balancing of the classes

For supervised training, in addition to the input features that form an input space with 132 dimensions, a target output is needed. The target output is a time series that discriminates the cerebral state for each input sample. Here we define a fourclass approach by considering that the input samples can be classified as: 1. interictal – the “normal” brain state far from seizures, 2. preictal – the time interval just previous to the seizure onset, 3. ictal – the time interval during a seizure, 4. postictal – the time interval between a seizure and a “normal” brain state. The number of preictal and postictal samples depends on the preictal, postictal and intervention times (IT) defined. The exact time where the preictal state starts is unknown and can probably be patient-dependent. The search for appropriate predictors should consider different preictal times, and here we tested 10, 20, 30 and 40 min. The usefulness of a prediction depends upon whether or not the time to the seizure onset is sufficient to take some preventive action, imposing the definition of an IT to distinguish between early seizure detection and seizure prediction. This period is decisive for the development of intervention mechanisms designed to warn the patients or to abort the seizure generation mechanisms (i.e. mechanisms

Fig. 2 – Target time series encoding the classification of the cerebral states for three seizures. The interictal epochs are represented by black time slots while the yellow time slots represent the ictal plus postictal epochs. Red time slots indicate preictal times. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

able to interrupt the pathways leading to seizures such as the vagus nerve stimulation [24]). An IT of 10 s was considered as appropriate. After the seizure offset there is a time of instability called the postictal time. As the preictal time, the postictal period are also not defined by neurophysiologists and a time of 10 min was assumed for all patients. The number of ictal samples is dependent on the seizures onset and offset, which are set by the neurophysiologists in the raw EEG. The target output was defined as a sequence of samples, where the values 1, 2, 3, or 4 stand for the interictal, preictal, ictal, or postictal classes, respectively (Fig. 2). For seizure prediction the class of interest is the preictal class (level 2), and the generation of prediction alarms is explained in Section 3.4.5. Given the previous definitions concerning electrode selection and preictal times, the number of available datasets for the patients with scalp electrodes were three (electrode selection modes) multiplied by four preictal times, leading to 12 datasets. For the patients without scalp electrodes only two electrode selection methods (random and based on the seizure propagation information) were applied, leading to a number of available datasets of 2 × 4 =8. A problem with the training data is that it is unbalanced, since the ictal, postictal and preictal classes have relatively small portions compared to the interictal class. The patterns related to the interictal class may represent more than 90% of the training data. In this case, a model predicting only interictal labels would present an overall accuracy of more than 90%, which is clearly a misleading measure. One of the approaches suggested to overcome the class unbalance is to downsample the class with larger proportions [25]. Thus, to reduce the effect of the class unbalance, we randomly select samples from the interictal class (matching the number of samples of the interictal class with the sum of the rest of the classes). Let it be stressed that the class balancing is just implemented in the training data, and in testing we use the original time-series to evaluate the generalization capacity of the classifiers. Support vector machines classifiers can deal with class unbalance by the definition of different costs for each class, but if the unbalance is very strong, as is the case, the technique does not work. Alternatively, other studies [26] suggest the use of downsampling to deal with the class unbalance since it may be important to reduce redundant information and to reduce

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

the computational cost of the trained model. Moreover it represents a common framework for the training of the different machine learning strategies considered.

3.4.2.

Two types of machine learning techniques were considered for seizure prediction: artificial neural networks (ANN) [27] and support vector machines (SVM) [28]. Two structures of ANNs have been used: multilayer perceptrons neural networks (MLP) and radial basis functions networks (RBF). A RBF is a special case of ANN that is defined by a single hidden layer, and by applying radial basis activation functions. The value of a radial basis function depends only on two parameters: the placement of its centre in the input space and the spread of its base. The parameters in a RBF are the parameters of the activation functions and a set of weights in the output layer, usually a linear layer. Gaussian activation functions are applied and the training of an RBF is the tuning of the centres and spreads of the activation functions, and also the definition of the appropriate values for the weights in the output layer. MLPs were trained by the resilient back-propagation algorithm (RBP), while the RBFs were trained by an algorithm that exploits the separability of the parameters in two classes: non-linear and linear. The RBP version used was the one implemented in the Neural Network Toolbox of Matlab and described in [29]. The initial weights values were determined by the Nguyen–Widrow [30] initialization algorithm as implemented by the Artificial Neural Network Toolbox of Matlab. The Nguyen–Widrow initialization algorithm chooses values in order to distribute the active region of each neuron in the layer approximately evenly across the layer’s input space. The values contain a degree of randomness, so they are not the same each time this function is called. This procedure is applied only to neurons were the transfer function has a finite active input range, such as the hyperbolic tangent sigmoid used in this study for the hidden layers. For transfer functions where the active input range is an infinite interval the weights were setup randomly, which is the case of the purelin transfer function, which was considered in this study for the output layer. The use of this initialization procedure, instead of purely random initial conditions, is referred to reduce training time by more than an order of magnitude. Different numbers of training epochs ware tested and we found that 2000 or 4000 epochs were appropriate numbers for the number of neurons and hidden layers used. The RBFs were trained using a methodology that involves the LevenbergMarquardt (LM) [31,32] algorithm and the minimization of a criterion, which exploits the separability of the RBFs parameters into linear and nonlinear as described in [33]. The LM optimized only the nonlinear parameters, while the linear ones were found using the linear least-squares strategy. The initial values of the centers were set up by the optimal adaptive k-means clustering method (OAKM) [34], and the values of the spreads were found using [27]: dmax i = √ 2n

i = 1, . . ., n

were dmax is the maximum distance between centers and n is the number of centers. The LM stops the parameters optimization using the “early-stopping” method.

3.4.3.

ANN classifiers

(1)

329

Multiclass SVM classifiers

By definition, an SVM is a binary classifier, but there are situations where more than two classes are needed to solve a given classification problem, which is the case of this paper. For this purpose the SVMs were also adapted to perform classification in more than two classes. The standard approach is to reduce a multi-class problem to several two-class problems, for which the standard SVM algorithm can be applied. The different approaches differ in the way in which single SVM is combined to give rise to a multi-class classifier. We apply the “one-versus-one” approach using the “max-wins” voting, which was reported as the most appropriate method for practical use [35]. For a problem with k classes this methodology constructs k(k − 1)/2 classifiers. Each classifier is defined to classify data into two of the defined classes. The final label attributed to the data is the one that receives more votes from the individual classifiers. The SVMs were trained by using the Matlab and its interface to the LibSVM library [36].

3.4.4.

Analyzed structures

To obtain the best classifiers, we tested several ANN and SVM structures and parameterizations. For the RBF networks, the number of hidden neurons considered were [3,5,7,9,11,13,15,17,19,21,23,25,27,29], and [40,50]. The choice of this numbers aim to test RBF networks with a very low number of neurons and also with relatively high number of neurons. Some numbers between 3 and 50 were lived out in order to delimitate the number of cases, and consequently to decrease the overall training time. A total of 16 RBFs were applied to the data returned by each electrode selection method and defined according to each one of the four preictal times considered. In the case of the SVM, two kernel types were considered: Gaussian and polynomial of degree two. The Gaussian kernel has by definition just one free parameter, the so-called Gamma. The polynomial kernel assumes two free parameters. In this work just the parameter that controls the aperture of the kernel was allowed to change. The other parameter is a bias term and was considered as zero. In each case a grid-search approach was implemented. The free parameter Gamma in both kernel types assumed the values [2−10 , 2−5 , 23 , 25 , 210 ]. For all the SVMs the values of C (cost) chosen were [25 , 28 , 210, , 213 , 215 ]. The minimum and maximum values for C and Gamma were selected based on several analyses on what should be the appropriate ranges. The different values between the minimum and maximum were defined as exponentially growing sequences [37]. So, a total of 50 SVMs were tested, i.e., 2 kernel types × 5 costs × 5 kernel parameters, for each prediction scenario. In the case of the MLP, structures with two and three hidden layers were tested. For the two-layer structure, 200 and 400 neurons were considered in the first and second hidden layer, respectively. The three-layer structure assumed 15 and 200 neurons in the first hidden layer, 50 and 200 in the second and 100 and 200 in the third hidden layer. The activation function considered for the hidden layers was the hyperbolic tangent sigmoid, while for the output layer a linear activation

330

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

Sz 4

113

115

Times (hours)

Sz 6

Sz 5

(7:01:49)

155

117

157

Times (hours)

Sz 1

Sz 7

(0:38:54)

(0:59:19)

202

159

204

Times (hours)

229

206

Sz 9

(2:23:36)

231

253

Times (hours)

Sz 3Sz 4

Sz 2

Sz 8

(3:03:21)

(7:08:18)

255

Times (hours)

Sz 5

Sz 6

Sz 7

257

259

260

Times (hours)

Sz 8 Sz 9

Alarms

100

50

0

Training data

Times (hours)

200

150

250

Testing data

Fig. 3 – Results for a representative patient.

function was considered. A total of 5 MLPs were analyzed for each prediction scenario. The previous assumptions on the classifiers structure were taken based on previous experience. In general, for each scalp and mixed patient, a total of (16 + 50 + 5) × 12 = 852 classifiers were trained, whereas this number was 568 for the invasive EEG recorded patients. This gives a total of (227 + 9) × 852 + 42 × 568 = 2, 24, 928 classifiers in total in this study.

3.4.5.

Alarm generation

Although the classification target presents temporal dynamics (as shown in Fig. 2), the output of the computational intelligence methods considers each input sample individually, without a temporal relation with the previous ones. Hence, the performance metrics are sample-based and every testing sample is analyzed individually, and each sample can be represented in a confusion matrix [38]. Optimally, a well-trained classifier should be able to classify all of the samples in testing data correctly, and thus reproduce the target output. However, in practice it is unlikely for a classifier to classify all of the samples correctly. If the output of the classifiers is directly considered to predict seizures, for every sample misclassified as preictal, a false alarm will be generated. To prevent this, methodologies that accounts for the temporal dynamics of the classification must be used. They may also be considered as smoothers of the output of the classifier. Here we developed and used the technique of “firing power” [39]. In [40] we compared the “firing power” with a Kalman Filter for output regularization. The Kalman Filter was applied in the past by other authors with SVM classifiers [10]. It was concluded that the “firing power” approach is more “conservative” concerning the raising of alarms, because it maintains a longer memory of the classification dynamics, and because of its particular constraints (rules) on the times where alarms are possible to be raised. The high number of false alarms raised by the Kalman Filter approach makes its applicability insignificant for most of the analyzed patients. In the firing power technique, firstly the output of the classifiers is divided simply into two classes, namely preictal and

nonpreictal. Then a sliding window with a size equal to the preictal time is considered. In each window a measure that represents how many samples are classified as preictal is computed. This measure is defined as firing power of the classifier output, and is given as (2),

n fp[n] =

k=n−



o[k]

,

(2)

where fp[n] is the firing power at the discrete time n,  is the number of samples of the considered preictal time, and o[n] is the two-class classifier output. o[k] is equal to one for a sample classified as preictal and zero otherwise, thus fp[n] is a scaled function that assumes values between zero and one. If fp[n] is one, then all of the previous samples in the preceding preictal time have been classified as preictal. A zero fp[n] means that no samples were classified as preictal throughout the past preictal time frame. Alarms are raised based on the firing power and by considering a threshold value. A threshold of 50% was considered, meaning that when the firing power crosses the 0.5 value an alarm is generated, and therefore a new alarm can only be raised after a time at least equal to the defined preictal time, and only if the threshold is crossed in an ascending way.

3.4.6.

Performance evaluation

The performance was evaluated based on the sensitivity (SS) and the false prediction rate (FPR). Sensitivity is a percentage given by (3) where # means “number of”. SS =

#PredictedSeizures × 100. Total#Seizures

(3)

A seizure is considered to be correctly predicted when an alarm is raised and the seizure onset occurs in the subsequent preictal time. FPR, the number of false prediction per hour, is expressed in units of h−1 (or 1/h) and is defined by: FPR =

#FalseAlarms HoursofTesting − (#Seizures × P.Time)

(4)

For the calculation of the FPR, the total period during which alarms are not considered as false predictions should be

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

Fig. 4 – Percentage of patients that present predictors with a given minimum sensitivity (SS) versus a given maximum false prediction rate (FPR) per hour.

removed. For this reason we subtract from the hours of testing all the preictal times (P.Times) in the denominator of (4).

4.

Results and discussion

From all evaluations performed with the three different electrode combinations, four different preictal times and three different types of machine learning classifiers, we selected, for each patient, the best predictor by its proximity to the optimal performance (100% of sensitivity, 0 h−1 of FPR). Supplemental Table SIII lists the results for all the patients. Fig. 3 shows, for a representative patient, the alarms generated by the besttailored predictor over approximately 10 days. This predictor was based on an MLP selected for patient 1327903, and was able to anticipate all the six seizures occurring throughout the independent out-of-sample testing data of 138.58 h of duration. During the testing time the predictor raised 12 false alarms, leading to long periods correctly labeled as interictal. Fig. 4 presents the percentage of patients that present predictors with a given minimum SS versus a maximum FPR. One can observe that for 32% (89 patients) testing performances with SSs ≥ 50% and FPR ≤ 0.15 h−1 were achieved. For 16% (45 patients) SS = 100% and FPR ≤0.15 h−1 were obtained, and if we reduce the FPR level to 0.10 h−1 then 12% (33 patients) fulfil the condition. 23% (60 patients) fulfil the condition of optimal sensitivity (SS = 100%) SS and FPR ≤0.25 h−1 . For the optimal FPR level of 0 h−1 one can observe that the percentage of patients vary from 4% to 6%, related to SSs ranging from 100% to 10%, respectively. In average over all patients, scalp EEG channels present performance values (sensitivity: 73.55 ± 24.83%; FPR: 0.28 ± 0.28 h−1 ) slightly better than the ones observed for the intracranial recordings (sensitivity 67.66 ± 21.83%; FPR 0.39 ± 0.37 h−1 ). However these differences were found to be statistically non-significant according to a Kruskal–Wallis test of significance (K–W Test in Table 2). Concerning the preictal time, the most appropriate value is in average 30.47 min. In

331

a seizure-by-seizure analysis, we defined the anticipation time for each seizure as the difference between the time of the raised alarm and the seizure onset time. We obtained an average anticipation time of 15.58 min, consistent with alarms raised by considering a threshold level of 0.5 in the “firing power” output regularization, and an average preictal time of about 30 min. Based on the best predictor selected for each patient we analyzed the influence on the results of the different patient characteristics, of the different data specificities, and of the different methodological options considered. This analysis is summarized in Table 2. In order to find out if the differences in sensitivity and FPR were significant we apply the Kruskal–Wallis test of significance (K–W Test in Table 2). We selected this test because our results are non-normally distributed with natural limits, i.e., sensitivity tend to have values that approaches 100% and FPR values that approaches 0 h−1 . The Kruskal–Wallis test is a nonparametric version of the oneway analysis of variance (ANOVA) test, and is an adaptation of the Wilcoxon rank sum test to more than two groups [41]. Significant differences are highlighted in bold in Table 2. According to gender, age, and focus lateralization, no significant differences were identified in SS and FPR according to the applied Kruskal–Wallis test. The epileptic focus localization seemed to contribute with significant differences in SS (p-value = 0.02). The different localizations of the epileptic focus may influence the seizure occurrence patterns, and thus influencing its predictability (Fig. 5a and b). For example frontal lobe seizures are more likely to occur during night, whereas temporal lobe seizures, although less likely to occur during the night, when occurring, are more likely to secondarily generalize [42]. All data characteristics, with exception of the recording type, induce significant changes in the predictors performances, at both SS and FPR levels. The increase in sampling frequency resulted in an increase in SS and a decrease on FPR up to 512 Hz (Fig. 5c and d). This is an expected behavior, given that higher sampling rates mean better signal quality, i.e., more information available. However, the upper sampling rates present worst results in SS, especially at 2048 Hz and 2500 Hz. One factor that may be influencing is the reduced number of patients with records at sampling frequencies higher than 2048 Hz (only nine). Very reduced values of FPR were also observed at 2048 Hz and 2500 Hz, but again the reduced number of patients prevents the attainment of solid conclusions about this occurrence. The duration of the testing, as expressed in Fig. 5e and f, also influences significantly both SS and FPR (p-value 144 K–W Test⇒

20 67 70 45 24 23 29

85.21 80.82 73.84 66.41 61.22 65.23 63.49

21.40 22.33 21.92 25.10 19.70 26.55 27.77 Sig. (p < 0.01)

0.31 0.22 0.30 0.46 0.27 0.35 0.25

0.33 0.23 0.29 0.44 0.21 0.32 0.27 Sig. (p = 0.01)

#T. Sz.

1 ]1,3] ]3,5] ]5,7] ]7,9] >9 K–W Test⇒

47 79 57 34 20 41

95.74 81.44 68.95 64.64 55.42 46.77

20.40 21.84 18.24 18.19 13.62 14.00 Sig. (p < 0.01)

0.11 0.26 0.36 0.35 0.33 0.48

0.17 0.24 0.34 0.25 0.32 0.42 Sig. (p < 0.01)

Class. Type

SVM MLP RBF K–W Test⇒

88 92 98

73.73 74.17 69.14

23.44 25.70 23.90

0.19 0.29 0.42

0.14 0.30 0.39 Sig. (p < 0.01)

24.08 25.50 25.86

0.28 0.31 0.33

0.30 0.32 0.31

Gen.

Age

Patient

Rec. Type

S. F. (Hz)

Data

Methodology

Elec. Sel.

RAND ORIG 1020

Non-Sig. 72.57 71.72 72.93 Non-Sig.

Non-Sig. 67.66 73.55 Non-Sig.

Non-Sig. 99 92 87

71.36 72.43 73.08

Std. FPR (h−1 )

334

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 4 ( 2 0 1 4 ) 324–336

Table 2 (Continued) #Pat.

Avg. SS (%)

K–W Test⇒ P. I. Time (min)

10 20 30 40 K–W Test⇒

Avg. FPR (h−1 )

Non-Sig. 27 60 64 127

67.83 76.37 74.65 70.04

(p-value

Epileptic seizure predictors based on computational intelligence techniques: a comparative study with 278 patients.

The ability of computational intelligence methods to predict epileptic seizures is evaluated in long-term EEG recordings of 278 patients suffering fro...
2MB Sizes 0 Downloads 3 Views