IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

297

Discriminative and Generative Classification Techniques Applied to Automated Neonatal Seizure Detection Eoin M. Thomas, Andriy Temko, Senior Member, IEEE, William P. Marnane, Member, IEEE, Geraldine B. Boylan, and Gordon Lightbody

Abstract—A number of automated neonatal seizure detectors have been proposed in recent years. However, there exists a large variability in the morphology of seizure and background patterns, both across patients and over time. This has resulted in relatively poor performance from systems which have been tested over large datasets. Here, the benefits of employing a pattern recognition approach are discussed. Such a system may use numerous features paired with nonlinear classifiers. In particular, two types of nonlinear classifiers are contrasted for the task. Additionally, it is shown that the proposed architecture allows for efficient classifier combination which improves the performance of the algorithm. The resulting automated detector is shown to achieve field leading performance. A particular strength of the proposed algorithm is the performance of the algorithm when very low false detections are required, at 0.25 false detections per hour, the system is able to detect 75.4% of the seizure events. Index Terms—Classifier fusion, machine learning, neonatal seizure detection.

I. INTRODUCTION EONATAL seizures are the most common and important sign of acute neonatal encephalopathy and represent a major risk of death or subsequent neurological disability, and by themselves may contribute to an adverse neurodevelopmental outcome [1]. Clinical signs may be absent in as many as 85% of neonatal seizures. Furthermore, treatment of these events with antiepileptic drugs may further suppress clinical signs of seizure while electrographic seizures persist [2], [3]. Thus, electroencephalogram (EEG) monitoring is required both to accurately diagnose neonatal seizure events and to evaluate the efficacy of treatment. Few staff members of the neonatal intensive care unit (NICU) receive sufficient training to interpret EEG traces. Automated

N

Manuscript received September 12, 2011; revised October 8, 2012; December 5, 2012; accepted December 21, 2012. Date of publication January 4, 2013; date of current version March 8, 2013. This work was supported in part by Science Foundation Ireland (SFI/10/IN.1/B3036). E. M. Thomas is with the INRIA Sophia Antipolis, Valbonne 06560, France (e-mail: [email protected]). A. Temko, W. P. Marnane, and G. Lightbody are with the Department of Electrical Engineering, University College Cork,Cork, Ireland (e-mail: andreyt@ rennes.ucc.ie; [email protected]; [email protected]). G. B. Boylan is with the Department of Paediatrics and Child Health, Cork University Hospital, Cork, Ireland (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JBHI.2012.2237035

neonatal seizure detectors have thus been proposed to play an assistive role in the NICU by identifying the location of seizure events in time, either in real time or during a review. Neonatal seizures manifest as repetitive activity in one or more EEG channels with a minimum duration of 10 s [4]; examples of neonatal EEG are given in Fig. 1. Detection of neonatal seizure events is complicated by a number of factors, which arise due to the nature of the underlying EEG signal obtained from sick babies. The morphology of background and seizure patterns varies significantly between different patients, as the EEG is greatly affected by the conceptual age of the patient and the severity of the insult leading to seizures [5]. This results in a large variability in EEG recordings both in time and between patients. Furthermore, as the EEG is recorded using surface electrodes, the signals are susceptible to physiological and environmental artifact. Artifacts such as electrode disconnect, electrocardiogram, movement, and respiration artifacts are frequently observed with durations which range from seconds to hours. In recent years, the reported performance of a number of algorithms has increased to a point where online implementation is now feasible [6]–[9]. However, the majority of these approaches require numerous rules and thresholds to account for the variability present in the EEG. Automated neonatal seizure detectors are designed with no prior data from the patient being monitored, and are as such patient independent. Nonetheless, prior knowledge obtained over a population of previous patients can be utilized [10], [11]. Overall, a diverse range of patterns occurring from both seizure and nonseizure activity must be accounted for, an in-depth review of the field of neonatal seizure detection can be found in [12]. Here, a pattern recognition framework is employed in which a nonlinear classifier is used for decision making. Support vector machines (SVMs) [13] and Gaussian mixture models (GMMs) [14] are compared here as examples of discriminative and generative approaches to classification. Algorithms based on SVMs have been validated in the field of epileptic seizure detection in adults [15], [16]. Classifiers based on GMMs have been employed in biomedical applications such as brain computer interfaces [17] and person authentication [18] based on EEG. A GMM-based system was also validated for the task of detecting seizures in intracranial recordings from adult patients [19]. In this paper, a classifier-based neonatal seizure detector is presented. In particular, this study is focused on the classification

2168-2194/$31.00 © 2013 IEEE

298

Fig. 1.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

Examples of neonatal EEG. Note that the seizure patterns are characterized by repetitive waveforms. (a) Background EEG. (b) Generalized seizure.

Fig. 2. Neonatal seizure detector system diagram. Every channel of the EEG is processed separately. The EEG is first segmented into epochs. For every epoch of each channel, a set of 55 features is extracted and fed into the classifier. The output of the classifier is then converted into an estimate of posterior probability of seizure. The probabilistic output is temporally smoothed and the maximum support taken across all the channels. A threshold is used to obtain hard decisions for the epochs. Finally, a collar operation is applied to compensate for the delay caused by the moving average filter and to better capture the start and end of the seizure events.

stage of the algorithm. Section II expands upon the design of the algorithm and the dataset employed. Section III provides a comparison of the performance of the SVM and GMM classifiers. Section IV discusses the merits of classifier fusion. Finally, Section V presents a comparison between the proposed system and leading results in the field.

TABLE I FEATURES EXTRACTED FROM EACH EPOCH IN TIME OF THE INDIVIDUAL CHANNELS OF THE EEG

II. EXPERIMENTAL SETUP A. Overall System The system is designed to process and classify each EEG channel independently, as shown in Fig. 2. The referential EEG channels are first converted to a standard eight channel montage and then downsampled from 256 to 32 Hz. The signal is segmented into epochs of 8 s with 50% overlap. A total of 55 features (see Table I) are extracted from the EEG epochs for each channel. The feature set is described under three headings; the “time domain” and “frequency domain” features have been validated in previous studies; for more details and references see [20]. The “information theory” features were chosen based on the results of Faul et al. in a feature comparison article [21]. Following feature extraction, the features are normalized to have zero mean and unity standard deviation over the training set. The normalizing template is retained and applied to features during testing. Two classifiers were tested for this study: an SVM with a radial basis function (RBF) kernel, and a GMM classifier with linear discriminant analysis (LDA) feature pre-

processing. Both classifiers are designed to output an estimate of the probability of seizure. The final stage of the detector consists of postprocessing and combining the multichannel decisions. Here, the probability of seizure obtained for each channel is first smoothed over time using a central moving average filter. The number of filter taps is 15, this is equivalent to taking the mean probability of seizure over 1 min. The moving average filter results in a time lag of 30 s (half the filter length) between obtaining a new epoch and a decision being available for this epoch.

THOMAS et al.: DISCRIMINATIVE AND GENERATIVE CLASSIFICATION TECHNIQUES

The largest smoothed probability of seizure is then selected to represent the multichannel output of the classification stage. This value is then thresholded to produce a binary decision with 1 indicating seizure and 0 indicating nonseizure. Finally, a collar operation is used to extend the duration of each detection. Here, the beginning of each detection is extended 40 s back in time and the end of the detection is extended 40 s forward in time. The widths of the moving average filter (1min) and collar operation (40 s) were investigated in [13] and [14].

299

3) Comparison of SVM and GMM Classification Algorithms: The decision function for the SVM in (1) can be reformulated as a function of a set of Ms seizure support vectors ϕi , and a set of Mns nonseizure support vectors ϕj : ySVM (x) = sgn(λSVM (x))  αi K(x, ϕi ) λSVM (x) = bSVM + −

1) SVM: SVMs are popular and well-understood classifiers and as such are only introduced briefly here; for more detailed information the reader is referred to [22]. In the SVM classifier, the class membership, y ∈ {1, −1}, for an input feature vector x ∈ d is found according to ⎛ ⎞ M  y(x) = sgn ⎝ (1) αj yj K(x, ϕj ) + bSVM ⎠ j =1 M 

αj yj = 0 and αj > 0.

yGM M (x) = sgn(λGM M (x)) ⎛ λGM M (x) = bGM M + ln ⎝

The variable αj can be viewed as a weighting factor for each support vector ϕj with associated label yj and bSVM is the bias. The value of M corresponds to the number of support vectors, that is the subset of the training data for which αj is greater than zero. The kernel function used here is the Gaussian RBF T

σ −1 (x−ϕj )

(3)

which can be viewed as a membership function based on the Euclidean distance between the support vector ϕj and the feature vector x with a scalar scaling factor σ. 2) GMM: The GMM classifier is a generative classifier, that is, the classifier models the underlying probability density function (PDF) of each class separately. A GMM represents the PDF of a random variable x ∈ d as a weighted sum of N Gaussian distributions p(x|Θ) =

N 

αm p(x|θ m )

(4)

m =1

where

N 

αm = 1, and αm > 0.

(5)

m =1

Here, Θ is the mixture model, αm corresponds to the weight of mixture m, and the density of each component is given by the normal probability distribution p(x|θ m ) =

i



(2)



− ln ⎝

1

|Σm |−1/2 − 1 (x−μm ) T Σ −1 m (x−μm ) . e 2 (2π)d/2

αj K(x, ϕj )

(8)

and Qsvm where Qsvm s n s are the sets of indices of the seizure and nonseizure support vectors, respectively. Similarly, the decision function for the GMM classifier with equal priors can be given as a function of the likelihood (4) for each class

j =1

K(x, ϕj ) = e− 2 (x−ϕj )

i ∈ Q ss v m

j ∈ Q snvs m

B. Discriminative and Generative Classifier Models

where



(7)

j m Qgm s

∈ Q gs m m

(9)

αi p(x|θ i )⎠ ⎞

αj p(x|θ j )⎠

(10)

∈ Q gn ms m

m Qgm ns

and are the sets of indices of the Gaussian where distributions associated with the seizure and nonseizure models, respectively. The summations represent the likelihoods of each class; note that the difference in log-likelihoods is employed here so as to satisfy Bayes’ theorem [24]. Moreover, the bias term bGM M is included for completeness, and represents the influence of the priors. As can be seen from (8) and (10), the final classification formulas for both a two-class discriminative SVM (with RBF kernel) and a two-class generative GMM classifier are similar. To determine the class membership of each input vector x, both classification formulas calculate a weighted (by α) Gaussian distance (a similarity measure) from this vector to each class, represented by either the Gaussian centroids for GMM or the support vectors for SVM. Note that the decision functions in (7) and (9) will provide binary decisions. Alternatively, the output of the GMM can be converted to a probability of seizure, defined as class ω1 , according to Bayes’ Theorem, assuming equal priors P (ω1 |λGM M (xj )) =

1 1 + exp(−λGM M (xj )).

(11)

Likewise, the output of the SVM classifier can be converted to an estimate of the probability of seizure via Platt scaling [25]. Platt scaling consists of a sigmoid function

(6)

The parameters μm and Σm are the mean and covariance matrix of each Gaussian distribution in the mixture. The parameters α, μ, and Σ are optimized iteratively for each class via the expectation maximization (EM) algorithm [23] in order to maximize the log-likelihood of the model on the training data.





P (ω1 |λSVM (ϕj )) =

1 . 1 + exp(AλSVM (ϕj ) + B)

(12)

The parameters A and B are found from the training set using maximum likelihood estimation. Interestingly, Platt scaling has been found to reduce the testing error of SVMs [26]. It is worthwhile noting that balanced classes were used to obtain the

300

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

TABLE II PATIENT INFORMATION FOR THE EEG DATASET

weights of the Platt sigmoid. This reflects the choice of equal priors in the computation of the GMM decision. Despite the similarity of the classification formulas for the SVM with RBF kernel and GMM classifiers, a fundamental difference lies in the training method and the optimization criterion. Roughly speaking, both approaches perform an extraction of a compact representation of the training data for each class. However, for the GMM, the representation is based on centroids obtained by averaging the training data, while for the SVM each class is represented by a subset of the training data which lie close to the discriminative boundary. Additionally, the support vectors are selected from the training data in the context of both classes, while the GMM centroids are class-indifferent, and thus are not optimized to increase the separability of the problem [27]. For the GMM, the weights α are computed based on the amount of training data modeled by each Gaussian distribution. However, for the SVM, the weights reflect the importance of a support vector for the classification task. C. Dataset The dataset used in this study was recorded in the NICU at Cork University Maternity Hospital over three years between 2003 and 2006. In this dataset, 17 babies with electrographic neonatal seizures that developed over 72 h from birth were recorded with EEG. More details on the recordings can be found in Table II. This recruitment reflects the real-life situation in the NICU and ensures that the performance of the seizure detection algorithm is tested on continuous, nonpreselected files [28]. The patients were full term neonates ranging in gestational age from 39 to 42 weeks. The mean birth weight of the patients was 3.6 kg (range 1.8–4.9 kg). A NicOne video EEG machine was used to record multichannel EEG at 256 Hz using the 10– 20 system of electrode placement modified for neonates. In this study, eight bipolar EEG channels are used (F4-C4, F3-C3, T4C4, C4-CZ, CZ-C3, C3-T3, C4-O2, and C3-O1). The dataset contained over 267 h of EEG from which a total of 705 seizure events with a mean duration of 3.89 min were annotated by a neonatal neurophysiologist (G. B. Boylan).

D. Test Setup and Metrics In clinical practice, samples of testing patient data are never available beforehand in the NICU. It is, therefore, necessary to develop a patient-independent neonatal seizure detector. For this reason, the leave-one-out (LOO) cross-validation method was used to assess the performance of the system for patientindependent seizure detection. In this manner, all but one patients data from the dataset (see Table II) were used for training and development and the remaining seizure patients data wwere used for testing. This procedure was repeated until each seizure patient had been a test subject and the mean result was reported. Several alternatives to the LOO performance assessment for neonatal seizure detection have been discussed in [29]. The LOO method is known to be an almost unbiased estimation of the true generalization error [30]. What is examined with the LOO procedure is not a particular model, but indeed the methodology used to obtain such a model. This last point means that the LOO estimate effectively gives a robust prediction of the performance that other researchers or practitioners will obtain using this method, but trained on their data. The systems proposed here are epoch-based systems; thus, epoch-based metrics were used to assess the performance of the detector. In particular, the system was assessed using three primary metrics: sensitivity, the percentage of seizure epochs correctly classified; specificity, the percentage of nonseizure epochs correctly classified; and precision, the percentage of seizure decisions corresponding to seizures. Performance curves may be plotted by varying the decision threshold of the classifier. These include the receiver operating characteristic (ROC) curve (specificity versus sensitivity) and the precision-recall (PR) curve (sensitivity versus precision). The area under the ROC curve is a metric representing the separability of the data. Furthermore, the area under the PR curve may provide additional information in unbalanced sets, particularly, when the target class in is a small fraction of the testing data, as in this study. Certain algorithms in the literature are not epoch based, and as such report event-based metrics. For comparison purposes, the percentage of seizure events detected is reported as the good detection rate (GDR) and the number of false detection events per hour (FD/h) is also reported. Note that both these metrics are based on the “any overlap” grading scheme [14].

E. Model Selection and Training For each fold in the LOO cross validation, the data from the 16 subjects were used to train the classification model and tune other system parameters. This ensures that the model selection routine is completely independent from the performance assessment routine and that the testing subject was not seen or used at any time for any system tuning. The parameters of the SVM were selected using a fivefold cross validated over the training data. The classifier was then trained with the optimal parameters on the full training data of 16 patients. The most frequently chosen SVM parameters were 20 for C and 0.05 for σ −1 .

THOMAS et al.: DISCRIMINATIVE AND GENERATIVE CLASSIFICATION TECHNIQUES

301

TABLE III CLASSIFIER AGREEMENT, THE PERFORMANCE IS SET TO THE EQUAL ERROR RATE FOR THE PURPOSE OF COMPARISON

Fig. 3.

Boxplots for the area under the ROC and PR curves over all patients.

For the GMM, an LDA algorithm was used to reduce the dimensionality of the feature set, in order to improve the classification performance in accordance with [14]. The optimal system parameters, such as the number of LDA components retained and the number of Gaussians used in each GMM, were selected to yield the highest ROC area over the training data. Such dimensionality reduction was not required for the SVMbased algorithm as kernel machines are not as sensitive to the curse of dimensionality [22]. The most frequently chosen system parameters for the GMM were 16 Gaussians and 32 LDA components preserved.

III. COMPARING THE DISCRIMINATIVE AND GENERATIVE CLASSIFIERS A. Overall Performance The mean ROC areas obtained were 0.963 and 0.956 for the SVM and GMM systems, respectively. Fig. 3 presents a comparison of the performance using boxplots. The central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually as crosses. The boxplots for the ROC area show similar median values (0.967 and 0.97 for SVM and GMM, respectively); however, the boxplot for GMM exhibits a larger interquartile range and the results are skewed toward lower values. This indicates that the GMM classifier obtains a lower ROC area than the SVM for certain patients. Furthermore, an outlier can be seen at 88% ROC area, which is more than 2% lower than the lowest result obtained by the SVM classifier.

The PR boxplots show a large interquartile range for both classifiers. This is indicative of the high variance in performance among the set of patients. The two outliers in each boxplot correspond to patients with a low number of seizures in prolonged recordings. High PR areas are difficult to achieve in these patients due to the sparsity of the target class. Despite similar interquartile ranges, the median value and lowest value are lower for the GMM than for the SVM classifier. In order to gain further insight into the classifier outputs, the agreement of the classifiers over all epochs in testing was investigated at the equal error rate (sensitivity equal to specificity). For each epoch, the output of the classifier was labeled as either a true negative (TN) for correctly classifying background, a true positive (TP) for correctly classifying seizure, a false negative (FN) for missing a seizure epoch, and finally a false positive (FP) for incorrectly declaring a seizure. Table III presents the agreement of both classifiers (in %) for each label (TN, TP, FN, FP), as well as the percentage of unique decisions. For example, if all TNs produced by both classifiers form 100%, then in 93.2% of TN epochs the SVM and GMM decisions coincide, whereas there are an additional 3.9% of TNs produced uniquely by the SVM and 2.9% produced uniquely by the GMM. Thus, the agreement is high between classifiers for correctly detected background patterns and correctly detected seizure epochs with 93.2% agreement for TNs and 94.5% for TPs. A different behavior is observed for the misclassified epochs. For missed seizures (FN) the agreement is only 59.1%, with the SVM and GMM contributing 15.6% and 25.3% of the total number of FNs, respectively. At a closer look, on average 330 epochs of seizure activity were misclassified for each patient. The agreement of 59.1% indicates that many of these seizure epochs were not detected by either classifier, and are thus assumed to be too subtle in nature for the classifiers to recognize them. As has been shown in previous studies, the main characteristic of FN is the short duration of the seizure events [13], [14]. However, it should be noted that 40.9% of the FN are only missed by one classifier. Thus, certain seizures have characteristics which are only recognized by one of the classifiers. Even less agreement is observed for false alarms (FP) where the agreement is as low as 53.6%; thus, almost half of all FP are produced by only one classifier. B. Visualizing Disparity Among Classifiers In the previous section, it was shown that the highest disagreement between classifiers occurs for FPs. This implies that the difference between the decision boundaries of the classifiers

302

Fig. 4.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

Scatter plots for three patients of the feature vectors resulting in FPs.

leads to the misclassification of distinct background activity. The boundaries of the classifiers and the feature vectors themselves can not be visualized due to the high dimensionality of the data. Therefore, the feature space was projected to a 2-D space using principal component analysis (PCA) in order to visualize the disparity among classifiers [24]. The three patients in which the disparity between SVM and GMM distributions is greatest are shown in Fig. 4. The resulting scatter plots show the unique misclassified FPs produced by each classifier. It can be seen that for these three patients, the misclassified epochs from each system occur from separate locations in the feature space. Thus, despite similar overall performance, this example shows that the classifiers have distinct boundaries in the feature space resulting in different patterns being misclassified. This shows the extent of the difference between the decision boundaries of the two classifiers.

TABLE IV MEAN AREA UNDER THE ROC AND PR CURVES

IV. CLASSIFIER COMBINATION The diversity in decisions produced by each technique, as discussed in the previous sections, is a good premise for successful application of fusion techniques [31]. As the main target of the current study is to discuss the complementary nature of the proposed techniques rather than exploit it (which will be the aim of our separate future study), we limit ourselves to the usage of only the simplest untrainable fusion methods. These techniques, namely mean, product, max, and min are discussed in detail in [32]. The input to the fusion operation is a vector of two values corresponding to filtered posterior probabilities of seizure over all channels given by the two classifiers. The moving average filter and channel combination for each classifier are applied prior to fusion. With reference to Fig. 2, the fusion operator occurs between the max operator and the application of the threshold over the filtered probabilities of seizure. The ROC areas and PR areas obtained over all subjects are given in Table IV for all investigated classifiers. Of particular interest, it can be seen that the results for max fusion result in a decrease in both ROC and PR area values compared to the SVM. This is explained by the behavior of the max combination rule, which can be viewed as declaring seizure when either or both classifiers have declared a seizure. However, from Table III it is known that the agreement for false detections is low. Thus, the

Fig. 5. Boxplots of ROC area over the subjects for each classification scheme investigated.

use of max combination yields an increase in the number of false detections, resulting in a reduction in the ROC area compared to the SVM. Combination schemes where both classifiers are required to agree on a decision—such as mean, min, and product— result in an increase of ROC and PR area due to the reduction in false detections. In particular, the min and product fusion schemes were found to give a significant improvement in an ROC area based on paired t-tests (p < 0.05 in both cases). This was attributed to an increase in ROC area for the majority of subjects when compared to the SVM classifier. This trend can be seen in the boxplots shown in Fig. 5. No significant improvement in the PR area was found for the fusion schemes. Thus, despite the lower performance achieved by the GMM compared to SVM, the diversity of the decisions is sufficiently large for these simple fusion schemes to surpass the performance of the individual classifiers.

THOMAS et al.: DISCRIMINATIVE AND GENERATIVE CLASSIFICATION TECHNIQUES

303

TABLE V STATISTICS OF THE TEST DATABASES FROM VARIOUS STUDIES

Fig. 6.

Comparison of performance for epoch-based metrics.

V. PERFORMANCE ASSESSMENT The various ways for performance assessment used in the literature for reporting the performance of neonatal seizure detectors have been outlined in [29]. Additionally, the statistics of the databases employed in key published research from the field have been tabulated in Table V, in order to facilitate the comparison of results. The number of patients with seizures is relatively low for the majority of studies, and thus a set of recordings without seizures are also used by some authors. The number of seizure-free patients is shown in parentheses within the table. Note that the duration of the test set is given solely for the seizure patients where available. Different metrics were employed across studies; thus, certain results can only be discussed in terms of epoch-based metrics and others in terms of event-based metrics. In Fig. 6, studies reporting specificity and sensitivity are contrasted to the ROC curve of the fusion classifier [6], [33], [34]. Note that the fusion classifier outperforms these techniques. Greene et al. employed a linear classification strategy, yet reported low values of sensitivity in order to achieve high specificity [34]. The addition of a postprocessing scheme, such as the moving average and collar used here, has been found to improve the results of linear classifier- based systems [20]. Event-based metrics are compared to the GDR versus FD/h curve in Fig. 7. The fusion classifier proposed here improves upon the performance of previously proposed systems [6], [8],

Fig. 7.

Comparison of performance for event-based metrics.

[35]. It has become clear in recent years that a neonatal seizure detector will only become accepted within the NICU if the false alarm rate is very low (1FD/h). In this respect, it can be seen that a key strength of the proposed approach is the high GDR obtained with low false detections. For example, the system detected 75.4% of seizures with only 0.25 FD/h. Such performance is useful in the NICU considering that only ∼20% of neonatal seizures are detected without EEG monitoring. It is worth reemphasizing that unlike other studies which report performance increases obtained on datasets of several carefully selected minutes of EEG, the results in our study are obtained on a large, unedited dataset. The dataset used here contained 267 h of continuous unedited neonatal EEG, and thus these results are stable and significant. VI. CONCLUSION There are a number of advantages to the proposed seizure detection system. First, as is demonstrated previously, the system achieves the best performance reported to date under a strict patient independent testing methodology. Second, the majority of systems used in this comparison use multiple thresholds, and the tradeoff between good and false detections is, thus, fixed. The fusion system can be readily adjusted to suit differing operating conditions and patient types. Extending the previous point, the output of the detector can be in the form of a probability of seizure in time. Furthermore, as each channel is analyzed independently, a probability of seizure can readily be generated

304

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 17, NO. 2, MARCH 2013

for each channel. Thus, the proposed system provides the clinician and staff in the NICU with a large amount of detailed information if required. Simple classifier fusion was shown here to result in a small increase in an ROC area. This result warrants further investigation into more sophisticated methods of classifier fusion. REFERENCES [1] R. Clancy, “Summary proceedings from the neurology group on neonatal seizures,” Pediatrics, vol. 117, pp. 23–27, 2006. [2] A. M. E. Bye and D. Flanagan, “Spatial and temporal characteristics of neonatal seizures,” Epilepsia, vol. 36, no. 10, pp. 1009–1016, 1995. [3] D. M. Murray, G. B. Boylan, I. Ali, C. A. Ryan, B. P. Murphy, and S. Connolly, “Defining the gap between electrographic seizure burden, clinical expression and staff recognition of neonatal seizures,” Arch. Dis. Child Fetal Neonatal Ed., vol. 93, no. 3, pp. 187–191, 2008. [4] R. Clancy, “Prolonged electroencephalogram monitoring for seizures and their treatment,” Clin. Perinatol., vol. 33, pp. 649–665, 2006. [5] E. M. Mizrahi and R. R. Clancy, “Neonatal seizures, early-onset seizure syndromes and their consequences for development,” Mental Retard. Dev. Disabilit. Res. Rev., vol. 6, pp. 229–241, 2000. [6] M. A. Navakatikyan, P. B. Colditz, C. J. Bruke, T. E. Inder, J. Richmond, and C. E. Williams, “Seizure detection algorithm for neonates based on wave-sequence analysis,” Clin. Neurophysiol., vol. 117, no. 6, pp. 1190– 1203, 2006. [7] W. Deburchgraeve, P. Cherian, M. D. Vos, R. Swarte, J. Blok, G. Visser, P. Govaert, and S. V. Huffel, “Automated neonatal seizure detection mimicking a human observer reading EEG,” Clin. Neurophysiol., vol. 119, no. 11, pp. 2447–2454, 2008. [8] J. Mitra, J. R. Glover, P. Y. Ktonas, A. T. Kumar, A. Mukherjee, N. B. Karayiannis, J. D. Frost, R. A. Hrachovy, and E. M. Mizrahi, “A multistage system for the automated detection of epileptic seizures in neonatal electroencephalography,” J. Clin. Neurophysiol., vol. 26, no. 4, pp. 218–226, 2009. [9] P. Cherian, W. Deburchgraeve, R. Swarte, M. De Vos, P. Govaert, S. Van Huffel, and G. Visser, “Validation of a new automated neonatal seizure detection system: A clinicians perspective,” Clin. Neurophysiol., vol. 122, no. 8, pp. 1490–1499, 2011. [10] A. Temko, N. Stevenson, W. Marnane, G. Boylan, and G. Lightbody, “Inclusion of temporal priors for automated neonatal EEG classification,” J. Neur. Eng., vol. 9, no. 4, p. 046002, 2012. [11] A. Temko, G. Lightbody, E. Thomas, G. Boylan, and W. Marnane, “Instantaneous measure of EEG channel importance for improved patientadaptive neonatal seizure detection,” IEEE Trans. Biomed. Eng., vol. 59, no. 3, pp. 717–727, Mar. 2012. [12] E. Thomas, A. Temko, G. Lightbody, W. Marnane, and G. Boylan, “Advances in automated neonatal seizure detection,” New Adv. Intell. Signal Process., vol. 372, pp. 93–113, 2011. [13] A. Temko, E. Thomas, W. Marnane, G. Lightbody, and G. B. Boylan, “EEG-based neonatal seizure detection with support vector machines,” Clin. Neurophysiol., vol. 122, no. 3, pp. 464–473, 2011. [14] E. M. Thomas, A. Temko, G. Lightbody, W. P. Marnane, and G. B. Boylan, “Gaussian mixture models for classification of neonatal seizures using EEG,” Physiol. Meas., vol. 31, pp. 1047–1064, 2010. [15] A. Shoeb, H. Edwards, J. Connolly, B. Bourgeois, S. T. Treves, and J. Guttag, “Patient-specific seizure onset detection,” Epilepsy Behav., vol. 5, no. 4, pp. 483–498, 2004.

[16] A. B. Gardner, A. M. Krieger, G. Vachtsevanos, and B. Litt, “One-class novelty detection for seizure analysis from intracranial EEG,” J. Mach. Learn. Res., vol. 7, pp. 1025–1044, 2006. [17] X. Zhu, J. Wu, Y. Cheng, and Y. Wang, “GMM-based classification method for continuous prediction in brain-computer interface,” in Proc. 18th Int. Conf. Pattern Recog., 2006, pp. 1171–1174. [18] S. Marcel and J. Millan, “Person authentication using brainwaves (EEG) and maximum a posteriori model adaptation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 4, pp. 743–752, Apr. 2007. [19] L. Meng, M. Frei, I. Osorio, G. Strang, and T. Nguyen, “Gaussian mixture models of ECoG signal features for improved detection of epileptic seizures,” Med. Eng. Phys., vol. 26, no. 5, pp. 379–393, 2004. [20] E. M. Thomas, “A machine learning framework for neonatal seizure detection,” Ph.D. dissertation, University College Cork, Cork, Ireland, 2011. [21] S. Faul, G. B. Boylan, S. Connolly, W. P. Marnane, and G. Lightbody, “Chaos theory analysis of the newborn —Is it worth the wait?,” in Proc. IEEE Int. Symp. Intell. Signal Process., Sep. 2005, pp. 381–386. [22] B. Sch¨olkopf and A. Smola, Learning with Kernels. Cambridge, MA: MIT Press, 2002. [23] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Royal Stat. Soc., vol. 39, no. 1, pp. 1–38, 1977. [24] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. New York: Wiley-Interscience, 2001. [25] J. Platt, “Probabilistic outputs for support vector machines comparison to regularized likelihood methods,” in Advance Large Margin Classifiers. Cambridge, MA: MIT press, pp. 61–74, 1999. [26] A. Niculescu-Mizil and R. Caruana, “Predicting good probabilities with supervised learning,” in Proc. Int. Conf. Mach. Learn., 2005, pp. 625–632. [27] B. Sch¨olkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik, “Comparing support vector machines with Gaussian kernels to radial basis function classifiers,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2758–2765, Nov. 1997. [28] J. Gotman, D. Flanagan, B. Rosenblatt, A. Bye, and E. Mizrahi, “Evaluation of an automatic seizure detection method for the newborn EEG,” Electroenceph. clin. Neurophysiol., vol. 103, no. 3, pp. 363–369, 1997. [29] A. Temko, E. Thomas, W. Marnane, G. Lightbody, and G. B. Boylan, “Performance assessment for EEG-based neonatal seizure detectors,” Clin. Neurophysiol., vol. 122, no. 3, pp. 474–482. [30] V. Vapnik, Estimation of Dependences Based on Empirical Data. New York: Springer-Verlag, 1982. [31] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. New York: Wiley-Interscience, 2004. [32] A. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal biometric systems,” Pattern Recognit., vol. 38, pp. 2270–2285, 2005. [33] A. Aarabi, R. Grebe, and F. Wallois, “A multistage knowledge-based system for EEG seizure detection in newborn infants,” Clin. Neurophysiol., vol. 118, no. 12, pp. 2781–2797, 2007. [34] B. R. Greene, W. P. Marnane, G. Lightbody, R. B. Reilly, and G. B. Boylan, “Classifier models and architectures for EEG-based neonatal seizure detection,” Physiol. Meas., vol. 29, pp. 1157–1178, 2008. [35] J. Gotman, D. Flanagan, J. Zhang, and B. Rosenblatt, “Automatic seizure detection in the newborn, methods and initial evaluation,” Electroenceph. Clin. Neurophysiol., vol. 103, no. 3, pp. 356–362, 1997.

Authors’ photographs and biographies not available at the time of publication.

Discriminative and generative classification techniques applied to automated neonatal seizure detection.

A number of automated neonatal seizure detectors have been proposed in recent years. However, there exists a large variability in the morphology of se...
602KB Sizes 0 Downloads 0 Views