A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders.

Computers in Biology and Medicine 64 (2015) 127–137

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

A wrapper-based approach for feature selection and classification of major depressive disorder–bipolar disorders Turker Tekin Erguzel a,n, Cumhur Tas b,c, Merve Cebi c a

Uskudar University, Faculty of Engineering and Natural Sciences, Department of Computer Engineering, Istanbul, Turkey NPIstanbul Hospital, Department of Psychiatry, Istanbul, Turkey c Uskudar University, Faculty of Humanities and Social Sciences, Department of Psychology, Istanbul, Turkey b

art ic l e i nf o

a b s t r a c t

Article history: Received 23 February 2015 Accepted 20 June 2015

Feature selection (FS) and classification are consecutive artificial intelligence (AI) methods used in data analysis, pattern classification, data mining and medical informatics. Beside promising studies in the application of AI methods to health informatics, working with more informative features is crucial in order to contribute to early diagnosis. Being one of the prevalent psychiatric disorders, depressive episodes of bipolar disorder (BD) is often misdiagnosed as major depressive disorder (MDD), leading to suboptimal therapy and poor outcomes. Therefore discriminating MDD and BD at earlier stages of illness could help to facilitate efficient and specific treatment. In this study, a nature inspired and novel FS algorithm based on standard Ant Colony Optimization (ACO), called improved ACO (IACO), was used to reduce the number of features by removing irrelevant and redundant data. The selected features were then fed into support vector machine (SVM), a powerful mathematical tool for data classification, regression, function estimation and modeling processes, in order to classify MDD and BD subjects. Proposed method used coherence, a promising quantitative electroencephalography (EEG) biomarker, values calculated from alpha, theta and delta frequency bands. The noteworthy performance of novel IACO–SVM approach stated that it is possible to discriminate 46 BD and 55 MDD subjects using 22 of 48 features with 80.19% overall classification accuracy. The performance of IACO algorithm was also compared to the performance of standard ACO, genetic algorithm (GA) and particle swarm optimization (PSO) algorithms in terms of their classification accuracy and number of selected features. In order to provide an almost unbiased estimate of classification error, the validation process was performed using nested cross-validation (CV) procedure. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Artificial intelligence Support vector machine Improved Ant Colony Optimization Major depressive disorder Bipolar disorder Coherence

1. Introduction Advances in computer science and data acquisition systems simplified collecting and storing large data sets with long time series. These data sets have found increasingly frequent and various application fields, such as astronomy [1], biology [2], finance [3], marketing [4] medicine [5], data mining and knowledge discovery purposes [6]. Intrinsically, the evaluation process of large data is a valuable process and recent studies underline the use of FS methods with their promising outcomes [7–10]. Large scale of datasets with high feature dimensionality may state high classification accuracy and over-fitting performance without using proper validation methods. In order to both overcome that biased error estimate and remove the irrelevant and n Correspondence to: Uskudar University, Faculty of Engineering and Natural Sciences, Department of Computer Engineering Altunizade Mah. Haluk Turksoy Sk. No:14 PK: 34662 Uskudar / Istanbul Tel.: þ90 5054970211. E-mail address: [email protected] (T. Tekin Erguzel).

http://dx.doi.org/10.1016/j.compbiomed.2015.06.021 0010-4825/& 2015 Elsevier Ltd. All rights reserved.

noisy features that mislead or impede early diagnose and effective treatment process, nested-CV is used. With nested-CV, an inner-CV loop is used for model selection while an outer-CV is used to compute an estimate of the error with a completely new dataset. Through the use of nested FS, such ubiquitous problems can be automatically detected and removed, resulting in more reliable subset or pattern discovery in many fields [11–15]. Within this context, a wrapper-based system is generally used combining a classifier and a meta-heuristic algorithm to identify the best subset of features without sacrificing prediction accuracy. A common way to describe meta-heuristic algorithms is that they combine randomness, probability and mathematical equations to imitate natural phenomena. These phenomena include the biological evolutionary process like genetic algorithm (GA) [16] and the differential evolution (DE) [17], animal behaviour like particle swarm optimization (PSO) [18], ACO [19], and the physical annealing process like simulated annealing (SA) [20]. Many metaheuristic algorithms and their improved modifications have also been successfully applied to various optimization problems in

128

T. Tekin Erguzel et al. / Computers in Biology and Medicine 64 (2015) 127–137

recent studies [21–24]. Compared to conventional numerical methods, those algorithms have outperformed on generating better solutions [25]. Among these meta-heuristic algorithms, ACO is a stochastic search method based on observations of social behaviors of real insects or animals and it has been shown to be an efficient algorithm for FS problems [15,26–30]. However there are still some weaknesses of ACO in practice. The probability to get trapped in local optimal solution, high computational time and system resources requirement to obtain the optimal solution and the difficulties to set the heuristic parameters to achieve the good performance are prominent considerations of the algorithm. So, in order to avoid the potential weaknesses, similar improved ACO methods were proposed in recent studies [31–35]. In the last decade, there has been an upsurge of interest within the neuroscience community in the use of AI methods. One such method is supervised machine learning (ML), in the area of AI, can automatically detect patterns in the existing training data and then use the detected patterns to make prediction on future data [36]. Compared to conventional methods, the advantages of applying supervised ML could be underlined twice. Supervised ML methods address individual differences, rather than considering group differences as most traditional statistical comparisons do, and classify subjects in order to contribute to clinical decision process. Those methods generate a model using training set that includes input and output data. Following the classification process, the model is tested using external test data to estimate prediction capability of the model. Those methods are also sensitive to spatially distributed and subtle effects in the brain that would otherwise be indistinguishable applying traditional univariate methods which focus on gross differences at group level [37]. SVM is a specific type of supervised ML method based on the structural risk minimization (SRM) principle. SVM is used to solve classification problems by maximizing the margin between the two opposing classes separated through a hyperplane. SVMs are widely used in order to solve the problem of model selection, over-fitting, nonlinear, the curse of dimensionality and local minimum in a better way [38–39], and have promising outcomes in regression tasks [40] and also are widely used in the classification of psychiatric disorders [41–45]. Some psychiatric disorders are frequently misdiagnosed which ultimately lead to suboptimal treatment and poor outcomes. One good example for such diagnostic dilemma is the difficulties in discriminating depressive episodes in patients with BD and patients experiencing MDD, a clinical term which has been used for clinical cases experiencing depression without any lifetime presence of mania [46]. Discriminating MDD and BD at earlier stages of illness could therefore help to facilitate efficient and specific treatment. This is because, bipolar disorder is linked with poorer functioning and the highest rates for committed suicides, and using specific treatments such as mood stabilizers may be crucial for the treatment of bipolar patients [47]. In addition, receiving antidepressants for patients with BD may induce a manic episode, characterized by elevated mood, agitation, grandiose delusions and a marked increase in goal-directed behaviour may result with inappropriate risk-taking [48]. Recent studies have utilized neuroimaging methods between MDD and BD to reveal discrete patterns of functional and structural abnormalities in neural systems and found some potential [42–44]. In some other studies, conventional statistical techniques were used and those methods rely on the basic assumption of linear combinations which may have well-known inadequacies for discriminating heterogeneous symptom based psychiatric diagnosis [45]. Over the past decade, machine learning methods have been used increasingly in the study of affective disorders and in comparisons of these patients to those with other psychiatric disorders [49]. Thus, classifying MDD and BD at earlier stages of illness could therefore help to facilitate efficient and specific treatment. Some studies used neuroimaging methods for BD and MDD to reveal discrete patterns of functional and structural abnormalities in

neural systems critical for emotion regulation [49–51]. In some other studies, traditional statistical techniques were used and those methods rely on the basic assumption of linear combinations only, so may not be appropriate for such tasks [52]. Over the past decade, machine learning methods have been used increasingly in the study of affective disorders and in comparisons of these patients to those with other psychiatric disorders [53]. A recent study used SVM to compare the diagnostic performance of BD and MDD and classified the subjects with 54.76% accuracy [54]. In a similar study pattern recognition analysis was applied using subdivisions of anterior cingulate cortex (ACC) blood flow at rest. SVM classified MDD and BD subjects using subgenual ACC blood flow 81% accuracy [55]. Another study employed multivariate pattern classification techniques using brain morphometric biomarkers and yielded up to 79.3% accuracy by differentiating the 2 depressed groups [56]. In another study, the epileptic seizure detection for multichannel EEG signals based on the automatic identification system is presented. Considering both MDD and BD EEG signals approximate entropy and statistic values were used for feature extraction. The seizure detection accuracy of SVM with various kernel functions was also tested in this study using MDD and BD EEG signal. The prediction accuracy is given as 97.17% for RBF type kernel SVM model [57]. Beside high classification accuracy contribution to the diagnosis process, computerized feature extraction methods have been used increasingly in the study of affective disorders and in comparisons of psychiatric disorders. Depressive episode in BD is regarded among the most wearing psychiatric disorders with a lifetime prevalence of up to 4–5% [58]. Although BD and MDD have been considered as distinct clinical cases, treated with specific therapeutic methods, former studies revealed that 60% of BD cases were incorrectly diagnosed as UD, and were consequently treated inappropriately [49]. Thus, it is critically important to determine the biomarkers reflecting distinctive pathophysiologic processes in BD and MDD [59]. EEG coherence is one of those potential neurophysiological biomarkers reflecting brain dynamics. Coherence describes relationship between signals in a given frequency band and various spatial coherence signals are gathered over long distances as parallel processing [60–62]. EEG coherence is a remarkable large scale measure of functional relationships or synchronized functioning between cortical regions pairs, therefore coherence appreciated as a biomarker representing the brain's functional connectivity [63–65]. Recent studies underline the clinical contribution of coherence as biomarker in the classification of psychiatric disorders [45,66–68]. Through above analysis, this paper aims to reveal the discriminating features without sacrificing the classification accuracy of MDD and BD subjects using SVM and IACO. The coherence, a measure of functional connectivity in the brain, values were first calculated using a previously defined method [69]. Subjects, EEG recordings and coherence calculation steps are given in Section 2. FS methods and the hybrid structure with SVM are described in Section 2.4. Computational experiments of proposed approach are reported in Section 3 and finally the outcomes in terms of engineering and psychiatry perspective are discussed in Section 4.

2. Materials and methods 2.1. Subjects We conducted a retrospective investigation in 1977 patients who applied to the Neuropsychiatry Istanbul Hospital Department of Psychiatric Outpatient Clinic between January 2010 and April 2015. Among these patients, 101 patients receiving the diagnose of BD and MDD on admission were recruited for this study. 46


bipolar disorder patients in depressive episode (17 males and 29 females) and 55 patients with MDD (23 males and 32 females) were matched. Eligible subjects were outpatients suffering from a depressive episode of BD or MDD, diagnosed according to Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV criteria for either a primary diagnosis of bipolar affective disorder depressive episode of major depressive episode on the Structured Clinical Interview for Axis I Disorders (SCID-I). We included subjects with a diagnosis of MDD who received at least the scores of 8 on the Hamilton Depression Rating Scale-17 item version (HDRS) or subjects with a diagnosis of BD depressive episode and scoring higher than 13 points in Young Mania Rating Scale (YMRS) [70]. We excluded the subjects with first depressive episode, episode with current psychotic features, history of rapid cycling (Z4 cycles during a year), history of mixed episodes, current psychiatric comorbidity on axis I, serious unstable medical illness or neurologic disorder (e.g., epilepsy, head trauma with loss of consciousness), alcohol or substance abuse within 6 months preceding the study and patients treated by electroconvulsive therapy within 3 months before their participation to the study. We also excluded subjects with less than four psychiatric admissions. This criterion was set to ensure the longitudinal reliability of the diagnosis. All patients were evaluated by four experienced psychiatrist with a clinical expertise of at least 5 years. Inter-rater reliability was not evaluated. However, we also excluded subjects with less than four psychiatric admissions. This criterion was set to ensure the longitudinal reliability of the diagnosis. As a note of caution, none of the patients were under antidepressant medication at the time of the EEG recording. This is because before QEEG recording is a routine procedure conducted before the planning of treatment for all patients who applied to Neuropsychiatry Istanbul Hospital. However, BD subjects were receiving a mood stabilizer or a combination of mood stabilizers (25% lithium, 57.5% sodium valproate, 17.5% quetiapine and olanzapine). Participants met the routine laboratory studies (complete blood count, chemistry, thyroid stimulating hormone), urine toxicology screen, and electrocardiogram were performed at study screening, and subjects were required to be medically stable before enrollment to the study. 2.2. EEG recordings For all the patients, EEGs were recorded for five minutes of eyes-closed resting state condition. Patients were instructed to avoid medication for 12 h before the EEG recording took place. In order to observe and reveal the efficacy of coherence, quantitative EEG (QEEG) data were collected from 101 subjects who were seated in a sound-attenuated, electrically shielded room in a reclining chair with eyes closed (wakeful–resting condition). The technicians monitored the QEEG data during the recording and realerted the subjects every minute as needed to avoid drowsiness. Electrodes were placed with an electrode using 19 recording electrodes distributed across the head according to the international 10–20 system arrangement. Three minutes of eye-closed EEG at rest were acquired using Scan LT EEG amplifier and electrode cap (Compumedics/Neuroscan, USA) with the sampling rate of 250 Hz. 19 sintered Ag/AgCl electrodes positioned according to the 10/20 International System with binaural reference. For each individual, intra-hemispheric coherence was measured across electrode pairs F3–C3, F3–P3, F3–T5, C3–P3, C3–T5, P3–T5 on the left hemisphere, and F4–C4, F4–P4, F4–T6, C4–P4, C4–T6, P4–T6 on the right hemisphere. Inter-hemispheric coherence was measured across electrode pairs F3–F4, C3–C4, P3–P4, and T7–T8. Raw EEG signal was filtered through a band-pass filter (0.15–30 Hz) before artifact elimination and EEG segments with obvious eye, head movements and muscle artifacts were manually

129

removed. The data analysis of the EEG was accomplished with the Neuroguide Deluxe 2.5.1 software (Applied Neuroscience, St. Petersburg, FL). 2.3. QEEG coherence biomarker Classical EEG spectral analysis was implemented using magnitude-squared coherence, as a function, is based on Fourier transform, of the frequency f, value. Coherence is defined as the normalized power spectrum per frequency of two signals recorded simultaneously at different sites on the scalp. The magnitudesquared coherence Cxy (f) is calculated for every pair of electrodes as the square of the modulus of the mean cross power spectral density (PSD) normalized to the product of the mean auto PSD. The coherence value for an electrode pair wave-forms x and y is calculated as C xy ðf Þ ¼

j P xy ðf Þj 2 j P xx ðf Þj j P yy ðf Þj

ð1Þ

where P xy ðf Þ is the cross PSD estimate of x and y, P xx ðf Þand P yy ðf Þ are the PSD estimates of x and y, respectively. The power spectrum (periodogram) and cross-power spectrum are defined as P xx ðf Þ : ¼ j x€ ðf Þj 2 ¼ x€ ðf Þx€ ðf Þ;

ð2Þ

P xy ðf Þ : ¼ ÿðf Þx€ ðf Þ

ð3Þ

where ẍ is complex conjugate of x and Z 1 xðtÞe iwt dt x€ ðf Þ :¼

ð4Þ

1

is the Fourier transform that generates the information about frequencies occurring in signals and the dominant frequency for those signals. During the calculation process, raw EEG signal was divided into periods of 650 ms with a 50% overlap and each period was windowed with a Hanning window. The Matlab program was then employed for coherence analysis by using 10–14 artifact free epochs for each subject. The scan numbers were set same randomly between the target and non-target stimulation conditions. Coherence values were calculated for the target and nontarget stimuli for long-range intra-hemispheric and interhemispheric pairs for delta, theta and alpha frequency bands. Finally, distribution of coherence values was normalized using Fisher's Z transformation [68]. 2.4. Feature selection FS process is a commonly used technique for decreasing dimensionality of the data and increasing efficiency of learning algorithm. For FS process the whole search space covers all possible subsets of features and the number of subsets is calculated as given in Eq. 5: n X n n n n ¼ þ þ… ¼ 2n ð5Þ s 0 1 n s¼0 where n represents the number of features and s is the size of the current feature subset [71]. FS methods usually require heuristic or random search strategies causing high complexity, so the degree of optimality of the final subset is generally reduced [15]. FS methods could be grouped into three major classes based on their evaluation procedure [72]. If an algorithm performs FS independent of a learning algorithm, then it is called as filter approach and mostly includes selecting feature subsets based on the inter-class separability principle. Due to its computational efficiency, the filter approach is popular while using high-dimension data. If the evaluation step is processed with a classification algorithm, the

130


FS algorithm is called wrapper approach. In the wrapper approach, selected features are fed into a preset learning model to measure performance of the subset. Compared to filters, in wrappers, the predictive performance of the final selected subset is correlated with the chosen relevance measure while dealing with large dataset may increase the complexity due to the use of learning algorithms in the evaluation of feature subsets [73]. Finally, in embedded approach the FS and learning algorithm are interleaved similar to wrapper methods and the link between the FS and the classifier is stronger, nevertheless the wrapper methods has a better coverage of the search space [6]. Wrappers are constituted by three components; learning machine, feature evaluation criteria; and a FS method as shown in Fig. 1. Because a large number of features may cause high complexity meta-heuristic search methods seem to be prominent due to their flexibility in random searching to contribute to FS process. Recent studies underline nature inspired methods such as: particle swarm optimization (PSO), genetic algorithm (GA)-based attribute reduction and gravitational search algorithm (GSA). Besides these methods, attempt to achieve better solutions by application of knowledge from previous iterations, ACO is another auspicious approach to solve the combinational optimization problems and has been widely employed in FS [29]. 2.5. Improved ant colony optimization algorithm for feature selection ACO is a stochastic algorithm that mimics real ant colonies to construct a solution by a sequence of probabilistic decisions. The probability matrix is initialized randomly to enable variety for each ant and is expanded by adding a solution component, pheromone, after each probabilistic decision step. The sequence of decisions taken by each ant during the searching forms a pheromone trail and the density of the pheromone of the path is [74,75]. At the end of each iteration, ants deposit pheromones on the path they have visited and depending on the solution performance, pheromone density varies. The pheromone density on the trail evaporates with time so that the new comer ants can find alternative paths. Besides, at the end of each tour the pheromone density on the path generated by the best ant and worst and is also updated to follow award and penalty strategy in order to strengthen the search guide of the region near the optimal solution. The iterative process continues till the stopping criterion is reached. The stopping criterion may be either a number of iterations or a solution of desired quality [15]. ACO algorithms make probabilistic decision in terms of the artificial pheromone trails and the local heuristic information expressing desirability of the next node. These two factors are combined to form the so-called probabilistic transition rule and are expressed as given in Eq. 6: ηυu σ σ υ pku ¼ τu U P ð6Þ p p τ u U ηu if cu A N ðs Þ0otherwise: cu A N ðs

Þ

where τu is a pheromone level of the edge from any feature to featureðuÞ, indicating how informative the feature is and ηðuÞ is a heuristic desirability of choosing featureðuÞreflecting the desirability of choosing the next edge. For an ant k, Nðsp Þ, represents the set of all possible features to be connected from the current feature

Fig. 1. Wrapper approach procedure.

and C u represents the selected feature for each ant. With the use of pku probability function, the probabilities of all possible next weighted features are calculated where high probability means more attractive alternative for the ant. σ and υ are constants to trade off the relative importance of the pheromone, τu and the heuristic information, ηu . If υ ¼ 0 the search process will only utilize the pheromone value it will cause to the rapid emergence of stagnation situation and the ants will not be able to find any good solution at all [76]. At the end of each iteration, the pheromone values, associated with the edge of joining features are updated using Eq. 7. That step of the algorithm is called local pheromone update process.

τu ’ 1 ρ U τu þ

m X

Δτku

ð7Þ

k¼1

where ρ pheromone trail decay coefficient, m is the number of ants and Δτku is the laid pheromone on the edge of featureðuÞ by ant k, where ( δnJ k if ant used feature ðuÞ in its tour; k Δτu ¼ ð8Þ 0 otherwise where δ is a constant and J k is the fitness value of selected feature set by ant k [77]. By using this rule, the pheromone level of the features with highest fitness value will increase frequently, which will make that set more inclined to be selected during subsequent iterations by the ants. At the end of each tour, following the local pheromone update process, global pheromone update process starts. Pheromones of the paths belonging to the best and worst of the tour are updated as given in the following Eqs. 9 and 10 respectively:

τbest ¼ τbest þ θnJ best u u τworst ¼ τbest 0:1nθnJ worst u u

ð9Þ ð10Þ

τ the pheromones of the paths followed by where τ the ant in the tour with the lowest ð J worst ) and highest fitness values ð J best ) in one iteration respectively. With the contribution of pheromone evaporation (λ) given in equation, the pheromone density of the visited paths is reduced to avoid from the features with higher pheromone to be chosen so that the ants could explore the features which have never been chosen: best u and

worst are u

τu ¼ τλu þ ½τbest þ τworst u u

ð11Þ

In ACO based FS, the ants search the feature space to construct each candidate subset using the probability transition rule. Since there is not the certain mechanism to predetermine invalid subsets, it is possible for ACO to consider any combination of the features as a candidate so that lots of invalid candidates could be produced, especially in the initial phase. These invalid candidates feature subsets could then impede the algorithm to converge the optimal solution and therefore reduce the performance of the algorithm. One of the other severe weaknesses of conventional ACO algorithm is stagnation. Stagnation is experienced if the ants generate the same solution without any improvement therefore causing the algorithm to be trapped in a local optimal solution [78]. In order to address the problem of conventional ACO algorithms, an improvement technique is added in order to progress the ACO method so that diversity of ants' solution set is assured. Recent studies also focus on the noteworthy results of improved optimization algorithms versions [79–81]. In classical ACO, the parameters σ and υ in Eq. 6 are static and all ants use the same values in the running of FS process. Thus the relative importance of each is constant during the FS process for each ant disregarding the phase of ACO. At the beginning of the FS process the ants have little information about alternative paths,


therefore the distance has stronger impact on routing compared to pheromone. On the other hand after the algorithm runs for a long time the impact of pheromone is to be reinforced, because more information is stored in pheromone about better paths. A dynamic process was therefore replaced with standard ACO to converge to the solution fast and robust instead of using static routing bias parameters. Because the pheromone calculation process given in Eqs. 7–10 is employed to evaluate the performance of current ant and the pheromone update process, as given in Eq. 11, is a static process used to update pheromone table after each visit, they are all independent of bias parameters. In the experiments, we used an experimental rule of adjusting parameters, under which the algorithm finds the best solution faster in average [82]. Dorigo [83] recommended that σ ¼ 1 and υ ¼ 5 are reasonable for many cases. So, we set our parameters as 1 and 5 respectively at the very beginning of FS process. The values are updated at the end of each tour according to the accuracy values of overall ants in the tour. If a shorter path is detected at the end of the tour the σ parameter is increased by 0.25 and the value of υ is decreased by 0.25 in order to increase relative importance of pheromone to the distance. The process is repeated till the values are 5 and 1 respectively. With IACO, the effectiveness of the improved ACO was underlined in terms of time complexity. The design of the designed IACO algorithm is as follows. Initialize ACO repeat set heuristic parameters σ and υ

for ant k ϵ{1,2,…,m} choose a feature set using probabilistic rule calculate fitness value local pheromone update end-for evaluate solutions generated by all ants in the tour identify the best and the worst of the tour global pheromone update decrease pheromone density by evaporation rate if a shorter path is detected at the end of the tour update heuristic parameters σ and υ end-if until stopping criterion is reached Select the features with highest pheromone values

131

2.7. Support vector machines SVM is a widely used method for classification and regression problems mapping the training samples from the input space into a higher dimensional feature space. Because SVMs are based on linear or nonlinear RBF kernels, they are preferred to improve correlation of data with nonlinear nature. SVMs map raw data into a high-dimensional feature space using a nonlinear mapping function ðφÞ first, and construct the optimal separating hyperplane just on the base of support vectors to do linear regression in this space. The basic classification principle could be depicted as shown in Fig. 2. With a training set given as S ¼ xi ; yi j xi A H; yi A þ 1 ; i ¼ 1; 2; …lg; where xi are the input vectors and yi the labels of xi ; the target function is calculated as 8 l X > > < min φðW Þ ¼ 12W:W þ C δi ð13Þ i¼1 > > : subject to y ðW:φðx ÞÞ þ b Z1 δ ; δ Z 0 i ¼ 1; 2; …l i

i

i

i

where W represents the hyperplane normal vector, C is a penalty coefficient, which controls the trade-off between maximization of the margin width and minimizing the number of misclassified samples in the training set is set as 10. δi is another hyperparameter and controls the width of kernel is set as 0.2. Finally, optimal hyperplane is transformed into the following quadratic equation: 8 l X > 1X > > max L ð α Þ ¼ αi α α y y Kðxi xj Þ > > 2 ij i i j j < i¼1 ð14Þ l > X > > > subject to αi yi ¼ 0; 0 r αi r C; i ¼ 1; 2; …l: > : i¼1

And the output function could be expressed as " # l X f ðxÞ ¼ sign yi αi K ðxi :xÞ þ b

ð15Þ

i¼1

Depending on the data, various kernel functions could be used in decision function. Linear, kernel and radial basis function (RBF) are mostly used kernel functions and the functions are given in Eq. 16 respectively [86]: K ðx; xi Þ ¼ 〈x:xi 〉; K ðx; xi Þ ¼ ð〈x:xi 〉 þ cÞd K ðx; xi Þ ¼ expð 〈x xi 2 〉=2σ 2 Þ

2.6. Fitness function A fitness function is used to put forth the degree of goodness of selected subset. For a classification problem, if two subsets with different number of features present quite similar performance, the subset with less number of features comes into prominence. Therefore, the evaluation of fitness function regards two concerns: the classification accuracy and the number of features in the subset. In order to satisfy these concerns, the fitness function is designed in terms of both accuracy and number of features as f xj ¼ mnJ X j þ nnð1=j Xj j Þ ð12Þ where X j is the subset constituted by jth ant, JðX j Þ is the classification accuracy using X j , |Xj | is the number of features of X j , m A [0,1] and n A [0,1] are the two coefficients used assign relative importance to classification accuracy and number of the selected subset parameters [84]. In our study, because the classification accuracy is relatively important compared to the number of features we set m as 0.92 and n as 0.78. Substituting the coefficients in the equation, the fitness values are calculated as given in Table 3.

ð16Þ

In this study 3 types of kernels were employed and RBF kernel was selected with its comparatively better performance as given in Table 1. 2.8. The proposed IACO–SVM model This study adopts IACO approach to present a novel IACO–SVM model for parameter optimization problem of SVM. In order to

Fig. 2. SVM constructs a separating hyperplane to maximize the margin between classes. The samples on the dashed lines are called as support vectors. New instances are classified to one side of the hyperplane they fall into [85].

132


classify validation samples or unknown data correctly a training set S ¼ xi ; yi j xi A H; yi A þ 1 ; i ¼ 1; 2; …l ; and a decision function f is employed to map the input vectors x onto the outputs y A f 1; 1g. In the proposed model, the inputs are processed as the selected feature subset by IACO while the outputs are considered as psychiatric disorder type. To do this, in this context, all features are fed into the FS step from alpha, delta and theta coherence values and more informative features are selected by implementing meta-heuristic process of artificial ants. The performance of selected feature subset is then evaluated by the fitness function over the SVM classifier. According to the fitness value, the pheromone update process is initiated and heuristic parameters are also modified subsequently. The modeling process loops till the stopping criterion is satisfied. At the end of each modeling process, the performance of the classifier is tested using an external test data that is new to the model. Through the nestedCV process the model with the highest performance is selected. The overall process of the proposed method is illustrated in Fig. 3. As shown in Fig. 3, nested-CV consists of two nested, outer and inner, cross validations Firstly; the dataset is split into six parts, 6-fold outer-CV, using stratified sampling. While one fold is reserved for outer-CV test, remaining five folds are held out for training process in inner-CV cycle. In inner-CV cycle, 5-fold CV process is applied and 5 models are generated finally. The models Table 1 Classification performance of SVM classifier using 3 different kernel types. Kernel function type

Classification accuracy (%)

Linear kernel Polynomial kernel RBF kernel

56.43% 58.41% 62.37%

with the selected feature subsets are then sorted according to their classification accuracies. In order to eliminate the leakage from test and training data the best of those five models is tested with reserved outer-CV test fold that is completely non-familiar to the models. Following the inner CV loop, reserved outer CV test fold is swapped with one fold of five training folds so that the models generated by inner-CV process are tested by completely new test data for each outer CV cycle. In inner CV, features are selected with IACO using inner-CV training data. With IACO method, four folds are used for training and one for validation subsequently. As we splitted the data into six parts in outer-CV, the aforementioned steps are repeated five more times to determine the local best model for each loop. Finally, to discover the global best of six locally best models with their optimized feature subset, the array is sorted according to the classification accuracy. 2.9. Complexity analysis 2.9.1. Time complexity In this study, a hybrid approach, combining IACO and SVM methods, was employed in order to generate a model using fewer but more informative features. The computational complexity of the proposed model in terms of time can be computed considering the parameters given in Eq. (17): Niterations N ants T f eature selection þ T SV M training þ T pheromne updating ð17Þ where NIterations is the number of iterations and NAnts is the number of ants in each iteration. Tfeature selection is the runtime for an ant to generate a feature subset, TSVM training is the runtime for an ant to train SVM classifier using selected features which scales cubically (O(n3)) with the size of the training set (n), and finally TPheromoneUpdating is the runtime to update pheromone table after a model is generated [87]. In order to reduce the complexity, the high

Fig. 3. Schematic representation of the proposed IACO–SVM approach with nested-CV.


dimensional data should be transformed into low dimensional data which includes extracting the essential information from the data. Transforming the data to a more condensed form not only improves the classification accuracy but also reduces the computational complexity. In this study dimensionality reduction was realized as FS using IACO which reduced the number of variables and reduced SVM training runtime while improving the overall performance of the system. Besides, application of an IACO version reduced the number of iterations ðN iterations Þ from 48 to 33 while increasing the fitness value as shown in Fig. 4. The tests were performed on a Windows 7 Professional operating system installed computer with Intel(R) Core (TM) i5-3470 CPU @3.20 GHz processor, 4.00 GB of physical memory and 500 GB Seagate disk drive hardware configuration. As it is seen in Fig. 4, fitness value calculation process, given in Eq. 12, is repeated for all iterations. As the number of iteration increases the performance of the algorithms improves. In order to investigate the convergence performance of both ACO and IACO approach, algorithms were executed 10 times. The mean value of the minimum fitness functions of those 10 sessions versus the number of iteration is plotted in Fig. 4. It is possible to deduce that the proposed approach, IACO, converges faster and accurate than the standard ACO algorithm. Besides, 20 trials were performed to compare the average running time of standard ACO, PSO, GA and IACO, and the results are given in Table 2.

2.9.2. Space complexity Space complexity of an algorithm is expressed in terms of memory space consumed throughout the overall process. The space occupied by the algorithm is calculated using fixed and variable amount of memory. Fixed amount is occupied by the variables used in the program while variable amount is occupied by the component whose size is dependent on the iterations and recursive procedures. In order to determine the overall amount of memory used by the algorithm two space parameters are taken into consideration. With data space, the amount is expressed with the variables, data structures, allocated memory and other data components. If the scale of our problem is p, and the number of ants is q, then the space complexity is calculated as given in Eq. (18): SðpÞ ¼ SðpÞ ¼ O 4p2 þ Oð2pqÞ:

ð18Þ

For the IACO, the scale of the problem is the same therefore the space complexity is same with simple ACO.

133

3. Experimental results In this study, a series of experiments are conducted to show the effectiveness of the proposed FS algorithm. The classification of 46 BD and 55 MDD subjects was performed using SVM with selected feature subset. FS process was performed using genetic algorithm (GA), particle swarm optimization (PSO), ACO and IACO algorithms. In order to overcome general drawbacks of ACO such as early stagnation [31] and low convergence speed [32,34,84], the heuristic parameters were adjusted for IACO during the optimization process. Initial feature set is composed of 12 intrahemispheric and 4 inter-hemispheric QEEG coherence values from alpha, delta and theta frequency bands. Nested-CV was performed to train and test SVM classifier. Since ROC analysis is a significant discrimination tool illustrating how classifiers and threshold choices perform we employed ROC and area under the ROC curve (AUC) analysis for each combined and standalone classifier models. The performance of each approach is plotted in Fig. 5. As it is seen from the figure, the proposed approach, IACO– SVM, outperforms to GA–SVM, PSO–SVM, ACO–SVM and standalone SVM classifiers. The performances of classifiers with aforementioned FS methods were also compared in terms of number of features, classification accuracy, sensitivity and AUC values are also given in Table 3 to emphasize the contribution of dynamic heuristics used in IACO. Although the number of features of standalone SVM classifier is more than the models using FS methods, the accuracy, sensitivity and AUC values are not satisfactory. Assigning FS methods to the classifier has evolved the classification performance up to 73.26% with fewer features. The overall performance of IACO claims that dynamic heuristic parameters improve the classification accuracy more. So, we have upgraded standard ACO and a considerable increase was observed on both the overall classification accuracy and AUC value emphasizing the importance of assigning relevant features to the model. The selected features yielding the classification performance are also given in Table 4 and the visualization of the selected features plotted using the Table 4 is shown in Fig. 6.

Table 2 The average running time of FS processes. Algorithm

Average running time (min)

Standard ACO IACO GA PSO

23 min 08 s 17 min 11 s 21 min 13 s 19 min 38 s

Fig. 4. Fitness values of ACOFS and IACOFS methods with various number of iterations.

134


Fig. 5. ROC curves of SVM, ACO–SVM and IACO–SVM models for MDD and BD subjects' classification.

Table 3 Performance of FS algorithm on accuracy, sensitivity and AUC values. FS Method

Number of features

Accuracy (%)

Sensitivity AUC

Fitness value

None PSO GA ACO IACO

48 25 24 25 22

62.37 73.26 75.24 78.21 80.19

0.636 0.782 0.800 0.836 0.854

0.590 0.705 0.724 0.750 0.773

0.631 0.739 0.776 0.779 0.793

4. Discussions and conclusions In this paper, an IACO–SVM algorithm is described in detail. The proposed framework consists of two consecutive steps which are FS and classification respectively. Firstly, a swarm intelligence based FS method named IACO, utilizing both ACO and dynamic heuristic parameters, was employed to overcome the weaknesses of premature convergence and low search speed problem. Following the FS process an SVM model was generated using the selected feature subset. Recent studies focused on the classification performance of SVM using EEG signals [88–91]. Besides, the contribution of metaheuristic FS methods is promising on classification performance [16–24]. Among these algorithms improved applications of FS methods are being used in order to abstain from the deficiencies of classical approaches [92–94]. In our study, QEEG coherence data of 46 BD and 55 MDD subjects were fed into IACO first. Selected more informative feature subset from 16 electrodes of alpha, delta and theta frequency bands was then used as input in SVM. According to the classification performance process is repeated and both heuristic parameters and pheromone values are updated till the stopping criteria is satisfied. The noteworthy performance of IACO–SVM approach stated that it is possible to discriminate 46 BD 55 MDD subjects using 22 of 48 features with 80.19% overall classification accuracy. The performance of IACO algorithm was also compared to the performance of standard ACO in terms of computational complexity and classification accuracy. The results underline that less number of features selected by IACO–SVM not only improve the accuracy of UD–BD classification, but also importantly, reduce the computational cost. As a future suggestion, the performance of other improved swarm intelligence based optimization algorithms could be explored to find a more effective and applicable approach for FS. This research contributes to the literature on combinatorial optimization problems and from a practical perspective, the proposed algorithm could be used in

engineering applications and medical decision making processes. Regarding the clinical translation of the selected features in the IACO–SVM model performed in this study, only handful of studies investigated the differences in the cerebral connectivity between MDD and BD. Concerning EEG coherence, a recent study concluded by our research group found a lack of frontal interhemispheric alpha connectivity in BD patients as compared with MDD by using univariate methods [45]. In another study, BD patients showed greater alpha activity in the bilateral temporo-parietal regions as compared with MDD [95]. Moreover, a similar study underlined that a lack of inter-hemispheric synchronization in the slow-wave frequency bands may be the unique features that distinguish BD from MDD [96]. Different from the former studies, delta frequency band is included into the analyses. This is because; delta frequency band has been a related to cognitive processes which has been shown to be impaired in patients with BD more than MDD [97]. Taken as a whole, the features selected in this study are generally complementary to the studies where differences between MDD and BD were found at the group level. Nonetheless, the ultimate goal for examining differences among psychiatric disorders is to identify biological markers that are specific for each disorder. Notably, the use of univariate statistics avoids making inferences at the individual level and thus, is less informative for clinical use. SVM based approaches become crucial in this aspect as it permits individual based assumptions in addition to specificity and sensitivity values of the selected features. Regarding the clinical translation of the selected features in the IACO–SVM model performed in this study, only handful of studies investigated the differences in the cerebral connectivity between MDD and BD. Concerning QEEG coherence, a recent study done by our research group found a lack of frontal interhemispheric alpha connectivity in BD patients as compared with MDD by using univariate methods [45]. In another study, BD patients showed greater alpha activity in the bilateral temporo-parietal regions as compared with UD [95]. Moreover, a similar study underlined that a lack of inter-hemispheric synchronization in the slow-wave frequency bands may be the unique features that distinguish BD from UD [96]. Different from the former studies, we included delta frequency band into the analyses. The reason for this was firstly because delta frequency band was related to cognitive functioning which is more impaired in patients with bipolar disorder as compared with major depression [98]. Secondly, specific alterations in delta band were demonstrated during the sleep of depressed patients and thus delta band may serve as a control band as our recordings were collected during eyes-closed non-


135

Table 4 Selected subset features using IACO. Delta frequency band

Theta frequency band

Alpha frequency band

f3–f4, p3–t5,p4–t6,f3–p3

f3–c3, f3–t5, p4–t6, c4–p4, c4–t6, c3–c4,t5–t6, c3–p3

c4–p4, f3–p3, f3–t5,c3–p3,f4–c4, f4–t6,c4–t6,p4–t6,c3–c4,p3–p4

Fig. 6. The visualization of the selected subset features using IACO.

sleep condition [99]. This may be the reason why we found fewer features contributing from delta band in the current study. Besides, we found that alpha band has more features than beta and delta. Alpha frequency band is the most common band that has been found to be altered in EEG studies conducted in depressive patients. In brief, studies have found an abnormal frontal alpha asymmetry which disrupts the behavioral approach avoidance tendencies that pave the ground for depression [100]. Lastly, differences in state anxiety could reflect an increased activity in beta frequency band, however this may vary among patients in the current study [101]. This may be why we found fewer features in beta as compared to alpha frequency band. Taken as a whole, the features selected in this study are generally complementary to the studies where differences between UD and BD were found at the group level [45,95,96]. Nonetheless, the ultimate goal for examining differences among psychiatric disorders is to identify biological markers that are specific for each disorder. Notably, the use of univariate statistics avoids making inferences at the individual level and thus, is less informative for clinical use. SVM based approaches become crucial in this aspect as it permits individual based assumptions in addition to specificity and sensitivity values of the selected features.

Conflict of interest statement We certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript.

Acknowledgments Authors would like to express their thanks to NPIstanbul Hospital for providing the required EEG data. References [1] D.D. Meisel, Fourier transforms of data sampled in unequally spaced segments, Astron. J. 84 (1979) 116–126.

[2] P. Bajcsy, An overview of DNA microarray grid alignment and foreground separation approaches, EURASIP J. Adv. Signal Process. (2006) 1–13. [3] E. Andreou, E. Ghysels, A. Kourtellos, Oxford Handbook on Economic Forecasting–Forecasting with Mixed-frequency Data, Oxford University Press, Oxford, USA, 2010. [4] W.H. Press, G. Rybicki, Annotation: what can be done about missing data? Astrophys. J. 338 (1989) 277–280. [5] N.I. Kalyadin, V.A. Lemenkov, I.R. Losev, et al., Problems of medical monitoring of patients and the requirements for development of computer monitoring systems, Biomed. Eng. 30 (2) (1996) 81–85. [6] M.V. Susana, F.M. Mendonca, J.F. Goncalo, et al., Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients, Appl. Soft Comput. 13 (2003) 3494–3504. [7] S.F. Yuan, F.L. Chu, Fault diagnosis based on support vector machines with parameter optimization by artificial immunization algorithm, Mech Syst. Signal Process. 21 (2007) 1318–1330. [8] X.L. Zhang, X.F. Chen, Z.J. He, An ACO-based algorithm for parameter optimization of support vector machines, Expert Syst. Appl. 37 (2007) 6618–6628. [9] F.F. Chen, B.P. Tang, R.X. Chen, A novel fault diagnosis model for gearbox based on wavelet support vector machine with immune genetic algorithm, Measurement 46 (2013) 220–232. [10] J. Huang, X.G. Hu, F. Yang, Support vector machine with genetic algorithm for machinery fault diagnosis of high voltage circuit breaker, Measurement 44 (2011) 1018–1027. [11] T. Pahikkala, S. Okser, A. Airola, T. Salakoski, T. Aittokallio, Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations, Algorithms Mol. Biol. 7 (2012) 11. http://dx.doi.org/ 10.1186/1748-7188-7-11. [12] E. Thomas, M. Dyson, M. Clerc, An analysis of performance evaluation for motor-imagery based BCI, J. Neural Eng. 10 (2013) 031001. [13] D. Dai, J. Wang, J. Hua, H. He, Classification of ADHD children through multimodal magnetic resonance imaging, Front. Syst. Neurosci. 6 (2012) 63. http://dx.doi.org/10.3389/fnsys.2012.00063. [14] G.C. Cawley, N.L. Talbot, On over-fitting in model selection and subsequent selection bias in performance evaluation, J. Mach. Learn. Res. 11 (2010) 2079–2107. [15] M.H. Aghdam, G.A. Nasser, M.E. Basiri, Text feature selection using ant colony optimization, Expert Syst. Appl. 36 (2009) 6843–6853. [16] J.H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, MI, 1975. [17] R. Storn, Differential evolution design of an IIR-filter, in: Proceedings of IEEE International Conference on Evolutionary Computation, Nagoya, 1996, pp. 268–273. [18] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: Proceedings of IEEE International Conference on Neural Networks, 1995, pp. 1942–1948. [19] M. Dorigo, T. Stützle, Ant Colony Optimization, MIT Press, London, 2004. [20] S. Kirkpatrick, C. Gelatt, M. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671–680.

136


[21] M.H. Mashinchi, A.O. Mehmet, P. Witold, Hybrid optimization with improved Tabu search, Appl. Soft Comput. 11 (2011) 1993–2006. [22] A.Y. Qing, Dynamic differential evolution strategy and applications in electromagnetic inverses catering problems, IEEE Trans. Geosci. Remote 44 (1) (2006) 116–125. [23] P.S. Shelokar, P. Siarry, V.K. Jayaraman, et al., Particle swarm and ant colony algorithms hybridized for improved continuous optimization, Appl. Math. Comput. 188 (2007) 129–142. [24] L. Chen, J. Shen, L. Qin, et al., An improved ant colony algorithm in continuous optimization, J. Syst. Sci. Syst. Eng. 12 (2) (2003) 224–235. [25] R.M. Rizk-Allah, M.Z. Elsayed, A.A. El-Sawy, Hybridizing ant colony optimization with firefly algorithm for unconstrained optimization problems, Appl. Math. Comput. 224 (2013) 473–483. [26] A. Al-Ani, Feature subset selection using ant colony optimization, Int. J. Comput. Intell. Syst. 2 (1) (2006) 53–58. [27] H. Huang, X. Hong-Bo, G. Jing-Yi, Ant colony optimization-based feature selection method for surface electromyography signals classification, Comput. Biol. Med. 42 (2012) 30–38. [28] K. Monirul, M.D. Shahjahan, A new hybrid ant colony optimization algorithm for feature selection, Expert Syst. Appl. 39 (2012) 3747–3763. [29] K. Shima, N. Hossein, An advanced ACO algorithm for feature subset selection, Neurocomputing 147 (2015) 271–279. [30] M.M. Janaki, K.R. Chandran, A. Karthik, A.S. Vijay, An enhanced ACO algorithm to select features for text categorization and its parallelization, Expert Syst. Appl. 39 (2012) 5861–5871. [31] R. Jovanovic, T. Milan, An ant colony optimization algorithm with improved pheromone correction strategy for the minimum weight vertex cover problem, Appl. Soft Comput. 11 (2011) 5360–5366. [32] D. Qiulei, H. Xiangpei, S. Lijun, W. Yunzeng, An improved ant colony optimization and its application to vehicle routing problem with time windows, Neurocomputing 98 (2012) 101–107. [33] B. Yu, Y. Zhong-Zhen, Y. Baozhen, An improved ant colony optimization for vehicle routing problem, Eur. J. Oper. Res. 196 (2009) 171–176. [34] K. Watcharasitthiwat, P. Wardkein, Reliability optimization of topology communication network design using an improved ant colony optimization, Comput. Electr. Eng. 35 (2009) 730–747. [35] D. Zhao, L. Luo, K. Zhang, An improved ant colony optimization for the communication network routing problem, Math. Comput. Model. 52 (2010) 1976–1981. [36] K.P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012. [37] G. Orrùa, W. Pettersson, F. Andre, et al., Using support vector machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review, Neurosci. Biobehav. Rev. 36 (2012) 1140–1152. [38] V.N. Vapink, An overview of statistical learning theory, IEEE Trans. Neural Netw. 10 (5) (1999) 988–999. [39] D.C. Lou, C.L. Liu, C.L. Lin, Message estimation for universal steganalysis using multi-classification support vector machine, Comput. Stand. Interfaces 31 (2) (2009) 420–427. [40] N. Cristianni, T.J. Shawe, Support Vector Machines and Other Kernel based Learning Methods, Cambridge University Press, 2000. [41] A. Subasi, Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders, Comput. Biol. Med. 43 (2013) 576–586. [42] I. Kalatzis, N. Piliouras, C. Ventouras, et al., Design and implementation of an SVM-based computer classification system for discriminating depressive patients from healthy controls using the P600 component of ERP signals, Comput. Method Programs Biomed. 75 (2004) 11–22. [43] Y. Chen, J. Storrs, L. Tan, et al., Detecting brain structural changes as biomarker from magnetic resonance images using a local feature based SVM approach, J. Neurosci. Methods 221 (2014) 22–31. [44] J. Dukart, K. Mueller, H. Barthel, et al., Meta-analysis based SVM classification enables accurate detection of Alzheimer's disease across different clinical centers using FDG-PET and MRI, Psychiatry Res.: Neuroimaging 212 (2013) 230–236. [45] C. Tas, M. Cebi, O. Tan, et al., EEG power, cordance and coherence differences between unipolar and bipolar depression, J. Affect. Disord. 172 (2015) 184–190. [46] C.L. Bowden, A different depression: clinical distinctions between bipolar and unipolar depression, J. Affect. Disord. 84 (2005) 117–125. [47] Y.W. Chen, S.C. Dilsaver, Lifetime rates of suicide attempts among subjects with bipolar and unipolar disorders relative to subjects with other Axis I disorders, Biol. Psychiatry 39 (1996) 896–899. [48] S.N. Ghaemi, D.J. Hsu, F. Soldani, et al., Antidepressants in bipolar disorder: the case for caution, Bipolar Disord. 5 (6) (2003) 421–433. [49] J.R. Almeida, A. Versace, A. Mechelli, et al., Abnormal amygdala-prefrontal effective connectivity to happy faces differentiates bipolar from major depression, Biol. Psychiatry 66 (2009) 451–459. [50] N.S. Lawrence, A.M. Williams, S. Surguladze, et al., Subcortical and ventral prefrontal cortical neural responses to facial expressions distinguish patients with bipolar disorder and major depression, Biol. Psychiatry 55 (2004) 578–587. [51] M.L. Phillips, W.C. Drevets, S.L. Rauch, et al., Neurobiology of emotion perception II: implications for major psychiatric disorders, Biol. Psychiatry 54 (2003) 515–528.

[52] M.D. Ritchie, B.C. White, J.S. Parker, et al., Optimization of neural network architecture using genetic programming improves detection and modeling of gene–gene interactions in studies of human diseases, BMC Bioinform. 4 (2003) 28. [53] S. Leslie, Neurometric quantitative EEG features of depressive disorders, in: R. Takahashi, P. Flor Henry, J. Gruzier, S. Niwa (Eds.), Cerebral Dynamics, Laterality and Psychopathology, Elsevier Science Publishers, 1987, pp. 1–17. [54] M.H. Serpa, Y. Ou, M.S. Schaufelberger, et al., Neuroanatomical classification in a population-based sample of psychotic major depression and bipolar I disorder with 1 year of diagnostic stability, BioMed Res. Int. 2014 (2014) 9 http://dx.doi.org/10.1155/2014/706157. [55] J.R.C. Almeida, J. Mourao, H.J. Aizenstein, et al., Pattern recognition analysis of anterior cingulate cortex blood flow to classify depression polarity 203 (4) (2013) 310–311Br. J. Psychiatry 203 (4) (2013) 310–311. [56] R. Redlich, J.J. Almeida, D. Grotegerd, N. Opel, Brain morphometric biomarkers distinguishing unipolar and bipolar depression. A voxel based morphometry-pattern classification approach, JAMA Psychiatry 71 (11) (2014) 1222–1230. [57] C.P. Shen, C.M. Chan, F.S. Lin, Epileptic seizure detection for multichannel EEG signals with support vector machines, in: Proceedings of the 11th IEEE International Conference on Bioinformatics and Bioengineering, 2011. [58] K.R. Merikangas, H.S. Akishal, J. Angst, et al., Lifetime and 12-month prevalence of bipolar spectrum disorder in the national comorbidity survey replication, Arch. Gen. Psychiatry 64 (5) (2007) 543–552. [59] M. Phillips, E. Vieta, Identifying functional neuroimaging biomarkers of BD: toward DSM-V, Schizophr. Bull. 33 (4) (2007) 893–904. [60] E. Basar, EEG-Brain Dynamics, Relation Between EEG and Brain Evoked Potentials, Elsevier, Amsterdam, 1980. [61] W. Miltner, C. Braun, M. Arnold, Coherence of gamma-band EEG activity as a basis for associative learning, Nature 397 (1999) 434–436. [62] M. Schürmann, T. Demiralp, E. Basar, Electroencephalogram alpha (8–15 Hz), responses to visual stimuli in cat cortex, thalamus, and hippocampus: a distributed alpha network? Neurosci. Lett. 292 (2000) 175–178. [63] P.L. Nunez, EEG Coherence measures in medical and cognitive science: a general overview of experimental methods, computer algorithms and accuracy, in: M. Eselt, U. Swiener, H. Witte (Eds.), Quantitative and Topological EEG and MEG Analysis, Universitatsverlag Druckhaus, Mayer-Jena, 1997. [64] S. Lopes, F.H. Vos, J.E. Mooibroek, et al., Relative contributions of intracortical and thalamo-cortical processes in the generation of alpha rhythms, revealed by partial coherence analysis, Electroencephalogr. Clin. Neurophysiol. 50 (5–6) (1980) 449–456. [65] H. Petsche, S.C. Etlinger, EEG and Thinking: Power and Coherence Analysis of Cognitive Processes, Verlag Der Österreichischen Akademie Der Wissenscaften, Wien, 1998. [66] W.Y. Julia, R.B. Amanda, F.O. Brian, et al., Resting state EEG power and coherence abnormalities in bipolar disorder and schizophrenia, J. Psychiatr. Res. 47 (2013) 1893–1901. [67] K. Verner, M. Colleen, K. Sidney, et al., EEG power, frequency, asymmetry and coherence in male depression, Psychiatr. Res.: Neuroimaging 106 (2001) 123–140. [68] A. Özerdem, B. Güntekin, E. Saatçi, et al., Disturbance in long distance gamma coherence in bipolar disorder, Prog. Neuropsychopharmacol. Biol. Psychiatry 34 (2010) 861–865. [69] R.W. Thatcher, P. Krause, M. Hrybyk, Corticocortical association fibers and EEG coherence: a two compartmental model, Electroencephalogr. Clin. Neurophysiol. 64 (1986) 123–143. [70] R.C. Young, J.T. Biggs, E. Ziegler, et al., A rating scale for mania: reliability, validity and sensitivity, Br. J. Psychiatr. 133 (1978) 429–435. [71] D. Mladenic, Feature selection for dimensionality reduction Lecture Notes on Computer Science, Subspace, Latent Structure and Feature Selection, Statistical and Optimization, Perspectives Workshop, SLSFS 2005, Bohinj, Slovenia, 3940, Springer, Slovenia (2006) 84–102. [72] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, Chichester, 1973. [73] M.E. Basiri, N.A. Ghasem, M.H. Aghdam, Using Ant Colony OptimizationBased Selected Features for Predicting Post-Synaptic Activity in Proteins, EvoBIO Lecture Notes Computer Science, Springer Berlin Heidelberg (2008) 12–23. [74] M. Dorgio, T. Stutzle, Ant Colony Optimization, The MIT Press, Cambridge, 2004. [75] S. Janson, D. Merkle, M. Middendorf, Parallel ant colony algorithms, in: E. Alba (Ed.), Parallel Metaheuristics: A New Class of Algorithms, A John Wiley and Sons, 2005, Chapter 8. [76] R.L. Haupt, S.E. Haupt, Practical Genetic Algorithms, second ed., John Wiley & Sons, New Jersey, 2004. [77] R. Khushaba, A. Alsukker, A. Al-Ani, Intelligent Artificial Ants based Feature Extraction from Wavelet Packet Coefficients for Biomedical Signal Classification, ISCCSP, Malta, 2008. [78] S.L. Ho, S. Yang, H.C. Wong, et al., An improved ant colony optimization algorithm and its application to electromagnetic devices designs, IEEE Trans. Magn. 41 (2005) 1764–1767. [79] W. Kanyapat, W. Paramote, Reliability optimization of topology communication network design using an improved ant colony optimization, Comput. Electr. Eng. 35 (2009) 730–747.


[80] D. Zhao, L. Liang, K. Zhankg, An improved ant colony optimization for the communication network routing problem, Math. Comput. Model. 52 (2010) 1976–1981. [81] B. Zhang, H. Qi, R.T. Ren, et al., Inverse transient radiation analysis in onedimensional participating slab using improved ant colony optimization algorithms, J. Quant. Spectrosc. Radiat. Transf. 133 (2014) 351–363. [82] K. Jun-man, Z. Yi, Application of an improved ant colony optimization on generalized traveling salesman problem, Energy Procedia 17 (2012) 319–325. [83] M. Dorigo, V. Maniezzo, A. Colorni, The ant system: optimization by a colony of cooperating agents, IEEE Trans. Syst. Man Cybern.: B 26 (1996) 1–13. [84] X. Zhao, D. Li, B. Yang, et al., Feature selection based on improved ant colony optimization for online detection of foreign fiber in cotton, Appl. Soft Comput. 24 (2014) 585–596. [85] H. Li, J. Sun, Predicting business failure using support vector machines with straightforward wrapper: a re-sampling study, Expert Syst. Appl. 38 (2011) 12747–12756. [86] X. Zhang, X. Chen, Z. He, An ACO-based algorithm for parameter optimization of support vector machines, Expert Syst. Appl. 37 (2010) 6618–6628. [87] C.L. Huang, ACO-based hybrid classification system with feature subset selection and model parameters optimization, Neurocomputing 73 (2009) 438–448. [88] T. Gandhi, B. Panigrahi, S. Anand, A comparative study of wavelet families for EEG signal classification, Neurocomputing 74 (2011) 3051–3057. [89] N. Gary, T. Ebrahimi, V.J. Marc, Support vector EEG Classification in the Fourier and time-frequency correlation domains, in: Proceedings of the 1st International IEEE EMBS Conference on Neural Engineering, Capri Island, Italy, 2003. [90] K. Mahajan, M. Rajput, A comparative study of ANN and SVM for EEG Classification, Int. J. Eng. Res. Technol. 1 (2012) 3051–3057. [91] Y. Li, C. Guan, H. Li, A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system, Pattern Recognit. Lett. 29 (2008) 1285–1294. [92] C. Changdara, G.S. Mahapatrab, R.K. Pal, An improved genetic algorithm based approach to solve constrained knapsack problem in fuzzy environment 42 (4) (2015) 2276–2286Expert Syst. Appl. 42 (4) (2015) 2276–2286. [93] K. Suresh, N. Kumarappan, Hybrid improved binary particle swarm optimization approach for generation maintenance scheduling problem, Swarm Evol. Comput. 9 (2013) 69–89. [94] R.K. Hamidreza, F. Karim, An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system, Appl. Math. Comput. 205 (2) (2008) 716–725. [95] P.S. Lee, Y.S. Chen, J.C. Hsieh, et al., Distinct neuronal oscillatory responses between patients with bipolar and unipolar disorders: a magnetoencephalographic study, J. Affect. Disord. 123 (2010) 270–275. [96] A.L. Lieber, N.D. Newbury, Diagnosis and subtyping of depressive disorders by QEEG discriminating IV. Subtypes of unipolar depression, Hillside J. Clin. Psychiatry 10 (1988) 73–82. [97] E. Başar, C. Başar, B. Güntekin, G.G. Yener, Brain's alpha, beta, gamma, delta, and theta oscillations in neuropsychiatric diseases: proposal for biomarker strategies, in: E. Başar, C. Başar-Eroğlu, A. Ozerdem, P.M. Rossini, G.G. Yener (Eds.), Application of Brain Oscillations in Neuropsychiatric Disease, Elsevier B.V., 2013, pp. 19–54.

137

[98] T. Harmony, T. Fernández, J. Silva, et al., EEG delta activity: an indicator of attention to internal processing during performance of mental tasks, Int. J. Psychophysiol. 24 (1) (1996) 161–171. [99] A. Steiger, M. Kimura, Wake and sleep EEG provide biomarkers in depression, J. Psychiatr. Res. 44 (4) (2010) 242–252. [100] S.D. Nesslera, B. Brockea, H.H.J Kayserd, Is resting anterior EEG alpha asymmetry a trait marker for depression? Neuropsychobiology 41 (2000) 31–37. [101] W. Heller, J.B. Nitschke, M.A. Etienne, Patterns of regional brain activity differentiate types of anxiety, J. Abnorm. Psychol. 106 (3) (1997) 376.

Turker Tekin Erguzel received his Ph.D. degree in real time system modeling, controlling and parameter optimization from Marmara University, Istanbul. His research interests focus on real time systems, nonlinear system modeling, and fuzzy controller design. He has been teaching at Uskudar Univeristy in the Computer Engineering Department of Faculty of Engineering and Natural Sciences since 2012. His recent focus is on application of artificial intelligence methods to psychiatric disorders.

Cumhur Tas is a psychiatrist and has obtained his Ph.D. in neuroscience at the International Graduate School of Neuroscience, Ruhr University Bochum. Dr. Tas is based at the Uskudar University Psychology Department and is a honorary research fellow at the Research Department of Cognitive Neuropsychiatry and Preventative Medicine, LWL-University Hospital, Bochum. He received his M.D. from Adnan Menderes University, Turkey in 2005, and specialization degree in psychiatry from Celal Bayar University, Turkey in 2011. He held a visiting fellowship as a honorary research associate at the Psychology Department in the Institute of Psychiatry, King's College London, from 2008 to 2009. His primary research interests focuses on social cognition, social learning and their implications on social functioning in patients with schizophrenia. Currently, he is investigating the role of oxytocin in explaining the social functioning in schizophrenia patients. He is also the developer of a Family-involved Social Cognition and Interaction Training programme (f-SCIT) in schizophrenia, and thus have a special interest in the development of new rehabilitative social cognitive interventions in schizophrenia.

Merve Cebi was born in 1986 in Istanbul. She received the B.A. degree in psychology from the University of Bogazici in 2009, and M.Sc. degree in neuroscience from the University of Istanbul, in 2012. During her education, she was involved in a summer research project in the University of Maastricht via the Erasmus internship programme. Since 2013, she has been a Ph.D. student in the University of Istanbul, in the department of advanced neurological sciences. Her research interests include memory, neurodegenerative diseases, aging, and depression.

A statistical approach to set classification by feature selection with applications to classification of histopathology images.

Feature selection for ordinal text classification.

Particle swarm optimization for feature selection in classification: a multi-objective approach.

Discriminative least squares regression for multiclass classification and feature selection.

Ensemble selection for feature-based classification of diabetic maculopathy images.

Feature Selection and Classification of Electroencephalographic Signals: An Artificial Neural Network and Genetic Algorithm Based Approach.

Heartbeat classification using disease-specific feature selection.

Feature Subset Selection for Cancer Classification Using Weight Local Modularity.

Feature selection and classification of leukocytes using random forest.

Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach.

Parallel classification and feature selection in microarray data using SPRINT.

Neuroimaging-based biomarkers for treatment selection in major depressive disorder.

Multiclass classification of sarcomas using pathway based feature selection method.

Controlled trial of bright light for nonseasonal major depressive disorders.

A novel relational regularization feature selection method for joint regression and classification in AD diagnosis.

A feature selection based framework for histology image classification using global and local heterogeneity quantification.

A feature selection approach based on term distributions.

A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data.

A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

Revisiting the Serotonin Hypothesis: Implications for Major Depressive Disorders.

A sampling and classification item selection approach with content balancing.

Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.

Relevance popularity: A term event model based feature selection scheme for text classification.

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes.