Ensemble selection for feature-based classification of diabetic maculopathy images.

Computers in Biology and Medicine 43 (2013) 2156–2162

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

Ensemble selection for feature-based classification of diabetic maculopathy images Pradeep Chowriappa a, Sumeet Dua a,n, U. Rajendra Acharya b,c, M. Muthu Rama Krishnan b a

Department of Computer Science, Louisiana Tech University, Ruston, LA 71272, USA Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore 599489, Singapore c Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur 50603, Malaysia b

art ic l e i nf o

a b s t r a c t

Article history: Received 5 June 2013 Accepted 2 October 2013

As diabetic maculopathy (DM) is a prevalent cause of blindness in the world, it is increasingly important to use automated techniques for the early detection of the disease. In this paper, we propose a decision system to classify DM fundus images into normal, clinically significant macular edema (CMSE), and nonclinically significant macular edema (non-CMSE) classes. The objective of the proposed decision system is three fold namely, to automatically extract textural features (both region specific and global), to effectively choose subset of discriminatory features, and to classify DM fundus images to their corresponding class of disease severity. The system uses a gamut of textural features and an ensemble classifier derived from four popular classifiers such as the hidden naïve Bayes, naïve Bayes, sequential minimal optimization (SMO), and the tree-based J48 classifiers. We achieved an average classification accuracy of 96.7% using five-fold cross validation. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Diabetic retinopathy Fundus imaging Decision system Feature extraction Image texture Ensemble classifier

1. Introduction Diabetic maculopathy (DM) is a complication from diabetic retinopathy (DR) that causes gradual and irreversible loss of central vision. DM manifests in the form of micro-aneurysms over the macular regions of the retina and is found predominantly in patients whose blood sugar and blood pressure are not adequately controlled. Though evidence indicates that effective control of blood sugar and blood pressure can retard the onset and progression of DR, the manifestation of the symptoms of DM are observed until the disease is in its advanced stages. Therefore, the safe treatment of DM largely relies on accurate identification of early indicators of the disease. Related works have proposed successful automated techniques for the early detection of DR [1] [2]. These automated techniques consist of three stages [3], recognition, training, and classification. Recognition entails the analysis and use of image processing techniques to detect and extract patterns from a sample image. In the case of DM, local and textural features are extracted from fundus images to effectively detect micro-aneurysms. The extracted features are then subjected to the training stage, in which a supervised learning model is built. Finally, in the classification stage, test sample DR images are classified to their corresponding classes using the model built in the training stage.

n

Corresponding author. Tel.: þ 1 318 257 4921. E-mail address: [email protected] (S. Dua).

0010-4825/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compbiomed.2013.10.003

In this paper, we propose a framework (see Fig. 1) for the automatic screening of DR fundus images based on the severity of DM. The framework extracts textural features (both region specific and global) to identify discriminatory patterns and classify samples into normal, non-CMSE and CMSE classes. We adopt the ensemble selection strategy to build an ensemble model for classification. In the remainder of this paper, Section 2 contains the related works in the area. Section 3 contains the description of the dataset used for analysis. Section 4 contains the description of the proposed framework of the decision system – covering both the feature extraction techniques employed and the model building using ensemble selection. Furthermore this section describes five performance measures used to gauge the performance of the model built using ensemble selection. Section 5 contains a discussion of the inferences and results obtained. And we conclude in Section 6. 2. Related works Researchers in the area of DR fundus image classification have relied on both techniques to extract features in conjunction with machine-learning models to extract features that best discriminate and classify images into normal and diseased classes. Sinthanayothin et al. [1] proposed an automatic computerized screen system based on a recursive region growing strategy. This system used selected threshold values in gray-level images in conjunction with an artificial neural network (ANN) to classify DR images.

P. Chowriappa et al. / Computers in Biology and Medicine 43 (2013) 2156–2162

Nayak et al. [4] used features that describe morphological characteristics of the fundus images. These features included cup-todisc (c/d) ratios, the distance between the optic disc center and ONH, diameter of the optic disc, and the ratio of area of blood vessels in inferior–superior side to the area of blood vessels in the nasal-temporal side. In conjunction with ANN model this system achieved an average classification accuracy of 95%. Ang et al. [5] described four main features based on the areas of the exudates specific to four regions: the macula, the foveola, the fovea, the parafovea, and the perifovea. As in [4], Ang et al. used these region specific features in conjunction with an ANN to obtain an average accuracy of 96.67%. Though effective in classification, these techniques target very specific micro-aneurisms. Researchers are investigating the use of textural features to overcome this limitation. Acharya et al. [6] used higher order spectral features derived from DR fundus images to classify images into normal and 30 open-angle glaucoma classes with an accuracy of 91%. Similarly, Dua et al. [7] proposed the use of wavelet-based energy features on DR images to classify glaucomatous images with an accuracy of 93.33%. In this paper we propose a decision system framework that explores and examines the effectiveness of textural features for the classification of DM fundus images. We use known textural features to analyze DM images and propose the use of ensemble selection [8] to classify DM images into normal, non-CSME, and CSME classes.

3. Image pre-processing The fundus images of patients with DM are categorized into four classes based on the severity of the disease: normal, mild, moderate, and severe. Cases of mild and moderate maculopathy

2157

are classified further as non-CSME, and cases of severe maculopathy are categorized as CSME (Fig. 2). Here, the image dataset consists of 30 retinal images of normal eyes, 30 retinal images of eyes affected by non-CSME, and 30 retinal images of eyes affected by CSME respectively. The images from each class were obtained from the Kasturba Medical Hospital, Manipal, India. The 90 images were stored in a 24-bit JPEG lossless format with an image size of 720 576 pixel. Each image in the set was subject to the following image pre-processing steps: (1) color normalization where the RGB image is converted into the habitat suitability index (HIS) model for contrast enhancement. (2) We use the median filter to eliminate noise. (3) We use histogram specification to remove biases caused by the skin tone of the subject [4]. (4) We then convert the image to grayscale.

4. Methodology In this section, we describe the framework of the proposed decision system (the overview of which is shown in Fig. 1). The following sections describe the feature extraction strategies and classifier modeling using ensemble selection.

4.1. Feature extraction 4.1.1. Fractal dimension (FD) To extract the FD of a DM image, we view the DM image as a three-dimensional (3D) object having differing intensity variation (texture) across the surface of a two-dimensional (2D) spatial plane. In FD, any surface A in Euclidean n-space is self-similar if A is the union of N r distinct (non-overlapping) copies of itself scaled up or down by a factor of r. FD is computed [9,10], as follows: D¼

log N r ; log ð1=rÞ

ð1Þ

where D is the fractal dimension. In this paper, we use the sequential modified differential box counting algorithm to extract the FD for a given image. To facilitate effective computation of each gray-scale image, we fix the grid size to the power of 2. We use both the maximum and minimum intensity for each box in the grid, and compute the difference N, and r using the following relation: r ¼ s=M ;

ð2Þ

where M ¼ minðR; CÞ and s denotes the scale factor. R and C represent the number of rows and columns, respectively. We iteratively double the grid size until maxðR; CÞ is greater than 2. Using the N and r values at variable grid sizes, we use a linear regression model to fit a straight line from plot log ðNÞ vs. log ð1=rÞ. The FD of a given image is represented by the slope as Fig. 1. Proposed decision system framework.

FD ¼ log Nr =logð1=rÞ

Fig. 2. Typical fundus images (a) normal; (b) CSME and (c) non-CSME.

ð3Þ

2158


4.1.2. Laws' mask energy (LME) Laws' mask energy (LME) is based on texture energy transforms that have been applied to the image to estimate the energy within the pass region of filters [11]. All the masks are derived from onedimensional (1D) vectors of 3 pixel in L3; E3; S3 lengths which describe level, edge, and spot features. By convoluting any vertical 1D vector with a horizontal vector, nine 2D filters of size 3 3 are generated. To extract texture information from an image Iði; jÞ, the image is first convoluted with each 2D mask. For example, if we use E3S3 to filter image Iði; jÞ, the result is a “texture image” ðTIE3E3 Þ, as follows. TIE3E3 ¼ Iði; jÞ E3E3

ð4Þ

According to Laws, all 2D masks, except L3L3 had a zero mean [16]. Thus, texture image TIL5L5 was used to normalize the contrast of all other texture images TIði; jÞ. This step makes these descriptors contrast independent as follows. Normalize ðTImask Þ ¼

TIði;jÞmask TIði;jÞL3L3

ð5Þ

The appropriate convolution of these masks yields nine combinations of 3 3 masks, of which we use zero-sum masks 1 through 8. The outputs ðTIÞ from Laws' masks are passed to texture energy measurement ðTEMÞ filters. These filters consist of a moving non-linear window average of absolute values as follows. TEMði;jÞ ¼

3

∑

3 ∑ TIði þ u;j þ vÞ

microstructures such as bright spot (U ¼ 0), flat area, or dark spot ðU ¼ 8Þ, and edges of varying positive and negative curvature ðU ¼ 1 7Þ. Therefore, a rotation invariant measure called LBP P;R that uses uniformity measure U is calculated based on the number of transitions in the neighborhood pattern. Only patterns with U r 2 are assigned to the LBP code. If the number of bittransitions in the circular bit-stream is less than or equal to 2, the center pixel is labeled as uniform. 8 P 1 > < ∑ sðg g Þ p c LBPP;R ðxÞ ¼ P ¼ 0 > : P þ1

if UðxÞ r 2

ð7Þ

otherwise

where ( sðxÞ ¼

1;

xZ0

0;

xo0

Multi-scale image analysis using LBP is performed by choosing circles with various radii around the center pixels and, thus, constructing a separate LBP image for each scale. In this work, the energy and entropy of the LBP image, constructed over different scales (R¼1, 2, and 3 with the corresponding pixel count P being 8, 16, and 24, respectively) are used as feature descriptors.

ð6Þ

u ¼ 3 v ¼ 3

The image under inspection is filtered using these eight masks, and their energies are computed and used as feature [12]. 4.1.3. Local binary pattern (LBP) The local binary pattern (LBP) has been established as a robust and efficient texture descriptor by [13]. In its simplest form, the LBP feature vector is determined using the following method. A circular neighborhood is considered around a pixel. A set of P points that are located on the circumference of the circle with radius R and are equidistant from the center pixel are chosen. The gray values at points on the circular neighborhood that do not coincide exactly with pixel locations are estimated using bilinear interpolation. Let g c be the gray value of the center pixel and g p , p ¼ 0; …; P 1 correspond to the gray values of the P points. These P points are converted into a circular bit-stream of 0 s and 1 s according to whether the gray value of the pixel is less than or greater than the gray value of the center pixel. Fig. 3 depicts the circular symmetric neighbor sets for P and R. Ojala et al. introduced uniformity in texture analysis by classifying each pixel as uniform or non-uniform and used the uniform pixels for further computation of texture descriptor [13]. These uniform fundamental patterns have a uniform circular structure that contains few spatial transitions U (number of spatial bitwise 0/1 transitions) and function as templates for

4.1.4. Texture from Fourier spectrum In any image ðxi ; yi Þ where i ¼ 1; 2; …; K represents the edge points of an object, Fourier descriptors of that edge can be represented by the following approach. Each point can be treated as a complex number [14] so that sðkÞ ¼ xi þjyi

ð8Þ

Now the DFT of sðkÞ is K 1

aðuÞ ¼ ∑ sðkÞe j2π uk=K

ð9Þ

k¼0

If the length of DFT of any sequence is the same as an original sequence, the number of the descriptors varies as the length of the edge changes. Here, the AC power of the Fourier descriptor is computed as follows: P AC ¼

∑

ðF 2R ðu; vÞ þ F 2I ðu; vÞÞ;

ð10Þ

u a 0;v a 0

where F R ðu; vÞ and F I ðu; vÞ are the real and imaginary segments of the Fourier transform of the image, respectively, and u and v are the frequencies along the x and y axes of the image, respectively. Fourier descriptors are not invariants to scaling and translation.

Fig. 3. Circularly symmetric neighbor sets for P and R [13].


4.1.5. Gabor wavelet A 2D Gabor function is a Gaussian modulated complex sinusoid and is written as follows:

ψ i;k ðm; nÞ ¼

1 2 2 þ ns2 þ 2π jωmÞ 2π sm sn exp 12 m s2 m

ð11Þ

In this equation, ω is frequency of the sinusoid, and sm and sn are the standard deviations of the Gaussian envelopes. 2D Gabor wavelets are obtained by dilating and rotating the mother Gabor wavelet ψ ðm; nÞ using ψ i;k ðm; nÞ ¼ a 1 ψ a 1 ðm cos θ þ n sin θÞ; a 1 ð m sin θ þ n cos θÞ ; a 4 1; ð12Þ Where a 1 is a scale factor, l and k are integers, the orientation θ is given by θ ¼ kπ =K, and K is the number of orientations. The parameters sm and sn are calculated using the design strategy proposed by [15]. Given an image Iðm; nÞ; the Gabor wavelet transform is obtained as xl;k ðm; nÞ ¼ Iðm; nÞnψ l;k ðm; nÞ ð13Þ

where n denotes a convolution operator. The parameters K and S are number of orientation and number of scales, respectively. The mean and the standard deviation are used as features and given by

μl;k ðm; nÞ ¼

M N 1 ∑ ∑ ðxl;k ðm; nÞ μl;k Þ2 M N m ¼ 1n ¼ 1

12

In this section, we describe two feature ranking schemes: the

χ2 and relief feature ranking to rank the 64 features. We also describe the two feature subset-selection strategies used to select the most discriminatory features, CFS using genetic search and CSE using the greedy stepwise search [6,7]. 4.3.1. Chi-square ðχ 2 Þ feature ranking In χ 2 feature ranking, the value of a feature is estimated by computing the value of the χ 2 statistic of each feature. This feature evaluation technique is divided into two phases. In the first phase, each feature is sorted according to a significance level (sigLevel). Based on this value, the features are discretized into random intervals. Once discretization is performed, the χ 2 value is computed for every pair of adjacent intervals of the feature. The procedure then merges the pair of intervals with the lowest χ 2 value. This merging is terminated when the χ 2 value exceeds the previous set sigLevel. The χ 2 statistic is determined as follows: ðAij Eij Þ2 Eij i¼1j¼1 2

M N 1 ∑ ∑ x ðm; nÞ and M N m ¼ 1n ¼ 1 l;k

sl;k ¼

In this paper, the values of the features are scaled to a minimum of zero and a maximum of one. After the features are normalized, they are subject to equidepth binning. 4.3. Feature ranking and selection

n

for l ¼ 1; 2; …; S and k ¼ 1; 2; …; K;

k

χ2 ¼ ∑ ∑ :

ð14Þ

The feature vector is then constructed using μl;k and sl;k as feature components, for K ¼ 6 orientations and S ¼ 4 scales, resulting in a feature vector of length 48, given by f ¼ fμ11 ; s11 ; …; μ48 ; s48 g

2159

ð15Þ

4.2. Ensemble selection (ES) for training and classification In this section, we describe the different data preprocessing steps used prior to subjecting the feature vectors for the training and classification stages of the decision system. We also describe a means of gauging the performance of the proposed textural features using known feature ranking and feature selection strategies. We use two feature ranking techniques, the chi-square ðχ 2 Þ and relief feature ranking schemes [7]. The feature subset selection strategies used include correlation-based feature selection (CFS) with genetic search strategy and the consistency subset evaluation (CSE) filter with the greedy stepwise search strategy [7]. Furthermore, we describe the ensemble classifier based on ensemble selection [8]. Our objective in this paper was to create a diverse set of classifiers are used to create the ensemble model. These classifiers include the probability network-based hidden naïve Bayes (HNB) [17] learner, the probability-based naïve Bayes (NB) [17] learner, the statistical sequential minimal optimization (SMO) [18] learner using the polynomial based kernel, and the tree-based J48 learner (J48) [19]. We have chosen these classifiers as they are all non-linear and work on independent philosophies of model construction during the training phase. Moreover, by using a diverse set of classifiers to create an ensemble we ensure the removal of biases brought about by redundant in the decision systems. 4.2.1. Feature normalization and discretization The 64 features are subject to the min–max normalization, in which each feature is rescaled to lie within a predetermined range.

ð16Þ

where k is the number of classes, Aij is the number of patterns in the ith interval of the jth class, Ri is the number of patterns in the ith interval¼ ∑kj ¼ 1 Aij , C j is the number of patterns in the jth class¼ ∑2i ¼ 1 Aij , N is the number of patterns¼∑2i ¼ 1 Ri , and Eij is the expected frequency of Aij ¼ Ri C j =N. If Ri or C j is 0, Eij is set to 0.1. The degree of freedom of the χ 2 statistic is one less than the number of classes. The second phase of feature evaluation is the fine-tuning of the process performed in the first phase. Once feature intervals have been merged, a consistency check is performed. Any inconsistency in the merging of feature i does not pass the previously determined sigLevelðiÞ for that feature, may not be considered a potentially significant feature, and is discouraged for future merging. In this way, the features are ranked according to the level of significance. 4.3.2. Relief feature ranking The relief algorithm uses a relevancy parameter τ, which acts as a threshold that ranges between ð0 o τ o 1Þ and is used to gauge the statistical relevancy of a feature to the class to which an image belongs. The difference between two images, X and Y, is defined by the following function: dif f ðxk ; yk Þ ¼ ðxk yk Þ=nuk ;

ð17Þ

where nuk normalizes the values of dif f into the interval [0,1]. The relief method uses two measures “near-hit” and “near-miss” to describe the proximity of an image to a subset of images belonging to a class. An image is a near-hit of X if it belongs to a close neighborhood of X and to the same class as X. Similarly, an image is a near-miss if it belongs to the proximity of X but also belongs to a different class than X. The algorithm creates a triplet for each image as follows image X; near hit; near miss, where the near hit and near miss are chosen using Euclidean distance. Once the near hit and near miss are determined, a feature weight vector W is updated using the following: W i ¼ W i dif f ðxi ; near hiti Þ2 þdif f ðxi ; near_missi Þ2

ð18Þ

2160


A relevance vector R is determined using every sample triplet. This relevance vector R is derived from the weight vector W, and is used to depict the relevance of each feature using the following: R ¼ ð1=mÞW

ð19Þ

The features in this paper are ranked based on increasing order of relevance R. 4.3.3. Correlation-based feature selection (CFS) using genetic Search CFS uses a heuristic measure “merit” to evaluate the importance of a feature to predict the class label and guages the level of intercorrelation between features simultaneously [7]. The merit of a feature s is defined using the following relation: kr cf Merit s ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k þ kðk 1Þr f f

ð20Þ

where Merit s is the merit heuristic of a feature subset s containing k features, r cf the average feature class correlation, and r f f is the average feature-to-feature inter-correlation. In this work, the CFS is paired with the iterative genetic search algorithm. The genetic search algorithm utilizes a neural network to weigh the features based on their correlation to class labels. The genetic search algorithm has been proven more efficient than traditional search approaches, especially with training sets with a large number of features [20]. 4.3.4. Consistency subset evaluation using greedy stepwise search In this work, we gauge the effectiveness of CSE strategy in conjunction with the greedy stepwise search. CSE finds the combinations of features which have values that divide the data into subsets containing a strong single class majority, i.e. high class consistency. This consistency was first presented as a measure by [7] as follows: Consistencys ¼ 1

∑Ji ¼ 0 jDi j jM i j N

ð21Þ

where s is a feature subset, J is the number of distinct combinations of feature values for s, jDi j is the number of occurrences of the ith feature value combination, jM i j is the cardinality of the majority class for the ith feature value combination, and N is the number of instances in the dataset. The rank of the feature is determined according to its overall contribution to the consistency of the feature set. In this work, the greedy stepwise subset evaluation is brought about using the greedy forward search through the feature space resulting in a subset of features that are discriminatory. 4.4. Classification using ensemble selection (ES) In this section, we describe the use of ensemble selection [8] to create an ensemble classifier for image classification. Ensemble classifiers, also known as multiple classifier systems (MCS), are a collection of learning models employed in unison for combined decision making. 4.4.1. Creation of the ensemble ES [8] is used to build an ensemble classifier from large collections of diverse models. ES is composed of two phases. In the first phase, known as ensemble overproduction, a large set of models is generated. In the second phase, or the choice phase, a subset of models is selected from the models generated in the first phase. In the first phase, we create a model library (ML). As shown in Table 1, the ML is a collection of 57 models that are derived from the four parent classifiers, HNB, NB, SMO, and J48. Each model in

Table 1 Models in model library. Classifier

# Models in library

Hidden Naïve Bayes Naïve Bayes SMO with polynomial Kernel J48

2 3 12 40

the ML is a variation of the parent classifier (see Supplementary materials for details). These variations are brought about by changing the parameters of the parent classifiers. In the second phase of ES, the models are combined in ML such that optimal accuracy is achieved. Analogous to the feature selection problem, ES in this work employs the forward selection strategy to choose the best subset of models. Thus, ES initiates the creation of an ensemble classifier by choosing a model from the ML at random. Models are added one at a time to form a subset of models. If the evaluation criterion of the ensemble is higher than the evaluation criteria in the previous step, then more models are added to the existing ensemble. This iterative process is terminated when the ensemble of size k þ 1 models exhibit values lower than the ensemble of size k. 4.4.2. Ensemble evaluation criteria ES is commonly plagued by the problem of over-fitting that is associated with the forward selection technique used to create the ensemble classifier. To check for over-fitting, we test the ensemble classifier with training sets of different sizes, using the hillclimbing technique. We also employ ensemble selection with 5-fold cross validation. During the creation of the ensemble, five models are chosen randomly for each iteration of ES. Furthermore, we propose six known evaluation criteria to determine the performance of the ensemble. These criteria include the optimized set using the root mean square error (RMS), overall accuracy (ACC), precision/recall F-score (FSC), area under ROC (AUC), and average precision (APR). The metric squared error, accuracy and ROC (SAR), SAR ¼ ðACC þ AUC þ ð1 RMSÞÞ=3 is a robust metric used when the correct metric is unknown.

5. Results and discussion In this section, we describe the performance of the proposed decision system using textural features for the classification of DM images into three known classes. 5.1. Performance evaluation To explore the ES sensitivity to the size of data, we analyze the effects of under-sampling and over-sampling using the hill climbing process. As the number of models in a library increases, the chances of finding combinations of models that over-fit the hillclimbing set increases. To avoid this problem, we use bagged ensemble selection. Here, we reduce the number of models selected by creating a model bag. The models in the bag are chosen at random. We set the model bag size to 10 and repeat the bag ensemble 10 times to ensure that the best models are selected. As a result, each ensemble is a weighted average of models implying that the average of a set of ensembles is a simple weighted average of the base-level models. Table 2 contains the results obtained from our analysis using a 5-fold cross validation on over-sampling and under-sampling


results. We observe and report a linear decrease in all the evaluation criteria as we proceed from the lower values of hillclimbing iterations to the higher values of these iterations. We also observe an average of 96.7% accuracy, with a high accuracy of 97.8%. Furthermore, we compare our results to other ensemble approaches such as AdaBoost using J48, bagging using J48, voting using the average of probabilities, majority voting, and random forest [8]. As observed in Table 3, there is a significant improvement in all evaluation criteria accuracy using the proposed decision system.

Table 2 Performance evaluation of decision system. Hill climbing iterations

ACC

FSC

APR

RMS

AUC

SAR

10 20 30 40 50 60 70 80 90 100 250 500 1000 2500 5000

0.978 0.978 0.967 0.967 0.967 0.967 0.967 0.967 0.967 0.978 0.956 0.967 0.956 0.944 0.967

0.978 0.978 0.966 0.966 0.966 0.966 0.966 0.966 0.966 0.978 0.955 0.966 0.955 0.944 0.966

0.978 0.978 0.967 0.967 0.967 0.967 0.967 0.967 0.967 0.978 0.956 0.967 0.956 0.945 0.967

0.387 0.387 0.394 0.393 0.394 0.389 0.389 0.389 0.389 0.385 0.395 0.393 0.396 0.400 0.388

0.998 0.999 0.996 0.996 0.996 0.998 0.998 0.998 0.998 0.999 0.995 0.997 0.994 0.989 0.998

0.863 0.863 0.856 0.865 0.865 0.858 0.858 0.858 0.858 0.865 0.851 0.856 0.851 0.844 0.858

Table 3 Comparative analysis with other ensemble techniques. Ensemble approaches

ACC

FSC

APR

RMS

AUC

SAR

Ada boost (J48) Bagging (J48) Voting with avg. of probabilities Majority voting Random forest

0.467 0.411 0.467 0.444 0.456

0.459 0.410 0.455 0.433 0.443

0.455 0.409 0.453 0.429 0.442

0.558 0.483 0.502 0.609 0.474

0.625 0.593 0.623 0.583 0.619

0.511 0.507 0.529 0.473 0.533

2161

5.2. Feature ranking and selection In order to establish the effects of feature ranking and feature selection strategies on classification accuracy, we use two feature ranking strategies, the χ 2 and relief methods of feature ranking. Fig. 4 shows the assigned ranks each feature. Gabor wavelet features and LBP features are consistently ranked higher than the other features within the vector. Furthermore, the features are subjected to two feature subset selection strategies: CFS using genetic search and CSE using the greedy stepwise search. The feature set selected using CFS using genetic search included 14 features, of which LBP1, LBP5 and LBP6 were the three most prominent features of the selected feature set. Similarly, LBP2, LBP5, and Fourier features were selected as the three most prominent of the five features selected using CSE using greedy stepwise search. Table 3 (in Supplementary materials) enumerates the selected subset of features that are considered to be highly discriminatory.

5.3. Effects of feature ranking and selection To determine the effects of feature ranking and selection, we carried out tests on ensemble classifier using the features after feature ranking and feature selection is performed. The log plots in Fig. 5 show the plots of ACC, RMS, and AUC using hill climbing (HC) sets of 10–100 with step sizes of 10, 250, 500, 1000, 2500, and 5000 with a 5-fold cross validation. HC is carried out on the training set (with replacement) to gauge the ensemble classifiers performance with larger dataset. From Fig. 5 (a), we observe that χ 2 and the relief ranking scheme exhibit the same trends, implying that the ensemble performance is independent of ranking. However, there is a visible decrease in ACC with the two feature selection approaches applied. This decrease indicates that when fewer features are used for classification, there is a considerable effect on the overall performance of the system. This observation is further reinforced in RMS and AUC readings obtained (Fig. 5 (b) and (c)). We thus believe that the systems performance is affected by the number of features present and is not susceptible over-fitting based the number of features.

Fig. 4. Ranks of features using feature ranking and feature selection schemes.

2162


Fig. 5. The log scale representation of three performance measures on using feature ranking and feature selection, respectively. (a) Percentage accuracy (ACC), (b) root mean square error (RSM), and (c) area under the ROC curve (AUC).

6. Conclusion [2]

In this paper, we have proposed a decision support system using textural features and an ensemble classifier for the classification of DM images. We successfully demonstrate that through effective optimization and performance evaluation, the proposed system achieves on average a classification accuracy of 96.7%. Furthermore, we have tested the scalability of the decision system to handle large number of samples and determined that it is not significantly affected by the number of samples in the training set using varied hill climbing iterations. We observe that the performance of the system is stable indicating it can handle a large number of samples. In the future, we plan to extend the framework to handle other medical image datasets and extend the system to handle a wider range of textural features.

Conflict of interest statement The authors do not have any related conflict of interest.

[3]

[4]

[5]

[6]

[7]

[8]

[9] [10]

Acknowledgment [11]

The project described was supported in part by the National Institutes of Health (NIH) through the National Institute of General Medical Sciences (NIGMS) Grant 8P20GM103424. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIGMS or the NIH.

[12] [13]

[14] [15]

Appendix A. Supplementary material Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.compbiomed. 2013.10.003. References

[16] [17] [18]

[19] [20]

[1] C. Sinthanayothin, V. Kongbunkiat, S. Phoojaruenchanachai, A. Singalabanija, Automated screening system for diabetic retinopathy, in: Proceedings of the 3rd

International Symposium on Image and Signal Processing and Analysis, Aizu, Japan, 2003, pp. 915–920. G.G. Yen, W.-F. Leong, A sorting system for hierarchical grading of diabetic fundus images: a preliminary study, IEEE Trans. Inf. Tech. Biomed. 12 (2008) 118–130. P. Kahai, K.R. Namuduri, H. Thompson, A decision support framework for automated screening of diabetic retinopathy, Int. J. Biomed Imaging 2006 (2006) 1–8. J. Nayak, P.S. Bhat, U.R. Acharya, Automatic identification of diabetic maculopathy stages using fundus images, J. Med. Eng. Tech. 33 (2009) 119–129, http: //dx.doi.org/10.1080/03091900701349602. M.H. Ang, U.R. Acharya, S.V. Sree, T.-C. Lim, J.S. Suri, Computer-based identification of diabetic maculopathy stages using fundus images. Multi modality state of the art medical image segmentation and registration methodologies, in: A.S. El-Baz, et al., (Eds.), Science þ Business Media, LLC, vol. 1, Springer, New York, 2011, p. 319. R. Acharya, S. Dua, X. Du, C. Chua, V. Sree, Automated diagnosis of glaucoma using texture and higher order spectra features, IEEE Trans. Inf. Tech. Biol. 15 (2011) 1. S. Dua, R.U. Acharya, P. Chowriappa, S.V. Sree, Wavelet-based energy features for glaucomatous image classification, IEEE Trans. Info. Tech. Biomed. 16 (2012) 80–87, http://dx.doi.org/19.1109/TITB.2011.2176540. R. Caruana, A. Munson, N.-M. Ru, Getting the most out of ensemble selection, in: Proceedings of the 6th International Conference on Data Mining, ICDM'06, pp. 828–833. B.B. Mandelbrot, The Fractal Geometry of Nature, W H Freeman, New York, 1982. M.K. Biswas, T. Ghose, S. Guha, P.K. Biswas, Fractal dimension estimation for texture images: a parallel approach, Pattern Recognition Lett. 19 (1998). K.I. Laws, Rapid texture identification, in: Proceedings of SPIE Image Processing for Missile Guidance, vol. 238, 1980, pp. 376–380. M. Petrou, P.G. Sevilla, Image Processing – Dealing with Texture. s.l, John Wiley and Sons, West Sussex, England, 2006. T. Ojala, M. Pietikäinen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. 24 (2002) 971–987. R.C. Gonzalez, R.E. Woods, Digital Image Processing, 2nd ed., Prentice Hall, New York (2002) 2002; 655–659. B.S. Manjunath, W.Y. Ma, Texture Features for browsing and retrieval of image data, IEEE Trans. Pattern Anal., vol. 18, pp. 837–842. S. Liao, M.W.K. Law, A.C.S. Chung, Dominant local binary patterns for texture classification, IEEE Trans. Image Process. 18 (2009) 1107–1118. H. Zhang, L. Jiang, and J. Su, Hidden naive Bayes, in: Proceedings of the 20th National Conference on Artificial Intelligence, 2005, pp. 919–924. S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, K.R.K. Murthy, Improvement to Platt's SMO algorithm for SVM classifier design, Neural Comput. 13 (2001) 637–649. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, 1993. P.-Y. Xia, X.-Q. Ding, B.-N. Jiang, A GA-based feature selection and ensemble learning for high-dimensional datasets, in: Proceedings of the 2009 International Conference on Machine Learning and Cybernetics, 2009, pp. 7–12.

A Selective Ensemble Classification Method Combining Mammography Images with Ultrasound Images for Breast Cancer Diagnosis.

Multi-View Ensemble Classification of Brain Connectivity Images for Neurodegeneration Type Discrimination.

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data.

Classification of diabetes maculopathy images using data-adaptive neuro-fuzzy inference classifier.

Diabetic retinopathy and maculopathy.

[Indications for coagulation in diabetic maculopathy].

Macular hole in diabetic maculopathy.

Delineation of blood vessels in pediatric retinal images using decision trees-based ensemble classification.

Automatic screening and classification of diabetic retinopathy and maculopathy using fuzzy image processing.

A statistical approach to set classification by feature selection with applications to classification of histopathology images.

Excessive permeability in diabetic maculopathy.

Learning ensemble classifiers for diabetic retinopathy assessment.

An ensemble-of-classifiers based approach for early diagnosis of Alzheimer's disease: classification using structural features of brain images.

[Binocular Fluocinolone Acetonide Intravitreal Implant for Therapy-Resistant Diabetic Maculopathy].

Machine learning techniques for diabetic macular edema (DME) classification on SD-OCT images.

OCT parameters in patients with diabetic maculopathy.

Ensemble Deep Learning for Biomedical Time Series Classification.

Using beta binomials to estimate classification uncertainty for ensemble models.

Genetic programming based ensemble system for microarray data classification.

Locality-constrained Subcluster Representation Ensemble for lung image classification.

Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification.

Graph ensemble boosting for imbalanced noisy graph stream classification.

Ensemble Hierarchical High-Order Functional Connectivity Networks for MCI Classification.

Novel optical coherence tomography classification of torpedo maculopathy.