IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 4, JULY 2014

1485

A Methodology for Improving Tear Film Lipid Layer Classification Beatriz Remeseiro, Veronica Bolon-Canedo, Diego Peteiro-Barral, Amparo Alonso-Betanzos, Bertha Guijarro-Berdi˜nas, Antonio Mosquera, Manuel G. Penedo, and Noelia S´anchez-Maro˜no

Abstract—Dry eye is a symptomatic disease which affects a wide range of population and has a negative impact on their daily activities. Its diagnosis can be achieved by analyzing the interference patterns of the tear film lipid layer and by classifying them into one of the Guillon categories. The manual process done by experts is not only affected by subjective factors but is also very time consuming. In this paper we propose a general methodology to the automatic classification of tear film lipid layer, using color and texture information to characterize the image and feature selection methods to reduce the processing time. The adequacy of the proposed methodology was demonstrated since it achieves classification rates over 97% while maintaining robustness and provides unbiased results. Also, it can be applied in real time, and so allows important time savings for the experts. Index Terms—Feature selection, Guillon categories, machine learning, tear film lipid layer, textural features.

I. INTRODUCTION EARS are secreted from the lachrymal gland and distributed by blinking to form the tear film of the ocular surface [1]. The tear film is responsible for wetting the ocular surface, which is the first line of defense, and is also essential for clear visual imaging [2]. Its outer layer, known as tear film lipid layer (TFLL), is composed of a polar phase with surfactant properties overlaid by a non-polar phase. It is the thinnest layer of the tear film and is mainly secreted by the meibomian glands, embedded in the upper and lower tarsal plates [3]. Some of its functions include forming a glossy and smooth surface with high optic qualities, establishing the tear film, sealing the lid margins

T

Manuscript received February 14, 2013; revised October 10, 2013; accepted December 8, 2013. Date of publication December 11, 2013; date of current version June 30, 2014. This paper has been partially funded by the Secretar´ıa de Estado de Investigaci´on of the Spanish Government and FEDER funds of the European Union through the research projects PI10/00578, TIN2011-25476, and TIN2012-37954; and by the Conseller´ıa de Industria of the Xunta de Galicia through the research projects CN2011/007 and CN2012/211. B. Remeseiro, V. Bolon-Canedo and D. Peteiro-Barral acknowledge the support of Xunta de Galicia under Plan I2C Grant Program. B. Remeseiro, V. Bolon-Canedo, D. Peteiro-Barral, A. Alonso-Betanzos, B. Guijarro-Berdi˜nas, M.G. Penedo, and N. S´anchez-Maro˜no are with the Dept. de Computaci´on, Universidade da Coru˜na, Campus de Elvi˜na s/n, A Coru˜na 15071, Spain (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). A. Mosquera is with the Dept. de Electr´onica y Computaci´on, Universidade de Santiago de Compostela, Campus Universitario Sur, Santiago de Compostela 15782, Spain (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JBHI.2013.2294732

during sleep, preventing spillover of tears, and controlling water evaporation from the tear film [4]. A quantitative or qualitative change in the normal lipid layer has a negative effect on the quality of vision measured as contrast sensitivity [5] and on the evaporation of tears from the ocular surface. Actually, it has been shown that a substantial tear evaporation caused by alterations of the lipid layer is characteristic of the evaporative dry eye (EDE) [6]. This disease leads to irritation of the ocular surface and is associated with symptoms of discomfort and dryness [7]. Also, it affects a wide sector of the population, specially among contact lens users, and worsens with age. The current work conditions, such as computer use, has increased the proportion of people with EDE [8]. The lipid layer thickness can be evaluated by the classification of the interference phenomena, which correlates with tear film quality [9], since a thinner lipid layer speeds up water evaporation, decreasing the tear film stability and often causing the EDE. The Tearscope plus is the instrument designed by Guillon for rapid assessment of lipid layer thickness [10]. Other devices were designed for lipid layer examination, but Tearscope plus is still the most commonly used instrument in clinical settings and research. In general, Tearscope images have been analyzed in vivo, without any acquisition process. However, there are some studies that have used video/images acquisition procedures [11], [12], which aid the observer to correctly classify the patterns and are indispensable for computer-based analysis. The acquisition process in those studies are similar to that performed in this paper: Rolando et al. [11] acquired lipid layer videos using the Tearscope plus, whilst Nichols et al. [12] obtained a photograph series. Guillon defined five main grades of lipid layer interference patterns [10] to evaluate the lipid layer thickness through the Tearscope plus: open meshwork, closed meshwork, wave, amorphous and color fringe. However, the classification into these grades is a difficult clinical task, especially with thinner lipid layers that lack color and/or morphological features. The subjective interpretation of the experts via visual inspection may affect the classification. This time-consuming task is very dependent on the training and experience of the optometrist(s), and so produces a high degree of inter- and also intra- observer variability [13]. The development of a systematic and objective computerized method for analysis and classification is thus highly desirable, allowing for homogeneous diagnosis and relieving the experts from this tedious task. Some techniques have been designed to objectively calculate the lipid layer thickness where a sophisticated optic system was necessary [14] or an

2168-2194 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

1486

Fig. 1.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 4, JULY 2014

Steps of the research methodology.

interference camera evaluated the lipid layer thickness by only analyzing the interference color [15]. In [13], [16], it was demonstrated that the interference phenomena can be characterized as a color texture pattern, so the classification could be automatized using machine learning techniques. Similarly, in [17] a wide set of feature vectors was proposed using different texture analysis methods in three color spaces and a 95% of classification accuracy was obtained. Up to the authors’ knowledge, there are no more attempts in the literature to automatize TFLL classification. The problem with the approach proposed in [17] is that the time required to extract some of the textural features is too long (more than one minute). Interviews with optometrists revealed that a scale of computation time over ten seconds per image makes the system not usable. Therefore, the previous approach prevents the practical clinical use of an application developed to automatize the process, because it could not work in real time. In order to deal with this problem, feature selection can play a crucial role. In machine learning, feature selection (FS) is defined as the process of detecting the relevant features and discarding the irrelevant ones [18], to obtain the subset of features that describes properly the given problem. Therefore, because the number of features to extract and process will be reduced, the time required also will be reduced in consonance, and most of the times, this can be achieved with a minimum degradation of performance. In this paper, a methodology for improving TFLL classification will be presented, in which feature selection plays a crucial role by optimizing both accuracy and processing time. For this purpose, several FS methods will be applied after extracting the color and texture features trying to improve the performance of the results achieved in [17]. Moreover, since the evaluation of the system is performed in terms of several criteria, a multiple-criteria decision-making (MCDM) method will be used to obtain the best solution. This will lead us to a general methodology able to be applied to some other classification problems. This paper is organized as follows: Section II describes the research methodology, Section III presents the materials and methods employed in this paper, Section IV shows the results and discussion and, finally, Section V includes the conclusions. II. RESEARCH METHODOLOGY In order to obtain an efficient method for automatic TFLL classification, a five-step methodology was applied as illustrated in Fig. 1. First, feature extraction is performed, after which FS methods are applied to select the subset of relevant features, that allow for correct classification of the TFLL thickness. After that, several performance measures are computed to evaluate the performance of the system. Finally, a MCDM method is carried out to obtain a final result. In what follows, every step will be explained in depth.

Fig. 2.

Steps of the feature extraction step.

Fig. 3.

(a) Input image in RGB. (b) Sub-template and region of interest.

A. Feature Extraction The initial stage in this methodology consists in processing the tear film images in order to extract their features (see Fig. 2). First, the region of interest of an input image in RGB is extracted. Then, this extracted region in RGB is converted to the Lab color space and its channels L, a and b are obtained. After that, the texture features of each channel are extracted and three individual descriptors are generated. Finally, the three individual descriptors are concatenated in order to generate the final descriptor of the input image which contains its color and texture features. Next, every stage will be explained in detail apart from the Concatenation stage due to its simplicity. 1) Extraction of the Region of Interest: The input images, as depicted in Fig. 3(a), include several areas of the eye that do not contain relevant information for the classification. Experts usually focus on the bottom part of the iris, because this is the area where the tear can be perceived with the best contrast. This forces a preprocessing step aimed at extracting the region of interest (ROI) [19]. The acquisition procedure guarantees that this region corresponds to the most illuminated area of the image. To this end, the input image is transformed to the Lab color space and only the L component is considered in this step. Also, a set of ring-shaped templates that define different ROI shapes is used. Then, the normalized cross-correlation between the L component and the set of templates is computed. Finally, the region with maximum cross-correlation value is selected [see Fig. 3(b)] and the ROI of the input image is obtained through a completely automatic process. 2) Color Analysis: Color is one of the discriminant features of the Guillon categories [10] for TFLL classification. Some categories show distinctive color characteristics and, for this reason, tear film images were analyzed [17] not only in grayscale but also in the Lab and RGB color spaces. The best results were obtained in Lab, so we will focus on it from now on. The CIE 1976 L*a*b color space [20] (Lab) was defined by the International Commission on Illumination, abbreviated as CIE from its French title Commission Internationale de l’Eclairage. It is a chromatic color space that describes all the colors that the human eye can perceive. Lab is a 3-D model where its three coordinates represent: (i) the luminance of the color L, (ii) its position between magenta and green a, and (iii) its position between

REMESEIRO et al.: METHODOLOGY FOR IMPROVING TEAR FILM LIPID LAYER CLASSIFICATION

yellow and blue b. Its use is recommended by CIE in images with natural illumination and its colorimetric components make this color space appropriate in texture extraction. 3) Texture Analysis: As well as color, interference patterns are one of the discriminant features of the Guillon categories. It has been demonstrated [17] that the interference phenomena can be characterized as a texture pattern, since thick lipid layers show clear patterns while thinner layers are more homogeneous. As several techniques for texture analysis can be applied, in this paper five methods are tested. These are subsequently described: 1) Butterworth bandpass filters [21] are frequency domain filters that have an approximately flat response in the bandpass frequency, which gradually decays in the stopband. The order of the filter defines the slope of the decay; the higher the order, the faster the decay. 2) The discrete wavelet transform [22] generates a set of wavelets by scaling and translating a mother wavelet. The wavelet decomposition of an image consists of applying these wavelets horizontally and vertically, generating four images (LL, LH, HL, HH). The process is repeated iteratively on the LL subimage resulting in the standard pyramidal wavelet decomposition. 3) Co-occurrence features method was introduced by Haralick et al. [23], which is based on the computation of the conditional joint probabilities of all pairwise combinations of gray levels. This method generates a set of gray level co-occurrence matrices and extracts several statistics from their elements. In general, the number of orientations and so of matrices for a distance d is 4d. 4) Markov random fields [25] generate a texture model by expressing the gray values of each pixel in an image as a function of the gray values in its neighborhood. 5) Gabor filters [26] are complex exponential functions modulated by Gaussian functions. The parameters of Gabor filters define their shape, and represent their location in the spatial and frequency domains.

1487

search, however the performance obtained was not good. The poor behavior showed by wrappers in this kind of scenarios, together with the significant computational burden required by this approach [28], prevent their use in this paper. Therefore, filters were chosen because they allow for reducing the dimensionality of the data without compromising the time and memory requirements of machine learning algorithms. Among the broad suite of methods present in the literature, the following filters were chosen based on previous studies [29], [30] and are subsequently presented: 1) Correlation-based feature selection (CFS) [31] is a simple multivariate filter algorithm that ranks feature subsets according to a correlation based heuristic. The goal is to select subsets that contain features that are highly correlated with the class and uncorrelated with each other. 2) Consistency-based filter [32] evaluates the worth of a subset of features by the level of consistency in the class values when the samples are projected onto the subset of attributes. The inconsistency criterion specifies to what extent the dimensionally reduced data can be accepted. 3) The INTERACT algorithm [33] is a subset filter based on symmetrical uncertainty and the consistency contribution, which is an indicator about how significantly the elimination of a feature will affect consistency. This method can handle feature interaction. C. Classification Support vector machine (SVM) is a widely-used classifier based on the statistical learning theory and revolves around the notion of a “margin”, either side of a hyperplane that separates two classes [34]. SVM reaches a global minimum and avoids local minima, which may happen in other algorithms. They avoid problems of overfitting and, with an appropriate kernel, they can work well even if the data is not linearly separable. D. Performance Measures

B. Feature Selection Machine learning can take advantage of FS to reduce the number of features so as to improve the performance of automatic classifiers [18]. FS methods can be divided into three main models: filters, wrappers and embedded methods [18]. The filter model relies on general characteristics of the data (correlation, entropy, etc.) to evaluate and select feature subsets without involving any learning algorithm or prediction model. On the other hand, wrapper models use a specific prediction method as a black box to score subsets of features as part of the selection process and finally, embedded methods perform FS as part of the training process of the prediction model. By having some interaction with the classifier, wrapper and embedded methods tend to give better performance results than filters, at the expense of a higher computational cost. Also, it is well-known that wrappers have the risk of overfitting when having more features than samples [27], as it is the case in this paper. Trying to overcome this limitation, some preliminary tests have been performed in this paper using a wrapper approach with sequential forward

After the SVM is trained, the performance of the system is evaluated in terms of different measures of relevance to the problem in question. The definitions of the performance measures are provided as follows: 1) The accuracy is the percentage of correctly classified instances on a dataset with optimum illumination. 2) The robustness is the classification accuracy in a noisy dataset, i.e. its accuracy when the images in the dataset show illuminations outside the optimum range. This measure is related to the generalization ability of the method when handling noisy inputs. Notice that the higher the robustness the higher the generalization performance. 3) The feature extraction time is the time that the texture analysis methods take to extract the selected features of a single image. Notice that this does not include the training time of the classifier, which is not relevant for practical applications because the classifier will be trained off-line. This also applies to FS, which is a pre-processing step that is performed off-line.

1488

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 4, JULY 2014

TABLE I LIPID LAYER INTERFERENCE PATTERNS. FROM TOP TO BOTTOM: OPEN MESHWORK, CLOSED MESHWORK, WAVE, AND COLOR FRINGE

E. Multiple-Criteria Decision-Making Multiple-criteria decision-making (MCDM) [35] is focused on evaluating classifiers from different aspects and produce rankings of them. A multi-criteria problem is formulated using a set of alternatives and criteria. Among many MCDM methods that have been developed up to now, technique for order of preference by similarity to ideal solution (TOPSIS) [36] is a well-known method that will be used. TOPSIS finds the best algorithms by minimizing the distance to the ideal solution whilst maximizing the distance to the anti-ideal one. The extension of TOPSIS proposed by Olson [37] is used in this paper. III. MATERIALS AND METHODS The aim of this paper is to present a general methodology based on color texture analysis, FS and MCDM. In this paper, we test this methodology on TFLL classification to improve previous results [17]. The materials and methods used in this paper are described in this section. A. Datasets Guillon defined five main grades of lipid layer interference patterns in increasing thickness [10]: open meshwork, closed meshwork, wave, amorphous and color fringe. In this paper, a clinical dataset provided by optometrists will be used. It does not contain images within the amorphous category because it is a very uncommon pattern [13], and for this reason the latter category is not considered. See the representative images from Table I as examples of the four Guillon categories and the ROIs picked from 10 different images according to Section II-A1. The appearance and estimated thickness of each category are also described in Table I. The acquisition of the input images was carried out with the Tearscope-plus [38] attached to a Topcon SL-D4 slit lamp [39]

and a Topcon DV-3 digital video camera [40]. The slit lamp’s magnification was set at 200X and the images were stored via the Topcon IMAGEnet i-base [41] at a spatial resolution of 1024 × 768 pixels per frame in RGB. The characteristics of the illumination are provided by Topcon parameters. There are four levels of brightness obtained with the Topcon shutter established to 16, 28, 45, and 81 units. Since the tear lipid film is not static between blinks, a video was recorded and analyzed by an optometrist in order to select the best images for processing. Those images were selected when the TFLL was completely expanded after the eye blink and are the same that specialists analyze by hand. In order to analyze the texture in Lab color space, the input image in RGB was transformed to Lab and each component was analyzed separately, generating three descriptors per image corresponding to the L, a and b components. Next, these three descriptors were concatenated to generate the final descriptor, so its size is 3 times the size of the descriptors. Table II shows the arrangements for applying the texture analysis methods. Note that column No. of features contains the number of features generated by each method, in which they are always multiplied by three because of the use of Lab. Two banks of images acquired from the same group of healthy patients with ages ranging from 19 to 33 years were available. All images in these banks have been annotated by optometrists from the Faculty of Optics and Optometry of the University of Santiago de Compostela (Spain). Although the interference patterns are independent of the illumination, there is an optimum range of illuminations used by optometrists to obtain the images. Images with illuminations outside this range are considered noisy images. In this manner, two bank datasets are managed in this paper. There is a first bank which contains 105 images from VOPTICAL_I1 dataset [42], all of them taken over the same illumination conditions, which are considered to be the optimum ones by practitioners. This dataset contains

REMESEIRO et al.: METHODOLOGY FOR IMPROVING TEAR FILM LIPID LAYER CLASSIFICATION

1489

TABLE II ARRANGEMENTS FOR TEXTURE ANALYSIS METHODS AND NUMBER OF FEATURES Texture analysis Butterworth filters

Discrete wavelet transform Co-occurrence features Markov random fields

Gabor filters

Configuration (per component) A bank of Butterworth bandpass filters composed of 9 second order filters was used, with bandpass frequencies covering the whole frequency spectrum. The filter bank maps each input image to 9 output images, one per frequency band. Each output image was normalized separately and then an uniform histogram with non-equidistant bins [16] was computed. Since 16 bin histograms were used, the feature vectors contain 16 components per frequency band. A Haar algorithm [22] was applied as the mother wavelet. The descriptor of an input image is constructed calculating the mean and the absolute average deviation of the input and LL images, and the energy of the LH, HL and HH images. Since 2 scales were used, the feature vectors contain 12 components. A set of 14 statistics proposed in [23] was computed from each co-occurrence matrix. These statistics represent features such as homogeneity or contrast. The descriptor of an image consists of 2 properties, the mean and range across matrices of these statistics, thus obtaining a feature vector with 28 components per distance. Distances from 1 to 7 were considered. In this work, the neighborhood of a pixel is defined as the set of pixels within a Chebyshev distance d. To generate the descriptor, the directional variances proposed by C ¸ esmeli and Wang [44] were used. For a distance d, the descriptor comprises 4d features. Distances from 1 to 10 were considered. A bank of 16 Gabor filters centred at 4 frequencies and 4 orientations was created. The filter bank maps each input image to 16 output images, one per frequency-orientation pair. Using the same idea as in Butterworth Filters, the descriptor of each output image is its uniform histogram with non-equidistant bins. Since 7 bin histograms were used, the feature vectors contain 7 components per filter.

No. features

9 × 16 × 3 = 432

12 × 3 = 36

28 × 7 × 3 = 588

(Σ 1d 0= 1 4d) × 3 = 660

16 × 7 × 3 = 336

TABLE III NO. OF FEATURES

Texture analysis

None

Butterworth filters Discrete wavelet transform Co-occurrence features Markov random fields Gabor filters Concatenation Fig. 4. (a) Image obtained with an optimum illumination, and its ROI where a color fringe pattern is clearly observable. (b) Image obtained with a too high illumination, and its ROI where a color fringe pattern is hardly observable.

the samples that are expected to be obtained in a real case situation and will be used to compute the performance of algorithms. It includes 29 open meshwork, 29 closed meshwork, 25 wave and 22 color fringe images. The second bank contains 406 images from VOPTICAL_Is dataset [43], taken over different illumination conditions. This bank will be used only to evaluate the sensibility of algorithms to noisy data. It includes 159 open meshwork, 117 closed meshwork, 90 wave and 40 color fringe images. Fig. 4 shows an example of two images from the same subject. It can be seen that a too high illumination produces an image where the interference pattern is hardly appreciated. B. Experimental Procedure The experimental procedure is detailed as follows, 1) Apply the five texture analysis methods (see Table II) to the two banks of images. Moreover, the concatenation of all the features extracted by these five methods is also considered. As a result, six datasets with optimum illumination (105 images) and six datasets with different illuminations (406 images) are available. 2) Apply the three FS methods (see Section II-B) to the datasets with optimum illumination to provide the subset of features that describe properly the given problem.

432 36 588 660 336 2052

Feature selection filter CFS Cons 26 10 27 24 29 56

6 8 6 13 7 6

INT 14 7 21 15 18 29

3) Train a SVM (see Section II-C) with radial basis kernel and automatic parameter estimation, according to [45]. A ten-fold cross validation was used, so the average error across all 10 trials is computed. 4) Evaluate the effectiveness of FS in terms of the three performance measures (see Section II-D), by means of the MCDM method TOPSIS (see Section II-E). R CoreTM i5 CPU Experimentation was performed on an Intel 760 @ 2.80 GHz with RAM 4 GB. IV. RESULTS AND DISCUSSION In this section, the results obtained with and without FS will be compared in terms of the three performance measures described above. Bear in mind that the column None in the tables shows the results when no FS was performed, and Concatenation stands for the concatenation of all the features of the five texture analysis methods. The number of features selected by each of the three FS filters is summarized in Table III. In average, CFS, Consistency-based filter (Cons) and INTERACT (INT) retain only the 4.9%, 1.6% and 3.2% of the features, respectively. A. Classification Accuracy Table IV shows the test accuracies for all pairwise texture analysis and FS methods after applying the SVM classifier over the VOPTICAL_I1 dataset. The best result for each is marked in bold face. As can be seen, all texture analysis techniques perform

1490

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 4, JULY 2014

TABLE IV MEAN TEST CLASSIFICATION ACCURACY (%), VOPTICAL_I1 DATASET

Texture analysis Butterworth filters Discrete wavelet transform Co-occurrence features Markov random fields Gabor filters Concatenation

None 91.42 88.57 95.24 84.76 95.24 97.14

Feature selection filter CFS Cons 93.33 91.43 94.29 85.71 91.43 96.19

83.81 94.29 86.67 83.81 86.67 87.62

INT 86.67 96.19 93.33 75.24 86.67 93.33

TABLE V ROBUSTNESS: MEAN TEST ACCURACY (%), VOPTICAL_IS DATASET

Texture analysis Butterworth filters Discrete wavelet transform Co-occurrence features Markov random fields Gabor filters Concatenation

None 88.18 82.27 92.17 83.99 89.90 92.61

Feature selection filter CFS Cons 84.98 88.18 92.36 76.35 85.22 93.84

71.92 86.21 85.22 70.94 69.46 77.59

INT 79.56 86.70 89.16 70.69 82.51 91.87

TABLE VI FEATURE EXTRACTION TIME (S)

Texture analysis Butterworth filters Discrete wavelet transform Co-occurrence features Markov random fields Gabor filters Concatenation

None 0.22 0.03 102.18 13.83 0.42 116.68

Feature selection filter CFS Cons 0.15 0.01 27.01 0.50 0.18 37.04

0.04 0.01 0.05 0.27 0.06 0.05

INT 0.07 0.01 9.86 0.31 0.11 9.96

and V). However, the remainder methods deteriorate their mean classification accuracy by between 6.78% and 8.23%. Note also that the illumination levels affect the robustness in different degrees. The brighter the illumination, the lower the robustness to noise. This also happens to practitioners when performing this task by hand. For this reason, their experience to control the illumination level during the acquisition stage is cornerstone for ensuring good classification performance. C. Feature Extraction Time

quite well providing results over 84% accuracy. The best result is generated by the concatenation of all methods. Individually, Gabor filters and co-occurrence features without FS outperform the other methods. Although Markov random fields use information of the pixel’s neighborhood, as the co-occurrence features do, this method does not work so well because the statistics proposed by Haralick et al. [23] provide much more textural information. Regarding FS, it outperforms primal results in three out of six methods (Butterworth filters, the discrete wavelet transform and Markov random fields), while accuracy is almost maintained in co-occurrence features and the concatenation of all methods when CFS is applied. As conclusions, the best result is obtained by using the concatenation of all when no FS is performed (97.14%). Closely, the discrete wavelet transform with INTERACT filter and the concatenation of all with CFS obtain an accuracy of 96.19%. Notice that although these results improve slightly the previous ones [17] (95%), our goal is to reduce the processing time whilst maintaining accuracy. As mentioned in Section II-B, results of the wrapper approach are not displayed because of its poor performance. Particularly, the highest accuracy obtained by the wrapper was 89.52%, whilst the best one for the filters was 96.19%. This degradation in accuracy is caused by overfitting, demonstrating the non adequacy of the wrapper approach for this problem. B. Robustness to Noise Table V shows the robustness of the six different methods over VOPTICAL_IS dataset. Co-occurrence features and the concatenation of all obtain remarkable better results than the remainder methods. Both methods obtain values of robustness over 90% for some configurations. In particular, the best result is obtained by using the concatenation of all methods when CFS filter is used (93.84%). In relative terms, co-occurrence features and the concatenation of all methods deteriorate their mean classification accuracy by 2.66% and 4.59%, respectively (mean differences between the values contained in Tables IV

TFLL classification is a real-time task so the time a method takes to process an image cannot be a bottleneck. After applying FS and so reducing the number of input attributes, the time needed for analyzing a single image with any of the six methods was also reduced as can be seen in Table VI. In general terms, Butterworth filters, the discrete wavelet transform and Gabor filters take a negligible lapse of time to extract the features of an image (regardless of whether or not FS is applied as preprocessing step). Moreover, Markov random fields takes a time which could be acceptable for practical applications, even when no FS is applied, although it could not work in real time. Co-occurrence features has been known to be slow and, despite the authors implemented an optimization of the method based on [24], it presents an unacceptable extraction time. Regarding the time of the concatenation of all methods, note that it is influenced by the time of co-occurrence features. Co-occurrence features and the concatenation of all methods are only acceptable for practical applications when consistency-based or INTERACT filters are used. Consistency-based filter selects fewer features (see Table III) and consequently the processing time when this filter is used is smaller. Co-occurrence features is the core behind the good performance of the concatenation of all methods. This is demonstrated by further experiments showing that the concatenation of the other four methods achieves a maximum accuracy of 93.33% and robustness of 88.91%. These results are significantly worse (around 4%) than the best results obtained by the concatenation of all methods. D. Overall Analysis The efficiency of FS is tested in this paper. In general terms, we can assert that in a field with a very large number of features, FS filters play a significant role to reduce the cost of obtaining data and the complexity of the classifier. Consistency-based filter performed the most aggressive selection retaining only the 1.6% of the features (see Table III). CFS retained three times more features (4.9%) than the former. Halfway, INTERACT

REMESEIRO et al.: METHODOLOGY FOR IMPROVING TEAR FILM LIPID LAYER CLASSIFICATION

TABLE VII TOPSIS VALUES OBTAINED FOR EVERY METHOD WHEN w = [1/3, 1/3, 1/3]

0.9774 0.9344 0.3431 0.8670 0.9954 0.3066

Feature selection filter CFS Cons 0.9773 0.9775 0.9416 0.8691 0.9686 0.8986

0.8159 0.9848 0.9281 0.8081 0.8295 0.8988

Concatenation FS: CFS

95

INT 0.9008 0.9900 0.9812 0.6923 0.9164 0.9853

selected in average 3.2% of features. Moreover, in most cases the test accuracy is improved or maintained with a remarkable reduction in the number of features when FS is used (see Table IV). The effectiveness of FS on TFLL classification was then demonstrated. Evaluating the performance of the methods for texture analysis presented in this paper is a multi-objective problem defined in terms of accuracy, robustness, and feature extraction time. Butterworth filters, the discrete wavelet transform and Gabor filters obtain competitive classification accuracies in short spans of time (see Tables IV and VI). However, these methods are very sensitive to noisy data (see Table V) which make them inappropriate for practical applications. On the other hand, cooccurrence features presents competitive results in classification accuracy and generalization (see Tables IV and VI). However, the time the method takes to extract its features is an impediment (see Table VI). The concatenation of all methods improves the previous results but at the expense of an even longer feature extraction time. Table VII shows the TOPSIS values obtained for every method when the weights of each criteria are set equally. Note that the larger the value the better the method. The top three methods are marked in bold. As can be seen, those methods with the best balance among classification accuracy, robustness to noise and feature extraction time are ranked in higher positions. In particular, Gabor filters with no FS, the discrete wavelet transform with INTERACT filter, and the concatenation of all methods with INTERACT filter rank the top three positions of the ranking. However, those methods with good performance in accuracy and robustness but very long feature extraction time are penalized. e.g. co-occurrence features or the concatenation of all methods, with no feature selection in both cases. A more detailed look at the results contained in Tables IV, V and VI reveals that the combination of all methods in both CFS filtering configuration and without FS obtain the best results in terms of accuracy and robustness. These two configurations are in the Pareto front [46] of accuracy versus robustness (see Fig. 5). In multi-objective optimization, the Pareto front is defined as the border between the region of feasible points (not strictly dominated by any other), for which all constraints are satisfied, and the region of unfeasible points (dominated by others). In this case, solutions are constrained to maximize accuracy and robustness. The suitability of these two solutions to the problem in question is also corroborated by TOPSIS. Table VIII shows the TOPSIS values when only accuracy and robustness are considered (note that the third term in the weight vector is

Concatenation FS: None

Butterworth filters Discrete wavelet transform Co-occurrence features Markov random fields Gabor filters Concatenation

None

100

Robustness to noise (%)

Texture analysis

1491

90 85 80 75 70 65 65

70

75

80 85 Accuracy (%)

90

95

100

Fig. 5. Pareto front of a multi-objective optimization problem based on accuracy and robustness to noise. TABLE VIII TOPSIS VALUES OBTAINED FOR EVERY METHOD WHEN w = [1/2, 1/2, 0]

Texture analysis Butterworth filters Discrete wavelet transform Co-occurrence features Markov random fields Gabor filters Concatenation

None 0.8983 0.6567 0.9923 0.4777 0.9829 0.9991

Feature selection filter CFS Cons 0.9017 0.8986 0.9846 0.3361 0.8526 0.9987

0.1694 0.9377 0.6233 0.1601 0.2717 0.4769

INT 0.4722 0.9629 0.9539 0.0009 0.5526 0.9706

considered to be the feature extraction time). The concatenation of all methods without FS and with CFS filtering rank first and second, respectively. However, the time for extracting the features must be shortened for practical applications. With this aim a case of study, in which a deeper analysis for FS is carried out, is presented in the next section. Notice that the number of features in the concatenation of all methods without FS is too large (2052 features) to be optimized by hand. For this reason, we will focus on the concatenation on all methods with CFS (56 features). E. The Concatenation of All Methods With CFS: A Case of Study When using FS, features are selected according to some specific criteria depending on the method. Filters remove features based on redundancy and relevance, but they do not take into account costs for obtaining them. Note that the cost for obtaining a feature depends on the procedures required to extract it. Therefore, each feature has an associated cost that can be related to financial cost, physical risk or computation demands. This is the case of co-occurrence features and, consequently, the concatenation of all methods. In co-occurrence features the cost for obtaining the 588 features is not homogeneous. Features are vectorized in groups of 28 related to distances and channels in the color space. Each group of 28 features corresponds with the mean and range of 14 statistics across the co-occurrence matrices (see Table II).

1492

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 4, JULY 2014

TABLE IX CO-OCCURRENCE FEATURES SELECTED BY CFS OVER THE CONCATENATION OF ALL METHODS, IN WHICH FEATURES CORRESPONDING WITH 14 TH STATISTIC ARE MARKED IN BOLD

Distance 1 2 3 4 5 6 7

Component in the color space L a – 98 193 267, 268, 275, 276, 277 350, 359 434, 443, 446 518

29, 50 121, 133 – – – – 546

b 66 – 230 321 – 492, 502 576

If we focus on co-occurrence features when using CFS, the number of features were reduced by 95.41% (from 588 to 27) but the processing time was not reduced in the same proportion, and it is now 27.01 instead of 102.18 seconds (a reduction of 73.57%). This fact clearly shows that computing some of the 588 features takes longer than others. Some experimentation was performed on the time the method takes to compute each of the 14 statistics. Results disclosed that computing the 14th statistic, which corresponds with the maximal correlation coefficient [23], takes around 96% of the total time. So the time for obtaining a single matrix is negligible compared to the time for computing the 14th statistic. Therefore, the key for reducing the feature extraction time is to reduce the number of 14th statistics in the selection. In the case of the concatenation of all methods with CFS, the filter selects 56 features (see Table III) distributed as follows: 17 features of Butterworth filters, one of the discrete wavelet transform, 24 of co-occurrence features, one of Markov random fields, and 13 of Gabor filters. Five of the features selected in co-occurrence features correspond with the 14th statistic (see Table IX). In co-occurrence features, the cost of obtaining the statistics also depends on the distance and component in the color space. On the one hand, the longer the distance the larger the number of matrices to compute (and so, the higher the processing time). On the other hand, the differences of color have little contrast so the colorimetric components of the Lab color space are minimal. As a consequence, the matrices within components a and b have smaller dimension than the matrices within component L. As expected, the smaller the dimension the shorter the time to compute a statistic. Computing the five 14th statistics in the different distances and components take: 3.12 s (feature 98), 8.23 s (feature 350), 9.61 s (feature 434), 11.49 s (feature 518), and 4.81 s (feature 546). As can be seen, avoiding computing some of them will entail saving a significant amount of time. The aim here is to explore the impact of removing some of the five 14th statistics selected by CFS in terms of accuracy, robustness and time. There are five features within the 14th statistic so only 25 = 32 different configurations need to be explored. An empirical evaluation of brute force is acceptable. Table X shows the performance of the different configurations in terms of accuracy, robustness and time. Each configuration corresponds with those features selected by CFS removing some 14th statistics. For purposes of simplicity, only the acceptable results are shown. It

TABLE X PERFORMANCE MEASURES FOR THE CONCATENATION OF ALL METHODS WITH CFS WHEN SOME OF THE FIVE 14 TH STATISTICS ARE NOT SELECTED

Features removed {}, baseline performance {98, 434} {98, 434, 546} {98, 350, 518, 546} {98, 434, 518, 546} {98, 350, 434, 518, 546}

Acc (%) 96.19 97.14 97.14 97.14 97.14 97.14

Rob (%) 93.84 94.09 93.84 93.60 92.86 92.61

Time (s) 37.04 24.31 19.83 9.72 8.34 0.11

The best results are marked in bold.

is assumed that one solution is unacceptable if it obtains a lower accuracy and robustness in a longer span of time than other. In terms of accuracy and robustness to noisy data, the best result is obtained when removing the features {98, 434} (results of 97.14% and 94.09%, respectively), but at the expense of a quite long lapse of time (24.31). Note that this result even improves the baseline performance. In the remainder results, the classification accuracy is maintained whilst the feature extraction time is reduced, only at the expense of a slightly deterioration in terms of robustness to noise (less than 2%). It is also important to remark the effectiveness of CFS filter for selecting the most appropriate features. If we do not apply feature selection and we simply remove the 14th statistics from the 588 features corresponding with co-occurrence features in the concatenation of all methods, the accuracy and the robustness are 92.86% for both of them. That is, the accuracy is worse than the results shown in Table X and the robustness is not significantly different. As expected, the time is also longer: 14.74 seconds. V. CONCLUSION The time required by existing approaches which deal with TFLL classification prevent their clinical use because they could not work in real time. In this paper, a methodology for improving this classification problem was proposed. This methodology includes the application of FS and MCDM methods which, up to the authors’ knowledge, has not been explored yet in this field. Three of the most popular FS methods were used: CFS, Consistency-based, and INTERACT. They were tested on five popular texture analysis methods: Butterworth filters, the discrete wavelet transform, co-occurrence features, Markov random fields, and Gabor filters; and the concatenation of all these methods. In order to decide the best combination of techniques, the MCDM method TOPSIS was applied. Results obtained with this methodology surpass previous results in terms of processing time whilst maintaining accuracy and robustness to noise. In clinical terms, the manual process done by experts can be automated with the benefits of being faster and unaffected by subjective factors, with maximum accuracy over 97% and processing time under 1 second. The clinical significance of these results should be highlighted, as the agreement between subjective observers is between 91%–100% according to [13]. In this paper, the proposed methodology was later optimized ad hoc regarding the different costs to compute the features. This process was suitable because of the low number of features but it would become unfeasible as the number of features

REMESEIRO et al.: METHODOLOGY FOR IMPROVING TEAR FILM LIPID LAYER CLASSIFICATION

increases. Therefore, as future research, we plan to develop a FS method which automatically takes into account costs of features. Regarding image processing, we would like to increase our database by including the amorphous category. Finally, as this is a general methodology, we plan to use it to solve some other problems. ACKNOWLEDGMENT ´ We would like to thank the Facultad de Optica y Optometr´ıa of the Universidade de Santiago de Compostela for providing us with the annotated datasets. REFERENCES [1] S. Pflugfelder, S. Tseng, O. Sanabria, H. Kell, C. Garcia, C. Felix, W. Feuer, and B. Reis, “Evaluation of subjective assessments and objective diagnostic tests for diagnosing tear-film disorders known to cause ocular irritation,” Cornea, vol. 17, no. 1, pp. 38–56, 1998. [2] G. Rieger, “The importance of the precorneal tear-film for the quality of optical imaging,” Br. J. Ophthalmol., vol. 76, no. 3, pp. 157–158, 1992. [3] K. Nichols, J. Nichols, and G. Mitchell, “The lack of association between signs and symptoms in patients with dry eye disease,” Cornea, vol. 23, no. 8, pp. 762–70, 2004. [4] A. Bron, J. Tiffany, S. Gouveia, N. Yokoi, and L. Voon, “Functional aspects of the tear film lipid layer,” Experiment. Eye Res., vol. 78, no. 3, pp. 347–360, 2004. [5] M. Rolando, M. Iester, A. Macr´ı, and G. Calabria, “Low spatial-contrast sensitivity in dry eyes,” Cornea, vol. 17, no. 4, pp. 376–369, 1998. [6] M. Rolando, M. Refojo, and K. Kenyon, “Increased tear evaporation in eyes with keratoconjunctivitis sicca,” Arch. Ophthalmol., vol. 101, no. 4, pp. 557–558, 1983. [7] M. Lemp, “Report of the national eye institute/industry workshop on clinical trials in dry eyes,” CLAO J., vol. 21, no. 4, pp. 221–232, 1995. [8] M. Lemp, C. Baudouin, J. Baum, M. Dogru, G. Foulks, S. Kinoshita, P. Laibson, J. McCulley, J. Murube, S. Pfugfelder, M. Rolando, and I. Toda, “The definition and classification of dry eye disease: Report of the definition and classification subcommittee of the international dry eye workshop (2007),” Ocular Surf., vol. 5, no. 2, pp. 75–92, 2007. [9] G. Foulks, “The correlation between the tear film lipid layer and dry eye disease,” in Surv. Ophthalmol., 2007, pp. 52:369–74. [10] J. Guillon, “Non-invasive tearscope plus routine for contact lens fitting,” Cont. Lens Anterior Eye, vol. 21 Suppl. 1, pp. 31–40, 1998. [11] M. Rolando, C. Valente, and S. Barabino, “New test to quantify lipid layer behavior in healthy subjects and patients with keratoconjunctivitis sicca,” Cornea, vol. 27, no. 8, pp. 866–870, 2008. [12] J. Nichols, K. Nichols, B. Puent, M. Saracino, and G. Mitchell, “Evaluation of tear film interference patterns and measures of tear break-up time,” Optom. Vis. Sci., vol. 79, no. 6, pp. 363–369, 2002. [13] C. Garc´ıa-Res´ua, M. Gir´aldez-Fern´andez, M. Penedo, D. Calvo, M. Penas, and E. Yebra-Pimentel, “New software application for clarifying tear film lipid layer patterns,” Cornea, vol. 32, no. 4, pp. 538–546, 2013. [14] P. King-Smith, B. Fink, and N. Fogt, “Three interferometric methods for measuring the thickness of layers of the tear film,” in Optom. Vis. Sci., 1999, pp. 76:19–32. [15] E. Goto, Y. Yagi, M. Kaido, Y. Matsumoto, K. Konomi, and K. Tsubota, “Improved functional visual acuity after punctal occlusion in dry eye patients,” Am. J. Ophthalmol., vol. 135, no. 5, pp. 704–705, 2003. [16] L. Ramos, M. Penas, B. Remeseiro, A. Mosquera, N. Barreira, and E. Yebra-Pimentel, “Texture and color analysis for the automatic classification of the eye lipid layer,” in Proc. LNCS: Adv. Computat. Intell. (Int. Work Conf. Artif. Neural Netw.), vol. 6692, pp. 66–73. [17] B. Remeseiro, L. Ramos, M. Penas, E. Mart´ınez, M. Penedo, and A. Mosquera, “Color texture analysis for classifying the tear film lipid layer: A comparative study,” in Proc. Int. Conf. Digital Image Comput.: Techn. Appl., Noosa, Australia, Dec. 2011, pp. 268–273. [18] I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, Feature Extraction: Foundations and Applications. New York, NY, USA: Springer-Verlag, 2006. [19] D. Calvo, A. Mosquera, M. Penas, C. Garc´ıa-Res´ua, and B. Remeseiro, “Color texture analysis for tear film classification: A preliminary study,” in Proc. Lecture Notes Comput. Sci.: Int. Conf. Image Anal. Recognit., 2010, vol. 6112, pp. 388–397.

1493

[20] K. McLaren, “The development of the CIE 1976 (L*a*b) uniform colorspace and color-difference formula,” J. Soc. Dyers Colorists, vol. 92, no. 9, pp. 338–341, 1976. [21] R. Gonzalez and R. Woods, Digital Image Processing. Englewood Cliffs, NJ, USA: Pearson/Prentice-Hall, 2008. [22] S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 7, pp. 674–693, Jul. 1989. [23] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Texture features for image classification,” IEEE Trans. Syst., Man, Cybern. Syst., Man Cybern., vol. 3, no. 6, pp. 610–621, Nov. 1973. [24] D. A. Clausi and M. E. Jernigan, “A fast method to determine cooccurrence texture features,” IEEE Trans. Geosci. Remote Sens., vol. 36, no. 1, pp. 298–300, Jan. 1998. [25] J. Woods, “Two-dimensional discrete markovian fields,” IEEE Trans. Inf. Theory, vol. 18, no. 2, pp. 232–240, Mar. 1972. [26] D. Gabor, “Theory of communication,” J. Inst. Electr. Eng., vol. 93, pp. 429–457, 1946. [27] J. Loughrey and P. Cunningham, “Overfitting in wrapper-based feature subset selection: The harder you try the worse it gets,” Res. Development Intell. Syst. XXI, pp. 33–43, 2005. [28] M. Dong and R. Kothari, “Feature subset selection using a new definition of classifiability,” Pattern Recognit. Lett., vol. 24, no. 9, pp. 1215–1225, 2003. [29] V. Bol´on-Canedo, N. S´anchez-Marono, and A. Alonso-Betanzos, “On the behavior of feature selection methods dealing with noise and relevance over synthetic scenarios,” in Proc. IEEE Neural Netw. Int. Joint Conf., 2011, pp. 1530–1537. [30] V. Bol´on-Canedo, N. S´anchez-Marono, and A. Alonso-Betanzos, “A review of feature selection methods on synthetic data,” Knowl. Inf. Syst., vol. 34, no. 3, pp. 483–519, 2013. [31] M. Hall, “Correlation-based feature selection for machine learning,” Ph.D. dissertation, Dept. Comput. Sci., Univ. Waikato, Hamilton, New Zealand, 1999. [32] M. Dash and H. Liu, “Consistency-based search in feature selection,” Artif. Intell., vol. 151, no. 1–2, pp. 155–176, 2003. [33] Z. Zhao and H. Liu, “Searching for interacting features,” in Proceedings of the 20th International Joint Conference on Artificial Intelligence. San Mateo, CA, USA: Morgan Kaufmann Publishers Inc., 2007, pp. 1156– 1161. [34] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining Knowl. Discov., vol. 2, pp. 121–167, 1998. [35] M. Zeleny, Multiple Criteria Decision Making. vol. 25, New York, NY, USA: McGraw-Hill, 1982. [36] C. Hwang and K. Yoon, Multiple Attribute Decision Making: Methods and Applications: A State-of-the-Art Survey. vol. 13, New York, NY, USA: Springer-Verlag, 1981. [37] D. Olson, “Comparison of weights in TOPSIS models,” Math. Comput. Modell., vol. 40, no. 7–8, pp. 721–727, 2004. [38] Tearscope Plus Clinical Hand Book and Tearscope Plus Instructions. Broomall, PA, USA: Keeler Ltd. Windsor, Berkshire, Keeler Inc, 1997. [39] “Topcon SL-D4 slit lamp,” Topcon Medical Systems, Oakland, NJ, USA. [40] “Topcon DV-3 digital video camera,” Topcon Medical Systems, Oakland, NJ, USA. [41] “Topcon IMAGEnet i-base,” Topcon Medical Systems, Oakland, NJ, USA. [42] “VOPTICAL_I1. VARPA optical dataset annotated by optometrists from the Faculty of Optics and Optometry, University of Santiago de Compostela (Spain),” [Online]. Available: http://www.varpa.es/ voptical_I1.html, last access: december 2013. [43] “VOPTICAL_Is. VARPA optical dataset annotated by optometrists from the Faculty of Optics and Optometry, University of Santiago de Compostela (Spain),” [Online]. Available: http://www.varpa. es/voptical_Is.html, last access: december 2013. [44] E. C¸esmeli and D. Wang, “Texture segmentation using Gaussian-Markov random fields and neural oscillator networks,” IEEE Trans. Neural Netw., vol. 12, Mar. 2001. [45] B. Remeseiro, M. Penas, A. Mosquera, J. Novo, M. Penedo, and E. YebraPimentel, “Statistical comparison of classifiers applied to the interferential tear film lipid layer automatic classification,” Computat. Math. Methods Med., vol. 2012, 2012. [46] J. Teich, “Pareto-front exploration with uncertain objectives,” in Evolut. Multi-Criter. Optim. New York, NY, USA: Springer-Verlag, 2001, pp. 314–328. Authors’ photographs and biographies not available at the time of publication.

A methodology for improving tear film lipid layer classification.

Dry eye is a symptomatic disease which affects a wide range of population and has a negative impact on their daily activities. Its diagnosis can be ac...
920KB Sizes 2 Downloads 2 Views