This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

Saliency based Ulcer Detection for Wireless Capsule Endoscopy Diagnosis Yixuan Yuan1 , Student Member, IEEE, Jiaole Wang1 , Student Member, IEEE, Baopu Li2 , Member, IEEE, and Max Q.-H. Meng1,∗ , Fellow, IEEE

Abstract—Ulcer is one of the most common symptoms of many serious diseases in the human digestive tract. Especially for the ulcers in the small bowel where other procedures cannot adequately visualize, wireless capsule endoscopy (WCE) is increasingly being used in the diagnosis and clinical management. Because WCE generates large amount of images from the whole process of inspection, computer-aided detection of ulcer is considered an indispensable relief to clinicians. In this paper, a two-staged fully automated computer-aided detection system is proposed to detect ulcer from WCE images. In the first stage, we propose an effective saliency detection method based on multi-level superpixel representation to outline the ulcer candidates. To find the perceptually and semantically meaningful salient regions, we first segment the image into multi-level superpixel segmentations. Each level corresponds to different initial region sizes of the superpixels. Then we evaluate the corresponding saliency according to the color and texture features in superpixel region of each level. In the end, we fuse the saliency maps from all levels together to obtain the final saliency map. In the second stage, we apply the obtained saliency map to better encode the image features for the ulcer image recognition tasks. Because the ulcer mainly corresponds to the saliency region, we propose a saliency max-pooling method integrated with the Locality-constrained Linear Coding (LLC) method to characterize the images. Experiment results achieve promising 92.65% accuracy and 94.12% sensitivity, validating the effectiveness of the proposed method. Moreover, the comparison results show that our detection system outperforms the state-ofthe-art methods on the ulcer classification task. Index Terms—Multi-level superpixel representation; Saliency; Saliency based max-pooling method; LLC

I. INTRODUCTION

U

LCER is one of the most common lesions of the gastrointestinal (GI) tract that affects approximately 10% of the people in the world [1]. It is a chronic inflammatory sore or erosion on the internal mucous membranes. Helicobacter pylori bacteria and non-steroidal anti-inflammatory drugs (NSAIDs) are considered to be two important causes of ulcer in the digestive tract. Ulcer itself is not lethal, however, it is the symptom of some serious diseases, such as Crohn’s disease and ulcerative colitis whose complications may cause death [2]. This work is supported by RGC GRF # 415613 awarded to Max Q.H. Meng and partially by National Natural Science Foundation of China (61305099) awarded to Baopu Li. 1 Yixuan Yuan, Jiaole Wang and Max Q.-H. Meng are with the Department of Electronic Engineering, The Chinese University of Hong Kong, N.T., Hong Kong SAR, China {yxyuan,jlwang,max}@ee.cuhk.edu.hk 2 Baopu Li is with the Department of Biomedical Engineering in Shenzhen University, Shenzhen, China [email protected] ∗ Corresponding author.

(a) WCE device

(b) Ulcer in a WCE image

Fig. 1. Wireless capsule endoscopy and one captured image. (a) A typical WCE device. (b) An example image of ulcer.

Traditional imaging technique for the ulcer is push and sonde enteroscopy. In the inspection procedure, it will be inserted to patient’s mouth or anus by experienced physicians to view the GI tract [3]. Although the traditional techniques play an important role to view the upper and lower ends of the GI tract, they are highly invasive to the patients. Moreover, it is technically difficult for the traditional procedures to get full access to the small intestines [4], [5]. Iddan et al. [6] introduced the revolutionary wireless capsule endoscopy (WCE) as an alternative way to provide direct, painless and noninvasive inspection of the small bowel. The commercially available WCE usually consists of an optical dome, an illumination part, an imaging sensor, batteries, and a radio frequency transmitter as shown in Fig. 1(a). After a WCE is swallowed by a patient, it is then pushed by peristalsis to slowly travel through the small intestine. During about 8 hours inside patient’s GI tract, the WCE takes 2-4 images per second. These images are compressed and transmitted wirelessly to a data-recording device attached to the patient’s waist. All the images can be downloaded and examined off-line by doctors to make diagnostic decisions [7]. An example image of ulcer that captured by a WCE during inspection is shown in Fig. 1(b). Although the WCE has shown significant advantages over the traditional endoscopies to inspect the ulcer nidus in the small intestine, there are new challenges associated with this technology. A WCE creates 55, 000 images for each patient, and the captured abnormal images occupy only 5% of the whole WCE images collected, it is tedious for clinicians to go through all these images manually frame by frame to locate the abnormal images. Therefore, it is crucial to design an automatic computer-aided system to assist the clinicians to analyze the ulcer images. A. Related Work Many efforts have been devoted to develop automated computer-aided detection (CAD) for ulcer screening using

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

WCE. Li and Meng [8] put forward a new texture feature combining merits of discrete curvelet transform (DCT) and local binary pattern (LBP), which leads to better descriptions of textures with multi-directional characteristic. Charisis et al. [9] presented a novel method which first processes the whole WCE image with a color rotation operation (CR), then extracts the uniform rotation invariant LBP (ULBP) feature. Eid et al. [10] proposed a curvelet-based lacunarity texture extraction method (DCT-LAC). The textural information is acquired by calculating the lacunarity index of DCT sub-bands of the WCE images. Yu et al. [11] introduced an improved bag of word model to detect the ulcer region using spatial pyramid kernel and feature fusion techniques. Karagyris et al. [12] proposed to apply log-Gabor filter bank to segment the Hue component in the HSV color space of the WCE images to obtain the ulcer candidates. Then the RGB values and texture information are extracted to classify the ulcer images from the normal ones. There are some limitations in the conventional ulcer classification methods. The image features in [8]–[10] were all extracted from the whole WCE image, which may be ineffective to describe the specific ulcer information. The problem of using the whole WCE image is that the features extracted from the non-ulcer region may bring noisy and redundant information to the classification. In order to handle these problems, Karagyris et al. [12] outlined the ulcer candidates and extracted the features directly from the ulcer. However, they used traditional segmentation methods which are not able to segment the ulcer region correctly and compactly because the ulcer shows vague boundary. The saliency method, which extracts the most important region in the image, recently attracts more and more attentions and achieves promising segmentation results [13]–[20]. Therefore, we are motivated to outline the ulcer region in WCE images through the saliency detection method first and then apply the achieved saliency map to better encode the image features for ulcer image classification task. Abnormal image classification is a fundamental problem in the field of automatic medical image diagnosis and it greatly relies on image representation. The bag of feature (BoF) model [21], which demonstrates promising performance in many natural images applications, has been adopted in this area [22]–[25]. The BoF method regards a single image as an order-less collection of local key point features, and represents images as the histograms of the visual words. In the BoF model, each key point feature from an image is assigned to the nearest codeword by hard assignment, which will cause severe information loss when the feature locates around the boundary of several codewords. As an improvement of the BoF method, Wang et al. [26] introduced the Locality-constrained Linear Coding (LLC) method to code the image features to preserve the locality and sparsity. The LLC method utilizes the locality constraints to project each descriptor into its local-coordinate system and the projected descriptors are integrated by max pooling to generate the final representation.

B. The Proposed Methods and Original Contributions In this paper, we propose a two-staged fully automated computer-aided detection system to detect ulcers from WCE images. In the first stage, owing to the drawbacks of traditional segmentation methods [12], we focus on the automatic estimation of salient regions across the WCE images. The classical saliency extraction approaches often calculate a pixel-based saliency [14], [18]–[20], ignoring the neighbor information and boundary information of the object. In addition, the human is attracted more by object instead of sole pixels. As an alternative approach, we propose to adopt superpixel representation [27], [28] for ulcer saliency detection in the WCE images. A superpixel is a group of pixels under some restriction of local image features such as color, intensity, or texture. It preserves most of the image structure and greatly reduces the complexity of the image processing. Furthermore, the single scale superpixel may not be able to represent the accurate contour of objects [29]. Thus we propose the saliency calculation method based on the multi-level superpixel representation for the images. Since the ulcer shows significant color and texture information on the WCE mucosa surface as shown in Fig. 1(b), we analyze these characteristics to evaluate the corresponding saliency value for each superpixel and obtain the saliency map for each level. The final saliency map is calculated by a fusion strategy that integrates the obtained saliency maps from all levels. In the second stage, we employ the obtained saliency maps for the ulcer classification task. Inspired by the promising results of bag-of-words (BoW) or BoF model and its variants [21], [26], [30], we propose to classify the ulcer images by coding WCE images with a modified Locality-constrained Linear Coding (LLC) method. The proposed modified LLC method integrates the original LLC method [26] with a saliency based max-pooling to emphasize the salient region for ulcer classification. The main contributions of this paper are summarized as follows: 1) Instead of extracting features from the whole WCE images [8]–[10], we propose a saliency map estimation method to outline the ulcer first and then extract the corresponding feature to better encode WCE images. The proposed saliency method is based on multi-level superpixel color and texture representation. 2) Different from utilizing the existed image coding methods [11], [22], [23], a saliency based max pooling method integrated with the original LLC method is proposed to carry out ulcer frame classification tasks. The rest of the paper is organized as follows: Section II introduces details of the proposed ulcer saliency estimation method. Section III gives details of the saliency based image coding by the proposed modified LLC method. The experimental results, including ulcer saliency map extraction and classification are illustrated in Section IV. We further discuss the results and the insights in Section V, and draw conclusions at the end of this paper.

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

Fig. 2. Illustration of multi-level superpixel saliency extraction method. (a) WCE images as input. (b) Multi-level SLIC superpixel segmentation. (c) Saliency map estimation based on texture contrast. (c’) Saliency map estimation based on color contrast. (d) Saliency map fusion across multiple levels for both texture and color saliency. (e) Final saliency map obtained by the fused texture and color saliency maps.

II. SALIENT REGION DETECTION Salient regions are usually defined as regions that could explicitly present the main meaningful or semantic contents. For the implicit images with multiple objects or cluttered scenes, however, there are no uniform saliency metrics that could describe the saliency. In this study, as shown in Fig. 1(b), WCE images with ulcer lesions could be explicitly distinguished by color and texture contrasts. By exploiting these characteristics, salient region detection method can segment the salient foreground region from WCE images. Thus, salient region detection method is especially suitable to be used as a first step for the ulcer recognition problem. A lot of research effort has already been devoted to design the saliency models for images [13]–[20], while the research on the medical saliency estimation for WCE videos has not been developed yet. In this paper, we propose a framework to detect ulcer abnormalities through visual saliency estimation based on the texture and color contrasts. The proposed saliency detection method consists of three main steps. The first step is multi-level segmentation, which decomposes the input image into multiple superpixels from a coarse level to a fine one. Then, we conduct the regional saliency estimation step by the color and texture contrasts on each superpixel level. In the end, the final saliency map is obtained by fusing multi-level saliency maps together. The flowchart of the construction of the integrated saliency map is shown in Fig. 2. A. Superpixel Segmentation A superpixel is defined as the meaningful entity by grouping spatially neighboring pixels with the similar property. Simple Linear Iterative Clustering (SLIC) [31] is the state-of-the-art superpixel algorithm that outputs a desired number of regular, compact superpixels with a low computational overhead. We propose to apply SLIC superpixels as a pre-processing method for WCE image saliency detection. Because it not only provides good segmentation results, but also generates suitable size of superpixels for WCE image analysis. In the SLIC method, the local K-means clustering is first performed on the pixels based on the color space and spatial distances. Then the isolated small clusters are merged with the largest neighbor

(a) Mask

(b) 50 superpixels

(c) 250 superpixels

Fig. 3. Illustration of the mask and the superpixel segmentation of different superpixels. (a) Image mask that outlines useful regions from WCE images. (b) A WCE image is segmented into 50 superpixels, the white lines are the boundaries of each superpixel. (c) The same image is segmented into 250 superpixels.

clusters to obtain the specific number of the superpixels. Each segmented superpixel is used as a processing unit in the proposed saliency model. Choosing a suitable number of superpixels for the WCE image is empiric and case-specific. This is because that too many numbers of superpixel lead to over-segmentation, while too few superpixels result in loss of the boundary information of the objects. In addition, using a single superpixel size to do segmentation may not be able to describe the boundary well for some cases. Therefore, we propose a multi-level superpixel method that first segments the image by using multiple different numbers of superpixels (a.k.a., multiple levels of superpixels), then fuses all superpixel segmentation in all levels later. The number of superpixels K we tested in this paper is set to be 50, 100, 150, 200, and 250 in each level, which results in level number L = 5. Fig. 3(b), 3(c) show an example of superpixel representation of a WCE image with superpixel number of 50 and 250. Additionally, WCE images are often obscured by the large black background and obvious bounders as shown in Fig. 1(b). To address this factor, we focus just on the superpixel in the outlined mask as in Fig. 3(a). B. Saliency Region Detection based on Texture In this paper, texture features of superpixel regions are extracted by using Leung-Malik (LM) filter bank [32]. The LM filter bank is a multi-scale, multi-orientation filter combination

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

Fig. 4. An illustration of the LM filter bank that is used to extract texture contrast in the WCE images.

with 48 filters. It consists of first and second derivatives of Gaussians at 6 orientations and 3 scales, 8 Laplacian of Gaussian (LOG) filters, and 4 Gaussians. The filter bank is shown in Fig. 4. Given an image IM×N , the response RM×N to the mth filter ωm out of 48 filters in the LM filter bank can be calculated by Rm (x, y) = ∑ ∑ ωm (s,t) I(x + s, y + t), (1) s

t

where (•, •) stands for the matrix entry of the corresponding matrix. Then, the texture feature matrix texture Fl ∈ R96×K for the lth level that have K superpixel regions can be defined by using the mean µ and the variance σ 2 of the filter responses in all the regions. Therefore, the texture feature vector of the ith superpixel at the lth level is given as the ith column of the matrix texture Fl , texture

2 T Fl (i) = [i µ1 ,i σ12 , . . . ,i µ48 ,i σ48 ] ,

(2)

where texture Fl (i) stands for the ith column of the texture feature matrix of the lth superpixel level and l = 1, . . . , L. The texture saliency texture Sˆ l ∈ RM×N at the lth level for the given image is defined as, N

texture ˆ

Sl (x, y) =

D(i, j) , j=1, j6=i max(D(i, j))



(3)

i, j

where D(i, j) is the (i, j) entry of the distance matrix D that represents the Euclidean distance between the texture feature vectors texture Fl (i) and texture Fl ( j) of the ith and jth regions. maxi, j (D(i, j)) is to take the maximum within the matrix. The entry (x, y) for the texture saliency matrix texture Sˆ l represents a pixel within the ith superpixel region. The texture saliency value for each pixel is set to be the same within the same superpixel region. Eq. (3) indicates that the superpixels whose textures are different from most of the other regions in the image are assigned with a higher value of texture saliency. The final texture saliency texture S is obtained by fusing the different level saliency maps together, texture

S=

1 L texture ˆ Sl . ∑ L l=1

(4)

C. Saliency Region Detection based on Color Intuitively, the ulcer in a WCE image shows different color information compared with the normal mucosa. However, it is not obvious to choose a color component that contains the

Fig. 5. WCE images with ulcer and the corresponding different color spaces images. (a) Original WCEimages. (b, c) Corresponding images of the second components of the HSV and CMYK color spaces, respectively.

most useful information to express the abnormality of ulcer. We take a trial-and-error approach and inspect the ulcer images under different color components of various color spaces such as RGB, CIELAB, CIEXYZ, YUV, YIQ, CMYK, HSV and HSI. Consequently, as shown in Fig. 5, the second components of the transformed WCE images in the HSV and CMYK color spaces highlight the ulcer regions and separate ulcer mucosa tissues from the uninformative parts. Therefore, we extract the saliency regions for all the WCE images in these two color planes. The color feature matrix color Fl ∈ R4×K for the lth level that have K superpixel regions can be defined by using the mean and variance values in the S and M color planes. The ith column of the matrix represents feature vector of the ith superpixel region at lth level. color

2 T Fl (i) = [i µS ,i σS2 ,i µM ,i σM ] ,

(5)

where i µ• and i σ•2 are the mean and variance of the ith superpixel in the S and M color planes. Similar to the texture saliency, the color saliency color Sˆ l ∈ M×N R at level l can be obtained by the following equation: N

color ˆ

Sl (x, y) =

D(i, j) , max (D(i, j)) j=1, j6=i



(6)

i, j

where D(i, j) is the (i, j) entry of the distance matrix D that represents the Euclidean distance between the color feature vectors color Fl (i) and color Fl ( j). maxi, j (D(i, j)) is to take the maximum within the matrix. The entry (x, y) for the color saliency matrix color Sˆ l represents a pixel within the ith superpixel region. The color saliency value for each pixel is set to be the same within the same superpixel region. Eq. (6) enables the region with a more different color distribution to obtain higher saliency, and promotes the color saliency values of the regions which contain abnormality of WCE images. We calculate the final saliency color S based on the mean value of

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

the different levels,

A. Local Descriptors and the Codebook color

S=

1 L color ˆ ∑ Sl . L l=1

(7)

D. Final Saliency Region Fusion Based on the texture and color saliency introduced in the aforementioned sections, we derive the proposed multi-level superpixels saliency method for the WCE images in a data fusion manner. Given image IM×N , we have computed two saliency maps texture S and color S based on the texture and color contrast information, respectively. Then the final saliency map can be defined as, f inal

S =color S ◦

texture

S ◦ K,

(8)

where ◦ stands for matrix Hadamard product. K ∈ RM×N is a Gaussian kernel which is centralized at the image center and gradually declines to the edge to mimic human attention property, K(x, y) = exp(−

(x − M/2)2 + (y − N/2)2 ). 2σ 2

(9)

For the simplicity, we use equal standard deviation σ in both horizontal and vertical directions in this paper. By applying the matrix Hadamard product, the color and the texture saliency maps could be fused in an arithmetic manner that only the regions with higher values in both texture and color features could achieve higher values in the final saliency. On the contrary, if a region has high value in only one of the saliency maps, the Hadamard product will eliminate it by simply multiplying a small saliency value to it. In this way, the final saliency map emphasizes the regions due to both color and texture contrasts.

III. SALIENCY BASED IMAGE CODING In this paper, we propose a modified LLC method that integrates the obtained saliency map to the max-pooling method for a WCE image. The modified LLC method can be summarized as three steps. In the first step, we extract the local descriptors (dense scale-invariant feature transform (dSIFT) [33], dense Histogram of Oriented Gradients (dHOG) [34], dense uniform local binary patterns (duniLBP) descriptors [35]) for each image separately to produce a codebook by using K-means clustering method. In the second step, the original LLC method is used for feature coding. In the last step, the max-pooling is applied on the saliency features and non-saliency features respectively to emphasize the features in the saliency region. Finally, these three local features are concatenated to produce the final image representation. By feeding the final representation features into the supported vector machine (SVM) classifier, we can carry out the ulcer image classification tasks. The work-flow of the proposed method for the WCE images is shown in Fig. 6.

In order to capture the diversity of the image characteristics, we extract different descriptors of the WCE images and then combine them together to represent images. The first feature we extracted is the dense SIFT (dSIFT) [36] features. The SIFT feature [33] has been recognized as one of the most robust features with respect to different geometrical changes. In the traditional SIFT feature, a local descriptor is created by forming a histogram of gradient orientations and magnitudes of image pixels in a small window. In our approach, however, the dSIFT descriptors are extracted at regular image grid points, rather than only at key points. The advantage of this strategy is that the dSIFT descriptor could be independence to the key point detection process which often fails due to missing texture or ill-illuminated images [37]. The dSIFT descriptor matrix dSIFT X ∈ R128×P can be calculated from each WCE image and represented by dSIFT X = [dSIFT xi ], where i ∈ [1, P]. Each column vector dSIFT x ∈ R128 corresponds to the dSIFT descriptor extracted i from an image patch with size of 16 × 16 and P is the number of the image patches. Similarly, we also extract the dHOG descriptor matrix (dHOG X ∈ R81×P ) and duniLBP descriptor matrix (duniLBP X ∈ R59×P ) from the same patches of each image to obtain a diverse representation. After obtaining the descriptors, we apply the K-means algorithm on each kind of features extracted from the training datasets to generate the visual vocabulary. The resultant cluster centers serve as a vocabulary of visual words and we represent the three codebooks as dSIFT B ∈ R128×M , dHOG B ∈ R81×M , and duniLBP B ∈ R59×M with M cluster centers. Choosing suitable codebooks is important and difficult. For example, if the size of visual words is too small, two key points will be assigned to the same cluster although they are not similar, which leads to the decrease of the discrimination ability. On the other hand, the vocabulary may be difficult to generalize with extra processing overhead. In an attempt to address this problem, we change the M from 10 to 100 and evaluate the corresponding classification performance. B. Descriptor Coding by LLC Once obtaining the three kinds of codebooks, we map each image descriptor matrix to the codebooks to obtain the image representation. The original LLC is first applied to encode the individual descriptors of each training and testing image. Without loss of generality, we take dSIFT descriptors as an example for the rest of the introduction and the left superscripts of different descriptors are dismissed for simplicity. Let X = [x1 , x2 , . . . , xP ] ∈ R128×P be the set of 128 dimensional descriptor vectors of an input sample and Z = [z1 , z2 , . . . , zP ] ∈ RM×P be the code for the image with LLC method. Then the objective function of LLC could be specified as N

min Z

∑ kxi − Bzi k2 + λ kdi zi k2

i=1

subject to 1T zi = 1, ∀i,

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(10)

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

Fig. 6. Illustration of the modified LLC coding descriptor formation. (a) WCE images as input. (b) The descriptors of dSIFT, dHOG, and duniLBP are extracted independently on each image. (c) Different codebooks for the three kinds of descriptors are formed. (d) Saliency map estimated by the aforementioned method. (e) The dSIFT, dHOG, and duniLBP codes are obtained independently by the original LLC coding method. (f) Saliency based max pooling is carried out to the three kinds of codes, and the concatenated results are obtained to represent each WCE image.

where λ is a weight parameter for adjusting the weights of locality, denotes element-wise multiplication, and di ∈ RM is locality adapter that gives different freedom for each basis vector proportional to its similarity to the input descriptor xi . dist(xi , B) ), σ dist(xi , B) = [dist(xi , b1 ), · · · , dist(xi , bm )], di = exp(

(11)

where dist(xi , b• ) is the Euclidean distance between xi and each element of the codebook B. And σ is used for adjusting the weight decay speed for locality adapter. C. Saliency based Max-Pooling After coding each descriptor vector, the codes are pooled together to generate a WCE image representation. The original LLC uses a naive max pooling method which ignores the saliency region of the image. This may have no effect for the recognition task if the input image has a clean background or there is a single object inside image to be recognized. But for the WCE images which usually have complex background and

distributed ulcer lesions, the naive max pooling method will ruin the important information of ulcer by blending the ulcer and normal regions together indiscriminately. Since the salient regions correspond to the semantic foreground that is probably the abnormal regions, we propose a saliency based max pooling method to emphasize the salient region in the image. First, we segment the obtained LLC code into two parts, salient region part and non-salient region part, according to the previously obtained saliency map for each input image. Given a 16 × 16 patch, if the mean saliency value of this patch is larger than the mean value of the whole saliency map, we designate this patch as a salient region and the corresponding code is a salient code. We then apply the max pooling method on the salient and non-salient codes, respectively. By concatenating the results together, the final representation vector of each image will be a M-dimensional feature vector y that can be calculated as,  max(s z1 , · · · ,s zU ) , max(ns z1 , · · · ,ns zP−U )

 y=

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(12)

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

where s z• and ns z• are the salient and non-salient codes, max(•) function returns the maximum value in a row-wise manner, U is the number of salient patches. The obtained image representation vector of each input WCE image highlights the salient region features, therefore it can characterize the WCE images better than other representations that calculated without considering saliency. Although the aforementioned method takes dSIFT descriptor as an example, the same process could be used for the dHOG and duniLBP descriptors. Thus, the final image representation f inal y ∈ R3M is a concatenation of the results from three descriptors,  T f inal (13) y = dSIFT yT dHOG yT duniLBP yT . IV. RESULTS A. Image Acquisition and Experimental Setup The images used for the development and evaluation of the proposed approach were extracted from 20 patients’ WCE videos with ulcerous diseases, such as unexplained ulceration, ulceration from NSAID, ulcerative colitis and Crohn’s diseases. The examinations were conducted by Pillcam SB WCE system and the Rapid Reader 6.0 software (Both from Given Imaging Ltd.) was employed to export the images from the video sequence. Actually it is difficult to gather large numbers of cases of ulcers in WCE videos since the patients who took the WCE examination may not suffer from the ulcers or they may carry multiple cases. Even if we have a clear ulcer case, bubbles in the GI tract may block the nidi and make it difficult to collect the ulcer images from patients. Thus, in this paper, we composed a dataset that consists of 170 ulcer and 170 normal images from the examination data of 20 patients. The 170 ulcer images were obtained from different ulcer regions to achieve the lowest possible similarity. Furthermore, the normal images including both simple and confusing healthy tissue (folds, villus, bubbles etc) are used to simulate the actual discrimination process. Three clinicians manually traced the boundaries of the ulcer regions on each single WCE image and these ulcers annotations served as ground-truth. B. Experiment Results for Ulcer Extraction The first experiment was designed to evaluate the performance of the proposed saliency extraction method for WCE images. After segmenting the WCE images under five different superpixel levels, we first calculated the corresponding texture and color saliency maps. As shown in Fig. 7(a), we illustrated this strategy by using one ulcer WCE image as an example. Fig. 7(b) shows the texture and color saliency maps under five different superpixel levels. Then, we fused these two kinds of five-level saliency maps into one texture and one color saliency maps, respectively. From the two fused saliency maps in Fig. 7(c), it is noted that the texture saliency and color saliency can complement each other. The final saliency map of the original image is built through fusing the two saliency maps together and applying a Gaussian kernel as shown in Fig. 7(d) and (e).

Fig. 9. Precision-recall figure that compares the saliency estimation results between the state-of-the-art methods and the proposed one. The blue dash line, the brown dot line, the cyan dot-dash line, the circle-marked black line, the cross-marked violet line and the red line represent CA, FT, GBVS, MSSS, SDSP, and our results, respectively.

In Fig. 8, a visual comparison for estimation saliency maps between the state-of-the-art methods and the proposed one is presented by using four different images that contain ulcers. The column (a) shows the original WCE images while the (b) to (g) columns show the estimated saliency maps by using FT [17], CA [14], GBVS [19], MSSS [16], SDSP [13] and our method. The column (h) shows the ground truth of the ulcer regions labeled by clinicians. As discussed in Section I, the traditional saliency extraction methods neither segment images into semantic regions as superpixels clearly nor analyze the color and texture information on the WCE images, therefore fail to outline the ulcer region in WCE images accurately. In contrast, Fig. 8(g) shows more meaningful saliency maps with well-defined object boundaries by the proposed multi-level superpixel extraction method. By integrating salient information from a coarse saliency map to a fine one, these examples qualitatively show that the proposed approach fits well for finding the boundaries between salient objects and background regions, therefore it is able to characterize the ulcer information well. It is also critical for our saliency detection algorithms to take account of the color and texture variation when locating the ulcer region. To quantitatively show the effectiveness of our method, we calculated the precision and recall rates for the salient maps from different methods under 256 salient thresholds ([0, 255]). The averaged results of our method and other ones are plotted as precision versus recall curve in Fig. 9. It is clear that our method achieves a higher precision value at every given recall than other methods. At maximum recall rates, all the methods converge to the same precision, which indicates there are almost 10% image pixels belonging to the ground truth salient regions. C. Experiment Results for Ulcer Recognition To evaluate the proposed ulcer frame recognition method, we carried the experiments with a dataset of 170 ulcer images and 170 normal images. For each WCE image with a size of 288 × 288, we extracted a 16 × 16 image patch every 8

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

Fig. 7. An example of saliency map extraction of an ulcer WCE image by using the proposed approach. (a) The original WCE image. (b) Saliency maps that extracted by texture and color contrasts under different of superpixel numbers. (c) The fused saliency maps of texture and color contrasts across different superpixel levels. (d) Saliency map obtained by fusing the texture and color maps. (e) Final saliency map after applying a Gaussian kernel.

Fig. 8. Comparison of saliency estimation results between the state-of-the-art methods and the proposed one by using four ulcer WCE images as an example. (a) original images. (b) FT. (c) CA. (d) GBVS. (e) MSSS. (f) SDSP. (g) the proposed method. (h) the binary-labeled ground truth.

Fig. 10. The WCE image ulcer recognition performance by the proposed modified LLC method under different vocabulary sizes. The red circle, blue cross and green square error bars indicate the accuracy, sensitivity, and specificity indexes, respectively. The markers show the means and the error bars show the standard deviations.

pixels in both row and column directions, which results in 1225 patches. Three types of features, dSIFT, duniLBP and the dHOG features are extracted at every patch to characterize the properties of the image. Supported vector machine (SVM) [38], [39] method with Gaussian radial basis function kernel was used to carry out classification.

Fig. 11. A comparison of the modified LLC method and the original one. The red square and blue circle error bars stand for the means and standard deviations of the modified LLC and the original one, respectively.

To present a quantitative measure, the performance of the classification was measured by the indexes of accuracy, sensitivity and specificity. In order to achieve as much generalization as possible in the results, the 5-fold cross validation method was applied to the dataset. From the total number of images, 80% of them was used for training and the remaining 20% for testing. This procedure was repeated 200 times by randomly selecting training and test sets, then the mean and standard deviation values of accuracy, sensitivity

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

TABLE I P ERFORMANCE COMPARISON OF ULCER DIAGNOSIS METHODS† .

Li et al. [8] CR-ULBP [9] DCT-LAC [10] Yu et al. [11] Ours∗

Accuracy (%)

Sensitivity (%)

Specificity (%)

89.49 ± 0.12 90.74 ± 0.07 85.44 ± 0.18 82.35 ± 0.93 92.65 ± 1.2

87.06 ± 0.38 86.62 ± 0.2 86.03 ± 0.65 91.18 ± 0.05 94.12 ± 2.4

91.91 ± 0.13 94.85 ± 0.15 84.84 ± 0.4 73.53 ± 3.89 91.18 ± 0.9

† Data are in a mean ± standard deviation form and are calculated from the results of 200 times 5-fold cross validation on the same WCE image dataset. ∗ The vocabulary size of our method is set to be 50 for the best performance.

image to represent the image information, but the features extracted from the non-ulcer region may bring noisy and redundant information to classification. Instead, in our proposed method, we first extract the saliency region which corresponds to the ulcer region and then incorporate this information to the feature coding process. In addition, compared with [11] which using existed BoW model to conduct the ulcer classification problem, we proposed a modified LLC method incorporating the saliency information to better encode the WCE images. Thus, our proposed method shows superior classification performance than those from [8]–[11]. V. DISCUSSION

and specificity indeces were calculated. Vocabulary size M of the codebook for the modified LLC method has to be carefully tuned during the construction of the image code. Therefore, we varied M from 10 to 100 in the experiments to evaluate how the vocabulary size influences the recognition performance. Fig. 10 shows the recognition performance with different number of codewords. The standard derivations of the recognition results are also illustrated as error bars in the figure to show the stability of classification result. It is clear that the means of accuracy, sensitivity and specificity indexes obtained the greatest values when the vocabulary size is selected to be 50. Moreover, the small standard deviation values of the three indexes indicate the stable performance when 50 codewords are used in the experiment. The mean accuracy, sensitivity and specificity with the vocabulary size 50 are 92.65%, 94.12% and 91.18%, respectively, demonstrating the effectiveness of the proposed method for detecting the ulcer frames in the WCE images. To evaluate the proposed method which uses a saliency based max pooling, we compared its performance with the original LLC method to carry out the ulcer recognition tasks. The 5-fold cross validation method was also carried out for 200 times, and the mean and standard deviation values of accuracy index for both methods are calculated. The comparison of the accuracy index of two methods is showed in Fig. 11. The performance of the modifed LLC method with the saliency max pooling outperforms the original LLC method under all vocabulary sizes. This result illustrates the effectiveness of our saliency based LLC method. To further evaluate the performance of the proposed method, we compared it with the state-of-the-art ulcer diagnosis methods. Table I shows the averaged accuracy, sensitivity and specificity of 200 times 5-fold cross validation by Li et al. [8], CR-ULBP method [9], DCT-LAC method [10], Yu et al. [11] and ours, respectively. The best performances of three indexes are highlighted by the gray background. The proposed method shows superior performance with an improvement of 10.30%, 3.16%, 1.91% and 7.21% in accuracy, 2.94%, 7.06%, 7.50% and 8.09% in sensitivity compared with the state-of-theart methods, respectively. This result validates the proposed method possesses superior ability to characterize the WCE images and demonstrates good discriminative capability for ulcer detection. In [8]–[10], the authors extracted features of the whole

A. Saliency Estimation for WCE Images As the first stage to describe a WCE image, we proposed a novel saliency estimation method that takes advantage of the superpixel segmentation method and the image contrast information (texture and color contrasts). Our method is based on the multi-level superpixel representation to avoid deficiency of pixel representation in the saliency region detection. The proposed multi-level superpixel segmentation method is considered as a superior alternative to the traditional WCE image segmentation method since it can outline abnormal region correctly and compactly. Since ulcer regions have particular color and texture characteristics, we utilized this important information and extracted salient maps that emphasize the color and texture contrasts of ulcer regions inside the WCE images. We tested many textures (such as Gabor filters [40], wavelets filters [41], et al.) to outline the ulcer region and found that the features extracted from LM filter could obtain better performance than the others. This result may be due to the multi-scale and multi-orientation nature of the LM filter. The experimental results shown in Section IV validated the effectiveness of our proposed method and also gave strong evidence to support our arguments. B. Parameter Tuning for Multi-level Superpixel Approach Although the multi-level superpixel approach and contrast information show significant performance, parameter tuning is the core problem that must be hurdled. The number of superpixel levels and superpixel number of each level, for instance, are the important parameters that have to be tuned in our method. Too many superpixels may lead to computational overhead and over-segmentation, while too few will lead to poor segmentation performance. In order to illustrate these problems, we should first choose the optimal maximum number of superpixel. We used only one superpixel level to carry out the saliency extraction experiment and plotted the corresponding precision-recall curves for the superpixel number from 50 to 400 in Fig. 12. By tuning the maximum number of superpixel, it is clear the proposed saliency method achieves the best overall performance when the superpixel number is set to 250. From this comparison, the optimal maximum number of superpixel in our data was chosen to be 250. Consequently, we proceeded to choose the optimal number of superpixel levels. In Fig. 13, it is shown the corresponding

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

200, 250, respectively. C. Image Coding Parameters

Fig. 12. Precision-recall figure that compares the saliency estimation results under different numbers of superpixels. The black dash line, the cyan dot line, the circle-marked yellow line, the red solid line, the diamond-marked green line and the square-marked blue line represent 100, 150, 200, 250, 300, and 400, respectively

The proposed modified LLC image coding method acts as the second stage for WCE image description. The original LLC method gives a compact description to represent WCE images by taking the locality and sparsity into consideration. In our methods, by carrying out max pooling after dividing the original LLC code into salient and non-salient parts, the WCE images can be better encoded to emphasize the salient regions in the recognition process. The ulcer recognition task is highly dependent on the codebook size (or vocabulary size). Intensive investigations have also been conducted to find out the relation between the recognition performance and the codebook size. The results suggested that the optimal performance could be reached by choosing the number of codewords to 50, with corresponding 92.65% accuracy and 94.12% sensitivity. The comparison between the modified and the original LLC method shows that the modified method robustly outperforms the original method under all vocabulary sizes. This result indicates our saliency-based approach suits the WCE ulcer diagnosis very well. Furthermore, compared with other ulcer detection methods [8]–[11], the viability of our method is fully demonstrated in terms of higher accuracy, sensitivity and satisfactory specificity and shows viability. D. Limitations and Future Work

Fig. 13. Precision-recall figure that compares the saliency estimation results under different numbers of levels after setting the maximum superpixel number to 200. The black dash line, the cyan dot line, the red solid line, the circle-marked blue line, and the square-marked green line represent 1, 3, 5, 7, and 9, respectively

There is still room for the improvement of the proposed method. In order to make the proposed method practically useful in hospital clinical trials, further tests using a much larger number of datasets are critical to validate the effectiveness and the robustness of the proposed classification strategy. Furthermore, this study is only proposed to implement automatic ulcer recognition task for the WCE images. Other tasks, such as bleeding detection, cancer recognition, etc, that use different datasets and features will be investigated in the near future. VI. CONCLUSION

precision-recall cureves with the number of superpixel levels (L) set to 1, 3, 5, 7, and 9, respectively. With the optimal maximum superpixel number as 250, the number of superpixels Ki at ith level is set as Ki = i × 250/L, i = 1, . . . , L. If L = 3, for instance, the numbers of superpixel Ki in different level i are set to be K1 = 83 (i.e., 250/3), K2 = 166 (i.e., 250 × 2/3), K3 = 250, respectively. It is clear that the performance with superpixel level set to 5 is better than the performances of 1 and 3 levels, and is similar to 7 and 9. However, the more superpixel levels are, the longer the computing time becomes. Considering the trade-off of the time and performance, we chose 5 superpixel levels as the optimal selection for the experiment. This results in the numbers of superpixel at each level to be 50, 100, 150,

In this paper, we proposed a two-staged fully automated computer-aided detection system to detect ulcer from WCE images. A saliency map extraction approach which is based on multi-level superpixel was proposed to segment the ulcer candidates in the first stage. In the second stage, the obtained saliency map is incorporated with the image features for performing the ulcer image recognition tasks. Since the ulcer usually corresponds to the saliency region, we propose a saliency max-pooling method integrated with the Locality-constrained Linear Coding (LLC) method to characterize the images. Experiment results achieve promising 92.65% accuracy and 94.12% sensitivity, validating the effectiveness of the proposed method. Furthermore, the comparison experiments showed that our method outperforms the state-of-the-art methods on the WCE ulcer classification task.

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2015.2418534, IEEE Transactions on Medical Imaging

ACKNOWLEDGMENT The authors would like to thank Thomas Yuen Tung Lam and Professor Justin Che Yuen Wu, CUHK Jockey Club Bowel Center Education Centre, Institute of Degestive Disease of the Chinese University of Hong Kong, for their professional suggestions on labeling the dataset. R EFERENCES [1] V. Charisis, L. Hadjileontiadis, and G. Sergiadis, “Enhanced ulcer recognition from capsule endoscopic images using texture analysis,” New Advances in the Basic and Clinical Gastroenterology, pp. 185– 210, 2012. [2] V. S. Charisis, L. J. Hadjileontiadis, J. Barroso, and G. D. Sergiadis, “Intrinsic higher-order correlation and lacunarity analysis for wce-based ulcer classification,” in Computer-Based Medical Systems (CBMS), 2012 25th International Symposium on. IEEE, 2012, pp. 1–6. [3] M. Appleyard, Z. Fireman, A. Glukhovsky, H. Jacob, R. Shreiver, S. Kadirkamanathan, A. Lavy, S. Lewkowicz, E. Scapa, R. Shofti, P. Swain, and A. Zaretsky, “A randomized trial comparing wireless capsule endoscopy with push enteroscopy for the detection of smallbowel lesions,” Gastroenterology, vol. 119, no. 6, pp. 1431 – 1438, 2000. [4] B. Upchurch and J. Vargo, “Small bowel enteroscopy,” Reviews in gastroenterological disorders, vol. 8, no. 3, p. 169177, 2008. [5] M. Manno, R. Manta, and R. Conigliaro, “Single-balloon enteroscopy,” in Ileoscopy, A. Trecca, Ed. Springer Milan, 2012, pp. 79–85. [6] G. Iddan, G. Meron, A. Glukhovsky, and P. Swain, “Wireless capsule endoscopy,” Nature, vol. 405, p. 417, 2000. [7] N. M. Lee and G. M. Eisen, “10 years of capsule endoscopy: an update,” Expert review of gastroenterology & hepatology, vol. 4, no. 4, pp. 503– 512, August 2010. [8] B. Li and M. Q.-H. Meng, “Texture analysis for ulcer detection in capsule endoscopy images,” Image and Vision computing, vol. 27, no. 9, pp. 1336–1342, 2009. [9] V. S. Charisis, C. Katsimerou, L. J. Hadjileontiadis, C. N. Liatsos, and G. D. Sergiadis, “Computer-aided capsule endoscopy images evaluation based on color rotation and texture features: An educational tool to physicians,” in Computer-Based Medical Systems (CBMS), 2013 IEEE 26th International Symposium on. IEEE, 2013, pp. 203–208. [10] A. Eid, V. S. Charisis, L. J. Hadjileontiadis, and G. D. Sergiadis, “A curvelet-based lacunarity approach for ulcer detection from wireless capsule endoscopy images,” in Computer-Based Medical Systems (CBMS), 2013 IEEE 26th International Symposium on. IEEE, 2013, pp. 273– 278. [11] L. Yu, P. C. Yuen, and J. Lai, “Ulcer detection in wireless capsule endoscopy images,” in Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012, pp. 45–48. [12] A. Karargyris and N. Bourbakis, “Detection of small bowel polyps and ulcers in wireless capsule endoscopy videos,” Biomedical Engineering, IEEE Transactions on, vol. 58, no. 10, pp. 2777–2786, 2011. [13] L. Zhang, Z. Gu, and H. Li, “Sdsp: A novel saliency detection method by combining simple priors.” in Image Processing (ICIP), 2013 20th IEEE International Conference on. IEEE, 2013, pp. 171–175. [14] S. Goferman, L. Zelnik-Manor, and A. Tal, “Context-aware saliency detection,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 10, pp. 1915–1926, 2012. [15] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, and S.-M. Hu, “Global contrast based salient region detection,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 409–416. [16] R. Achanta and S. Susstrunk, “Saliency detection using maximum symmetric surround,” in Image Processing (ICIP), 2010 17th IEEE International Conference on. IEEE, 2010, pp. 2653–2656. [17] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 1597–1604. [18] L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell, “Sun: A bayesian framework for saliency using natural statistics,” Journal of vision, vol. 8, no. 7, p. 32, 2008. [19] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Advances in neural information processing systems, 2006, pp. 545–552. [20] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on pattern analysis and machine intelligence, vol. 20, no. 11, pp. 1254–1259, 1998.

[21] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2. IEEE, 2006, pp. 2169–2178. [22] Y. Yuan and M. Q.-H. Meng, “Polyp classification based on bag of features and saliency in wireless capsule endoscopy,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, 2014, pp. 3930–3935. [23] T. Ma, Y. Zou, Z. Xiang, L. Li, and Y. Li, “Wireless capsule endoscopy image classification based on vector sparse coding,” in Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on. IEEE, 2014, pp. 582–586. [24] M. Safdari, R. Pasari, D. Rubin, and H. Greenspan, “Image patch-based method for automated classification and detection of focal liver lesions on ct,” in SPIE Medical Imaging. International Society for Optics and Photonics, 2013, pp. 86 700Y–86 700Y. [25] J. Wang, Y. Li, Y. Zhang, C. Wang, H. Xie, G. Chen, X. Gao, et al., “Bag-of-features based medical image retrieval via multiple assignment and visual words weighting,” IEEE transactions on medical imaging, vol. 30, no. 11, pp. 1996–2011, 2011. [26] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Localityconstrained linear coding for image classification,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp. 3360–3367. [27] A. P. Moore, S. Prince, J. Warrell, U. Mohammed, and G. Jones, “Superpixel lattices,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8. [28] A. Levinshtein, A. Stere, K. N. Kutulakos, D. J. Fleet, S. J. Dickinson, and K. Siddiqi, “Turbopixels: Fast superpixels using geometric flows,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 12, pp. 2290–2297, 2009. [29] L. Zhu, D. A. Klein, S. Frintrop, Z. Cao, and A. B. Cremers, “Multi-scale region-based saliency detection using w2 distance on n-dimensional normal distributions,” in Image Processing (ICIP), 2013 20th IEEE International Conference on. IEEE, 2013, pp. 176–180. [30] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 1794–1801. [31] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 11, pp. 2274–2282, 2012. [32] T. Leung and J. Malik, “Representing and recognizing the visual appearance of materials using three-dimensional textons,” International Journal of Computer Vision, vol. 43, no. 1, pp. 29–44, 2001. [33] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [34] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. IEEE, 2005, pp. 886–893. [35] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 7, pp. 971–987, 2002. [36] J.-G. Wang, J. Li, W.-Y. Yau, and E. Sung, “Boosting dense sift descriptors and shape contexts of face images for gender recognition,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. IEEE, 2010, pp. 96–102. [37] H. T. Ho and R. Chellappa, “Automatic head pose estimation using randomly projected dense sift descriptors,” in Image Processing (ICIP), 2012 19th IEEE International Conference on. IEEE, 2012, pp. 153– 156. [38] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995. [39] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27, 2011. [40] A. K. Jain and F. Farrokhnia, “Unsupervised texture segmentation using gabor filters,” in Systems, Man and Cybernetics, 1990. Conference Proceedings., IEEE International Conference on. IEEE, 1990, pp. 14– 19. [41] G. Van de Wouwer, P. Scheunders, and D. Van Dyck, “Statistical texture characterization from discrete wavelet representations,” Image Processing, IEEE Transactions on, vol. 8, no. 4, pp. 592–598, 1999.

0278-0062 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Saliency based ulcer detection for wireless capsule endoscopy diagnosis.

Ulcer is one of the most common symptoms of many serious diseases in the human digestive tract. Especially for the ulcers in the small bowel where oth...
1MB Sizes 25 Downloads 14 Views