Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014

645

Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation Pojala Chiranjeevi and Somnath Sengupta

Abstract— We propose a new algorithm for moving object detection in the presence of challenging dynamic background conditions. We use a set of fuzzy aggregated multifeature similarity measures applied on multiple models corresponding to multimodal backgrounds. The algorithm is enriched with a neighborhood-supported model initialization strategy for faster convergence. A model level fuzzy aggregation measure driven background model maintenance ensures more robustness. Similarity functions are evaluated between the corresponding elements of the current feature vector and the model feature vectors. Concepts from Sugeno and Choquet integrals are incorporated in our algorithm to compute fuzzy similarities from the ordered similarity function values for each model. Model updating and the foreground/background classification decision is based on the set of fuzzy integrals. Our proposed algorithm is shown to outperform other multi-model background subtraction algorithms. The proposed approach completely avoids explicit offline training to initialize background model and can be initialized with moving objects also. The feature space uses a combination of intensity and statistical texture features for better object localization and robustness. Our qualitative and quantitative studies illustrate the mitigation of varieties of challenging situations by our approach. Index Terms— Moving object detection, statistical local texture features, model feature vectors, model level fuzzy similarity, neighborhood supported model initialization, model level fuzzy aggregation.

I. I NTRODUCTION

M

OVING object detection from video sequences is a fundamental step in many visual surveillance applications like object tracking, human action recognition, high level behavior understanding, etc. The success of the overall system depends on this fundamental step. Research in this field has attained a level of maturity [1], [2] and the problems are reasonably well-solved for static and near-static backgrounds. However, challenges still exist to mitigate difficult background conditions such as swaying vegetation, camouflage, shadows, illumination changes, relocation of background objects, initialization with moving objects, sensitivity in detecting smaller objects, flickering monitors, etc. The most popular approach for moving object detection is the background subtraction, in which background models are maintained for each pixel and the foreground/background labeling decision is based Manuscript received March 28, 2013; revised August 1, 2013 and September 19, 2013; accepted September 23, 2013. Date of publication October 17, 2013; date of current version December 17, 2013. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. James E. Fowler. The authors are with the Department of Electronics and Electrical Communication Engineering, IIT Kharagpur, Kharagpur 721302, India (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2013.2285598

on the similarity between the stored models and the model based on the current features. A single background model for each pixel works in case of static backgrounds, but for highly dynamic background situations, a single model is often inadequate, as recent results from background subtraction algorithms have shown. Every possible dynamic condition of the background should ideally be associated with a model that evolves from the data. Multiple background models for each pixel can efficiently handle dynamic situations like swaying vegetation, rippling water, etc., which would otherwise be falsely classified as moving foreground. Each model may be constructed using a single feature which is either pixel-based [3]–[12] or region-based [13]–[19]. Every individual feature has some advantages and limitations under different conditions as explained in [21]. Use of multiple features [20]–[29] though more complex than single feature based techniques, exploits the advantages of different types of features. The preferred approach of background subtraction should therefore be to maintain multi-feature based multi-background models for each pixel. In implementation, several practical questions could arise. The first crucial point to address is the proper choice of features. Pixel-based features contribute to the accuracy in the shape of the detected moving objects, but exhibit some inherent weaknesses under dynamic backgrounds. Region-based features characterize exactly the opposite. Hence by using a combination of these two types of features, advantages of both can be exploited. In this work, we have chosen intensity (I) as the pixel feature and Statistical Texture (ST) features namely local homogeneity, energy, and texture mean as the region features. In multi-feature multiple model scenarios, the next major issue to be addressed is how to obtain an aggregated measure of multiple features of diverse nature for each model to arrive at a correct classification decision out of multiple models. We apply the concepts of Sugeno and Choquet fuzzy aggregation to the multi-model case, in which we required to compute a set of fuzzy integrals, where each represent the fuzzy similarity between the model and the current feature vector, to guide model retention and updating as well as foreground/ background labeling decision. Extent of similarity of the model with the current feature vector decides whether the model is either replaced or updated by the current at a given learning rate. From a practical standpoint, it is also important to note how quickly in terms of number of frames, the solutions can lead to correct classification, as we cannot expect existence of background-only frames initially. In general, every modelbased background subtraction algorithm considers the first frame’s feature vector of each pixel as the model for that

1057-7149 © 2013 IEEE

646

pixel, and in subsequent frames, these models evolve through the process of updating. In multi-model cases, a point to be considered is how to initialize several models for faster convergence. In the absence of other clues, the past algorithms initialized all the models of a pixel identically with the first frame’s feature vector. Intuitively, convergence of an evolving solution like this would depend a lot on the initialization. Moreover, in a dynamic background situation, the models of a pixel should truly include the dynamic nature. Let us consider an example of a tree that is swaying its leaves and branches, and behind that, we have a static background like a building behind the tree. Even without any moving foreground object, a pixel in different frames may either be a part of leaves, or branches, or the uncovered building. In all these cases, the labeling should be decisively background. Leaves, branches, and static background – all should be represented in the multiple models of the pixel. In a dynamic background situation like this, a pixel in a frame may be the part of a leaf and its neighboring pixels in the same frame may belong to the branches, other leaves, static background, etc. Other dynamic background conditions such as rippling water would also encounter situations similar in nature. Examples like these point out the need for model initialization with diversity, which is possible by including the feature vectors of neighborhood pixels in the model set, thereby including spatial background variations. Our argument was supported experimentally, where we established a much faster convergence of the model using our neighborhood supported model initialization as compared to identical multi-model initialization. In addition to model initialization, there is also an issue of model continuation. In a dynamic situation, a few rare older models may lose relevance and a few newer models may take its place. We therefore associate each model with a relevance factor, hereafter referred to as weight, which also gets updated along with the models, and only the models with higher weights are retained. To summarize, in our proposed work, herein referred to as Advanced Fuzzy Aggregation based Background Subtraction (AFABS), we have integrated multi-feature multi-model with neighborhood assisted initialization into a model level fuzzy aggregation framework, where each pixel is associated with weighted feature vectors, composed of intensity (I) and ST. The integrated approach works with the models and their associated weight updating within the said framework for robust foreground/background labeling. The contributions made in this paper are twofold: 1) Multi-feature fuzzy aggregation based on the concepts of Choquet and Sugeno integrals is applied on multiple models that represent the dynamic behavior of the background. The model level fuzzy similarities, calculated between all the models and the current feature vector using fuzzy integral, are utilized to maintain the background models in terms of creation, updating, and termination. We obtain a robust labeling of foreground/ background in terms of f - measures. 2) A neighborhood-assisted background model initialization is proposed that contributes to model diversity and faster convergence. The effectiveness of AFABS is demonstrated on a wide variety of video sequences,


having challenging background situations, with respect to the state-of-the-art. A comparative performance between the Choquet and the Sugeno integral is also shown using the proposed approach, where we demonstrate the better performance of Choquet over Sugeno, as the former is more suitable for cardinal aggregation. The rest of this paper is organized as follows. In Section-2, we present the related works. The basics of ST features and the definitions pertaining to the fuzzy aggregation of features are given in Section-3. The AFABS algorithm is presented in Section-4. Section-5 discusses the results and Section-6 concludes this paper. II. R ELATED W ORKS Background subtraction algorithms are well studied in the literature [1], [2]. Essentially, all background subtraction algorithms use models, which are composed of features. Features may be grouped under two broad categories – pixel features such as color [3]–[9], gradient [10]–[12], etc., and region features which are primarily textures [13]–[20]. As compared to these single feature based techniques, multifeatures such as color and gradient [23]–[25], color and texture [21], [26]–[29], etc. lead to better performance as advantages of individual features can be combined. In multifeature based techniques such as [23]–[26], multiple features were independently applied on the background subtraction algorithm and the segmented results from each feature are linearly combined to get the final result. Some of the methods [27]–[29] used multiple features hierarchically, using one feature for coarse detection and a set of other features to refine that. This increases the complexity of the algorithm. In [21], Chiranjeevi and Sengupta had established that a combination of I and ST features is more effective as multifeature model over the rest due to the shape localization of pixel based intensity and robustness of region based ST features. Each pixel is modeled with the mean and the covariance matrix, constructed using these features. However, block based covariance fusion results in poor shape localization and requires additional computations for the mean and the covariance. In these multi-feature based approaches, each feature is considered to be independent (i.e., correlation or interactions among the features were ignored) and is given equal importance, which may not lead to success in all types of environments. In multi-feature based foreground/background labeling, fuzzy fusion of various features through Choquet and Sugeno integrals [30] has been experimented recently to exploit the advantages of introducing fuzziness in the classification process. By giving an importance value to each feature and by considering the interactions or the correlations existing among the features by a fuzzy measure, the foreground/background classification could be strengthened. For example, Zhang and Xu fused color and gradient features in [31] and color and texture features (LBP) in [32] using Sugeno integral. Baf et al. fused color features in [33] and color and texture features in [34] using Choquet integral. Azab et al. [35] fused color, texture, and edge features using Choquet integral. Chiranjeevi and Sengupta [36] fused

CHIRANJEEVI AND SENGUPTA: NEIGHBORHOOD SUPPORTED MODEL LEVEL FUZZY AGGREGATION

intensity and fuzzy texture features using Choquet integral. All these approaches [32]–[36] considered a single model case, where the feature similarities between the model and the current set of features were fuzzy integrated at the pixel level and then thresholded for foreground/ background labeling. But the single model based techniques, which fuse features at the pixel level, may not be effective in real time challenging situations, as stated. Moreover, these techniques do not consider neighbors for model initialization. The deficiencies noted as a result in the performance of these approaches under highly dynamic background conditions lead to the conclusion that the background modeling issues need to be carefully investigated to handle multimodal distributions and other canonical problems such as initialization with moving objects, illumination variations, etc. The proposed work contributes in this direction.

•

Normalized symmetric co-occurrence matrix C=

For every frame of the video sequence and for every pixel, we compute a feature vector, which comprises of four elements – the intensity (I ) and three local ST features, namely, mean, local homogeneity, and energy, derived from the cooccurrence matrix, computed over a neighborhood region R of size M × N, centered at that pixel. The intensity and the co-occurrence matrix are computed on a single color channel. The ST features have been described with sufficient details in [21] and [37], and in this section, we only present the basic definitions of these features. To derive these features, we require the following matrices: • Co-occurrence matrix K θ,d , which tabulates the frequency of occurrences of two gray levels in an image at a specified distance d and a specified angle θ . Henceforth, for clarity, we would omit these subscripts and simply denote the co-occurrence matrix as K. This matrix is of dimensionality l × l , where l is the number of quantized intensity levels. • Symmetric co-occurrence matrix Ks = K + K T

ks (i , j )

where, ks (i , j )’s are the elements of the symmetric co-occurrence matrix. ST features — local homogeneity (L), energy (E), and the x-directional mean (Mx ) — are calculated using normalized symmetric co-occurrence matrix as follows: L =

III. A DVANCED F UZZY AGGREGATION OF I NTENSITY AND S TATISTICAL T EXTURE F EATURES

A. Computation of Feature Vector

Ks l−1 l−1 i=0 j =0

Mx =

Multi-feature based background subtraction techniques require efficient fusion of the information contributed by the individual features. Information fusion is the process of combining these features into a single datum and the fusion is achieved through aggregation operators, which are mathematical functions. The aggregation operators require some kind of parameterization to express additional information about the features that take part in the aggregation process. Weighted mean and weighted maximum, for example, are the simple forms of aggregation operators. But a more generalized form of these simple operators is achieved by using fuzzy measures in conjunction with fuzzy integrals such as Choquet and Sugeno integrals that consider interactions among the features, and hence would overcome the inherent limitations of crisp classification approaches [30]. In this section, we present a very brief introduction to the features that we use for fusion and the aggregation operators based on fuzzy measures that we apply on these features.

647

l−1 l−1

1

i=0 j =0

1+(i − j )2

l−1 l−1

c(i, j )

(1)

i c(i, j)

(2)

c(i, j )2

(3)

i=0 j =0

E =

l−1 l−1 i=0 j =0

where, c(i, j )’s are the elements of the normalized symmetric co-occurrence matrix. A feature vector X with a set of four features (I , L, E, Mx ) is formed for each pixel. X = [I L E Mx ]

(4)

The use of intensity (I) in our algorithm is generic. For monochrome images, it is the intensity, for R-G-B color images, one can consider either of the three color channels, and for Y-Cr-Cb images, the luminance (Y) component can be considered. B. Model Level Fuzzy Aggregation of Features For every pixel, a set of feature vectors is maintained as models and each is compared with the feature vector computed at that pixel in the current frame. The similarity values obtained for such element by element comparisons for each model are ordered and then the fuzzy integrals are computed on these ordered similarities using the membership values of the features. In this subsection, we present the definitions of the fuzzy measures and the fuzzy integrals, which would be necessary for a proper understanding of our algorithm. Let X= x 1 , x 2 , . . . , x n be the finite feature set and B be the Borel field of X. μ(x i ) ∈ (0, 1) be the importance given to the feature x i . Definition-1 Fuzzy measure: A fuzzy measure μ on a set X (the universe of discourse with the subsets E, F,…) is a set function μ: μ (X) → [0, 1], satisfying the following two conditions: (a) μ(φ) = 0, μ(X) = 1 (boundary conditions). (b) If E ⊆F and E, F∈B then μ(E) ≤ μ(F) (monotonicity condition). The boundary and the monotonicity conditions permit us to interpret the measure of a set as the measure of its importance. As more information sources (in our context, more features)

648


are added, the importance increases, and attains maximum value, which is one, when all the sources are considered. We now consider a set of r model feature vectors k )(k = 1, . . . , r ) at the pixel ( p, q) and let X p,q be the (Xmp,q current feature vector. Definition-2 Similarity function: The similarity function h(x ik ) for the it h component of the k t h model feature vector is given by h(x ik ) =

mi n(x im k , x i )

(5)

max(x im k , x i )

where, x im k indicates the it h component of the k t h model feature vector and x i indicates the it h component of the current feature vector. Let x 1k , x 2k , . . . , x nk be a permutation of x 1k , x 2k , . . . , x nk , that produces a non-decreasing order of similarity functions, i.e., h(x 1k ) ≤ h(x 2k ) ≤, ...., ≤ h(x nk ). From these ordered similarity function values and the membership values of all the features, the fuzzy integrals are calculated as follows. Definition-3 Choquet Integral: The Choquet integral F C of the similarity function with respect to the fuzzy measure μ is defined by FC =

n

(h(x i ) − h(x i−1 ))μ(x i , ....x n )

(6)

i=1

h(x 0 ) = 0 and n is the number of features. The superscript k on the i t h feature x i is omitted as the integral is to be computed for every model for k = 1, . . . , r . Choquet integral not only generalizes arithmetic mean and weighted mean but also generalizes ordered weighted average. Definition-4 Sugeno Integral: The Sugeno integral F S of the similarity function with respect to the fuzzy measure μ is defined by n

F S = Max (Mi n(h(x i ), μ(x i , ....x n ))) i=1

(7)

Sugeno integral generalizes weighted maximum, weighted minimum, and weighted median operators. We require the use of Sugeno λ-measure to compute the fuzzy measure μ on a subset of X, μ(x i , ....x n ). Definition-5 Sugeno λ-measure: A fuzzy measure μ is a Sugeno λ-measure, if it satisfies the following additional condition for λ > −1. μ(E ∪ F) = μ(E) + μ(F) + λμ(E)μ(F), where, E, F⊂X and E∩F= φ. Therefore the calculation of fuzzy measure over a subset only requires the values of all the singletons (μ({x i })) and the λ. The following equation establishes this fact with generalization. Consider a subset of features E= {x i , .., x n } whose joint membership is given by n 1 (1 + λμ(x k )) − 1 , if λ = 0 and λ > −1 μ(E) = λ k=i

=

n k=i

μ(x k ), if λ = 0

(8)

λ is obtained by replacing E with X in (8) and then applying boundary condition (μ(X) =1). n 1 (1 + λμ(x k )) − 1 ⇒ 1 + λμ(X) μ(X) = λ k=1

=

n

(1 + λμ(x k )) ⇒ 1 + λ =

k=1

n

[1 + λμ(x k )] .

k=1

IV. A DVANCED F UZZY AGGREGATION BASED BACKGROUND S UBTRACTION (AFABS) In AFABS, each pixel is modeled with a feature vector, composed of intensity and ST features, a combination of pixel and region-based features, to inherit the advantages of both types of features. By giving an importance value to each feature and fusing those by a fuzzy integral, correlations or interactions between the features can be considered. In this approach, multiple models are constructed for each pixel, where the models are initialized with neighborhood support, thereby achieving faster convergence of the background model to the background variations. Model level fuzzy similarity, calculated between each model and the current by the fuzzy integral, represents the amount of matching between those, and the model is updated accordingly. The schematic diagram of the proposed approach is shown in Fig. 1. The algorithm consists of five steps — model initialization, background models selection, fuzzy integral calculation for all the models, background model updating, and the foreground detection. The last subsection deals with the optimization of the parameter values used in the proposed algorithm. A. Model Initialization Model initialization is the first and the most crucial step of background subtraction. Many approaches described in the literature require a series of ideal background frames to initialize the model and these do not exploit the spatial correlations for model initialization. In the current problem, we simply initialize the model using the first frame of the video. In our model initialization scheme, models at the pixel are initialized with the feature vectors from the neighborhood pixels. As the neighboring pixels share their feature vectors for their model initialization, the possible spatial background variations are incorporated into the model in the beginning itself, resulting in faster convergence of the background model by adapting to those variations. This is experimentally validated in Fig. 2 and 3. The current algorithm is implemented using the same parameter set by using the proposed model initialization scheme and the scheme without neighborhood support, where all the models for a pixel are initialized with the same feature vector computed at that pixel. From Fig. 2, it can be observed that the proposed model initialization gives lesser false positives than the initialization scheme without neighborhood support due to faster convergence of the former by using neighborhood support. To illustrate quantitatively, a false positive versus frame index is plotted in Fig. 3 for the sequence given in Fig. 2, where it is seen that the proposed initialization


649

Fig. 3. Error comparison of two initialization schemes over a sequence of frames for the sequence in Fig. 2.

Fig. 1.

Schematic diagram of the proposed approach.

Fig. 2. Evaluation of the foreground/background classifications using two initialization schemes: (a) input frame, (b) model initialization without neighborhood support, and (c) proposed model initialization scheme.

Fig. 4. Probability (Prob) of number of background models (b) at the static (E) and the dynamic (F) pixels across seven fifty frames for EE sequence.

The larger the weight, the higher the probability of the model being produced by the background process. Therefore, at each time step, when we sort the models in the decreasing order of their weights, the most probable background models are on top of the list. The first b model feature vectors are selected as the background models as follows: ⎛ ⎞ j wi > TB ⎠ , TB ∈ [0, 1], (10) b = arg min j ⎝ i=1

scheme gives lesser false positives at each frame than the initialization scheme without neighborhood support as the former captures the spatial background variations using neighbors. Our proposed model initialization scheme is mathematically represented as follows. The model for the pixel ( p, q) consists of a group of r adaptive model feature vectors, r }. At t = 0, r − 1 adaptive model feature {Xmp,q1 , . . . , Xmp,q vectors for the pixel are initialized with the neighboring pixels’ feature vectors and the r t h model is initialized with the feature vector computed at the same pixel using first frame, as

r 1 2 = Xmp,q = Xmp,q , Xmp,q , . . . , Xmp,q Xi, j (9) (i, j )∈Nh

where, Nh is the valid neighborhood of ( p, q) and Xmp,q is a set of r models for the pixel at ( p, q). If the count of valid pixels in the chosen neighborhood is less than the number of models at the pixel ( p, q), the feature vector computed at ( p, q) is repeated for the remaining models. This case generally arises at the boundary pixels of a frame. Initially, weights of all the models are the same and are normalized such that their sum is one. The weight of the k t h model is denoted by wk .

where, TB is the user-settable threshold As an illustrative example to depict the strengths of model level fuzzy aggregation and neighborhood supported model initialization, we considered the EE sequence, using which we have shown the probability of b across all the frames for one static (E) and one dynamic pixel (F). We added a frame snapshot and a table in Fig. 4. The observations are very interesting. First, as compared to the static pixel, the dynamic pixel requires more than one background model more frequently (34% of the total frames for the dynamic background pixel as compared to only 12.3% for the static pixel), showing the necessity of model level fuzzy aggregation where multiple models are used to accommodate many background models. Second, as compared to the traditional model initialization (without neighborhood support), our neighborhood supported initialization leads to fewer occurrences of more than one background models (34% as against 89% for traditional initialization) in dynamic conditions. The remaining models (total models-background models) in the proposed initialization thereby include newly coming outliers, leading to faster convergence, as shown in Fig. 2 and 3. C. Fuzzy Integral Calculation for All the Models

B. Background Models Selection The probability of a model feature vector produced from the background processes is directly related to its weight.

In the current frame, the same set of features is computed at the corresponding pixel to form the current feature vector (X p,q ). Then the similarity functions are evaluated between the

650


corresponding elements of the current feature vector (X p,q ) mk and the model feature vector (X p,q ), where k varies from 1, 2,…r , using Eq.5. Using the ordered similarity function values and the membership values of all the features, the fuzzy integral for each model is calculated using (6) or (7) to obtain the set (F1 , · · · , Fr ).

Algorithm 1 AFABS

D. Model Updating The model of the pixels should be updated in order to cope up with the changes that have taken place in the background, as follows: • Case-1: If the max(F1 , . . . , Fr ) < T P , the current feature vector replaces the model feature vector having the lowest weight. mg

X p,q = X p,q •

(11)

g is the index of the least weighted model. Case-2: If the max(F1 , · · · , Fr ) ≥ T P , the best matching model that is the model feature vector having the maximum integral value is updated with the current feature vector as follows. ms

ms

X p,q (t) = (1 − αb )X p,q (t − 1) + αb X p,q (t)

(12)

where, t corresponds to the current frame, αb is the feature updating learning rate, and s is the index of the best matching model. Furthermore, the weights of the model feature vectors are updated as follows: wk = αw Mk + (1 − αw )wk , αw ∈ [0, 1]

(13)

where,αw is the weight updating learning rate and 1, i f k = s Mk = 0, Other wi se Thus the weights increase for longer existent models, enabling us to chose high weighted models as the most probable background models according to Eq.10. In both the cases, the weights of the models are normalized using Eq.14 so as to make their sum equal to one wk =

wk . w1 + w2 +, . . . , +wr

(14)

E. Foreground Detection In case 1 of Sec. IV-D, the current is not matching with any of the models, and hence the pixel is labeled as the foreground. In case 2, if the matched model’s index, s, is less than the number of most probable background models, b, the pixel is classified as background. Otherwise, it is classified as foreground. F. The AFABS Background Subtraction Algorithm The steps described above can be summarized in the following structure.

G. Choice of Parameters Our approach uses two types of parameters — those used to compute the feature vector and the ones pertaining to the generic background subtraction algorithm. The effects of the former are experimentally explained in Sec. V.E, whereas the effects of the latter are common in all background subtraction algorithms, and hence, just some theoretical explanations are added here for the sake of completeness. Number of models (r ) for a pixel is kept low for uni-modal background distributions and high for multi-modal background distributions. Higher value of r for a pixel results in larger memory requirements and computations. Lower value of r may not capture the complete background variations at a pixel. We have chosen an intermediate value of 4 and maintained constant. TB , which is used to select the background models, should be chosen high for multi-modal background distributions and low for uni-modal background distributions. We had chosen its value to be 0.7. The background model updation is controlled by two learning rate parameters αb and αw . We have experimentally obtained better results by retaining αb as constant (= 0.05) for all the video sequences. But, this is not the case, in case of αw , whose value depends on the type of video sequence, as explained follows. The larger the value of αw , the faster is the updation of the background variations into the background model. Due to this, a newly added model, which belongs to the foreground such as slow moving object, may soon reach the top and recognized as background, creating huge number of false negatives. But a


lower value of this parameter has the converse effect. We had chosen a trade off value for αw depending upon the type of video sequence. For example, for fast background changing video sequences (EE or WT), αw is chosen as high value (= 0.01), and for the remaining sequences, its value is chosen as 0.005. T P , the distance threshold, is chosen empirically. Its value in all our experiments is varied between (0.6, 0.8).

651

TABLE I Q UANTITATIVE P ERFORMANCE C OMPARISON OF P IXEL AND M ODEL L EVEL F USION U SING ACE FOR C HOQUET I NTEGRAL

V. R ESULTS AND D ISCUSSIONS This section presents our experimentations to illustrate the performance of the proposed method on many types of outdoor and indoor video sequences, having various challenging situations. The section is composed of five subsections. Section-V.A presents the mutual comparison between pixel and model level fusion (AFABS) for both Choquet and Sugeno integrals. Section-V.B compares the AFABS approach with the state-of-the-art approaches. In this section, we implicitly compared both Choquet and Sugeno integrals using AFABS approach. The similar parameters such as learning rates, number of models, background model selection threshold, region size, etc. across the competing algorithms are retained same. Distance thresholds of all the algorithms are varied such that their best results are used for comparison. Section-V.C demonstrates the effectiveness of AFABS in tackling other canonical background challenges. Experimentations over the parameters analysis are presented in Sec-V.D. Complexity and the computational issues are discussed in Sec-V.E. To measure the effectiveness of the proposed algorithm quantitatively, four indices, namely, Average Classification Error (ACE), f 0 − measur e, f 1 − measur e [38], and f j oint − measur e are used. f0 and f 1 are the measures of the foreground and the background respectively, whereas f j oint is a combined measure of the foreground and the background. For an algorithm to be efficient, it should have high f −measur es and low average error. FP + FN m 2P0 R0 2P1 R1 f1 = f0 = P0 + R0 P1 + R1

AC E =

f j oint =

2 f0 f 1 f0 + f1

(15) TN TN TP , R = , P = , and where, P0 = T N+F 0 1 N T N+F P T P+F P TP R1 = T P+F N . F P, F N, T P, T N respectively indicate false positives, false negatives, true positives and true negatives. m is the number of ground truth generated frames. It may be noted that for all our experimentations, post processing operations such as morphological, median filtering, and shadow elimination are not applied in any of the competing algorithms to maintain fairness in comparison. A. Performance Comparison Between pixel and model level fusion (AFABS) This comparison depicts the superiority of ST features over color (in different spaces), gradient (GR), and LBP features, as well as the superiority of AFABS (model level fusion) over

simple feature fusion techniques [33, 34] (pixel level fusion). As part of features’ fusion, the simple fusion algorithm [34] is implemented using various features (R-G-B, R-G-GR, R-GLBP, I-ST, and IFST [36]). Study on two more color spaces (Ohta and Y-Cb-Cr) has been added for Choquet integral. We considered G color channel to calculate the intensity and the ST features. In case of dynamic backgrounds (EE, WT, CA, SW1 ), foreground aperture (WT), camouflage (SW), and local intensity variations (IND,2 WK3 ), the performance of I-ST fusion is far better than that of R-G-B, R-G-GR, Ohta color, Y-Cb-Cr, and R-G-LBP fusion, as shown visually in Fig. 5 and 6 and numerically in Tables I and II for both Choquet and Sugeno integrals. Using AFABS (fusion at the model level), significant improvement in the segmentation quality as compared to simple I-ST and I-FST fusion (fusion at the pixel level) is obtained in case of dynamic backgrounds and marginal improvement in case of nearly stationary backgrounds (IND,WK), that may be noted visually in Fig. 5 and 6 and numerically in Tables I and II. As the first frame is used for model initialization, the model is wrongly initialized for BT sequence due to the existence of foreground objects at t = 0. Hence, all the simple fusion approaches are failed to detect the correct moving objects, but AFABS succeeded in this situation for both Choquet and Sugeno integrals. Average classification error (Eq. 15) of all these approaches on different video sequences is shown in Tables I and II for both Choquet and Sugeno integrals, where less average classification error is achieved for all the sequences by the AFABS approach.

B. Performance Comparison of AFABS (S) and AFABS (C) With the State-of-the-Art AFABS is a multi feature–model level fuzzy aggregation method whose performance is further compared with single feature based approaches (color- IMOG [5], CB [7], 1 M. Wu and X. Peng, “Spatio-temporal context for codebook-based dynamic background subtraction,” Int. J. Electron. Commun., vol.64, no.8, pp.739-747, 2010. 2 http://www.cvg.rdg.ac.uk/slides/pets.html 3 http://homepages.inf.ed.ac.uk/rbf/CAVIAR/

652


Fig. 5. Qualitative performance comparison of pixel ((b) to (h)) and model level fusion (i) using Choquet integral: (a) input frames, (b) R-G-B, (c) R-G-GR, (d) R-G-LBP, (e) Ohta, (f) Y-Cb-Cr, (g) I-ST, (h) I-FST, and (i) AFABS.

T2MOG [39],4 SOBS [8]5 and texture- LBPH [14]) and multifeature based approaches (STBS [21], MCBS [20]) to show its effectiveness. STBS and MCBS are used for comparison to depict the advantages of fuzzy over covariance fusion, as the former use covariance based fusion of multiple features. AFABS is a hybrid method which uses two different types of features - pixel (intensity) and region based (texture) by model level fuzzy fusion, and compared with the approaches (CB, IMOG, T2MOG, SOBS, LBPH) that use only one type of feature to depict the advantages of using two different types of features. Furthermore, IMOG (improved MOG over Stauffer and Grimson [5]) and CB are the most standard methods generally used for comparison in this field. T2MOG is used by us for comparison, as both (ours and theirs) use fuzzy concepts and is more relevant. LBPH, STBS, MCBS, and SOBS use neighbor characteristics for modeling each pixel similar to ours neighbor based texture computation and hence more relevant for comparison. In dynamic backgrounds (Fig. 7) such as swaying vegetation (EE, SW, CA) and water surface (WA), our fuzzy approaches (AFABS(S) and AFABS(C)) segment moving objects with lesser false positives as compared to others. Being pixel-based models, they outperform LBPH, MCBS, and STBS, which are block-based models. 4 http://sites.google.com/site/t2fmog/source 5 http://www.na.icar.cnr.it/~maddalena.l/MODLab/SoftwareSOBS.html

In CA sequence, where the foreground object is hardly visible, our approach is sensitive enough to detect small foreground movements against dynamic backgrounds in a better manner as compared to others. Multiple moving objects existing in SW, BT, and IND sequences (Fig. 7) are well segmented by our approaches. In IND sequence, closely moving objects are detected as separate blobs due to pixel level fuzzy fusion, whereas other block level approaches (LBPH, STBS, MCBS) detect those as a single blob. The performance of our approaches in presence of shadows (IND, BT, CF, IR6 and MSA7 ) (Fig. 7) is noteworthy, whereas the remaining approaches are severely affected by the shadows. False negatives are observed inside the detected foreground objects by applying LBPH on IND, MSA, WA, and CF video sequences, due to texture similarity between the foreground and the background, caused by uniform texture. Our approach is robust in case of uniform texture. Against local intensity variations (lower left in WK sequence), the performance of our algorithms are significantly better. Smaller objects existing in WK, FO, DE, and CA sequences are detected with accurate shape by our approaches, at par with approaches using only pixel-based features, and better than the approaches using region-based features (LBPH, STBS, MCBS), even with majority of the features in our approach being texture. In 6 http://cvrr.ucsd.edu/aton/shadow/index.html 7 http://cvprlab.uniparthenope.it.


653

Fig. 6. Qualitative performance comparison of pixel ((b) to (e)) and model level fusion (f) using Sugeno integral: (a) input frames, (b) R-G-B, (c) R-G-GR, (d) R-G-LBP, (e) I-ST, and (f) AFABS.

TABLE II Q UANTITATIVE P ERFORMANCE C OMPARISON OF P IXEL AND M ODEL L EVEL F USION U SING ACE FOR S UGENO I NTEGRAL

AFABS(S) and AFABS(C) exhibit higher f – measure values than the others, signifying their effectiveness. In both the measures, AFABS(C) offers lesser average error and higher f – measures than AFABS(S) for all the video sequences used for comparison. On an average, we had generated fifty ground truths for each sequence used in these quantitative evaluations. C. Effectiveness of AFABS in Tackling Other Canonical Background Challenges

some of these video sequences, there are portions in the moving objects which have similar chromatic features as the background, such as EE (loin), WA(calf), SW(whole body), IR(whole body), SW (whole body), CF (shirt), and WK (whole body), causing camouflage. But our approaches extract complete objects without any notable presence of false negatives. In all the sequences, our approach outperforms STBS, thereby indicating the superiority of fuzzy over covariance fusion. With respect to false negatives also, the performance of AFABS algorithms is quite good in all the sequences. Among our approaches, AFABS(C) outperforms AFABS(S) in all the sequences, especially in dynamic backgrounds, as Choquet integral is more suitable for cardinal aggregation. To summarize, reasonably better results are obtained across all the sequences by AFABS without using any post–processing operations and without changing the parameters (except T P ) across the sequences. Average classification error and f − measur es of all the competing approaches are compared in Tables III and IV respectively. Our approaches (AFABS (S) and AFABS (C)) offer lesser average classification error than the others. From Table IV, on an average over all the sequences (AVG),

In addition to above experiments, we tried to prove the robustness of AFABS (S) and AFABS (C) against other canonical background problems existing in real time environments such as airport (AP), shopping mall (SM), parking lot (PE8 ), and campus (DE), as shown in Fig. 8. Our approaches adapt well to the illumination variations (TD, CU), relocated background object (MO), and introduction of a car into the background (PE), that can be well noticed by comparing the current frame with the first frame of the video. AFABS also extracts multiple moving objects with better shape localization and also as isolated blobs in DE, AP, and SM sequences. Thick shadows existing in SM and AP sequences are also well handled by our approaches. Due to the unavailability of the ideal background frames in SM, AP, and BT sequences, we initialized the model with moving objects. After a few frames of adaptation of the model to the true background, correct segmented results are obtained. AFABS also handled well the curtain movements (CU) in the background by extracting the foreground with less noise. D. Parameter Analysis of Feature Vector The optimization of the parameters used to construct the feature vector in the proposed algorithm is described using 8 http://www.cvg.rdg.ac.uk/slides/pets.html

654


Fig. 7. Qualitative performance comparison between all the competing approaches: (a) input frames, (b) CB, (c) SOBS, (d) IMOG, (e) T2MOG, (f) LBPH, (g) STBS, (h) MCBS, (i) AFABS (S), and (j) AFABS (C).

Fig. 8. Qualitative evaluation of the proposed approaches for other canonical challenges: (a) first frame, (b) input frames, (c) ground truths, (d) AFABS (S), (e) AFABS (C).

CA sequence as follows. This sequence is specifically chosen, as it contains relatively smaller objects moving under dynamic backgrounds, and we will show how the parameter selection influences the shape as well as the robustness in segmentation. 1) Larger Region Size (M × N) results in the incorporation of information of larger scale into the co-occurrence matrix. Texture features calculated using such co-occurrence matrix characterize global properties and hence the foreground discrimination capability of the algorithm against the background reduces, especially in case of small moving objects, resulting an increase in the number of false negatives, and a

decrease in the number of false positives due to larger scale information, as shown in Fig. 9. The smaller moving object left undetected for the region size of 17 × 17 in Fig. 9. Keeping a smaller region size is essential to capture the local properties in order to obtain accurate shape localization of moving objects. Few false positives may arise due to this. So a trade off value is chosen. We obtained an optimum performance for a region size of 9 × 9. 2) Co-occurrence Matrix Parameter d has effects on the shape of the moving object. Larger value of d considers the inter pixel relation of farther pixels while calculating the


655

TABLE III Q UANTITATIVE P ERFORMANCE C OMPARISON B ETWEEN A LL THE C OMPETING A PPROACHES U SING ACE

Fig. 9.

Extracted foreground for various values of local region sizes.

Fig. 10.

Extracted foreground for various values of d.

co-occurrence matrix. Due to the global information, the shape of the moving objects will get lost. False positives are also get reduced with increase in d up to a certain value (d =2 in our case), as shown in Fig. 10. Beyond certain limit, the noise gets increases as the inter-pixel relation of two very farther pixels are not stable and reliable when there are background movements. 3) Effects of the Number of Quantized Levels on the segmentation performance are shown in Fig. 11. Using more quantization levels (l) increases the co-occurrence matrix size, thereby increasing the computational complexity. But the amount of texture information is also high, as a result, a better segmentation is obtained. Using less number of quantization levels results in the increase of the quantization noise, especially in the dynamic backgrounds, where the intensity variations are frequent. As evident from Fig. 11, background noise increases with the reduction in the number of quantization levels. We used a trade off value of 4, balancing the computational complexity and the noise. To show the independence of the choice of the cooccurrence matrix parameter θ on the segmentation performance, the sequence is segmented for various values of θ and

Fig. 11.

Extracted foreground for various values of l.

Fig. 12.

Extracted foreground for various values of θ .

the results are shown in Fig. 12. Segmented outputs are almost the same for various values of θ . We have chosen θ value as 135° in all our experimentations. Membership values (μ) for intensity, local homogeneity, energy, and mean are chosen as 0.45, 0.25, 0.15, and 0.25. These fuzzy measure values are determined after carrying out sufficient number of experiments on different kinds of video sequences. Giving larger weights to texture features result in better performance in dynamic backgrounds, but at the cost of shape localization, as inherently, the texture features are region-based in nature. Converse is true in case of intensity. Hence, to satisfy both these requirements, we assigned roughly equal weights to both intensity and texture features (0.45 and 0.65) and maintained constant across all the experimentations reported in this paper. E. Complexity and Computational Issues The complexity of the entire algorithm is mainly due to texture computation with a complexity of O(l 2 ), where l is the number of quantized intensity levels. For the entire image, it is O(XY l 2 ), where X × Y is the frame size. The computational complexity would have been definitely high,

656


TABLE IV Q UANTITATIVE P ERFORMANCE C OMPARISON B ETWEEN ALL THE C OMPETING A PPROACHES U SING f − measures: (a) f 0 , (b) f 1 , and (c) f j oint

TABLE V F RAME R ATE (fps) C OMPARISON OF A LL THE C OMPETING A PPROACHES

WT, BT, CF, TD, MO sequences can be downloaded from [40]. CA, AP, SM, CU, and WA sequences can be downloaded from [41]. VI. C ONCLUSION

if we had chosen the intensity levels in the entire dynamic range (that is 0-255). To reduce the complexity, our texture features are computed over quantized intensity values, which are just four. The local region size to calculate the texture features for each pixel is also chosen very small (9 × 9). Even with only four intensity levels and a smaller region size, we are able to obtain better results. Since the computation of texture features is independent at each pixel, parallel programming can be done to speed up the computation. The concept of integral image can also be applied to further speed up the computation. The frame rate comparison of all the competing approaches is indicated in Table V. From Table V, we may claim that our approach’s frame rate (fps) is high as compared to other multifeature based methods. Though the frame rate of our approach is less than that of pixel feature based methods, its performance against various challenging situations in comparison with pixel feature based methods is noteworthy. All the experiments are carried out on a system with Intel Core-2 duo processer and 2GB RAM. With more recent and advanced multi-core processors along with multi-threaded programming, the frame rate can be significantly improved.

A model level fuzzy aggregation based background subtraction algorithm using intensity and ST features is presented and its superiority over other features’ pixel level fusion is shown visually and numerically for both Sugeno and Choquet integrals. The models at a pixel are initialized with neighbors’ feature vectors for faster convergence of the model by adapting the background variations occurring spatially. Qualitative and quantitative experiments are carried out to show the effectiveness of AFABS in handling various challenging situations by comparing with the state-of-the-art. AFABS (C)’s performance is superior as compared to AFABS (S). Being a hybrid approach, AFABS inherits the advantages of both types of approaches, using pixel and region-based features, by extracting moving objects with accurate shape and with minimum error in dynamic backgrounds. Our approach does not require explicit background model training for model initialization unlike many other approaches such as MOG, CB, SOBS, etc. We did not assume any underlying model (for example Gaussian) for each pixel. Our method is simple as compared to other multi-feature covariance based fusion methods (STBS, MCBS) as these require additional computations of mean and


covariance. The effectiveness of model level fuzzy fusion as compared to covariance fusion is also illustrated. Future work on fuzzy aggregation based background subtraction should address further improved performance on heavily dynamic background situations, where AFABS, though better than all the competing state-of-the-art approaches, still shows some limitations in the subjective quality. R EFERENCES [1] M. Cristani, M. Farenzena, D. Bloisi, and V. Murino, “Background subtraction for automated multi-sensor surveillance: A comprehensive review,” EURASIP J. Adv. Signal Process., vol. 2010, pp. 20–25, 2010. [2] T. Bouwmans, F. El Baf, and B. Vachon, “Background modeling using mixture of Gaussians for foreground detection—A survey,” Recent Patents Comput. Sci., vol. 1, no. 3, pp. 219–237, 2008. [3] C. Chiu, M. Ku, and L. Liang, “A robust object segmentation system using a probability-based background extraction algorithm,” IEEE Trans. Circle Syst. Video, vol. 20, no. 4, pp. 518–528, 2010. [4] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” in Proc. IEEE Conf. CVPR, vol. 2. 1999, pp. 246–252. [5] Z. Tang and Z. Miao, “Fast background subtraction and shadow elimination using improved Gaussian mixture model,” in Proc. IEEE Workshop Haptic, Audio, Visual Environ. Games, 2007, pp. 541–544. [6] A. Elgammal, D. Hanvood, and L. S. Davis, “Nonparametric model for background subtraction,” in Proc. ECCV, pp. 751–767, 2000. [7] K. Kim, T. H. Chalidabhongse, D. Harwood, and L. S. Davis, “Real-time foreground-background segmentation using codebook model,” RealTime Imag., vol. 11, no. 3, pp. 172–185, 2005. [8] L. Maddalena and A. Petrosino, “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Trans. Image Process., vol. 17, no. 7, pp. 1168–1177, 2008. [9] A. Mittal and N. Paragios, “Motion-based background subtraction using adaptive kernel density estimation,” in Proc. CVPR, vol. 2. 2004, pp. 302–309. [10] D. Jang, X. Jin, Y. Choi, and T. Kim, “Background subtraction based on local orientation histogram,” in Proc. APCHI, 2008, pp. 222–231. [11] Z. Li, P. Jiang, H. Ma, J. Yang, and D. Tang, “A model for dynamic object segmentation with kernel density estimation based on gradient features,” Image Vis. Comput., vol. 27, no. 6, pp. 817–823, 2009. [12] L. Hu, W. Liu, B. Li, and W. Xing, “ Robust motion detection using histogram of oriented gradients for illumination variations,” in Proc. 2nd Int. Conf. Ind. Mech. Autom., vol. 2. 2010, pp. 443–447. [13] S. Zhang, H. Yao, and S. Liu, “Dynamic background modeling and subtraction using spatio-temporal local binary patterns,” in Proc. 15th IEEE ICIP, 2008, pp. 1556–1559. [14] M. Heikkila and M. Pietikainen, “A texture-based method for modeling the background and detecting moving objects,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 657–662, 2006. [15] L. Wang and C. Pan, “Fast and effective background subtraction based on ELBP,” in Proc. ICASSP, 2010, pp. 1394–1397. [16] L. Wang, H. Wu, and C. Pan, “Adaptive ELBP for background subtraction,” in Proc. ACCV, 2010, pp. 560–571. [17] G. Xue, J. Sun, and L. Song, “Dynamic background subtraction based on spatial extended center-symmetric local binary pattern,” in Proc. IEEE ICME, 2010, pp. 1050–1054. [18] Y. Satoh and K. Sakaue, “Robust background subtraction based on bi-polar radial reach correlation,” in Proc. IEEE TENCON, 2005, pp. 998–1003. [19] K. Yokoi, “Probabilistic BPRRC: Robust change detection against illumination changes and background movements,” IEICE Trans. Inf. Syst., vol. E93-D, no. 7, pp. 1700–1707, 2010. [20] S. Zhang, H. Yao, S. Liu, X. Chen, and W. Gao, “A covariance-based method for dynamic background subtraction,” in Proc. ICPR, 2008, pp. 1–4. [21] P. Chiranjeevi and S. Sengupta, “Moving object detection in the presence of dynamic backgrounds using intensity and textural features,” J. Electron. Imag., vol. 20, no. 4, pp. 043009-1–043009-11, 2011. [22] P. Chiranjeevi and S. Sengupta, “Spatially correlated background subtraction, based on adaptive background maintenance,” J. Vis. Commun. Image Represent., vol. 23, no. 6, pp. 948–957, 2012. [23] S. Jabri, Z. Duric, H. Wechsler, and A. Rosenfeld, “Detection and location of people using adaptive fusion of color and edge information,” in Proc. 15th ICPR, vol. 4. 2000, pp. 627–630. [24] M. Izadi and P. Saeedi, “Robust region-based background subtraction and shadow removing using color and gradient information,” in Proc. 19th ICPR, 2008, pp. 1–5.

657

[25] Q. Wan and Y. Wang, “Background subtraction based on adaptive nonparametric model,” in Proc. 7th WCICA, 2008, pp. 5960–5965. [26] J. Yao and J. M. Odobez, “Multi-layer background subtraction based on color and texture,” in Proc. IEEE CVPR, 2007, pp. 1–8. [27] X. Jian, D. Xiao-qing, W. Sheng-jin, and W. You-shou, “Background subtraction based on a combination of texture, color and intensity,” in Proc. IEEE 9th ICSP, 2008, pp. 1400–1405. [28] J. Yang, J. Wang, and H. Lu, “A hierarchical approach for background modeling and moving objects detection,” Int. J. Control, Autom., Syst., vol. 8, no. 5, pp. 940–947, 2010. [29] Y. Chen, C. Chen, C. Huang, and Y. Hung, “Efficient hierarchical method for background subtraction,” Pattern Recognit., vol. 40, no. 10, pp. 2706–2715, 2007. [30] V. Torra and Y. Narukawa, Modeling Decisions: Information Fusion and Aggregation Operators. New York, NY, USA: Springer-Verlag, 2007. [31] H. Zhang and D. Xu, “Fusing color and gradient features for background model,” in Proc. 8th Int. Conf. Signal Process., vol. 2. 2006, pp. 1–10. [32] H. Zhang and D. Xu, “Fusing color and texture features for background model,” in Proc. 3rd Int. Conf. Fuzzy Syst. Knowl. Discovery, 2006, pp. 887–893. [33] F. El Baf, T. Bouwmans, and B. Vachon, “A fuzzy approach for background subtraction,” in Proc. ICIP, 2008, pp. 2648–2651. [34] F. El Baf, T. Bouwmans, and B. Vachon, “Fuzzy integral for moving object detection,” in Proc. IEEE Int. Conf. Fuzzy Syst., 2008, pp. 1729–1736. [35] M. M. Azab, H. A. Shedeed, and A. S. Hussein, “A new technique for background modeling and subtraction for motion detection in real-time videos,” in Proc. IEEE 17th ICIP, 2010, pp. 3453–3456. [36] P. Chiranjeevi and S. Sengupta, “New fuzzy texture features for robust detection of moving objects,” IEEE Signal Process. Lett., vol. 19, no. 10, pp. 603–606, 2012. [37] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for image classification,” IEEE Trans. Syst. Man, Cybern., vol. 3, no. 6, pp. 610–621, 1973. [38] S. Herrero and J. Bescos, “Background subtraction techniques: Systematic evaluation and comparative analysis,” in Proc. ACIVS, 2009, pp. 33–42. [39] F. El Baf, T. Bouwmans, and B. Vachon, “Type-2 fuzzy mixture of Gaussians Model: Application to background modeling,” in Proc. Int. Symp. Visual Comput., 2008, pp. 772–781. [40] (1999). Wallflower Dataset [Online]. Available: http://research.mocrosoft.com/jckrumm/wallflower/ testimages.html [41] (2004). i2r Dataset [Online]. Available: http://perception.i2r.astar.edu.sg/bk_model/bk_index. html

Pojala Chiranjeevi received the B.Tech. degree in electronics and communication engineering from Sree Vidyanikethan Engineering College, affiliated with Jawaharlal Nehru Technological University, Rangampet, India, in 2005, the M.Tech. degree in visual information processing and embedded systems and the Ph.D. degree in computer vision from the Department of Electronics and Electrical communication Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India, in 2008 and 2012, respectively. Currently, he is with the Samsung Advanced Institute of Technology, Bangalore, India. His current research interests include image processing, pattern recognition, computer vision, machine learning, and multimedia.

Somnath Sengupta received the bachelor’s degree in electronics and telecommunication engineering from Jadavpur University, West Bengal, India, in 1978, the M.Tech. degree in electrical engineering from the Indian Institute of Technology Madras, Chennai, India, in 1980, and the Ph.D. degree in electrical engineering from the Indian Institute of Technology Bombay, Mumbai, India, in 1993. From 1980 to 1991, he was with the Research and Development Laboratories of Tata Electric Companies, Mumbai. In 1991, he joined the faculty of electronics and electrical communication engineering with the Indian Institute of Technology Kharagpur (IIT-KGP), Kharagpur, India, where he is currently a Professor. From 2000 to 2002, he was a Visiting Fellow with the University of Edinburgh, Scotland, U.K. He has authored a number of journal and conference papers of international repute. He has contributed significantly to distant learning education through a number of video and web courses.

Multiple-object geometric deformable model for segmentation of macular OCT.

Feature-based fuzzy connectedness segmentation of ultrasound images with an object completion step.

A biological hierarchical model based underwater moving object detection.

Fuzzy speed function based active contour model for segmentation of pulmonary nodules.

Localized Patch-Based Fuzzy Active Contours for Image Segmentation.

Multi-Object Model-based Multi-Atlas Segmentation for Rodent Brains using Dense Discrete Correspondences.

A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation.

A new kernel-based fuzzy level set method for automated segmentation of medical images in the presence of intensity inhomogeneity.

Moving object localization using optical flow for pedestrian detection from a moving vehicle.

CT images using fuzzy Markov random field model.

Automatic Cell Segmentation in Fluorescence Images of Confluent Cell Monolayers Using Multi-object Geometric Deformable Model.

Automatic leukocyte nucleus segmentation by intuitionistic fuzzy divergence based thresholding.

MOVING TO INEQUALITY: NEIGHBORHOOD EFFECTS AND EXPERIMENTS MEET STRUCTURE.

Robust object segmentation using a multi-layer laser scanner.

Object segmentation controls image reconstruction from natural scenes.

Moving Object Detection on a Vehicle Mounted Back-Up Camera.

A method for neighborhood-level surveillance of food purchasing.

Correction: Object Segmentation and Ground Truth in 3D Embryonic Imaging.

Modeling visual clutter perception using proto-object segmentation.

A magnocellular contribution to conscious perception via temporal object segmentation.

Moving Object Detection in Heterogeneous Conditions in Embedded Systems.

Object-oriented fuzzy expert system for on-line diagnosing and control of bioprocesses.

Some series of intuitionistic fuzzy interactive averaging aggregation operators.

Object Segmentation and Ground Truth in 3D Embryonic Imaging.