Feature selection using mutual information based uncertainty measures for tumor classification.

Bio-Medical Materials and Engineering 24 (2014) 763–770 DOI 10.3233/BME-130865 IOS Press

763

Feature selection using mutual information based uncertainty measures for tumor classification Lin Sun a,b,∗ and Jiucheng Xu b a International b College

WIC Institute, Beijing University of Technology, Beijing, China of Computer and Information Engineering, Henan Normal University, Xinxiang, China

Abstract. Feature selection is a key problem in tumor classification and related tasks. This paper presents a tumor classification approach with neighborhood rough set-based feature selection. First, some uncertainty measures such as neighborhood entropy, conditional neighborhood entropy, neighborhood mutual information and neighborhood conditional mutual information, are introduced to evaluate the relevance between genes and related decision in neighborhood rough set. Then some important properties and propositions of these measures are investigated, and the relationships among these measures are established as well. By using improved minimal-Redundancy-Maximal-Relevancy, combined with sequential forward greedy search strategy, a novel feature selection algorithm with low time complexity is proposed. Finally, several cancer classification tasks are demonstrated using the proposed approach. Experimental results show that the proposed algorithm is efficient and effective. Keywords: Feature selection, neighborhood rough set, mutual information, tumor classification

1. Introduction An emerging medical application domain for microarray gene expression profiling technology is clinical decision support, including those relevant to tumor classification, prediction of clinical outcomes in response to treatment, and so on [1]. In recent years, gene expression profiles based molecular diagnosis of tumor has been a well pursued topic purposed at precise diagnoses of tumors at their early stages [2]. However, high dimensionality and small sample size challenges the tumor classification. Feature (gene) selection is an efficient way to solve these problems [3]. Generally, genes are ranked according to their different expression by analysis of combination of normal and tumor samples, and genes above a predefined threshold are considered as candidate genes for the cancer being studied [4]. In fact, a good gene selection method that can identify key tumor-related genes is of vital importance to tumor classification and identification of diagnostic and prognostic signatures for predicting therapeutic responses [5]. The method for extracting relevant genes becomes a key issue for cancer diagnosis [6]. *

Corresponding author. E-mail: [email protected].

0959-2989/14/$27.50 © 2014 – IOS Press and the authors. All rights reserved

764

L. Sun and J. Xu / Feature selection using mutual information based uncertainty measures

In gene selection, a measure to compute relevance between genes and cancers must be required. Moreover, to select the best subset of genes in terms of the measure, a search strategy also needs to be constructed. Filter and wrapper methods are the two major categories to select features from gene expression data [6]. Recently, researches on feature selection are mainly focused on two aspects: criteria and searching strategies [7,8]. Typically, a criterion tries to measure the discriminating ability of a feature or a subset to distinguish the different class labels. A number of measures, such as distance, correlation, and mutual information, have been applied in filters to evaluate the efficacy of a feature [9]. Techniques such as linear discriminant analysis, fast correlation based filter, and filters using entropy and other information concepts for feature selection, have been successfully applied to gene selection in clinical practices. Among these techniques, mutual information has been reported to be effective in selecting features. In the last few years, mutual information [1,6,7,9,10] has been introduced or developed for feature evaluation. Ding and Peng [11] proposed a gene selection approach based on minimal redundancy and maximal relevance (mRMR). Zhang et al. [12] combined mRMR and ReliefF for gene selection. In experimental comparison, mRMR is found to be very effective for feature selection in a number of applications [1,11,12]. Exhaustive evaluation of possible feature subsets is usually unfeasible in practice due to the large amount of computational effort required, as a result a wide range of heuristic search strategies have been used including forward selection, backward elimination, hill-climbing, branch and bound algorithms, and stochastic algorithms like simulated annealing and genetic algorithms [7]. At present, two important issues need to be addressed in mRMR based gene selection. One is to compute relevance between numerical variables. It is known that gene expressions are numerical and data set discretization usually leads to information loss. To solve this issue, Peng et al. [13] used Parzen window technique to compute the relevance between numerical features and decision. Hu et al. [1] proposed neighborhood mutual information to cope with continuous gene data and evaluate the relevance between features. In gene analysis, it is difficult to estimate probability and joint probability of features for samples sparsely distributed in a high-dimensional space. Therefore, it is desirable to construct an effective and simple measure of relevance between continuous and discrete features for gene selection. This paper focuses on creating such a solution. The other issue is that the complexity of mRMR is very time-consuming to compute. There is a large number of genes in gene expression data sets, only a few of which are essential for classification. Hence, existing mRMR is unfit in practise if a gene selection algorithm is time-consuming. Based on this observation, some uncertainty measures are introduce to compute relevance between genes and cancers in neighborhood rough set. These measures are natural generalization of mutual information in numerical gene spaces. Then the properties of gene data that just minority of features are related to the tasks are discussed. An improved mRMR based algorithm for efficient feature selection is proposed, which is applied to six tasks of tumor classification. Experimental results show that this algorithm is more effective than the conventional feature selection approaches. 2. Neighborhood rough set and mutual information In gene expression data analysis, data is in the form of a collection of real-valued vectors. Assume that there are c subclasses of cancers, let D = {d1 , d2 , · · · , ds } denote the class labels of s samples, where di = k indicates the sample i being cancer k, where k = 1, 2, · · · , c. Let U = {x1 , x2 , · · · , xn } be a set of samples and C = {g1 , g2 , · · · , gm } a set of genes, and the corresponding gene expression matrix derived from a microarray can be represented as M = {xij |i = 1, 2, · · · , m, j = 1, 2, .., n}, where xij is the measured expression level of gene gi in the sample xj , i = 1, 2, · · · , m, j = 1, 2, · · · , n, and usually m n. Each row in the matrix corresponds to one sample and each column to a gene.


765

Let IS = (U, C ∪ D, V, f ) be an information system for classification learning, where U = {x1 , x2 , · · · , xn } is a nonempty and finite sample set called sample space, C is a nonempty and finite set of genes also called condition attributes to characterize the samples, D is a set of output variable also called decision attributes with C ∩ D = ∅, Va is a value domain of a ∈ C ∪ D, f : U × C ∪ D → V is an information function which associates a unique value of each gene with every sample on U , and V = ∪a∈C∪D Va . The neighborhood δP (xi ) of xi ∈ U in the subspace P ⊆ C is denoted by δP (xi ) = {xj |xj ∈ U, ΔP (xi , xj ) ≤ δ}, where δ is the threshold and ΔP (xi , xj ) is the metric function in subspace P . There are three widely used metric functions. Let x1 and x2 be two samples in m-dimensional space C = {g1 , g2 , · · · , gm }; f (x, gi ) be the value xis of gi in the sample s; Minkowsky distance can be 1 represented as Δp (x1 , x2 ) = ( ni=1 |f (x1 , gi ) − f (x2 , gi )|p ) p . When p = 1, Minkowsky distance is also known as Manhattan distance Δ1 . When p = 2, it is also called Euclidean distance Δ2 . Chebychev distance is used to represent Minkowsky distance when p = ∞. Manhattan distance is used in this case. Entropy of a random variable is a measure of its associated uncertainty, while mutual information of two random variables is the reduction in uncertainty of one variable given knowledge of the other [10]. More specifically, if a random variable X is a set of discrete features, then its uncertainty can be measured by H(X) = − x∈X p(x) log2 p(x), where p(x) is the probability distribution for all possible feature combinations x ∈ X. Similarly, if X and Y are two discrete random variables, the joint entropy of X ∪Y is denoted by H(X ∪ Y ) = − y∈Y x∈X p(x, y) log2 p(x, y), where p(x, y) denotes the joint probability density function of X and Y . If Y is given, the uncertainty of X, known as conditional entropy, can be calculated as: H(X|Y ) = − y∈Y p(y) x∈X p(x|y) log2 p(x|y) , where p(x|y) is the posterior probabilities of X given Y . To quantify how much information is shared by two variables X and Y , p(x,y) the concept mutual information I(X; Y ) is introduced as I(X; Y ) = y∈Y x∈X p(x, y) log2 p(x)p(y) . For continuous random variables, the entropy and mutual information can be defined as: H(X) = p(x,y) − p(x) log2 p(x)dx, I(X; Y ) = p(x, y) log2 p(x)p(y) dxdy. Unfortunately, in practice, for realworld problems, none of p(x), p(x) and p(x, y) are known, meaning that I(X; Y ) cannot be directly computed but has to be estimated from the data set. 3. Mutual information-based uncertainty measures As to gene analysis, with sample size being limited, it is impossible to precisely estimate probabilities of genes [1]. Since mutual information is good at quantifying information shared by two random variables, it is often taken as evaluation criterion to measure the relevance between features and class labels. Therefore, it becomes important to employ mutual information in gene evaluation to solve the difficulty in estimating probability density of genes. Then, the concept of neighborhood is combined with information theory and Shannon’s entropy is generalized to numerical information in feature selection. Definition 1. Let U be a set of cancer samples with P ⊆ C a subset of genes, and δP (xi ) denotes the neighborhood of sample xi ∈ U in P . The neighborhood entropy of P can be defined as N Hδ (P ) = −

|δP (xi )| 1 n , log2 i=1 |U | |U |

(1)

and P, Q ⊆ C are two subsets of genes, and the joint neighborhood entropy of P ∪ Q can be defined as N Hδ (P ∪ Q) = −

|δP ∪Q (xi )| 1 n . log2 i=1 |U | |U |

(2)

766


where |X| is the cardinality of set X, and δP ∪Q (xi ) is the neighborhood of sample xi ∈ U in P ∪ Q. Proposition 1. The following properties hold (1) 0 ≤ NHδ (C) ≤ log2 |U | as δC (xi ) ⊆ U for any xi ∈ U . (2) If δ ≤ δ , then δP (xi ) ⊆ δP (xi ) and N Hδ (P ) ≥ N Hδ (P ), where xi ∈ U , P ⊆ C a subset of genes. (3) N Hδ (P ∪ Q) ≥ N Hδ (P ) and N Hδ (P ∪ Q) ≥ N Hδ (Q), where P, Q ⊆ C are two subsets of genes. Proposition 2. If δ = 0, then N Hδ (P ) = H(P ) and N Hδ (P ∪ Q) = H(P ∪ Q), where P, Q ⊆ C. Proof. Suppose δ = 0, then δP (x) can be considered as the equivalence class [x]P in rough sets, i.e., δP (x) = [x]P . Thus, |δP|U(x|i )| is the probability distribution for all possible feature combinations for |[x ] | |[xj ]P | = H(P ). Similarly, xi ∈ U . Hence, N Hδ (P ) = − |U1 | ni=1 log2 |δP|U(x|i )| = − |Uj |P j log2 |U | N Hδ (P ∪ Q) = H(P ∪ Q) can be proved. This completes the proof. Definition 2. Let U be a set of cancer samples with P, Q ⊆ C two subsets of genes. The conditional neighborhood entropy of P with respect to Q (Q has been given) can be defined as N Hδ (P |Q) = −

|δP ∪Q (xi )| 1 n . log2 i=1 |U | |δQ (xi )|

(3)

If D is the decision of cancer samples, the conditional neighborhood entropy of D can be defined as N Hδ (D|P ) = −

|δD∪P (xi )| 1 n . log2 i=1 |U | |δP (xi )|

(4)

Property 1. Let U be a set of cancer samples with P ⊆ C a subset of genes and D the decision of cancer samples. If δP (x) ⊆ δD (x) for any x ∈ U , the decision of sample x ∈ U is δ-neighborhood consistent. Proposition 3. Let U be a set of cancer samples with P, Q ⊆ C. N Hδ (P |Q) = N Hδ (P ∪Q)−N Hδ (Q). Proof. It follows immediately from Definitions 1 and 2 that (xi )| |δ |δ (x )| N Hδ (P ∪ Q) − N Hδ (Q) = − |U1 | ni=1 (log2 P ∪Q − log2 Q|U |i ) |U | |δ (xi )| |U | = − |U1 | ni=1 (log2 P ∪Q |U | |δQ (xi )| ) (x )| |δ = − |U1 | ni=1 log2 |δP Q∪Q(xi )|i = N Hδ (P |Q), i.e., N Hδ (P |Q) = N Hδ (P ∪ Q) − N Hδ (Q). This completes the proof. Definition 3. Let U be a set of cancer samples with P, Q ⊆ C two subsets of genes. The neighborhood mutual information of P and Q can be defined as |δP (xi )||δQ (xi )| 1 n . (5) N M Iδ (P ; Q) = N Hδ (P ) + N Hδ (Q) − N Hδ (P ∪ Q) = − log2 i=1 |U | |U ||δP ∪Q (xi )| If D is the decision of cancer samples, the neighborhood mutual information of D and P is defined as |δD (xi )||δP (xi )| 1 n . (6) log2 N M Iδ (D; P ) = − i=1 |U | |U ||δD∪P (xi )| (x)||δP (x)| |[x]D ||δP (x)| |[x]D | Property 2. N M Iδ (D; P )x = − log2 |δ|UD||δ = − log2 |U ||δD∪P (x)| = − log2 |U | = H(D)x , if D∪P (x)| δP (x) ⊆ δD (x) for any x ∈ U . Obviously, if δP (x) ⊆ δD (x) for any x ∈ U , N M Iδ (D; P ) = H(D). Proposition 4. For any P, Q ⊆ C two subsets of genes , the following properties hold (1) N M Iδ (P ; Q) ≥ 0. (2) N M Iδ (P ; Q) = N M Iδ (Q; P ). (3) N M Iδ (P ; Q) = N Hδ (P ) − N Hδ (P |Q) = N Hδ (Q) − N Hδ (Q|P ).


767

The mutual information is a measure of generalized correlation between two feature sets, and it can also be interpreted as the amount of information shared by two feature sets or the quantity of information one set can predict about the other [14]. It is noted that N M Iδ (P ; Q) is also called joint mutual information, for mutual information is defined only on two features or variables in some places. Definition 4. Let U be a set of cancer samples with P, Q, S ⊆ C three subsets of genes. The neighborhood conditional mutual information between P and Q given subset of genes S can be defined as N CM Iδ (P ; Q|S) = −

|δP ∪S (xi )||δQ∪S (xi )| 1 n . log2 i=1 |U | |δS (xi )||δP ∪Q∪S (xi )|

(7)

If D is the decision of cancer samples, the neighborhood conditional mutual information between P and D given subset of genes S can be defined as N CM Iδ (P ; D|S) = −

|δP ∪S (xi )||δD∪S (xi )| 1 n . log2 i=1 |U | |δS (xi )||δP ∪D∪S (xi )|

(8)

Remark. Definition 4 states that N CM Iδ (P ; Q|S) can be interpreted as how much information the feature set P can provide about the feature set Q that the feature set S cannot. At an extreme, if N CM Iδ (P ; Q|S) = 0, P and Q are conditional independent given S, i.e., there is no extra information P can provide about Q when S is known. That is to say, N CM Iδ (P ; Q|S) implies Q brings information about P which is not already contained in S. Proposition 5. For any P, Q, S ⊆ C three subsets of genes, the following properties hold (1) N CM Iδ (P ; Q|S) ≥ 0. (2) N CM Iδ (P ; Q|S) = N Hδ (P |S) − N Hδ ((P |Q) ∪ S). (3) N CM Iδ (P ; Q|S) = N CM Iδ (Q; P |S). Proposition 6. Let P, S ⊆ C be two subsets of genes and D the decision of cancer samples. If S P ∪S (x)||δD∪S (x)| is given and δP (x) ⊆ δD (x) for any x ∈ U , then N CM Iδ (P ; D|S)x = − log2 |δ |δS (x)||δP ∪D∪S (x)| =

D∪S (x)| − log2 |δ|δ = N Hδ (D|S)x , i.e., N CM Iδ (P ; D|S)x = N Hδ (D|S)x . S (x)| Corollary 1. Let P, S ⊆ C be two subsets of genes and D the decision of cancer samples. If S is given and δP (x) ⊆ δD (x) for any x ∈ U , then N CM Iδ (P ; D|S) = N Hδ (D|S). Remark. Proposition 6 and Corollary 1 state that when S ⊆ C is given, if the decisions of samples in gene set P are δ-neighborhood consistent, then the uncertainty of the neighborhood conditional mutual information is equal to that of the conditional neighborhood entropy generated by the decision classes.

4. Feature selection algorithm It is obvious that genes of greater mutual information with decision would provide more information for cancer classification, therefore genes of great neighborhood mutual information should be selected. Based on this consideration, maximal relevance criterion [1,15] is introduced, which can be denoted by 1 M ax(S, D) = |S| N M Iδ (D; gi ). Based on the ideas above, the following algorithm is presented. gi ∈S Algorithm 1. Neighborhood mutual information based feature selection (NMIFS) Input: Set of genes C = {g1 , g2 , ..., gm }, D is the decision of cancer samples, and k ∈ [100 ∼ 200] Output: Selected gene subset S (1) Initialize relative parameter: S ← ∅ (2) Calculate N M Iδ (D; gi ) for any gi ∈ C, i = 1, 2, .., m

768


(3) Construct a descending order according to M ax(S, D) for any S ⊆ C (4) Select the front k genes, and let S = {gi1 , gi2 , · · · , gik }, where i = 1, 2, · · · , m (5) Return S It is known that ranking based algorithms cannot remove redundant features because those algorithms neglect the relevance between features [1]. In addition, Figure 1 illustrates the relationships between neighborhood mutual information and neighborhood entropy. After g1 is selected, there is no new contribution to reduce the uncertainty of D when the next gene g2 is selected. Since neighborhood conditional mutual information proposed above is a valid measure of relevance, we can therefore deduct that genes whose neighborhood conditional mutual information is less than a threshold are irrelevant to cancer recognition. Thus, these genes can be removed and they can be recognized by computing the neighborhood conditional mutual information between each gene and decision. In this case, assume that Sk = {g1 , g2 , · · · , gk } is a selected gene subset, to evaluate better relevance and redundancy of next selected genes, mRMR in [13] is introduced to present a new criterion, which is called improved minimal-Redundancy-Maximal-Relevance as follows: Min-Max(S, D) = 1 ) − β N CM I (g ; D|g N CM I (g ; gj |g ), where β is used to i+1 i+1 δ 2 δ g ∈Sm −Sk gj ∈Sk ,g ∈Sm −Sk |S| |S| regulate the relative importance of relevance and redundancy. Min-Max (S, D) computes the significance of each gene one by one and sorts the features in the descending order according to their significance. The following specific classification algorithm is proposed to test the best s gene, where s = 1, 2, · · · , N , and N is the number of all candidate genes. Finally, the genes yielding the best performance are selected. Algorithm 2. Neighborhood conditional mutual information based feature selection (NCMIFS) Input: Set of genes C = {g1 , g2 , · · · , gm }, D is the decision of cancer samples, and k ∈ [100 ∼ 200] Output: Selected gene subset S (1) Initialize relative parameter: S ← ∅ (2) Construct a selected gene subset Sk = {gi1 , gi2 , · · · , gik } with Algorithm 1 (3) S = {gi1 } (4) Calculate N CM Iδ (gi+1 ; D|S) and N CM Iδ (gi+1 ; gj |S) for any gi ∈ Sk , where i, j = 1, 2, .., k (5) Construct a descending order according to Min-Max(S, D) for any S ⊆ C (6) Employ a specific classification algorithm to evaluate gene subset series, S1 = {g1 }, S2 = {g1 , g2 }, · · · , Sk = {g1 , g2 , · · · , gk }, and select the subset which produces the best classification performance (7) Return S By using NCMIFS, the time complexity of gene selection is polynomial. When given N candidate features, the time complexity computed of the relevance between N and decision is O(N ). Steps (4) and (5) remove irrelevant features by computing the relevance between features and classification, in which most genes are deleted. Hence, these genes are not considered in the subsequent computation. Therefore, 2 3 the time complexity of NCMIFS is approximately O(N ), and no more than O(N ). 1+į(J) 1+į(') 10,į('J) 1+į(J) 1+į(J) Fig. 1. Relationships between neighborhood mutual information and neighborhood entropy.

769


5. Experimental results The experiments are performed on a personal computer with Windows XP, Intel(R) Core(TM) Dual CPU 3.1 GHz, and 4 GB memory. NCMIFS algorithm is implemented in Matlab language. In order to test the proposed algorithm, six tumor classification data sets shown in Table 1 are collected. In this experiment, we combine training sets and test sets together, normalize the feature value to the unit interval, and set δ = 0.2 according to the experimental results. To show the effectiveness of our proposed algorithm, NCMIFS is compared with several other feature selection algorithms, which include neighborhood rough set (NRS) and neighborhood mutual information based Algorithm (NMI). Due to limited space, the above two algorithms are not described, though their results can be found in [1]. The number of genes selected with three algorithms is given in Table 1. It is surprising that only several genes are selected for the tumor classification tasks. From Table 1, it can be observed that NCMIFS exhibits the best performances on most of the six data sets except for SRBCT, and outperforms NMI by a little margin although they are very close. Table 1 Number of genes selected with three forward greedy algorithms Data sets Breast DLBCL Leukemia1 Leukemia2 Lung SRBCT

Classes 9216 4026 7129 12582 7129 2308

Genes 5 6 3 3 3 5

Samples 84 88 72 72 96 88

NRS 5 4 2 2 5 3

NMI 3 4 2 2 4 5

NCMIFS 3 4 2 2 4 4

100 95 90 85 80 75 70 65 60

NMI-EmRMR NCMIFS 1

3

5

7

9

11

13

15

17

No. of genes

100 95 90 85 80 75 70 65 60 55

NMI-EmRMR NCMIFS 1

3

5

7

9 11 13 No. of genes

15

17

(b) Linear support vector machine (LSVM) Fig. 2. Variation of classification accuracy with number of selected genes (Breast).

(a) k-nearest-neighbor classifier (KNN)

19

Classification accuracy (%)

Classification accuracy (%)

Next we show the classification power of the first 20 genes selected by NMI_EmRMR in [1] and NCMIFS. The variation of classification accuracy with the number of selected genes is exhibited in Fig. 2 (a) and (b), where Breast is used as an example. From these two figures, it can be seen that the performances of NMI_EmRMR and NCMIFS are very close, though there is little difference between NMI_EmRMR and NCMIFS inducing classification accuracy and NCMIFS performs a little better than NMI_EmRMR as the number of genes increases. These results show that removing the genes after rank 100 has no influence on feature selection for tumor classification. Therefore, NCMIFS is a feasible solution to accelerate feature selection from tumor classification data sets.

19

770


6. Conclusion Finding minimum tumor-related gene subsets can improve the predictive performance of classification model. However, there remain the challenges from high dimensionality and small sample size of tumor data sets. As the gene expression information is usually quantitative, some mutual information based uncertainty measures are proposed to compute the relevance between numerical features and decision. By improving the minimal-Redundancy-Maximal-Relevancy strategy and combining it with forward greedy search, an effective gene selection algorithm with low computational complexity and high classification accuracy is constructed. Six tumor classification data sets are used to demonstrate the algorithm. Results shows that both NCMIFS and NMI-EmRMR can significantly improve the classification performances. Furthermore, the proposed method has many potential applications which need further investigation. Acknowledgments. This work was supported by National Natural Science Foundation of China (60873104, 61040037), Project of Henan Science and Technology Department (112102210194), Project of Henan Educational Department (12A520027, 13A520529), Fund for Youth Key Teachers of Henan Normal University, and Postgraduate Program of Beijing University of Technology (ykj-2012-6765). References [1] Q.H. Hu, W. Pan, S. An, P.J. Ma and J.M. Wei, An efficient gene selection technique for cancer recognition based on neighborhood mutual information, International Journal of Machine Learning and Cybernetics 1 (2010), 63–74. [2] S.L. Wang, X.L. Li, S.W. Zhang, J. Gui and D.S. Huang, Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction, Computers in Biology and Medicine 40 (2010), 179–189. [3] L. Sun, J.C. Xu and Y. Tian, Feature selection using rough entropy-based uncertainty measures in incomplete decision systems, Knowledge-Based Systems 36 (2012), 206–216. [4] R. Aragues, C. Sander and B. Oliva, Predicting cancer involvement of genes from heterogeneous data, BMC Bioinformatics 9 (2008), 172. [5] H.Q. Wang and D.S. Huang, Regulation probability method for gene selection, Pattern Recognition Letters 27 (2006), 116–122. [6] F.F. Xu, D.Q. Miao and L.Wei, Fuzzy-rough attribute reduction via mutual information with an application to cancer classification, Computers and Mathematics with Applications 57 (2009), 1010–1017. [7] J.J. Huang, Y.Z. Cai and X.M. Xu, A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern Recognition Letters 28 (2007), 1825–1844. [8] J.C. Xu and L. Sun, Knowledge entropy and feature selection in incomplete decision systems, Applied Mathematics & Information Sciences 7 (2013), 829–837. [9] S.N. Yu and M.Y. Lee, Conditional mutual information-based feature selection for congestive heart failure recognition using heart rate variability, Computer Methods and Programs in Biomedicine 108 (2012), 299–309. [10] J.B. Yang and C.J. Ong, An effective feature selection method via mutual information estimation, IEEE Transactions on Systems, Man, And Cybernetics—Part B: Cybernetics 42 (2012), 1550–1559. [11] C. Ding and H. Peng, Minimum redundancy feature selection from microarray gene expression data, in: IEEE Computer Society Conference on Bioinformatics, 2003, pp. 523–528. [12] Y. Zhang, C. Ding and T. Li, Gene selection algorithm by combining reliefF and mRMR, BMC Genomics 9 (2008), doi:10.1186/1471-2164-9-S2-S27. [13] H.C. Peng, F.H. Long and C. Ding, Feature selection based on mutual information: criteria of max-dependency, maxrelevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005), 1226–1238. [14] Y.S. Zhang and Z.G. Zhang, Feature subset selection with cumulate conditional mutual information minimization, Expert Systems with Applications 39 (2012), 6078–6088. [15] Battiti, Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks 5 (1994), 537–550.

Copyright of Bio-Medical Materials & Engineering is the property of IOS Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Feature selection for chemical sensor arrays using mutual information.

Spatial mutual information based hyperspectral band selection for classification.

Tree-structured feature extraction using mutual information.

Multiclass classification of sarcomas using pathway based feature selection method.

Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

Heartbeat classification using disease-specific feature selection.

Ensemble selection for feature-based classification of diabetic maculopathy images.

Feature Subset Selection for Cancer Classification Using Weight Local Modularity.

A feature selection based framework for histology image classification using global and local heterogeneity quantification.

Feature selection for ordinal text classification.

Mutual information measures applied to EEG signals for sleepiness characterization.

Feature selection and classification of leukocytes using random forest.

Parallel classification and feature selection in microarray data using SPRINT.

Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection.

Simultaneously optimizing spatial spectral features based on mutual information for EEG classification.

Discriminative least squares regression for multiclass classification and feature selection.

An ant colony optimization based feature selection for web page classification.

A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders.

Multi-Stage Feature Selection Based Intelligent Classifier for Classification of Incipient Stage Fire in Building.

A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data.

Relevance popularity: A term event model based feature selection scheme for text classification.

A kernel-based multivariate feature selection method for microarray data classification.

Feature selection in classification of eye movements using electrooculography for activity recognition.

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes.