A New Approach for Feature Selection from Microarray Data Based on Mutual Information

Tang Jian<sup>*</sup>; Zhou Shuigeng

doi:10.1109/TCBB.2016.2515582

摘要

Mutual information (MI) is a powerful concept for correlation-centric applications. It has been used for feature selection from microarray gene expression data in many works. One of the merits of MI is that, unlike many other heuristic methods, it is based on a mature theoretic foundation. When applied to microarray data, however, it faces some challenges. First, due to the large number of features (i.e., genes) present in microarray data, the true distributions for the expression values of some genes may be distorted by noise. Second, evaluating inter-group mutual information requires estimating multi-variate distributions, which is quite difficult if not impossible. To address these problems, in this paper, we propose a new MI-based feature selection approach for microarray data. Our approach relies on two strategies: one is relevance boosting, which requires a desirable feature to show substantially additional relevance with class labeling beyond the already selected features, the other is feature interaction enhancing, which probabilistically compensates for feature interaction missing from simple aggregation-based evaluation. We justify our approach from both theoretical perspective and experimental results. We use a synthetic dataset to show the statistical significance of the proposed strategies, and real-life datasets to show the improved performance of our approach over the existing methods.

出版日期2016-12
单位复旦大学

全文

访问全文

收藏分享被引(21) 浏览

更新时间：2021-11-24 18:56

A New Approach for Feature Selection from Microarray Data Based on Mutual Information

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友