A new feature selection approach based on ensemble methods in semi-supervised classification

Settouti Nesma<sup>*</sup>; Chikh Mohamed Amine; Barra Vincent

doi:10.1007/s10044-015-0524-9

摘要

In computer aided medical system, many practical classification applications are confronted to the massive multiplication of collection and storage of data, this is especially the case in areas such as the prediction of medical test efficiency, the classification of tumors and the detection of cancers. Data with known class labels (labeled data) can be limited but unlabeled data (with unknown class labels) are more readily available. Semi-supervised learning deals with methods for exploiting the unlabeled data in addition to the labeled data to improve performance on the classification task. In this paper, we consider the problem of using a large amount of unlabeled data to improve the efficiency of feature selection in large dimensional datasets, when only a small set of labeled examples is available. We propose a new semi-supervised feature evaluation method called Optimized co-Forest for Feature Selection (OFFS) that combines ideas from co-forest and the embedded principle of selecting in Random Forest based by the permutation of out-of-bag set. We provide empirical results on several medical and biological benchmark datasets, indicating an overall significant improvement of OFFS compared to four other feature selection approaches using filter, wrapper and embedded manner in semi-supervised learning. Our method proves its ability and effectiveness to select and measure importance to improve the performance of the hypothesis learned with a small amount of labeled samples by exploiting unlabeled samples.

出版日期2017-8

全文

访问全文

收藏分享被引(6) 浏览

更新时间：2024-04-24 13:24

A new feature selection approach based on ensemble methods in semi-supervised classification

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友