A novel framework for gene selection

Zhao Wei; Wang Gang<sup>*</sup>; Wang Hong bin; Chen Hui ling; Dong Hao; Zhao Zheng dong

doi:10.4156/ijact.vol3.issue3.18

摘要

Microarray data are highly redundant and noisy, and most genes are believed to be uninformative with respect to studied classes, as only a fraction of genes may present distinct profiles for different classes of samples. This paper proposed a novel hybrid framework (NHF) for the classification of high dimensional microarray data, which combined information gain(IG), F-score, genetic algorithm(GA), particle swarm optimization(PSO) and support vector machines(SVM). In order to identify a subset of informative genes embedded out of a large dataset which is contaminated with high dimensional noise, the proposed method is divided into three stages. In the first stage, IG is used to construct a ranking list of features, and only 10% features of the ranking list are provided for the second stage. In the second stage, PSO performs the feature selection task combining SVM. F-score is considered as a part of the objective function of PSO. The feature subsets are filtered according to the ranking list from the first stage, and then the results of it are supplied to the initialization of GA. Both the SVM parameter optimization and the feature selection are dynamically executed by PSO. In the third stage, GA initializes the individual of population from the results of the second stage, and an optimal result of feature selection is gained using GA integrating SVM. Both the SVM parameter optimization and the feature selection are dynamically performed by GA. The performance of the proposed method was compared with that of the PSO based, GA based, Ant colony optimization (ACO) based and simulated annealing (SA) based methods on five benchmark data sets, leukemia, colon, breast cancer, lung carcinoma and brain cancer. The numerical results and statistical analysis show that the proposed approach is capable of selecting a subset of predictive genes from a large noisy data set, and can capture the correlated structure in the data. In addition, NHF performs significantly better than the other methods in terms of prediction accuracy with smaller subset of features.

出版日期2011

全文

访问全文

收藏分享被引浏览

更新时间：2018-10-29 05:54

A novel framework for gene selection

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友