摘要

In machine learning and data mining, feature selection aims to seek a compact and discriminant feature subset from the original feature space. It is usually used as a preprocessing step to improve the prediction performance, understandability, scalability, and generalization capability of classifiers. A typical gene microarray data set has the characteristics of high dimensionality, limited samples, and most irrelevant features, and these characteristics make it difficult to discover a compact set of features that really contribute to the response of the model. In this paper, a score-based criteria fusion feature selection method (SCF) is proposed for cancer prediction, and this method aims at improving the prediction performance of the classification model. The SCF method is evaluated on five open gene microarray data sets and three low-dimensional data sets, and it shows superior performance over many well-known feature selection methods when employing two classifiers SVM and KNN to measure the quality of selected features. Experiments verify that SCF is able to find more discriminative features than the competing methods and can be used as a preprocessing algorithm to combine with other methods effectively