摘要

Feature scoring is an avenue to feature selection that provides a measure of usefulness for the individual features of a classification task. Features are ranked based on their scores and selection is performed by choosing a small group of high-ranked features. Most existing feature scoring/ranking methods focus on the relevance of a single feature to the class labels regardless of the role of other features (context-insensitive). The paper proposes a genetic programming (GP)-based method to see how a set of features can contribute towards discriminating different classes. The features receive score in the context of other features participating in a GP program. The scoring mechanism is based on the frequency of appearance of each feature in a collection of GP programs and the fitness of those programs. Our results show that the proposed feature ranking method can detect important features of a problem. A variety of different classifiers restricted to just a few of these high-ranked features work well. The proposed scoring-ranking mechanism can also shrink the search space of size O(2(n)) of subsets of features to a search space of size O(n) in which there are points that are very likely to improve the classification performance.

  • 出版日期2011