摘要

The goal of feature selection is to search the optimal feature subset with respect to the evaluation function. Exhaustively searching all possible feature subsets requires high computational cost. The alternative suboptimal methods are more efficient and practical but they cannot promise globally optimal results. We propose a new feature selection algorithm based on distance discriminant and distribution overlapping (HFSDD) for continuous features, which overcomes the drawbacks of the exhaustive search approaches and those of the suboptimal methods. The proposed method is able to find the optimal feature subset without exhaustive search or Branch and Bound algorithm. The most difficult problem for optimal feature selection, the search problem, is converted into a feature ranking problem following rigorous theoretical proof such that the computational complexity can be greatly reduced. Since the distribution of overlapping degrees between every two classes can provide useful information for feature selection, HFSDD also takes them into account by using a new approach to estimate the overlapping degrees. In this sense, HFSDD is a distance discriminant and distribution overlapping based solution. HFSDD was compared with ReliefF and mrmrMID on ten data sets. The experimental results show that HFSDD outperforms the other methods.