摘要

Dimensionality reduction is a necessary task in data mining when working with high dimensional data. A type of dimensionality reduction is feature selection. Feature selection based on feature ranking has received much attention by researchers. The major reasons are its scalability, ease of use, and fast computation. Feature ranking methods can be divided into different categories and may use different measures for ranking features. Recently, ensemble methods have entered in the field of ranking and achieved more accuracy among others. Accordingly, in this paper a Heterogeneous ensemble based algorithm for feature ranking is proposed. The base ranking methods in this ensemble structure are chosen from different categories like information theoretic, distance based, and statistical methods. The results of the base ranking methods are then fused into a final feature subset by means of genetic algorithm. The diversity of the base methods improves the quality of initial population of the genetic algorithm and thus reducing the convergence time of the genetic algorithm. In most of ranking methods, it%26apos;s the user%26apos;s task to determine the threshold for choosing the appropriate subset of features. It is a problem, which may cause the user to try many different values to select a good one. In the proposed algorithm, the difficulty of determining a proper threshold by the user is decreased. The performance of the algorithm is evaluated on four different text datasets and the experimental results show that the proposed method outperforms all other five feature ranking methods used for comparison. One advantage of the proposed method is that it is independent to the classification method used for classification.

  • 出版日期2013-6

全文