USING STATISTICAL MEASURES FOR FEATURE RANKING

作者:Mansoori Eghbal G*
来源:International Journal of Pattern Recognition and Artificial Intelligence, 2013, 27(1): 1350003.
DOI:10.1142/S0218001413500031

摘要

Feature ranking is a fundamental preprocess for feature selection, before performing any data mining task. Essentially, when there are too many features in the problem, dimensionality reduction through discarding weak features is highly desirable. In this paper, we have developed an efficient feature ranking algorithm for selecting the more relevant features prior to derivation of classification predictors. Regardless the ranking criteria which rely on the training error of a predictor based on a feature, our approach is distance-based, employing only the statistical distribution of classes in each feature. It uses a scoring function as ranking criterion to evaluate the correlation measure between each feature and the classes. This function comprises three measures for each class: the statistical between-class distance, the interclass overlapping measure, and an estimate of class impurity. In order to compute the statistical parameters, used in these measures, a normalized form of histogram, obtained for each class, is employed as its a priori probability density. Since the proposed algorithm examines each feature individually, it provides a fast and cost-effective method for feature ranking. We have tested the effectiveness of our approach on some benchmark data sets with high dimensions. For this purpose, some top-ranked features are selected and are used in some rule-based classifiers as the target data mining task. Comparing with some popular feature ranking methods, the experimental results show that our approach has better performance as it can identify the more relevant features eventuate to lower classification error.

  • 出版日期2013-2