摘要

Feature selection has been a significant task for data mining and pattern recognition. It aims to choose the optimal feature subset with the minimum redundancy and the maximum discriminating ability. This paper analyzes the feature selection method from two aspects of data and algorithm. In order to deal with the redundant features and irrelevant features in high-dimensional & low-sample data and low-dimensional & high-sample data, the feature selection algorithm model based on the granular information is presented in this paper. Thus, our research examines experimentally how granularity level affects both the classification accuracy and the size of feature subset for feature selection. First of all, the improved binary genetic algorithm with feature granulation (IBGAFG) is used to select the significant features. Then, the improved neighborhood rough set with sample granulation (INRSG) is proposed under different granular radius, which further improves the quality of the feature subset. Finally, in order to find out the optimal granular radius, granularity lambda optimization based on genetic algorithm (ROGA) is presented. The optimal granularity parameters are found adaptively according to the feedback of classification accuracy. The performance of the proposed algorithms is tested upon eleven publicly available data sets and is compared with other supervisory methods or evolutionary algorithms. Additionally, the ROGA algorithm is applied to the enterprise financial dataset, which can select the features that affect the financial status. Experiment results demonstrate that the approaches are efficient and can provide higher classification accuracy using granular information.