摘要

In this paper, we introduce weights into Pawlak rough set model to balance the class distribution of a data set and develop a weighted rough set based method to deal with the class imbalance problem. In order to develop the weighted rough set based method, we design first a weighted attribute reduction algorithm by introducing and extending Guiasu weighted entropy to measure the significance of an attribute, then a weighted rule extraction algorithm by introducing a weighted heuristic strategy into LEM2 algorithm, and finally a weighted decision algorithm by introducing several weighted factors to evaluate extracted rules. Furthermore, in order to estimate the performance of the developed method, we compare the weighted rough set based method with several popular methods used for class imbalance learning by conducting experiments with twenty UCI data sets. Comparative studies indicate that in terms of AUC and minority class accuracy, the weighted rough set based method is better than the re-sampling and filtering based methods, and is comparable to the decision tree and SVM based methods. It is therefore concluded that the weighted rough set based method is effective for class imbalance learning.