摘要

Traditional classifiers tend to favor majority class (negative class) when dealing with imbalanced data sets. SMOTE (Synthetic Minority Over Sampling Technique) is an effective approach designed for learning form imbalanced data set. However, SMOTE has the drawback of a certain blindness. To avoid this, in this paper, we present a novel oversampling method that attempts to synthesize new instances utilizing RPCL Clustering Algorithm and Estimation of Distribution Algorithms (EDAs), referred to as EDAOS. The new algorithm has the following characteristics while it is as simple and effective as the traditional oversampling algorithm: 1) When utilizing the RPCL clustering algorithm for clustering, it can find out the best number of clusters automatically instead of clustering all the instance to some specified number of cluster; 2) When utilizing EDAs for synthesizing new instances, the synthesized instances can be ensured to obey the original distribution of the original minority class instances. Experiments have been conducted on 12 standard UCI imbalanced data sets, and the results indicate that the instances synthesized by our oversampling approa1ch can obey the spatial distribution of the original minority class instances effectively. Compared with the traditional SMOTE oversampling algorithms, our approach performs better under the criterion of Fmeasure and AUC applied to a classifier.

  • 出版日期2011

全文