摘要

In real life, classifier learning may encounter a dataset in which the number of instances of a given class is much higher than for other classes. Such imbalanced datasets require special attention because traditional classifiers generally favor the majority class which has a large number of instances. Ensemble classifiers, in such cases, have been reported to yield promising results. Most often, ensembles are specially designed for data level preprocessing techniques that aim to balance class proportions by applying under-sampling and/or over-sampling. Most available studies concentrate on static ensembles designed for different preprocessing techniques. Contrary to static ensembles, dynamic ensembles became popular thanks to their performance in the context of ill defined problems (small size datasets). A dynamic ensemble includes a dynamic selection module for choosing the best ensemble given a test instance. This paper experimentally evaluates the argument that dynamic selection combined with a preprocessing technique can achieve higher performance than static ensemble for imbalanced classification problems. For this evaluation, we collect 84 two-class and 26 multi-class datasets of varying degrees of class-imbalance. In addition, we consider five variations of preprocessing methods and four dynamic selection methods. We further design a useful experimental framework to integrate preprocessing and dynamic selection. Our experiments show that the dynamic ensemble improves the F-measure and the G-mean as compared to the static ensemble. Moreover, considering different levels of imbalance, dynamic selection methods secure higher ranks than other alternatives.

  • 出版日期2018-4-19