摘要

It is well known that active learning can simultaneously improve the quality of the classification model and decrease the complexity of training instances. However, several previous studies have indicated that the performance of active learning is easily disrupted by an imbalanced data distribution. Some existing imbalanced active learning approaches also suffer from either low performance or high time consumption. To address these problems, this paper describes an efficient solution based on the extreme learning machine (ELM) classification model, called active online-weighted ELM (AOW-ELM). The main contributions of this paper include: 1) the reasons why active learning can be disrupted by an imbalanced instance distribution and its influencing factors are discussed in detail; 2) the hierarchical clustering technique is adopted to select initially labeled instances in order to avoid the missed cluster effect and cold start phenomenon as much as possible; 3) the weighted ELM (WELM) is selected as the base classifier to guarantee the impartiality of instance selection in the procedure of active learning, and an efficient online updated mode of WELM is deduced in theory; and 4) an early stopping criterion that is similar to but more flexible than the margin exhaustion criterion is presented. The experimental results on 32 binary-class data sets with different imbalance ratios demonstrate that the proposed AOW-ELM algorithm is more effective and efficient than several state-of-the-art active learning algorithms that are specifically designed for the class imbalance scenario.