AN EMPIRICAL EVALUATION OF REPETITIVE UNDERSAMPLING TECHNIQUES

Van Hulse Jason<sup>*</sup>; Khoshgoftaar Taghi M; Napolitano Amri

doi:10.1142/S0218194010004682

摘要

Class imbalance is a fundamental problem in data mining and knowledge discovery which is encountered in a wide array of application domains. Random undersampling has been widely used to alleviate the harmful effects of imbalance, however, this technique often leads to a substantial amount of information loss. Repetitive undersampling techniques, which generate an ensemble of models, each trained on a different, undersampled subset of the training data, have been proposed to allieviate this difficulty. This work reviews three repetitive undersampling methods currently used to handle imbalance and presents a detailed and comprehensive empirical study using four different learners, four performance metrics and 15 datasets from various application domains. To our knowledge, this work is the most thorough study of repetitive undersampling techniques.

出版日期2010-3

全文

访问全文

收藏分享被引(5) 浏览

更新时间：2024-03-31 13:04

AN EMPIRICAL EVALUATION OF REPETITIVE UNDERSAMPLING TECHNIQUES

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友