摘要

Many real world data contains more than two categories and the number of instances in each category differs greatly. Such as in medical diagnostic data, there may be several types of cancer and each with tens instances, but contains even more normal instances. Similarly, there may be very few abnormal samples in pharmaceutical test but which may cause great harm. Classification of such type of data is often summarized as imbalanced multi-class classification. Most existing researches study multi-class classification and imbalanced data classification separately, few study in a combination way, in particular for medical diagnosis data classification. In the context of medical diagnosis and pharmaceutical test, in this paper, we propose a divide and conquer approach to partition multi-class data and a self adaptive data resample method for imbalanced data. The proposed methods are tested on 23 UCI datasets in medical, pharmaceutical and other fields. Experiment results show that the proposed methods outperform other compared methods, in particular on those medical and pharmaceutical dataset.