摘要

In this study, we propose a set of new algorithms to enhance the effectiveness of classification for 5-year survivability of breast cancer patients from a massive data set with imbalanced property. The proposed classifier algorithms are a combination of synthetic minority oversampling technique (SMOTE) and particle swarm optimization (PSO), while integrating some well known classifiers, such as logistic regression, C5 decision tree (C5) model, and 1-nearest neighbor search. To justify the effectiveness for this new set of classifiers, the g-mean and accuracy indices are used as performance indexes; moreover, the proposed classifiers are compared with previous literatures. Experimental results show that the hybrid algorithm of SMOTE + PSO + C5 is the best one for 5-year survivability of breast cancer patient classification among all algorithm combinations. We conclude that, implementing SMOTE in appropriate searching algorithms such as PSO and classifiers such as C5 can significantly improve the effectiveness of classification for massive imbalanced data sets.

  • 出版日期2014-7