摘要

Unbalanced data that are minority classes with few samples presented in many fields. The mean of unbalanced data is difficult to formalize so that traditional algorithms are limited in solving unbalanced data. In this paper, a novel algorithm based on analysis of variance (ANOVA), fuzzy C-means (FCM) and bacterial foraging optimization (BFO) is proposed to classify unbalanced data. ANOVA can measure the difference between the means of two or more groups in which the observed variance is partitioned into components due to various explanatory variables. FCM is a method of fuzzy clustering algorithm that allows one piece of data to belong to two or more clusters. Natural selection tends to eliminate animals with poor foraging strategies and favors the propagation of genes of those animals that have successful foraging strategies. BFO can model the mechanism of natural selection and solve many application problems. The proposed algorithm combines the advantages of ANOVA, FCM and BFO. ANOVA has the ability to select beneficial feature subsets. FCM has the ability to identify data into clusters with certain membership degrees, and BFO has the fast ability to converge to global optima. In this paper, microarray data of ovarian cancer and zoo dataset are used to test the performance for the proposed algorithm. The performance of the proposed algorithm is supported by simulation results. From simulation results, the classification accuracy of the proposed algorithm outperforms other existing approaches.

  • 出版日期2012-8