摘要

DNA microarray technology provides an efficient way to diagnose cancer. However, microarray gene expression data face the challenges of class imbalance and high dimension. The class imbalance problem usually leads to inaccurate results when using traditional feature selection and classification algorithms. Due to fast learning speed and good classification performance, extreme learning machine (ELM) has become one of the best classification algorithms and weighted ELM has been recently presented to deal with the class imbalance. However, they ignored the negative impact of imbalanced feature set. This paper proposes a hybrid method based on WELM to handle the multi class imbalance problem of cancer microarray data at both feature and algorithmic levels. At feature level, a corrected feature subset is searched for each class using class oriented feature selection method, so that the features correlated with the minority class are explicitly selected. At algorithmic level, WELM is further modified to strengthen the input nodes with high discrimination power, and an ensemble model is trained to improve the generalization. That is, multiple modified WELM models are trained on the datasets characterized by different feature subsets; in order to encourage the ensemble diversity, the models with low dissimilarity are removed and the reserved ones are combined as an ensemble model. The experiments are conducted on eight gene expression datasets with multiple cancer types and classification results show that our method significantly outperforms ELM and several recent works.