摘要

With the development of high-throughput microarray chip technology, there are a large number of microarray expression data, which have few samples compared to the genes of high dimensions. And in recent years, more and more expression datasets contain samples of cancer tissues with their corresponding control tissues as paired. In the last decade, a variety of filter feature selection methods have been proposed. However, most of these methods rarely consider the effect of outlier and paired samples. In this article, we propose an ensemble feature selection method based on minimum redundancy maximum relevance (mRMR) method for paired microarray data. In order to increase the stability of the method, the improved method uses an ensemble strategy to generate diverse subsets from the original dataset. Then, the mRMR method is used to obtain multiple feature lists on the subsets. Finally, a rank aggregation strategy is adopted to decide the final list of selected features. We apply the method on six paired microarray datasets across different cancer types. Through comparison on the performance with other widely used filter methods, the proposed method obtains an excellent performance on the results. It indicates that the improved method is effective and has good applicability of feature selection for paired microarray expression data analysis.

  • 出版日期2014

全文