摘要

DNA microarray technology is a high throughput and parallel technique for genomic investigation due to its advantages of simultaneously surveying features of large scales complex data in biology. This paper aims to find feature subset to build the classifier for gene expression data analysis. At first, K-means clustering algorithm was carried out on the dataset of yeast cell cycle. Based on Rand calculation, a statistical method was used to pick out the data points ( genes) for classifier design. Meanwhile, the principal component analysis was applied to help to construct the classifier. For the validation of classifier built and prediction of a target subset of genes, discriminant analysis in terms of partial least square regression and artificial neural network were also performed.

全文