摘要

Hierarchical and k-means clustering are two major analytical tools for unsupervised microarray datasets. However, both have their innate disadvantages. Hierarchical clustering cannot represent distinct clusters with similar expression patterns. In order to address the problem that the time complexity of the existing HK algorithms is high and most of algorithms are sensitive to noise, a hierarchical K-means clustering algorithm based on silhouette and entropy (HKSE) is put forward. The optimal number of clusters is determined through computing the average Improved Silhouette of the dataset, such that the time complexity can be reduced. Entropy is introduced to HKSE as a similarity measurement to reduce sensitivity to noise. Clusters are weighted according to the size of clusters to improve the clustering quality.

全文