摘要

Initial clustering center selection plays an important role in Sequence clustering. However, in traditional optimized sequence clustering algorithms, different importances of sequences for selecting clustering centers are not considered. In this paper, DWHTSC (Density and Weighted Huffman Tree Based Sequence Clustering) algorithm is discussed. All the densities of sequences are computed, and the object with the largest density is seen as center sequence in sequence database. In accordance with the dissimilarity between each sequence and the center sequence, the weight of sequence in sequence database can be gained. By applying the corresponding weight and the thought of Huffman, an novel ICCSWHT (Initial Clustering Center Search Algorithm Based on Weighted Huffman Tree) approach is presented. Then K initial clustering centers are improved. In each cluster, based on the weight of sequence in corresponding cluster and the preprocessed vector representations of each sequence, the clustering centers are reselected. The experimental results and analysis demonstrate that DWHTSC not only improves the stability of clustering results, but also enhances the iteration time.

  • 出版日期2014

全文