摘要

Currently a majority of high attribute dimensional sparse clustering algorithms can only handle binarized data, thresholds are set subjectively and lack of evaluation method for clustering results, which brings great limits to applications. To solve these problems, this paper proposes a clustering algorithm based on principle of granularity. Considering the characteristic of high attribute dimensional sparse continuous data, dimensional similarity threshold is designed without transforming continuous data to binarized data. Then dimensional equivalence granules are sought discontinuously according to sampled dimensional similarity thresholds. Then a new method is designed to calculate the sparse similarity, and a re-clustering model based on indiscernibility degree is designed to refine the result, so the algorithm gains noise-immune ability. The last but not the least a new clustering quality evaluation model is proposed. The experimental results on both real world and synthesis datasets demonstrate that our algorithm is more efficient than the existing ones, and the clustering results reflect the data characteristics more precisely.

全文