A unique property of single-link distance and its application in data   clustering

Song Yuqing<sup>*</sup>; Jin Shuyuan; Shen Jie

doi:10.1016/j.datak.2011.07.003

摘要

We prove a unique property of single-link distance, based on which an algorithm is designed for data clustering. The property states that a single-link cluster is a subset with inter-subset distance greater than intra-subset distance, and vice versa. Among the major linkages (single, complete, average, centroid, median, and Ward's), only single-link distance has this property. Based on this property we introduce monotonic sequences of iclusters (i.e., single-link clusters) to model the phenomenon that a natural cluster has a dense kernel and the density decreases as we move from the kernel to the boundary. A monotonic sequence of iclusters is a sequence of nested iclusters such that an icluster in the sequence is a dominant child (in terms of size) of the icluster before it. Our data clustering algorithm is monotonic sequence based. We classify a dataset of one monotonic sequence into to two classes by splitting the sequence into two parts: the kernel part and the surrounding part. For a data set of multiple monotonic sequences, each leaf monotonic sequence represents the kernel of a class, which then "grows" by absorbing nearby non-kernel points. This algorithm, proved by experiments, compares favorable in effectiveness to other clustering algorithms.

出版日期2011-11
单位中国科学院; 天津职业技术师范大学

全文

访问全文

收藏分享被引浏览

更新时间：2019-11-14 21:21

A unique property of single-link distance and its application in data clustering

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友