摘要

How can we find a natural clustering of a "complex" dataset, which may contain an unknown number of overlapping clusters of arbitrary shape and be contaminated by noise? A tree-structured framework is proposed in this paper to purify such clusters by exploring the structural role of each data. In practice, each individual object within the internal organization of the data has its own specific role-"centroid", hub or outlier-due to distinctive associations with their respective neighbors. Adjacent centroids always interact on each other and serve as mediate nodes of one tree being members of some cluster. Hubs closed to some centroid become leaf nodes responsible for the termination of the growth of trees. Outliers that weakly touch with any centroid are often discarded from any trees as global noise. All the data can thus be labeled by a specified criterion of "centroids"-connected structural consistency (CCSC). Free of domain-specific information, our framework with CCSC could widely adapt to many clustering-related applications. Theoretical and experimental contributions both confirm that our framework is easy to interpret and implement, efficient and effective in "complex" clustering.