DENDIS: A new density-based sampling for clustering algorithm

Ros Frederic; Guillaume Serge

doi:10.1016/j.eswa.2016.03.008

摘要

To deal with large datasets, sampling can be used as a preprocessing step for clustering. In this paper, an hybrid sampling algorithm is proposed. It is density-based while managing distance concepts to ensure space coverage and fit cluster shapes. At each step a new item is added to the sample: it is chosen as the furthest from the representative in the most important group. A constraint on the hyper volume induced by the samples avoids over sampling in high density areas. The inner structure allows for internal optimization: only a few distances have to be computed. The algorithm behavior is investigated using synthetic and real-world data sets and compared to alternative approaches, at conceptual and empirical levels. The numerical experiments proved it is more parsimonious, faster and more accurate, according to the Rand Index, with both k-means and hierarchical clustering algorithms.

出版日期2016-9-1

全文

访问全文

收藏分享被引(14) 浏览

更新时间：2021-03-11 14:56

DENDIS: A new density-based sampling for clustering algorithm

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友