A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data

Chen, Yewang<sup>*</sup>; Tang, Shengyu; Bouguila, Nizar; Wang, Cheng; Du, Jixiang; Li, HaiLin

doi:10.1016/j.patcog.2018.05.030

摘要

Clustering is an important technique to deal with large scale data which are explosively created in internet. Most data are high-dimensional with a lot of noise, which brings great challenges to retrieval, classification and understanding. No current existing approach is "optimal" for large scale data. For example, DBSCAN requires O(n(2)) time, Fast-DBSCAN only works well in 2 dimensions, and rho-Approximate DBSCAN runs in O(n) expected time which needs dimension D to be a relative small constant for the linear running time to hold. However, we prove theoretically and experimentally that p-Approximate DBSCAN degenerates to an O(n(2)) algorithm in very high dimension such that 2(D) > > n. In this paper, we propose a novel local neighborhood searching technique, and apply it to improve DBSCAN, named as NQ-DBSCAN, such that a large number of unnecessary distance computations can be effectively reduced. Theoretical analysis and experimental results show that NQ-DBSCAN averagely runs in O(n*log(n)) with the help of indexing technique, and the best case is O(n) if proper parameters are used, which makes it suitable for many realtime data.

出版日期2018-11
单位华侨大学

全文

访问全文

收藏分享被引(116) 浏览

更新时间：2024-04-28 02:51

A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友