A Hybrid and Parameter-Free Clustering Algorithm for Large Data Sets

Shao, Hengkang; Zhang, Ping<sup>*</sup>; Chen, Xinye; Li, Fang; Du, Guanglong

doi:10.1109/ACCESS.2019.2900260

摘要

As an important unsupervised learning method, clustering can find the hidden structures in data effectively. With the amount of data grows larger, clustering of large data sets is a challenging task. Many clustering algorithms have been developed to deal with small data sets, but they are often inefficient when the data sets are large. Meanwhile, most clustering algorithms require some extra parameters as input, which may not be easy to obtain in practical applications. This paper proposed a new clustering algorithm called hybrid and parameter-free clustering method (HPFCM). HPFCM is able to rapidly perform clustering on large data sets without knowing the number of clusters in advance. HPFCM is based on sampling on large data sets (MMRS* sampling), assessing the clustering tendency on samples (eVAT), determining the number of clusters (EPB), forming different partitions (MST tree cutting), and extending the results to the rest of the data sets. We compare HPFCM with the other three methods, which are popular in clustering large data sets. Several numerical and real-world experiments have been conducted to verify our algorithm. The results show the great potential and effectiveness of HPFCM for clustering large data sets.

出版日期2019
单位华南理工大学

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2024-05-04 03:49

A Hybrid and Parameter-Free Clustering Algorithm for Large Data Sets

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友