A data labeling method for clustering categorical data

Cao Fuyuan; Liang Jiye<sup>*</sup>

doi:10.1016/j.eswa.2010.08.026

摘要

As the size of data growing at a rapid pace, clustering a very large data set inevitably incurs a time-consuming process. To improve the efficiency of clustering, sampling is usually used to scale down the size of data set. However, with sampling applied, how to allocate unlabeled objects into proper clusters is a very difficult problem. In this paper, based on the frequency of attribute values in a given cluster and the distributions of attribute values in different clusters, a novel similarity measure is proposed to allocate each unlabeled object into the corresponding appropriate cluster for clustering categorical data. Furthermore, a labeling algorithm for categorical data is presented, and its corresponding time complexity is analyzed as well. The effectiveness of the proposed algorithm is shown by the experiments on real-world data sets.

出版日期2011-3
单位山西大学

全文

访问全文

收藏分享被引浏览

更新时间：2018-08-02 13:51

A data labeling method for clustering categorical data

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友