A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures

Kim Kyoungok<sup>*</sup>

doi:10.3233/JIPS-16157

摘要

Partitioning a set of objects into groups or clusters is a fundamental task in data mining, and clustering is a popular approach to implementing partitioning. Among several clustering algorithms, the k-means algorithm is well-known and widely applied in several areas that only handle numerical attributes. The k-modes algorithm is an extension of the k-means algorithm that deals with categorical variables, which has several variations such as fuzzy methods. This paper presents a new attribute weighting method for the k-modes algorithm that utilizes impurity measures such as entropy and Gini impurity. The proposed algorithm considers both the distribution of categories of attributes within the same cluster and between different clusters. By doing this, categorical variables defined as more important that others by the new algorithm have a significant influence on the similarity calculation, and this results in improved clustering performance, which was confirmed by experiments.

出版日期2017

全文

访问全文

收藏分享被引浏览

更新时间：2019-01-21 18:38

A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友