A k-mean clustering algorithm for mixed numeric and categorical data

Ahmad Amir<sup>*</sup>; Dey Lipika

doi:10.1016/j.datak.2007.03.016

摘要

Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.

出版日期2007-11

全文

访问全文

收藏分享被引(462) 浏览

更新时间：2024-04-26 17:53

A k-mean clustering algorithm for mixed numeric and categorical data

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友