An efficient data reduction method and its application to cluster analysis

Wang, Jianpei; Yue, Shihong<sup>*</sup>; Yu, Xiao; Wang, Yaru

doi:10.1016/j.neucom.2017.01.059

摘要

Data reduction plays a very important role in the data mining field, but the existing methods have not been able to efficiently identify all major features which are hidden in the large datasets. On some occasions, they even cause the loss of the original key features. In this paper, a new efficient measure was developed to reduce a given dataset and to uncover the major features by multiplying the defined absolute density with the defined local density of any data. These two kinds of densities were estimated with a fast grid-based bisecting method. To test its performance on feature reduction and sample reduction, a group of feature-different synthetic datasets and 24 benchmark datasets were used as examples and the clustering accuracy, runtime and separability among clusters were used as measurements. The results strongly proved the proposed method could fast reduce a dataset and identify the most important key features. Additionally, it also can effectively determine the optimal number of clusters by suppressing the noisy data and enhancing the separation among clusters.

出版日期2017-5-17
单位天津大学

全文

访问全文

收藏分享被引(7) 浏览

更新时间：2021-11-24 15:09

An efficient data reduction method and its application to cluster analysis

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友