摘要

One of the most difficult problems in cluster analysis is the identification of the number of groups in a dataset especially in the presence of missing value. Since traditional clustering methods assumed the real number of clusters to be known. However, in real world applications the number of clusters is generally not known a priori. Also, most of clustering methods were developed to analyse complete datasets, they cannot be applied to many practical problems, e.g., on incomplete data. This paper focuses, first, on an algorithm of a fuzzy clustering approach, called OCS-FSOM. The proposed algorithm is based on neural network and uses Optimal Completion Strategy for missing value estimation in incomplete dataset. Then, we propose an extension of our algorithm, to tackle the problem of estimating the number of clusters, by using a multi level OCS-FSOM method. The new algorithm called Multi-OCSFSOM is able to find the optimal number of clusters by using a statistical criterion, that aims at measuring the quality of obtained partitions. Carried out experiments on real-life datasets highlights a very encouraging results in terms of exact determination of optimal number of clusters.

  • 出版日期2014-8