摘要

Clustering is the process of organizing objects into groups whose members are similar in some way. Most of the clustering methods involve numeric data only. However, this representation may not be adequate to model complex information which may be: histogram, distributions, intervals. To deal with these types of data, Symbolic Data Analysis (SDA) was developed. In multivariate data analysis, it is common some variables be more or less relevant than others and less relevant variables can mask the cluster structure. This work proposes a clustering method based on fuzzy approach that produces weighted multivariate memberships for interval-valued data. These memberships can change at each iteration of the algorithm and they are different from one variable to another and from one cluster to another. Furthermore, there is a different relevance weight associated to each variable that may also be different from one cluster to another. The advantage of this method is that it is robust to ambiguous cluster membership assignment since weights represent how important the different variables are to the clusters. Experiments are performed with synthetic data sets to compare the performance of the proposed method against other methods already established by the clustering literature. Also, an application with interval-valued scientific production data is presented in this work. Clustering quality results have shown that the proposed method offers higher accuracy when variables have different variabilities.

  • 出版日期2014-6-1