A Fuzzy Graph Framework for Initializing k-Means

作者:Drakopoulos Georgios; Gourgaris Panagiotis; Kanavos Andreas*; Makris Christos
来源:International Journal on Artificial Intelligence Tools, 2016, 25(6): 1650031.
DOI:10.1142/S0218213016500317

摘要

k-Means is among the most significant clustering algorithms for vectors chosen from an underlying space S. Its applications span a broad range of fields including machine learning, image and signal processing, and Web mining. Since the introduction of k-Means, two of its major design parameters remain open to research. The first is the number of clusters to be formed and the second is the initial vectors. The latter is also inherently related to selecting a density measure for S. This article presents a two-step framework for estimating both parameters. First, the underlying vector space is represented as a fuzzy graph. Afterwards, two algorithms for partitioning a fuzzy graph to non-overlapping communities, namely Fuzzy Walktrap and Fuzzy Newman-Girvan, are executed. The former is a low complexity evolving heuristic, whereas the latter is deterministic and combines a graph communication metric with an exhaustive search principle. Once communities are discovered, their number is taken as an estimate of the true number of clusters. The initial centroids or seeds are subsequently selected based on the density of S. The proposed framework is modular, allowing thus more initialization schemes to be derived. The secondary contributions of this article are HI, a similarity metric for vectors with numerical and categorical entries and the assessment of its stochastic behavior, and TD, a metric for assessing cluster confusion. The aforementioned framework has been implemented mainly in C# and partially in C++ and its performance in terms of efficiency, accuracy, and cluster confusion was experimentally assessed. Post-processing results conducted with MATLAB indicate that the evolving community discovery algorithm approaches the performance of its deterministic counterpart with considerably less complexity.

  • 出版日期2016-12