摘要

Many clustering methods have been proposed in the area of data mining, but only few of them focused on the incremental databases. In this paper, an algorithm for hierarchical clustering based on fuzzy graph connectedness algorithm (FHC) is investigated. The presented algorithm applies fuzzy set theory to hierarchical clustering method so as to discover clusters with arbitrary shape. It first partitions the data sets into several sub-clusters using a partitioning method, and then constructs a fuzzy graph of sub-clusters by analyzing the fuzzy-connectedness degree among sub-clusters. By computing the cut graph, the connected components of the fuzzy graph can be obtained, hence resulting the desired clustering. The algorithm can be performed in high-dimensional data sets, finding clusters of arbitrary shapes such as the spherical, linear, elongated or concave ones. Also rendered in this research is the incremental algorithm-IFHC applicable to periodically incremental environments. Not only can FHC and IFHC handle data with numerical attributes, but categorical attributes can be dealt with as well. The results of our experimental study for data sets with arbitrary shape and size are very encouraging. The experimental study in web log files is also conducted that can help discover the user access patterns efficiently. The investigation demonstrates that the proposed method generates better quality clusters than traditional algorithms, and scales up well for large databases.