摘要

As a data mining method, clustering, which is one of the most important tools in information retrieval, organizes data based on unsupervised learning which means that it does not require any training data. But, some text clustering algorithms cannot update existing clusters incrementally and, instead, have to recompute a new clustering from scratch. In view of above, this paper presents a novel down-top incremental conceptual hierarchical text clustering approach using CFu-tree (ICHTC-CF) representation, which starts with each item as a separate cluster. Term-based feature extraction is used for summarizing a cluster in the process. The Comparison Variation measure criterion is also adopted for judging whether the closest pair of clusters can be merged or a previous cluster can be split. And, our incremental clustering method is not sensitive to the input data order. Experimental results show that the performance of our method outperforms k-means, CLIQUE, single linkage clustering and complete linkage clustering, which indicate our new technique is efficient and feasible.