Document clustering using locality preserving indexing and support vector machines

Yang, Chengfu; Yi, Zhang<sup>*</sup>

doi:10.1007/s00500-007-0246-z

摘要

A method of document clustering based on locality preserving indexing (LPI) and support vector machines (SVM) is presented. The document space is generally of high dimensionality, and clustering in such a high-dimensional space is often infeasible due to the curse of dimensionality. In this paper, by using LPI, the documents are projected into a lower-dimension semantic space in which the documents related to the same semantic are close to each other. Then, by using SVM, the vectors in semantic space are mapped by means of a Gaussian kernel to a high-dimensional feature space in which the minimal enclosing sphere is searched. The sphere, when mapped back to semantics space, can separate into several independent components by the support vectors, each enclosing a separate cluster of documents. By combining the LPI and SVM, not only higher clustering accuracies in a more unsupervised effective way, but also better generalization properties can be obtained. Extensive demonstrations are performed on the Reuters-21578 and TDT2 data sets.

出版日期2008-5
单位四川文理学院; 电子科技大学

全文

访问全文

收藏分享被引(2) 浏览

更新时间：2019-08-23 13:35

Document clustering using locality preserving indexing and support vector machines

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友