DoSO: a document self-organizer

作者:Spanakis Gerasimos*; Siolas Georgios; Stafylopatis Andreas
来源:Journal of Intelligent Information Systems, 2012, 39(3): 577-610.
DOI:10.1007/s10844-012-0204-9

摘要

In this paper, we propose a Document Self Organizer (DoSO), an extension of the classic Self Organizing Map (SOM) model, in order to deal more efficiently with a document clustering task. Starting from a document representation model, based on important %26quot;concepts%26quot; exploiting Wikipedia knowledge, that we have previously developed in order to overcome some of the shortcomings of the Bag-of-Words (BOW) model, we demonstrate how SOM%26apos;s performance can be boosted by using the most important concepts of the document collection to explicitly initialize the neurons. We also show how a hierarchical approach can be utilized in the SOM model and how this can lead to a more comprehensive final clustering result with hierarchical descriptive labels attached to neurons and clusters. Experiments show that the proposed model (DoSO) yields promising results both in terms of extrinsic and SOM evaluation measures.

  • 出版日期2012-12