摘要

Owing to the drastic development of the information and Internet technologies, large amount of information and documents can be easily accessed through the electronic network. In addition to the efficiency of document acquisition, another typical issue for document management is the document content extraction. In order to provide the critical contents of a document to the knowledge requester, a thesaurus indicating the keyword correlation is required for accurate content extraction. This paper presents an algorithm for automatic keyword correlation analysis based on the keyword frequency and keyword location of each document in the repository. In addition, by application of the keyword correlation, an approach for document keyword extraction is developed. A platform for establishment of enterprise knowledge center is also developed to demonstrate feasibility of the proposed methodologies. The algorithms explored in this research can be applied in the industry to reduce the reliance on the knowledge engineers or domain experts for thesaurus establishment and document content extraction. In practice, the methodologies can be incorporated into the enterprise document/knowledge management systems for efficient and effective document searching, indexing and content recognition.

  • 出版日期2003

全文