An Adaptive Markov Model for Text Categorization

作者:Li Jin*; Yue Kun; Liu Weiyi
来源:3rd International Conference on Intelligent System and Knowledge Engineering, 2008-11-17 to 2008-11-19.

摘要

Existing methods for text categorization assume that a document is a bag of words. While computationally efficient, such a representation is unable to capture sequential information. In this paper, a document is looked upon as a sequence of characters or words and the preprocessing for text categorization, such as word segmentation and feature selection, is not demanded Statistical dependencies among the neighboring terms of a sequence are captured by different order markov models. We proposed a sequence classification methods based on adaptive markov model. Our method blends the markov models with different order values together for text categorization automatically and effectively. We present an extensive experimental evaluation of our method on an English collections and one Chinese corpus. The results show the high recall and precision of our method.