摘要

Previous approaches to text filtering are tested, because lengths of short texts limit their feather traction. From text characters of short text, a word-model-index-based short text online filtering approach is proposed. The main idea is applying a word-model-index to store labeled short texts. When online training, new labeled short text is incrementally updated into the index. When online classifying, firstly the index is queried by the words in current unlabeled short text, secondly the labeled corpus related with the current short text is retrieved, lastly a classification model is trained from the corpus and the model is applied to predict the current short text. The experimental results from real short message service filtering show that the proposed approach could reach higher on real short message filtering show that the proposed word-model-index-based approach can enhance the content cohesion of training set to refine the model, and ensemble results of multiple fine models can improve filtering performance.

  • 出版日期2010

全文