A Hybrid Model for Microblog Real-Time Filtering

作者:Han Zhongyuan; Yang Muyun*; Kong Leilei; Qi Haoliang; Li Sheng
来源:Chinese Journal of Electronics, 2016, 25(3): 432-440.
DOI:10.1049/cje.2016.05.007

摘要

The task of real-time microblog filtering is to decide if the subsequently posted tweets are relevant to a given query representing special information needs. The filters based on the retrieval model or the text classification model are the main solutions for this task. To best exploit the strengths of the two models, a hybrid model using the retrieval model as prior knowledge to rectify the hyperplane of classification is proposed. The hybrid filtering model incorporates the language model and the logistic regression model. Evaluated on the Text RetriEval Conference (TREC) 2012 microblog real-time filtering track dataset, the experimental results show that the proposed model is significantly better than the logistic regression model and the language model. Especially, it outperforms the best method of the TREC 2012 microblog real-time filtering track.