摘要

A topic detection model based on hierarchical clustering for Chinese microblog is proposed in this paper. In order to minimize the impact of noise, we optimize the feature selection and weight computation method and use a new scoring method to filter out those topic-unrelated tweets. We also give an improved topic detection algorithm which uses a new vector distance calculation method and center vector updating method. It is shown by the experiment that this method can filter out majority of the topic-unrelated tweets and identify microblog topics accurately and efficiently. The study of microblog topic detection method can help users and service providers find out microblog hot topics dynamically.

全文