摘要

Recently, more and more researchers have focused on the problem of analyzing people's sentiments and opinions in social media. The sentiment lexicon plays a crucial role in most sentiment analysis applications. However, the existing thesaurus based lexicon building methods suffer from the coverage problems when faced with the new words and new meanings in social media. On the other hand, the previous learning based methods usually need intensive expert efforts for annotating training datasets or designing extraction patterns. In this paper, we observe that the graphical emoticons are good natural sentiment labels for the corresponding microblog posts and a word-emoticon mutual reinforcement ranking model is proposed to learn the sentiment lexicon from the massive collection of microblog data. We integrate the emoticons and candidate sentiment words in the microblogs to construct a two-layer graph, on which a random walk is run for extracting the top ranked words as a sentiment lexicon. Extensive experiments were conducted on a benchmark dataset with various topics. The results validate the effectiveness of the proposed methods in building sentiment lexicon from microblog data.