摘要

Probabilistic topic models are statistical methods whose aim is to discover the latent structure in a large collection of documents. The intuition behind topic models is that, by generating documents by latent topics, the word distribution for each topic can be modelled and the prior distribution over the topic learned. In this paper we propose to apply this concept by modelling the topics of sentences for the aspect detection problem in review documents in order to improve sentiment analysis systems. Aspect detection in sentiment analysis helps customers effectively navigate into detailed information about their features of interest. The proposed approach assumes that the aspects of words in a sentence form a Markov chain. The novelty of the model is the extraction of multiword aspects from text data while relaxing the bag-of-words assumption. Experimental results show that the model is indeed able to perform the task significantly better when compared with standard topic models.

  • 出版日期2014-10