A novel sentiment aware dictionary for multi-domain sentiment classification

作者:Jha Vandana*; Savitha R; Shenoy P Deepa; Venugopal K R; Sangaiah Arun Kumar
来源:Computers & Electrical Engineering, 2018, 69: 585-597.
DOI:10.1016/j.compeleceng.2017.10.015

摘要

Sentiment Analysis is a sub area of Natural Language Processing (NLP) which extracts user's opinion and classifies it according to its polarity. This task has many applications but it is domain dependent and a costly task to annotate the corpora in every possible domain of interest before training the classifier. We are making an attempt to solve this problem by creating a sentiment aware dictionary using multiple domain data. This dictionary is created using labeled data from the source domain and unlabeled data from both source and target domains. Next, this dictionary is used to classify the unlabeled reviews of the target domain. The work is carried out in Hindi, the official language of India. The web pages in Hindi language is booming after the introduction of UTF-8 encoding style. When compared with labeling done by Hindi Sentiwordnet (HSWN), a general lexicon for word polarity, the proposed method is able to label 23-24% more number of words of target domain. The labels assigned by our method and the labels given by HSWN, for the available words, are compared and found matching with 76% accuracy.

  • 出版日期2018-7