Urdu Sentiment Analysis Using Supervised Machine Learning Approach

作者:Mukhtar Neelam; Khan Mohammad Abid
来源:International Journal of Pattern Recognition and Artificial Intelligence, 2018, 32(2): 1851001.
DOI:10.1142/S0218001418510011

摘要

From the last decade, Sentiment Analysis of languages such as English and Chinese are particularly the focus of attention but resource poor languages such as Urdu are mostly ignored by the research community, which is focused in this research. After acquiring data from various blogs of about 14 different genres, the data is being annotated with the help of human annotators. Three well-known classifiers, that is, Support Vector Machine, Decision tree and k-Nearest Neighbor (k-NN) are tested, their outputs are compared and their results are ultimately improved in several iterations after taking a number of steps that include stop words removal, feature extraction, identification and extraction of important features. extraction. Initially, the performance of the classifiers is not satisfactory as the accuracy achieved by all the three is below 50%. Ensemble of classifiers is also tried but the results are not fruitful (in terms of high accuracy). The results are analyzed carefully and improvements are made including feature extraction that raised the performance of these classifiers to a satisfactory level. It is further concluded that k-NN is performing better than Support Vector Machine and Decision tree in terms of accuracy, precision, recall and f-measure.

  • 出版日期2018-2