A novel approach to generate a large scale of supervised data for short text sentiment analysis

Sun, Xiao<sup>*</sup>; He, Jiajin

doi:10.1007/s11042-018-5748-4

摘要

As for the complexity of language structure, the semantic structure, and the relative scarcity of labeled data and context information, sentiment analysis has been regarded as a challenging task in Natural Language Processing especially in the field of short-text processing. Deep learning model need a large scale of training data to overcome data sparseness and the over-fitting problem, we propose multi-granularity text-oriented data augmentation technologies to generate large-scale artificial data for training model, which is compared with Generative adversarial network(GAN). In this paper, a novel hybrid neural network model architecture(LSCNN) was proposed with our data augmentation technology, which is can outperforms many single neural network models. The proposed data augmentation method enhances the generalization ability of the proposed model. Experiment results show that the proposed data augmentation method in combination with the neural networks model can achieve astonishing performance without any handcrafted features on sentiment analysis or short text classification. It was validated on a Chinese on-line comment dataset and Chinese news headline corpus, and outperforms many state-of-the-art models. Evidence shows that the proposed data argumentation technology can obtain more accurate distribution representation from data for deep learning, which improves the generalization characteristics of the extracted features. The combination of the data argumentation technology and LSCNN fusion model is well suited to short text sentiment analysis, especially on small scale corpus.

出版日期2020-3
单位合肥工业大学

全文

访问全文

收藏分享被引(33) 浏览

更新时间：2024-04-20 02:41

A novel approach to generate a large scale of supervised data for short text sentiment analysis

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友