摘要

As for the complexity of language structure, the semantic structure, and the relative scarcity of labeled data and context information, sentiment analysis has been regarded as a challenging task in Natural Language Processing especially in the field of short-text processing. Deep learning model need a large scale of training data to overcome data sparseness and the over-fitting problem, we propose multi-granularity text-oriented data augmentation technologies to generate large-scale artificial data for training model, which is compared with Generative adversarial network(GAN). In this paper, a novel hybrid neural network model architecture(LSCNN) was proposed with our data augmentation technology, which is can outperforms many single neural network models. The proposed data augmentation method enhances the generalization ability of the proposed model. Experiment results show that the proposed data augmentation method in combination with the neural networks model can achieve astonishing performance without any handcrafted features on sentiment analysis or short text classification. It was validated on a Chinese on-line comment dataset and Chinese news headline corpus, and outperforms many state-of-the-art models. Evidence shows that the proposed data argumentation technology can obtain more accurate distribution representation from data for deep learning, which improves the generalization characteristics of the extracted features. The combination of the data argumentation technology and LSCNN fusion model is well suited to short text sentiment analysis, especially on small scale corpus.