摘要

The user-generated data is opinion-rich, and automatic identification of user opinion plays an important role for many Web applications like recommendation systems, business and government intelligence. But the user expression on opinion is domain-dependent, and it is difficult for users to select the optimal classifier for a specific domain, especially for the users who are not familiar with the domain. A three phase opinion analysis framework based on ensemble learning is proposed in this paper, by which a set of optimal classifiers are chosen automatically to assemble for generating the final predicted results of unlabeled samples. Due to the problem of combination explosion, an approximation algorithm is proposed based on the classification accuracy and diversity to select the member classifiers, which can be proven to be 2-approximable. At last, extensive experiments are carried out to demonstrate the effectiveness of the proposed framework and algorithms for different domains on real-world datasets.

全文