摘要

In many question classification problems based on statistic learning, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. This provides a strong motivation to improve the question classification accuracy by using large quantities of unlabeled questions. In this paper, a new semi-supervised learning algorithm is proposed for question classification. This algorithm combines Expectation-Maximization (EM) and modified Bayes classifier. First, the initial parameters of modified Bayes classifier are estimated from just the labeled questions. Then, the classifier is used to assign class label to each unlabeled questions and the model is revised iteratively to convergence. Experiments on the Chinese question system of tourism domain show that the method could effectively exploit unlabeled examples to improve the classification accuracy.

全文