摘要

In the community of information retrieval, Centroid Classifier has been showed to be a simple and yet effective method for text categorization. However, it is often plagued with model misfit (or inductive bias) incurred by its assumption. Various methods have been proposed to address this issue, such as Weight Adjustment, Voting, Refinement and DragPushing. However, existing methods employ only one criterion, i.e., training-set error. Researches in machine learning indicate that training-set error based method cannot guarantee the generalization capability of base classifiers for unseen examples. To overcome this problem, we propose a novel Model Adjustment algorithm, which makes use of training-set errors as well as training-set margins. Furthermore, we prove that for a linearly separable problem, proposed method converges to the optimal solution after finite updates using any learning parameter eta(eta > 0). The empirical assessment conducted on four benchmark collections indicates that proposed method performs slightly better than SVM classifier in prediction accuracy, as well as beats it in running time.

  • 出版日期2011-8
  • 单位中国地质科学院; 中国科学院