Word sense learning based on feature selection and MDL principle

Ji Donghong<sup>*</sup>; He Yanxiang; Xiao Guozheng

doi:10.1007/s10579-007-9030-z

摘要

In this paper, we propose a word sense learning algorithm which is capable of unsupervised feature selection and cluster number identification. Feature selection for word sense learning is built on an entropy-based filter and formalized as a constraint optimization problem, the output of which is a set of important features. Cluster number identification is built on a Gaussian mixture model with a MDL-based criterion, and the optimal model order is inferred by minimizing the criterion. To evaluate closeness between the learned sense clusters with the ground-truth classes, we introduce a kind of weighted F-measure to model the effort needed to reconstruct the classes from the clusters. Experiments show that the algorithm can retrieve important features, roughly estimate the class numbers automatically and outperforms other algorithms in terms of the weighted F-measure. In addition, we also try to apply the algorithm to a specific task of adding new words into a Chinese thesaurus.

出版日期2006-12
单位武汉大学

全文

访问全文

收藏分享被引(1) 浏览

更新时间：2018-08-02 11:45

Word sense learning based on feature selection and MDL principle

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友