摘要
In this paper, the concept of Relative Frequency Ratio (RFR) is presented to evaluate the strength of collocation. Based on RFR, a WSD Model RFR-SUM is put forward to disambiguate polysemous Chinese word sense. It selects 9 frequently used polysemous words as examples, and achieves the average precision up to 92.50% in open test. It has compared the model with Naive Bayesian Model and Maximum Entropy Model. The results show that the precision by RFR-SUM Model is 5.95% and 4.48% higher than that of Na:ive Bayesian Model and Maximum Entropy Model respectively. It also tries to prune RFR lists. The results reveal that leaving only 5% important collocation information can keep almost the same precision. At the same time, the speed is 20 times higher.
- 出版日期2007
- 单位北京大学