A pragmatic model for new Chinese word extraction

Zhang Haijun<sup>*</sup>; Huang Heyan; Zhu Chaoyong; Shi Shumin

doi:10.1109/NLPKE.2010.5587846

摘要

This paper proposed a pragmatic model for repeat-based Chinese New Word Extraction (NWE). It contains two innovations. The first is a formal description for the process of NWE, which gives instructions on feature selection in theory. On the basis of this, the Conditional Random Fields model (CRF) is selected as statistical framework to solve the formal description. The second is an improved algorithm for left (right) entropy to improve the efficiency of NWE. By comparing with baseline algorithm, the improved algorithm can enhance the computational speed of entropy remarkably. On the whole, experiments show that the model this paper proposed is very effective, and the F score is 49.72 in open test and 69.83 in word extraction respectively, which is an evident improvement over previous similar works.

出版日期2010
单位北京理工大学

全文

访问全文

收藏分享被引浏览

更新时间：2018-08-07 02:51

A pragmatic model for new Chinese word extraction

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友