Automatic foreign person names extraction from chinese documents on the web

Gao Hong<sup>*</sup>; Huang Degen; Liu Wei; Yang Yuansheng

摘要

In this paper, a bootstrapping method for automatically extractzng foreignperson names (F-names) from Chinese web pages is presented. Starting from asmall set of F-name characters, the method iteratively extracts text-segmentscontaining F-name characters from the web. A context cue-word set is used toimprove the efficiency of extractzng. Statistic information is used to recognizeF-names from these text-segments. A confidence measure is assigned to eachpossible F-name candidate and a segmentation digraph is constructed forselecting F-names from F-name candidates. The method is used to extract 10000F-names from the Internet and the recognition precision is about 87%. Theresults show that the proposed method is effective.

出版日期2010

全文

访问全文

收藏分享被引浏览

更新时间：2018-11-13 13:06

Automatic foreign person names extraction from chinese documents on the web

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友