摘要

In this paper, a bootstrapping method for automatically extractzng foreignperson names (F-names) from Chinese web pages is presented. Starting from asmall set of F-name characters, the method iteratively extracts text-segmentscontaining F-name characters from the web. A context cue-word set is used toimprove the efficiency of extractzng. Statistic information is used to recognizeF-names from these text-segments. A confidence measure is assigned to eachpossible F-name candidate and a segmentation digraph is constructed forselecting F-names from F-name candidates. The method is used to extract 10000F-names from the Internet and the recognition precision is about 87%. Theresults show that the proposed method is effective.

  • 出版日期2010

全文