摘要

This paper proposed a multi-page Chinese expert name, native place, organization, job title and research interesting extraction model based on the fuzzy clustering for the characteristic of relationships among the expert pages. First, words, parts of speech and expert page features are chosen, and using the Conditional Random Fields model extracts the 5 categories expert metadata from the single page that are recalled from retrieval. Then, the features of multi-page relationship are chosen, using the Maximum Entropy model constructs the page classification model to acquire the related page group of expert. Finally, using the method of fuzzy clustering and the related page group as guide information extracts more accurate expert metadata from multi-page. The 5 categories expert metadata extraction experiment is performed in nature language processing and machine learning domains, the result shows that using the expert metadata extraction model based on the fuzzy clustering can acquire better effect for extraction expert metadata, this model make the average accuracy of extracting 5 categories expert metadata increases 10% compared to the extraction method based on single page.

  • 出版日期2014

全文