摘要

A method based on conditional random fields for automatic domain-specific terminological hyponymy extraction was proposed. First, taking the structured and regularized content expression forms of Baike card into consideration, a feature word dictionary that is suitable for general-purpose models after statistical analysis was summarized. Second, on the basis of the word and part of speech tagging (POS) features, combined with the feature word dictionary and punctuation, the inherent laws of domain-specific terminological hyponymy were learnt by CRF machine learning techniques, and a probabilistic model about the expression and the existing environment was obtained. At last, the accuracy of the model by means of a series of contrast experiments was verified and some improving schemes were put forward. The experimental results show that the accuracy rate of hyponymy extraction reaches 73.50 using the proposed method.