摘要

There has been considerable work done recently in recognizing named entities in biomedical text. In this paper, we investigate the named entity classification task, an integral part of the named entity extraction task. We focus on the different sources of information that can be utilized for classification, and note the extent to which they are effective in classification. To classify a name, we consider features that appear within the name as well as nearby phrases. We also develop a new strategy based on the context of occurrence and show that they improve the performance of the classification system. We show how our work relates to previous works on named entity classification in the biological domain as well as to those in generic domains. The experiments were conducted on the GENIA corpus Ver. 3.0 developed at University of Tokyo. We achieve f value of 86 in 10-fold cross validation evaluation on this corpus.

  • 出版日期2004-12