A comparative study for biomedical named entity recognition

作者:Wang, Xu; Yang, Chen; Guan, Renchu*
来源:International Journal of Machine Learning and Cybernetics, 2018, 9(3): 373-382.
DOI:10.1007/s13042-015-0426-6

摘要

With high-throughput technologies applied in biomedical research, the quantity of biomedical literatures grows exponentially. It becomes more and more important to quickly as well as accurately extract knowledge from manuscripts, especially in the era of big data. Named entity recognition (NER), aiming at identifying chunks of text that refers to specific entities, is essentially the initial step for information extraction. In this paper, we will review the three models of biomedical NER and two famous machine learning methods, Hidden Markov Model and Conditional Random Fields, which have been widely applied in bioinformatics. Based on these two methods, six excellent biomedical NER tools are compared in terms of programming language, feature sets, underlying mathematical methods, post-processing techniques and flowcharts. Experimental results of these tools against two widely used corpora, GENETAG and JNLPBA, are conducted. The comparison varies from different entity types to the overall performance. Furthermore, we put forward suggestions about the selection of Bio-NER tools for different applications.