摘要

Objective Hedging is frequently used in both the biological literature and clinical notes to denote uncertainty or speculation. It is important for text-mining applications to detect hedge cues and their scope: otherwise, uncertain events are incorrectly identified as factual events However, due to the complexity of language, identifying hedge cues and their scope in a sentence is not a trivial task. Our objective was to develop an algorithm that would automatically detect hedge cues and their scope in biomedical literature.
Methodology We used conditional random fields (CRFs), a supervised machine-learning algorithm, to train models to detect hedge cue phrases and their scope in biomedical literature The models were trained on the publicly available BioScope corpus We evaluated the performance of the CRF models in identifying hedge cue phrases and their scope by calculating recall. precision and F1-score We compared our models with three competitive baseline systems
Results Our best CRF-based model performed statistically better than the baseline systems, achieving an F1-score of 88% and 86% in detecting hedge cue phrases and their scope in biological literature and an F1-score of 93% and 90% in detecting hedge cue phrases and their scope in clinical notes.
Conclusions Our approach is robust, as it can identify hedge cues and their scope in both biological and clinical text To benefit text-mining applications, our system is publicly available as a Java API and as an online application at http //hedgescope askhermes org.

  • 出版日期2010-12