摘要

Objective: To detect negations of medical entities in free-text pathology reports with different approaches, and evaluate their performances. Methods and material: Three different approaches were applied for negation detection: the lexicon-based approach was a rule-based method, relying on trigger terms and termination clues; the syntax-based approach was also a rule-based method, where the rules and negation patterns were designed using the dependency output from the Stanford parser; the machine-learning-based approach used a support vector machine as a classifier to build models with a number of features. A total of 284 English pathology reports of lymphoma were used for the study. Results: The machine-learning-based approach had the best overall performance on the test set with micro-averaged F-score of 82.56%, while the syntax-based approach performed worst with 78.62% Fscore. The lexicon-based approach attained an overall average precision of 89.74% and recall of 76.09%, which were significantly better than the results achieved by Negation Tagger with a similar approach. Discussion: The lexicon-based approach benefitted from being customized to the corpus more than the other two methods. The errors in negation detection with the syntax-based approach producing poorest performance were mainly due to the poor parsing results, and the errors with the other methods were probably because of the abnormal grammatical structures. Conclusions: A machine-learning-based approach has potential advantages for negation detection, and may be preferable for the task. To improve the overall performance, one of the possible solutions is to apply different approaches to each section in the reports.

  • 出版日期2015-5