Active learning reduces annotation time for clinical concept extraction

作者:Kholghi Mahnoosh*; Sitbon Laurianne; Zuccon Guido; Nguyen Anthony
来源:International Journal of Medical Informatics, 2017, 106: 25-31.
DOI:10.1016/j.ijmedinf.2017.08.001

摘要

Objective: To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. Materials and methods: There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. Results: When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learningassisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. Discussion: The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Conclusion: Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations.

  • 出版日期2017-10
  • 单位CSIRO