Automatic Extraction of Concepts to Extend RadLex

作者:Hazen Rebecca; Van E**roeck Alex P; Mongkolwat Pat; Channin David S*
来源:Journal of Digital Imaging, 2011, 24(1): 165-169.
DOI:10.1007/s10278-010-9334-1

摘要

RadLex (TM), the Radiology Lexicon, is a controlled vocabulary of terms used in radiology. It was developed by the Radiological Society of North America in recognition of a lack of coverage of these radiology concepts by other lexicons. There are still additional concepts, particularly those related to imaging observations and imaging observation characteristics, that could be added to the lexicon. We used a free and open source software system to extract these terms from the medical literature. The system retrieved relevant articles from the PubMed repository and passed them through modules in the Apache Unstructured Information Management Architecture. Image observations and image observation characteristics were identified through a seven-step process. The system was run on a corpus of 1,128 journal articles. The system generated lists of 624 imaging observations and 444 imaging observation characteristics. Three domain experts evaluated the top 100 terms in each list and determined a precision of 52% and 26%, respectively, for identification of image observations and image observation characteristics. We conclude that candidate terms for inclusion in standardized lexicons may be extracted automatically from the peer-reviewed literature. These terms can then be reviewed for curation into the lexicon.