Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing

作者:Redman Joseph S; Natarajan Yamini*; Hou Jason K; Wang Jingqi; Hanif Muzammil; Feng Hua; Kramer Jennifer R; Desiderio Roxanne; Xu Hua; El Serag Hashem B; Kanwal Fasiha
来源:Digestive Diseases and Sciences, 2017, 62(10): 2713-2718.
DOI:10.1007/s10620-017-4721-9

摘要

Natural language processing is a powerful technique of machine learning capable of maximizing data extraction from complex electronic medical records. We utilized this technique to develop algorithms capable of "reading" full-text radiology reports to accurately identify the presence of fatty liver disease. Abdominal ultrasound, computerized tomography, and magnetic resonance imaging reports were retrieved from the Veterans Affairs Corporate Data Warehouse from a random national sample of 652 patients. Radiographic fatty liver disease was determined by manual review by two physicians and verified with an expert radiologist. A split validation method was utilized for algorithm development. For all three imaging modalities, the algorithms could identify fatty liver disease with > 90% recall and precision, with F-measures > 90%. These algorithms could be used to rapidly screen patient records to establish a large cohort to facilitate epidemiological and clinical studies and examine the clinic course and outcomes of patients with radiographic hepatic steatosis.

  • 出版日期2017-10