Automated classification of benign and malignant lesions in F-18-NaF PET/CT images using machine learning

作者:Perk, Timothy; Bradshaw, Tyler; Chen, Song; Im, Hyung-jun; Cho, Steve; Perlman, Scott; Liu, Glenn; Jeraj, Robert*
来源:Physics in Medicine and Biology, 2018, 63(22): 225019.
DOI:10.1088/1361-6560/aaebd0

摘要

Purpose. F-18-NaF PET/CT imaging of bone metastases is confounded by tracer uptake in benign diseases, such as osteoarthritis. The goal of this work was to develop an automated bone lesion classification algorithm to classify lesions in NaF PET/CT images. Methods. A nuclear medicine physician manually identified and classified 1751 bone lesions in NaF PET/CT images from 37 subjects with metastatic castrate-resistant prostate cancer, 14 of which (598 lesions) were analyzed by three additional physicians. Lesions were classified on a five-point scale from definite benign to definite metastatic lesions. Classification agreement between physicians was assessed using Fleiss' kappa. To perform fully automated lesion classification, three different lesion detection methods based on thresholding were assessed: SUV > 10 g ml(-1), SUV > 15 g ml(-1), and a statistically optimized regional thresholding (SORT) algorithm. For each ROI in the image, 172 different imaging features were extracted, induding PET, CT, and spatial probability features. These imaging features were used as inputs into different machine learning algorithms. The impact of different deterministic factors affecting classification performance was assessed. Results. The factors that most impacted classification performance were the machine learning algorithm and the lesion identification method. Random forests (RF) had the highest classification performance. For lesion segmentation, using SORT (AUC = 0.95 [95%CI = 0.94-0.95], sensitivity = 88% [86%-90%], and specificity = 0.89 [0.87-0.90]) resulted in superior classification performance (p < 0.001) compared to SUV > 10 g ml(-1) (AUC = 0.87) and SUV > 15g ml(-1) (AUC = 0.86). While there was only moderate agreement between physicians in lesion classification (kappa = 0.53 [95% CI = 0.52-0.53] ),classification performance was high using any of the four physicians as ground truth (AUC range: 0.91-0.93). Conclusion. We have developed the first whole-body automatic disease classification tool for NaF PET using RF, and demonstrated its ability to replicate different physicians' classification tendencies. This enables fully-automated analysis of whole-body NaF PET/CT images.