High quality machine-robust image features: Identification in nonsmall cell lung cancer computed tomography images

作者:Hunter Luke A; Krafft Shane; Stingo Francesco; Choi Haesun; Martel Mary K; Kry Stephen F; Court Laurence E*
来源:Medical Physics, 2013, 40(12): 121916.
DOI:10.1118/1.4829514

摘要

Purpose: For nonsmall cell lung cancer (NSCLC) patients, quantitative image features extracted from computed tomography (CT) images can be used to improve tumor diagnosis, staging, and response assessment. For these findings to be clinically applied, image features need to have high intra and intermachine reproducibility. The objective of this study is to identify CT image features that are reproducible, nonredundant, and informative across multiple machines. %26lt;br%26gt;Methods: Noncontrast-enhanced, test-retest CT image pairs were obtained from 56 NSCLC patients imaged on three CT machines from two institutions. Two machines (%26quot;M1%26quot; and %26quot;M2%26quot;) used cine 4D-CT and one machine (%26quot;M3%26quot;) used breath-hold helical 3D-CT. Gross tumor volumes (GTVs) were semiautonomously segmented then pruned by removing voxels with CT numbers less than a prescribed Hounsfield unit (HU) cutoff. Three hundred and twenty eight quantitative image features were extracted from each pruned GTV based on its geometry, intensity histogram, absolute gradient image, co-occurrence matrix, and run-length matrix. For each machine, features with concordance correlation coefficient values greater than 0.90 were considered reproducible. The Dice similarity coefficient (DSC) and the Jaccard index (JI) were used to quantify reproducible feature set agreement between machines. Multimachine reproducible feature sets were created by taking the intersection of individual machine reproducible feature sets. Redundant features were removed through hierarchical clustering based on the average correlation between features across multiple machines. %26lt;br%26gt;Results: For all image types, GTV pruning was found to negatively affect reproducibility (reported results use no HU cutoff). The reproducible feature percentage was highest for average images (M1 = 90.5%, M2 = 94.5%, M1 boolean AND M2 = 86.3%), intermediate for end-exhale images (M1 = 75.0%, M2 = 71.0%, M1 boolean AND M2 = 52.1%), and lowest for breath-hold images (M3 = 61.0%). Between M1 and M2, the reproducible feature sets generated from end-exhale images were relatively machine-sensitive (DSC = 0.71, JI = 0.55), and the reproducible feature sets generated from average images were relatively machine-insensitive (DSC = 0.90, JI = 0.87). Histograms of feature pair correlation distances indicated that feature redundancy was machine-sensitive and image type sensitive. After hierarchical clustering, 38 features, 28 features, and 33 features were found to be reproducible and nonredundant for M1 boolean AND M2 (average images), M1 boolean AND M2 (end-exhale images), and M3, respectively. When blinded to the presence of test-retest images, hierarchical clustering showed that the selected features were informative by correctly pairing 55 out of 56 test-retest images using only their reproducible, nonredundant feature set values. %26lt;br%26gt;Conclusions: Image feature reproducibility and redundancy depended on both the CT machine and the CT image type. For each image type, the authors found a set of cross-machine reproducible, nonredundant, and informative image features that would be useful for future image-based models. Compared to end-exhale 4D-CT and breath-hold 3D-CT, average 4D-CT derived image features showed superior multimachine reproducibility and are the best candidates for clinical correlation.

  • 出版日期2013-12