A coupling approach of a predictor and a descriptor for breast cancer prognosis

作者:Shin Hyunjung*; Nam Yonghyun
来源:BMC Medical Genomics, 2014, 7(Suppl 1): S4.
DOI:10.1186/1755-8794-7-S1-S4

摘要

Background: In cancer prognosis research, diverse machine learning models have applied to the problems of cancer susceptibility (risk assessment), cancer recurrence (redevelopment of cancer after resolution), and cancer survivability, regarding an accuracy (or an AUC-the area under the ROC curve) as a primary measurement for the performance evaluation of the models. However, in order to help medical specialists to establish a treatment plan by using the predicted output of a model, it is more pragmatic to elucidate which variables (markers) have most significantly influenced to the resulting outcome of cancer or which patients show similar patterns. %26lt;br%26gt;Methods: In this study, a coupling approach of two sub-modules-a predictor and a descriptor-is proposed. The predictor module generates the predicted output for the cancer outcome. Semi-supervised learning co-training algorithm is employed as a predictor. On the other hand, the descriptor module post-processes the results of the predictor module, mainly focusing on which variables are more highly or less significantly ranked when describing the results of the prediction, and how patients are segmented into several groups according to the trait of common patterns among them. Decision trees are used as a descriptor. %26lt;br%26gt;Results: The proposed approach, %26apos;predictor-descriptor,%26apos; was tested on the breast cancer survivability problem based on the surveillance, epidemiology, and end results database for breast cancer (SEER). The results present the performance comparison among the established machine leaning algorithms, the ranks of the prognosis elements for breast cancer, and patient segmentation. In the performance comparison among the predictor candidates, Semi-supervised learning co-training algorithm showed best performance, producing an average AUC of 0.81. Later, the descriptor module found the top-tier prognosis markers which significantly affect to the classification results on survived/dead patients: %26apos;lymph node involvement%26apos;, %26apos;stage%26apos;, %26apos;site-specific surgery%26apos;, %26apos;number of positive node examined%26apos;, and %26apos;tumor size%26apos;, etc. Also, a typical example of patient-segmentation was provided: the patients classified as dead were grouped into two segments depending on difference in prognostic profiles, ones with serious results with respect to the pathologic exams and the others with the feebleness of age.

  • 出版日期2014-5-8