Stable biomarker screening and classification by subsampling-based sparse regularization coupled with support vector machines in metabolomics

Fu, Guang-Hui; Zhang, Bing-Yang; Kou, He-Dan; Yi, Lun-Zhao<sup>*</sup>

doi:10.1016/j.chemolab.2016.11.006

摘要

The partial least squares-discriminant analysis (PLS-DA) is the most widely used statistical tool to perform classification and biomarker screening in metabolomics. However, the PLS-DA tends to overfit the data, and the selection of biomarkers is often unstable because of the disturbance of uninformative variables in principal components. In this paper, we propose an algorithm for performing stable biomarker screening and for seeking the optimal generalization performance, in which the biomarker identification is based on sparse regularization variable selection in combination with subsampling (SRS), and the classification is subsequently performed by a linear support vector machine (SVM) classifier in the selected-variable space to obtain the maximum classification accuracy. Two metabolomics datasets measured by gas chromatography-mass spectrometry are employed to evaluate the performance of the proposed SRS-SVM algorithm, and the comparison with existing related algorithms is given. The result shows that the SRS-SVM algorithm outperforms the PLS-DA and is competitive with other related algorithms in terms of prediction classification accuracy measured by both internal and external validation. Furthermore, the selection of candidate biomarkers is quite stable by the SRS-SVM algorithm, and it can be an alternative and competitive method for the analysis of metabolomics data. The R code for implementing the SRS-SVM algorithm is available in the Electronic supplementary material.

出版日期2017-1-15
单位昆明理工大学

全文

访问全文

收藏分享被引(17) 浏览

更新时间：2024-05-13 13:03

Stable biomarker screening and classification by subsampling-based sparse regularization coupled with support vector machines in metabolomics

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友