An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier

Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo<sup>*</sup>; Yang, Jianyi<sup>*</sup>

doi:10.1093/bioinformatics/btw768

摘要

Motivation: Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. Results: We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved > 0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TAfold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information.

出版日期2017-3-15
单位天津大学; 东北林业大学; 南开大学

全文

访问全文

收藏分享被引(36) 浏览

更新时间：2024-04-14 15:35

An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友