Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

Wang, Wei<sup>*</sup>; Sun, Lin; Zhang, Shiguang; Zhang, Hongjun; Shi, Jinling; Xu, Tianhe; Li, Keliang

doi:10.1186/s12859-017-1715-8

摘要

Background: DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results: Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions: Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.

出版日期2017-6-12
单位河南师范大学; 许昌学院

全文

访问全文

收藏分享被引(11) 浏览

更新时间：2024-04-25 09:23

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友