摘要

b-Hairpins in enzyme, a kind of special protein with catalytic functions, contain many binding sites which are essential for the functions of enzyme. With the increasing number of observed enzyme protein sequences, it is of especial importance to use bioinformatics techniques to quickly and accurately identify the beta-hairpin in enzyme protein for further advanced annotation of structure and function of enzyme. In this work, the proposed method was trained and tested on a non-redundant enzyme beta-hairpin database containing 2818 beta-hairpins and 1098 non-beta-hairpins. With 5-fold cross-validation on the training dataset, the overall accuracy of 90.08% and Matthew's correlation coefficient (Mcc) of 0.74 were obtained, while on the independent test dataset, the overall accuracy of 88.93% and Mcc of 0.76 were achieved. Furthermore, the method was validated on 845 beta-hairpins with ligand binding sites. With 5-fold cross-validation on the training dataset and independent test on the test dataset, the overall accuracies were 85.82% (Mcc of 0.71) and 84.78% (Mcc of 0.70), respectively. With an integration of mRMR feature selection and SVM algorithm, a reasonable high accuracy was achieved, indicating the method to be an effective tool for the further studies of b-hairpins in enzymes structure. Additionally, as a novelty for function prediction of enzymes, beta-hairpins with ligand binding sites were predicted. Based on this work, a web server was constructed to predict b-hairpin motifs in enzymes (http://202.207.29.251:8080/).