协同网络创新平台服务,让科研更成功

Protein-protein interface hot spots prediction based on a hybrid feature selection strategy

Abstract: Background: Hot spots are interface residues that contribute most binding affinity to protein-protein interaction. A compact and relevant feature subset is important for building machine learning methods to predict hot spots on protein-protein interfaces. Although different methods have been used to detect the relevant feature subset from a variety of features related to interface residues, it is still a challenge to detect the optimal feature subset for building the final model. Results: In this study, three different feature selection methods were compared to propose a new hybrid feature selection strategy. This new strategy was proved to effectively reduce the feature space when we were building the prediction models for identifying hotspot residues. It was tested on eighty-two features, both conventional and newly proposed. According to the strategy, combining the feature subsets selected by decision tree and mRMR (maximum Relevance Minimum Redundancy) individually, we were able to build a model with 6 features by using a PSFS (Pseudo Sequential Forward Selection) process. Compared with other state-of-art methods for the independent test set, our model had shown better or comparable predictive performances (with F-measure 0.622 and recall 0.821). Analysis of the 6 features confirmed that our newly proposed feature CNSV_REL1 was important for our model. The analysis also showed that the complementarity between features should be considered as an important aspect when conducting the feature selection. Conclusion: In this study, most important of all, a new strategy for feature selection was proposed and proved to be effective in selecting the optimal feature subset for building prediction models, which can be used to predict hot spot residues on protein-protein interfaces. Moreover, two aspects, the generalization of the single feature and the complementarity between features, were proved to be of great importance and should be considered in feature selection methods. Finally, our newly proposed feature CNSV_REL1 had been proved an alternative and effective feature in predicting hot spots by our study. Our model is available for users through a webserver: http://zhulab.ahu.edu.cn/iPPHOT/.