摘要

Protein interactions play vital roles in biological processes. The study for protein interface will allow people to elucidate the mechanism of protein interaction. However, a large portion of protein interface data is incorrectly collected in current studies. In this paper, a novel strategy of dataset reconstruction using manifold learning method has been proposed for dealing with the noises in the interaction interface data whose definition is based on the residue distances among the different chains within protein complexes. Three support vector machine-based predictors are constructed using different protein features to identify the functional sites involved in the formation of protein interface. The experimental results achieved in this work demonstrate that our strategy can remove noises, and therefore improve the ability for identification of protein interfaces with 77.8% accuracy.