摘要

In this study, a novel method was developed to predict the binding affinity of protein-ligand based on a comprehensive set of structurally diverse protein-ligand complexes (PLCs). The 1300 PLCs with binding affinity (493 complexes with K(d) and 807 complexes with K(i)) from the refined dataset of PDBbind Database (release 2007) were studied in the predictive model development. In this method, each complex was described using calculated descriptors from three blocks: protein sequence, ligand structure, and binding pocket. Thereafter, the PLCs data were rationally split into representative training and test sets by full consideration of the validation of the models. The molecular descriptors relevant to the binding affinity were selected using the ReliefF method combined with least squares support vector machines (LS-SVMs) modeling method based on the training data set. Two final optimized LS-SVMs models were developed using the selected descriptors to predict the binding affinities of K(d) and K(i). The correlation coefficients (R) of training set and test set for K(d) model were 0.890 and 0.833. The corresponding correlation coefficients for the K(i) model were 0.922 and 0.742, respectively. The prediction method proposed in this work can give better generalization ability than other recently published methods and can be used as an alternative fast filter in the virtual screening of large chemical database.