摘要

Protein function annotation and rational drug discovery rely on the knowledge of binding sites for small organic compounds, and yet the quality of existing binding site predictors was never systematically evaluated. We assess predictions of ten representative geometry-, energy-, threading-, and consensus-based methods on a new benchmark data set that considers apo and holo protein structures with multiple binding sites for biologically relevant ligands. Statistical tests show that threading-based Findsite outperforms other predictors when its templates have high similarity with the input protein. However, Findsite is equivalent or inferior to some geometry-, energy-, and consensus-based methods when the similarity is lower. We demonstrate that geometry-, energy-, and consensus-based predictors benefit from the usage of holo structures and that the top four methods, Findsite, Q-SiteFinder, Con Cavity, and Meta Pocket, perform better for larger binding sites. Predictions from these four methods are complementary, and our simple meta-predictor improves over the best single predictor.