AutoBind: automatic extraction of protein-ligand-binding affinity data from biological literature

作者:Chang Darby Tien Hao; Ke Chao Hsuan; Lin Jung Hsin; Chiang Jung Hsien*
来源:Bioinformatics, 2012, 28(16): 2162-2168.
DOI:10.1093/bioinformatics/bts367

摘要

Motivation: Determination of the binding affinity of a proteinlig- and complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection. %26lt;br%26gt;Result: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles.

  • 出版日期2012-8-15
  • 单位中国科学院