AsteriX: A Web Server To Automatically Extract Ligand Coordinates from Figures in PDF Articles

作者:Lounnas V*; Vriend G
来源:Journal of Chemical Information and Modeling, 2012, 52(2): 568-576.
DOI:10.1021/ci2004303

摘要

Coordinates describing the chemical structures of small molecules that are potential ligands for pharmaceutical targets are used at many stages of the drug design process. The coordinates of the vast majority of ligands can be obtained from either publicly accessible or commercial databases. However, interesting ligands sometimes are only available from the scientific literature, in which case their coordinates need to be reconstructed manually-a process that consists of a series of time-consuming steps. We present a Web server that helps reconstruct the three-dimensional (3D) coordinates of ligands for which a two-dimensional (2D) picture is available in a PDF file, The software, called AsteriX, analyses every picture contained in the PDF file and attempts to determine automatically whether or not it contains ligands. Areas in pictures that may contain molecular structures are processed to extract connectivity and atom type information that allow coordinates to be subsequently reconstructed. The AsteriX Web server was tested on a series of articles containing a large diversity in graphical representations. In total, 88% of 3249 ligand structures present in the test set were identified as chemical diagrams. Of these, about half were interpreted correctly as 3D structures, and a further one-third required only minor manual corrections. It is principally impossible to always correctly reconstruct 3D coordinates from pictures because there are many different protocols for drawing a 2D image of a ligand, but more importantly a wide variety of semantic annotations are possible. The AsteriX Web server therefore includes facilities that allow the users to augment partial or partially correct 3D reconstructions. All 3D reconstructions are submitted, checked, and corrected by the users domain at the server and are freely available for everybody The coordinates of the reconstructed ligands are made available in a series of formats commonly used in drug design research. The AsteriX Web server is freely available at http://swift.cmbi.ru.nl/bitmapb/.

  • 出版日期2012-2