摘要

We introduce TOPS+ strings, a highly abstract string-based model of protein topology that permits efficient computation of structure comparison, and can optionally represent ligand information. In this model, we consider loops as secondary structure elements ( SSEs) as well as helices and strands; in addition we represent ligands as first class objects. Interactions between SSEs and between SSEs and ligands are described by incoming/outgoing arcs and ligand arcs, respectively; and SSEs are annotated with arc interaction direction and type. We are able to abstract away from the ligands themselves, to give a model characterized by a regular grammar rather than the context sensitive grammar of the original TOPS model. Our TOPS+ strings model is sufficiently descriptive to obtain biologically meaningful results and has the advantage of permitting fast string-based structure matching and comparison as well as avoiding issues of Non-deterministic Polynomial time (NP)-completeness associated with graph problems. Our structure comparison method is computationally more efficient in identifying distantly related proteins than BLAST, CLUSTALW, SSAP and TOPS because of the compact and abstract string-based representation of protein structure which records both topological and biochemical information including the functionally important loop regions of the protein structures. The accuracy of our comparison method is comparable with that of TOPS. Also, we have demonstrated that our TOPS+ strings method out-performs the TOPS method for the ligand-dependent protein structures and provides biologically meaningful results.

  • 出版日期2008-12-1