The Dark Proteome Database

作者:Perdigao Nelson*; Rosa Agostinho C; O'Donoghue Sean I
来源:BioData Mining, 2017, 10(1): 24.
DOI:10.1186/s13040-017-0144-6

摘要

Background: Recently we surveyed the dark-proteome, i.e., regions of proteins never observed by experimental structure determination and inaccessible to homology modelling. Surprisingly, we found that most of the dark proteome could not be accounted for by conventional explanations (e.g., intrinsic disorder, transmembrane domains, and compositional bias), and that nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. In this paper we will present the Dark Proteome Database (DPD) and associated web services that provide access to updated information about the dark proteome. Results: We assembled DPD from several external web resources (primarily Aquaria and Swiss-Prot) and stored it in a relational database currently containing similar to 10 million entries and occupying similar to 2 GBytes of disk space. This database comprises two key tables: one giving information on the 'darkness' of each protein, and a second table that breaks each protein into dark and non-dark regions. In addition, a second version of the database is created using also information from the Protein Model Portal (PMP) to determine darkness. To provide access to DPD, a web server has been implemented giving access to all underlying data, as well as providing access to functional analyses derived from these data. Conclusions: Availability of this database and its web service will help focus future structural and computational biology efforts to study the dark proteome, thus providing a basis for understanding a wide variety of biological functions that currently remain unknown.