A Topology-Based Metric for Measuring Term Similarity in the Gene Ontology

作者:Gaston K Mazandu; Nicola J Mulder
来源:Advances in Bioinformatics, 2012, 2012: 1-17.
DOI:10.1155/2012/975783

摘要

The wide coverage and biological relevance of the Gene Ontology (GO), confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues. 1. Introduction Worldwide DNA sequencing efforts have led to a rapid increase in sequence data in the public domain. Unfortunately, this has also yielded a lack of functional annotations for many newly sequenced genes and proteins. From 20% to 50% of genes within a genome [1] are still labeled unknown, uncharacterized, or hypothetical, and this limits our ability to exploit these data. Therefore, automatic genome annotation, which consists of assigning functions to genes and their products, has to be performed to ensure that maximal benefit is derived from these sequencing efforts. This requires a systematic description of the attributes of genes and proteins using a standardized syntax and semantics in a format that is human readable and understandable, as well as being interpretable computationally. The terms used for describing functional annotations should have definitions and be placed within a structure of relationships. Therefore, an ontology is required in order to represent annotations of known genes and proteins and to use these to predict functional annotations of those which are identified but as yet uncharacterized. By capturing knowledge about a domain in a shareable and computationally accessible form, ontologies can provide defined and computable semantics

  • 出版日期2012