A statistical approach using network structure in the prediction of protein characteristics

作者:Chen Pao Yang*; Deane Charlotte M; Reinert Gesine
来源:Bioinformatics, 2007, 23(17): 2314-2321.
DOI:10.1093/bioinformatics/btm342

摘要

Motivation: The Majority Vote approach has demonstrated that protein-protein interactions can be used to predict the structure or function of a protein. In this article we propose a novel method for the prediction of such protein characteristics based on frequencies of pairwise interactions. In addition, we study a second new approach using the pattern frequencies of triplets of proteins, thus for the first time taking network structure explicitly into account. Both these methods are extended to jointly consider multiple organisms and multiple characteristics.
Results: Compared to the standard non-network-based method, namely the Majority Vote method, in large networks our predictions tend to be more accurate. For structure prediction, the Frequency-based method reaches up to 71% accuracy, and the Triplet-based method reaches up to 72% accuracy, whereas for function prediction, both the Triplet-based method and the Frequency-based method reach up to 90% accuracy. Function prediction on proteins without homologues showed slightly less but comparable accuracies. Including partially annotated proteins substantially increases the number of proteins for which our methods predict their characteristics with reasonable accuracy. We find that the enhanced Triplet-based method does not currently yield significantly better results than the enhanced Frequency-based method, suggesting that triplets of interactions do not contain substantially more information about protein characteristics than interaction pairs. Our methods offer two main improvements over current approaches first, multiple protein characteristics are considered simultaneously, and second, data is integrated from multiple species. In addition, the Triplet-based method includes network structure more explicitly than the Majority Vote and the Frequency-based method.
Availability: The program is available upon request.
Contact: pchen@stats.ox.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.

  • 出版日期2007-9-1