摘要

Identifying protein complexes from Protein-protein Interaction Networks (PINs) is fundamental for understanding protein functions and activities in cell. Based on the assumption that protein complexes are highly connected areas in PINs, many algorithms were proposed to identify protein complexes from PINs. However, most of these approaches neglected that not all proteins in complexes are highly connected, and proteins in PINs with different topological properties may form protein complexes in different ways and should be treated differently. In this paper, we proposed a double-layer clustering method based on the power-law distribution (PLCluster). To calculate the centrality scores of nodes, we proposed a Dense-Spread Centrality method. The centrality scores calculated by Dense-Spread Centrality method follow a power-law distribution. Based on the power-law distribution of the centrality scores, PLCluster divides the nodes into two categories: the nodes with very high centrality scores and the nodes with lower centrality scores. Then different strategies are applied to nodes in different categories for detecting protein complexes from the PIN, respectively. Furthermore, the predicted protein complexes, which are inconsistent with the fact that all proteins in a protein complex should be in the same sub cellular compartment, are filtered out. Compared with other nine existing methods on a high reliable yeast PIN, PLCluster shows great advantages in terms of the number of known complexes that are identified, Sensitivity, Specificity, f-measure and the number of perfect matches.