Hybrid page scoring algorithm based on centrality and PageRank

作者:Qiao Shaojie*; Peng Jing; Li Tianrui; Li Hong; Li Taiyong; Wang Chao
来源:Journal of Southwest Jiaotong University, 2011, 46(3): 456-460.
DOI:10.3969/j.issn.0258-2724.2011.03.017

摘要

In order to score Web pages in an effective manner, a new page scoring algorithm, CentralRank, was proposed based on centrality measures, including degree, betweenness and closeness, and the PageRank algorithm. The CentralRank algorithm computes the importance of pages in Web social networks based on the centrality measures and employs the PageRank algorithm to accurately score Web pages. To verify the performance of the CentralRank algorithm, a Web crawler was developed to automatically and effectively crawl Web pages. The Web crawler contains three essential techniques, that is, Web data collection, content analysis and duplicate page detection. Experiments on real data show that the CentralRank algorithm can guarantee less time deficiency and is more exact in scoring Web pages than the centrality measures-based page ranking algorithm and the PageRank algorithm with an average improvement of 14.2% and 7.5%, respectively.

全文