Distributed WEB information retrieval based on link partition

作者:Zhang Gang*; Wang Bin; Wu Li Hui
来源:Pattem Recognition and Aitificial Intelligence, 2007, 20(4): 519-524.

摘要

Distributed information retrieval is an effective way for large scale WEB information retrieval. A link based clustering algorithm (LIBCA) is proposed for document partition. The BloomFilter Algorithm is selected to improve the efficiency of LIBCA. CORI collection selection algorithm and OKAPI BM25 are used in the process of distributed information retrieval. Based on TREC WEB dataset for the recent three years, a performance comparison is performed among the methods of link based distributed information retrieval, centralized retrieval, and random based distributed information retrieval. The experiment indicates that at P@10 the results of link partition based distributed WEB information retrieval are equal or even better than that of centralized retrieval. The efficiency experimental results indicate that the LIBCA plus BloomFiltern achieves a high system performance and it can deal with large data-set.

  • 出版日期2007

全文