An implementation and optimization for scalable DHT crawler

作者:Zhou Mo*; Zhang JianYu; Dai YaFei
来源:Science China Information Sciences, 2010, 53(4): 769-779.
DOI:10.1007/s11432-010-0067-z

摘要

KAD is one of the largest scale DHT based on real applications. Measurements on KAD is a good approach for researching DHT. Many different active and passive measurements have been made on those systems, and crawlers are novel approach in active measurement. A crawler begins crawling into the DHT with a basic set of given nodes, sending node searching requests to the nodes in the given set for contact information from more unknown nodes. There are three goals in mind while we design the crawler: finishing crawling the given nodes set as soon as possible; retrieving more nodes information after the crawling; getting result while sending as few network packets as possible. The above goals are correlated with each other. Optimizing one may impact others. This paper proposes a basic DHT crawler framework and discusses possible extension to the framework. After that we exploit the fact that the connectivity in the overlay network is universality, thus we do not need to crawl the whole overlay network space while maintaining the crawling affect.