An Approach to Incremental Deep Web Crawling Based on Incremental Harvest Model

作者:Huang Qiuyan; Li Qingzhong*; Li Hong; Yan Zhongmin
来源:International Workshop on Information and Electronics Engineering (IWIEE) / International Conference on Information, Computing and Telecommunications (ICICT), 2012-03-10 to 2012-03-11.
DOI:10.1016/j.proeng.2012.01.093

摘要

The goal of incremental deep web crawling is to select the appropriate query to obtain the incremental records as many as possible while minimizing the communication cost. In this paper, an effective and efficient approach is proposed to solve this problem. In the approach, a set covering model is used to indicate the web database; based on this model, an incremental harvest model is learned by the machine learning method to select the appropriate query automatically. Extensive experimental evaluations over real web databases test and validate our techniques.

全文