摘要
The goal of incremental deep web crawling is to select the appropriate query to obtain the incremental records as many as possible while minimizing the communication cost. In this paper, an effective and efficient approach is proposed to solve this problem. In the approach, a set covering model is used to indicate the web database; based on this model, an incremental harvest model is learned by the machine learning method to select the appropriate query automatically. Extensive experimental evaluations over real web databases test and validate our techniques.
- 出版日期2012
- 单位山东大学