摘要

Deep web provides a large number of well-structured and specialized data, which is stored in the back-end database. To access the information, the only way is to submit query instances to the query interfaces provided by web database. For that, the first step is to find out the deep web interfaces. This paper presents a novel technique for detecting them. First, we build a domain model for each domain, which is composed of description attributes and values. Then the query is automatically submitted to the interface according to the domain model. And finally we get a returned report from which we judge whether it is a deep web interface or not. The main idea is if the report contains no record or some similar records with hyperlinks targeting to different Websites, we mark it as a non deep web interface, otherwise we mark it as a deep web interface. The experimental results we got show that our method has a rather high accuracy and precision.

全文