摘要

Current studies show that identificiation of Deep Web query interfaces is so important that it can support further information retrieval from web databases. However, one critical issue for recognition of Deep Web query interfaces is how to differ Deep Web query interfaces from surface web search interfaces, in this paper, we provide a cascaded method that can automatically identify Deep Web query interfaces with PreC-SVM. Firstly, through the way of pre-classification method to classify the part of the Surface Web search interfaces from all form. And then SVM classification model is applied to identify Deep Web Query interfaces form the remaining forms. Moreover, we extract and select the features for forms with information gain method. The experiments showed that pre-classification method can effectively filter out part of surface web search interface, and PreC-SVM classification perform better and faster than individual SVM and C4.5 decision tree method. The results show that PreC-SVM is very promising for query interface classification.

全文