Applying ant colony hybrid metaheuristics to wrapper verification

作者:Fernandez de Viana I; Abad P J; Alvarez J L; Arjona J L
来源:Expert Systems with Applications, 2016, 57: 62-75.
DOI:10.1016/j.eswa.2016.02.022

摘要

Wrappers are pieces of software used to extract data from websites and structure them for further application processing. Unfortunately, websites are continuously evolving and structural changes happen with no forewarning, which usually results in wrappers working incorrectly. Thus, wrappers maintenance is necessary for detecting whether wrapper is extracting erroneous data. The solution consists of using verification models to detect whether wrapper output is statistically similar to the output produced by the wrapper itself when it was successfully invoked in the past. Current proposals present some weaknesses, as the data used to build these models are supposed to be homogeneous or that the features of this data set can be mapped to an n-dimensional space of independent dimensions when there is a correlation among their features. In this paper, a new verification system based on the Best-Worst Ant System (BWAS) is presented to overcome previous weaknesses. The experimental results show an accuracy improvement of 7.5% over current solutions.

  • 出版日期2016-9-15