A collaborative approach to build evaluated web page datasets

作者:Barros Ricardo*; Rodrigues Nt Jose A; Xexeo Geraldo B; de Souza Jano M
来源:Future Generation Computer Systems, 2011, 27(1): 119-126.
DOI:10.1016/j.future.2010.06.007

摘要

Information retrieval algorithms demand datasets to assess their effectiveness. However, access to such datasets is often difficult and expensive, since building them is a time-consuming and costly task. This work presents a collaborative approach to dataset creation that uses a data quality evaluation technique based on fuzzy theory, to assist users in selecting suitable web documents for their datasets. These documents are automatically captured by a crawler and evaluated on information derived from their metadata.

  • 出版日期2011-1