摘要
Information retrieval algorithms demand datasets to assess their effectiveness. However, access to such datasets is often difficult and expensive, since building them is a time-consuming and costly task. This work presents a collaborative approach to dataset creation that uses a data quality evaluation technique based on fuzzy theory, to assist users in selecting suitable web documents for their datasets. These documents are automatically captured by a crawler and evaluated on information derived from their metadata.
- 出版日期2011-1