A Two-Stage Active Learning Method for Learning to Rank

作者:Silva Rodrigo M*; Goncalves Marcos A; Veloso Adriano
来源:Journal of the Association for Information Science and Technology, 2014, 65(1): 109-128.
DOI:10.1002/asi.22958

摘要

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can later be used to rank new query results. These training sets are costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning algorithms are able to reduce the labeling effort by selectively sampling an unlabeled set and choosing data instances that maximize a learning function%26apos;s effectiveness. In this article, we propose a novel two-stage active learning method for L2R that combines and exploits interesting properties of its constituent parts, thus being effective and practical. In the first stage, an association rule active sampling algorithm is used to select a very small but effective initial training set. In the second stage, a query-by-committee strategy trained with the first-stage set is used to iteratively select more examples until a preset labeling budget is met or a target effectiveness is achieved. We test our method with various LETOR benchmarking data sets and compare it with several baselines to show that it achieves good results using only a small portion of the original training sets.

  • 出版日期2014-1