摘要

A useful ability for search engines is to be able to rank objects with novelty and diversity: the top k documents retrieved should cover possible intents of a query with some distribution, or should contain a diverse set of subtopics related to the user's information need, or contain nuggets of information with little redundancy. Evaluation measures have been introduced to measure the effectiveness of systems at this task, but these measures have worst-case NP-hard computation time. The primary consequence of this is that there is no ranking principle akin to the Probability Ranking Principle for document relevance that provides uniform instruction on how to rank documents for novelty and diversity. We use simulation to investigate the practical implications of this for optimization and evaluation of retrieval systems.

  • 出版日期2011-2