摘要

Recently, result diversification has attracted a lot of attention as a means to improve the quality of results retrieved by user queries. In this article, we introduce a novel definition of diversity called DisC diversity. Given a tuning parameter r, which we call radius, we consider two items to be similar if their distance is smaller than or equal to r. A DisC diverse subset of a result contains items such that each item in the result is represented by a similar item in the diverse subset and the items in the diverse subset are dissimilar to each other. We show that locating a minimum DisC diverse subset is an NP-hard problem and provide algorithms for its approximation. We extend our definition to the multiple radii case, where each item is associated with a different radius based on its importance, relevance, or other factors. We also propose adapting DisC diverse subsets to a different degree of diversification by adjusting r; that is, increasing the radius (or zooming-out) and decreasing the radius (or zooming-in). We present efficient implementations of our algorithms based on the M-tree, a spatial index structure, and experimentally evaluate their performance.

  • 出版日期2015