摘要

Rocchio's similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In practice, Rocchio's algorithm often uses a fixed query updating factor. When this is the case, we strengthen the linear Omega(n) lower bound obtained by Chen and Zhu (Inf. Retr. 5:61-86, 2002) and prove that Rocchio's algorithm makes Omega(k(n-k)) mistakes in searching for a collection of documents represented by a monotone disjunction of k relevant features over the n-dimensional binary vector space {0,1} (n) , when the inner product similarity measure is used. A quadratic lower bound is obtained when k is linearly proportional to n. We also prove an O(k(n-k)(3)) upper bound for Rocchio's algorithm with the inner product similarity measure in searching for such a collection of documents with a constant query updating factor and a zero classification threshold.

  • 出版日期2010-2

全文