摘要

Caching is one of the techniques that Information Retrieval Systems (IRS) and Web Search Engines (WSEs) use to reduce processing costs and attain faster response times. In this paper we introduce Top-K SCRC (Set Cover Results Cache), a novel technique for results caching which aims at maximizing the utilization of cache. Identical queries are treated as in plain results caching (i.e. their evaluation does not require accessing the index), while combinations of cached sub-queries are exploited as in posting lists caching, however the exploited subqueries are not necessarily single-word queries. The problem of finding the right set of cached subqueries to answer an incoming query, is actually the Exact Set Cover problem. This technique can be applied in any best match retrieval model that is based on a decomposable scoring function, and we show that several best-match retrieval models (i.e VSM, Okapi BM25 and hybrid retrieval models) rely on such scoring functions. To increase the capacity (in queries) of the cache only the top-K results of each cached query are stored and we introduce metrics for measuring the accuracy of the composed top-K answer. By analyzing queries submitted to real-world WSEs, we verified that there is a significant proportion of queries whose terms is the result of a union of the terms of other queries. The comparative evaluation over traces of real query sets showed that the Top-K SCRC is on the average two times faster than a plain Top-K RC for the same cache size.

  • 出版日期2015-2