Faster compressed dictionary matching

作者:Hon Wing Kai; Ku Tsung Han*; Shah Rahul; Thankachan Sharma V; Vitter Jeffrey Scott
来源:Theoretical Computer Science, 2013, 475: 113-119.
DOI:10.1016/j.tcs.2012.10.050

摘要

Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size a, Hon et al. (2008) [12] gave an nH(k)(D)+o(n log sigma)-bit index which supports a query in O(vertical bar T vertical bar (log(epsilon) n+log d)+occ) time, where epsilon %26gt; 0 and H-k(D) denotes the kth-order entropy of D. Very recently, Belazzougui (2010) [3] has proposed an elegant scheme, which takes n log sigma + O(n) bits of index space and supports a query in optimal O(vertical bar T vertical bar + occ) time. In this paper, we provide connections between Belazzougui%26apos;s index and the XBW compression of Ferragina and Manzini (2005) [8], and show that Belazzougui%26apos;s index can be slightly modified to be stored in nH(k)(D) O(n) bits, while query time remains optimal; this improves the compressed index by Hon et al. (2008) [12] in both space and time.