Compact and hash based variants of the suffix array

作者:Grabowski S*; Raniszewski M
来源:Bulletin of the Polish Academy of Sciences-Technical Sciences, 2017, 65(4): 407-418.
DOI:10.1515/bpasts-2017-0046

摘要

Full-text indexing aims at building a data structure over a given text capable of efficiently finding arbitrary text patterns, and possibly requiring little space. We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to Makinen's compact suffix array (MakCSA), but working on fixed size blocks. Experiments on the widely used Pizza & Chili datasets show that SA-hash is about 2-3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring 0.2n-1.1n bytes of extra space, where n is the text length. FBCSA, in one of the presented variants, reduces the suffix array size by a factor of about 1.5-2, while it gets close in search times, winning in speed with its competitors known from the literature, MakCSA and LCSA.

  • 出版日期2017-9

全文