A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

Chen Kuan Yu; Chao Kun Mao<sup>*</sup>

doi:10.1007/s00453-011-9592-4

摘要

A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first "fully compressed" algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, ma parts per thousand currency signn, we present an O(mn (2))-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn (2)) time.

出版日期2013-2

全文

访问全文

收藏分享被引浏览

更新时间：2017-04-24 18:07

A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友