摘要

We present a simple and faster solution to the problem of matching a set of patterns with variable length don';t cares. Given an alphabet Sigma, a pattern p is a word p(1)@p(2) ... @p(m), where pi is a string over E called a keyword and @ is not an element of Sigma is a symbol called a variable length don';t care (VLDC) symbol. Pattern p matches a text t if t = u(0)p(1) u(1) ... u(m-1) p(m)u(m) for some u(0) ... u(m) is an element of Sigma*. The problem addressed in this paper is: given a set of patterns P and a text t, determine whether one of the patterns of P matches t. Kucherov and Rusinowitch (1997) [9] presented an algorithm that solves the problem in time 0 ((vertical bar t vertical bar + vertical bar P vertical bar) log vertical bar P vertical bar), where vertical bar P vertical bar is the total length of keywords in every pattern of P. We give a new algorithm based on Aho-Corasick automaton. It uses the solutions of Dynamic Marked Ancestor Problem of Chan et al. (2007) [5]. The algorithm takes O((vertical bar t vertical bar + parallel to P parallel to)log kappa/log log kappa) time, where parallel to P parallel to is the total number of keywords in every pattern of P, and kappa is the number of distinct keywords in P. The algorithm is faster and simpler than the previous approach.

  • 出版日期2010-2-15
  • 单位吉林大学; 吉林工商学院