摘要

Network intrusion detection and prevention systems commonly use regular expression (RE) signatures to represent individual security threats. While the corresponding deterministic finite state automata (DFA) for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, a variety of alternative automata implementations that compress the size of the final automaton have been proposed such as extended finite automata (XFA) and delayed input DFA (D(2)FA). The resulting final automata are typically much smaller than the corresponding DFA. However, the previously proposed automata construction algorithms do suffer from some drawbacks. First, most employ a %26quot;Union then Minimize%26quot; framework where the automata for each RE are first joined before minimization occurs. This leads to an expensive nondeterministic finite automata (NFA) to DFA subset construction on a relatively large NFA. Second, most construct the corresponding large DFA as an intermediate step. In some cases, this DFA is so large that the final automaton cannot be constructed even though the final automaton is small enough to be deployed. In this paper, we propose a %26quot;Minimize then Union%26quot; framework for constructing compact alternative automata focusing on the D(2)FA. We show that we can construct an almost optimal final D(2)FA with small intermediate parsers. The key to our approach is a space- and time-efficient routine for merging two compact D(2)FA into a compact D(2)FA. In our experiments, our algorithm runs on average 155 times faster and uses 1500 times less memory than previous algorithms. For example, we are able to construct a D(2)FA with over 80 000 000 states using only 1 GB of main memory in only 77 min.

  • 出版日期2014-12