A binary decision diagram based approach for mining frequent subsequences

Loekito Elsa; Bailey James<sup>*</sup>; Pei Jian

doi:10.1007/s10115-009-0252-9

摘要

Sequential pattern mining is an important problem in data mining. State of the art techniques for mining sequential patterns, such as frequent subsequences, are often based on the pattern-growth approach, which recursively projects conditional databases. Explicitly creating database projections is thought to be a major computational bottleneck, but we will show in this paper that it can be beneficial when the appropriate data structure is used. Our technique uses a canonical directed acyclic graph as the sequence database representation, which can be represented as a binary decision diagram (BDD). In this paper, we introduce a new type of BDD, namely a sequence BDD (SeqBDD), and show how it can be used for efficiently mining frequent subsequences. A novel feature of the SeqBDD is its ability to share results between similar intermediate computations and avoid redundant computation. We perform an experimental study to compare the SeqBDD technique with existing pattern growth techniques, that are based on other data structures such as prefix trees. Our results show that a SeqBDD can be half as large as a prefix tree, especially when many similar sequences exist. In terms of mining time, it can be substantially more efficient when the support is low, the number of patterns is large, or the input sequences are long and highly similar.

出版日期2010-8

全文

访问全文

收藏分享被引(15) 浏览

更新时间：2019-11-25 22:21

A binary decision diagram based approach for mining frequent subsequences

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友