摘要

High utility itemset mining addresses the limitations of frequent itemset mining by introducing measures of interestingness that reflect the significance of an itemset beyond its frequency of occurrence. Among such algorithms, level-wise candidate generation-and-test approaches suffer from the drawbacks of having an immense candidate pool and requiring several database scans. Meanwhile, methods based on pattern growth tend to consume large amounts of memory to store conditional trees. We propose an efficient algorithm, called Index High Utility Itemsets Mine (IHUI-Mine), for application to high utility itemsets. The subsume index, which has been employed to mine frequent itemsets, is extended in IHUI-Mine to the discovery of high utility itemsets. In addition to the enumeration and search strategies inherited from the subsume index, we introduce a new property to specifically accelerate the computation of transaction-weighted utilization for high utility itemsets. Furthermore, given that bitmaps are used for database representation, the real utility of candidates can be verified from the recorded transactions rather than by resorting to the entire database. The computational complexity of IHUI-Mine is analyzed, and tests conducted on publicly available synthetic and real datasets further demonstrate that the proposed algorithm outperforms existing state-of-the-art algorithms.