摘要

Various association rule mining techniques, such as frequent itemset mining, sequence itemset mining, and high utility itemset mining, have been studied to reveal valuable knowledge hidden from large databases. Among these techniques, high utility itemset mining has been researched actively by many researchers because of its characteristics that can find more meaningful itemsets compared to those of other approaches by considering the utility of each item in a given database. In recent years, mining high utility itemsets over data streams has emerged as an interesting topic because many users want to obtain valuable information from stream data, which are continually generated at rapid rates. However, in these environments, most of the previous high utility itemset mining methods cannot efficiently work in terms of both runtime and memory usage. In addition, since they conduct their mining processes without any consideration of transactions' arrival-time, it is hard for these methods to sufficiently fulfill the needs of users when they want to obtain only up to date, relevant information over data streams. In this paper, we propose a new tree-based algorithm that mines recent high utility itemsets over data streams. On the basis of the time decaying model, our algorithm diminishes the utilities of transactions according to their arrival-time in order to assign larger weights to recent data compared to those of older ones. Moreover, the algorithm regularly updates the utility information in its tree data structure and prunes the nodes with the utility values less than a user-specified minimum value. Thereby, the algorithm can maintain a reasonable memory usage bound by avoiding memory use that is unessential. Experimental results demonstrate that our algorithm can mine recent high utility itemsets from varying stream data while consuming smaller computational resources than those of the existing algorithms.

  • 出版日期2016