Iterative sampling based frequent itemset mining for big data

Wu, Xian<sup>*</sup>; Fan, Wei; Peng, Jing; Zhang, Kun; Yu, Yong

doi:10.1007/s13042-015-0345-6

摘要

Frequent pattern mining attracts extensive research interests over the past two decades: including mining frequent item sets from transactions, extracting frequent sequences from bio-arrays and detecting common subgraph from molecular structures. In the era of big data, the explosive data volume brings new challenges to frequent pattern mining: (1) Space complexity: both input data, intermediate results and the outputted patterns could be too large to fit into memory which prevents many algorithms from executing; (2) Time complexity: many existing approaches rely on exhaustive search or complicated data structures to mine frequent patterns which prove to be inapplicable for big data. To deal with these two challenges. we propose ISbFIM, an Iterative Sampling based Frequent Itemset Mining method. Rather than process the entire data set at once, ISbFIM samples computationally-manageable subsets and extracts frequent itemsets from these subsets. By repeating this process for a sufficient number of times, we can guarantee both theoretically and empirically that the frequent itemsets can be enumerated without running into a combinatorial explosion. ISbFIM can be easily parallelized and applied to mine item sets, sequences or structures. We implement a Map-Reduce version of ISbFIM to demonstrate its scalability on big data.

出版日期2015-12
单位上海交通大学

全文

访问全文

收藏分享被引(8) 浏览

更新时间：2024-05-18 10:20

Iterative sampling based frequent itemset mining for big data

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友