A distributed frequent itemset mining algorithm using Spark for Big Data analytics

Zhang, Feng; Liu, Min; Gui, Feng; Shen, Weiming; Shami, Abdallah; Ma, Yunlong<sup>*</sup>

doi:10.1007/s10586-015-0477-1

摘要

Frequent itemset mining is an essential step in the process of association rule mining. Conventional approaches for mining frequent itemsets in big data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm (DFIMA) which can significantly reduce the amount of candidate itemsets by applying a matrix-based pruning approach. The proposed algorithm has been implemented using Spark to further improve the efficiency of iterative computation. Numeric experiment results using standard benchmark datasets by comparing the proposed algorithm with the existing algorithm, parallel FP-growth, show that DFIMA has better efficiency and scalability. In addition, a case study has been carried out to validate the feasibility of DFIMA.

出版日期2015-12
单位同济大学

全文

访问全文

收藏分享被引(58) 浏览

更新时间：2024-03-29 01:24

A distributed frequent itemset mining algorithm using Spark for Big Data analytics

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友