摘要

The problem of mining frequent sequential patterns (FSPs) has attracted a great deal of research attention. Although there are many efficient algorithms for mining FSPs, the mining time is still high, especially for large or dense datasets. Parallel processing has been widely applied to improve processing speed for various problems. Some parallel algorithms have been proposed, but most of them have problems related to synchronization and load balancing. Based on a multi-core processor architecture, this paper proposes a load-balancing parallel approach called Parallel Dynamic Bit Vector Sequential Pattern Mining (pDBV-SPM) for mining FSPs from huge datasets using the dynamic bit vector data structure for fast determining support values. In the pDBV-SPM approach, the support count is sorted in ascending order before the set of frequent 1-sequences is partitioned into parts, each of which is assigned to a task on a processor so that most of the nodes in the leftmost branches will be infrequent and thus pruned during the search; this strategy helps to better balance the search tree. Experiments are conducted to verify the effectiveness of pDBV-SPM. The experimental results show that the proposed algorithm outperforms PIB-PRISM for mining FSPs in terms of mining time and memory usage.

  • 出版日期2017-4