Distributed Mining of Contrast Patterns

作者:Savage David*; Zhang Xiuzhen; Chou Pauline; Yu Xinghou; Wang Qingmai
来源:IEEE Transactions on Parallel and Distributed Systems, 2017, 28(7): 1881-1890.
DOI:10.1109/TPDS.2016.2637914

摘要

In this paper we propose a novel algorithm for mining contrast patterns using a distributed, map-reduce like framework. Contrast patterns describe differences between contrasted data sets and have previously been used for building highly accurate classifiers. However, mining for contrast patterns is a computationally expensive task and existing algorithms are designed to run in a sequential manner on a single machine. Consequently, existing approaches are unable to handle dense, high volume and high dimensional databases. Our algorithm addresses this problem by partitioning the search-space for contrast patterns into small, independent units. These units can be mined in parallel, providing a scalable solution for mining large data sets. Using three different real-world data sets we test an implementation of our algorithm on a Spark cluster. Results of these tests indicate that our algorithm achieves a high-degree of parallelism and scalability.

  • 出版日期2017-7-1