A MapReduce Framework for Mining Maximal Contiguous Frequent Patterns in Large DNA Sequence Datasets

Karim Md Rezaul<sup>*</sup>; Hossain Md Azam; Rashid Md Mamunur; Jeong Byeong Soo; Choi Ho Jin

doi:10.4103/0256-4602.95388

摘要

Current DNA sequence datasets have become extremely large, making it a great challenge for single-processor and main-memory-based computing systems to mine interesting patterns. Such limited hardware resources make the performance of most Apriori-like algorithms inefficient. However, recent implementation of a MapReduce framework has overcome these limitations. Furthermore, mining with maximal contiguous frequent patterns to express the function and structure of DNA sequences is a useful technique, capable of capturing the common data characteristics among related sequences. In this paper, we proposed an efficient approach for mining maximal contiguous frequent patterns in large DNA sequence data using MapReduce framework which can handle a massive DNA sequence datasets with a large number of nodes on a Hadoop platform. Our extensive experimental results show that the proposed approach can mine the complete set of maximal contiguous frequent patterns very efficiently.

出版日期2012-4

全文

访问全文

收藏分享被引(7) 浏览

更新时间：2018-04-10 18:46

A MapReduce Framework for Mining Maximal Contiguous Frequent Patterns in Large DNA Sequence Datasets

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友