摘要

While mining the association rules in distributed database, overhead increases at each site because of linkage and dependency with other sites. Each site scans database not only for itself but for the neighboring sites also. In the most popular Count Distribution (CD) and Fast Distributed Mining (FDM) algorithms, sites generate and scan the identical candidate itemsets. In the CD algorithm, sites generate candidate k + 1 itemsets using global frequent k-itemsets and in the FDM algorithm, sites generate using its own and neighboring sites heavy frequent k-itemsets. Most of the itemsets scanned by the CD algorithm are infrequent. These infrequent itemsets are not scanned in the FDM algorithm. Anyhow, in the FDM algorithm, some of the itemsets may be found frequent on neither of the sites but scanned on all the sites. In this paper, an efficient framework and an algorithm have been proposed for mining association rules in the distributed database. In the proposed framework, initially, overhead of each site for generating and scanning candidate itemsets for the neighboring sites is reduced. Later, a site either does not scan candidate k-itemset of neighboring site or postpone till its k + 1 itemsets are scanned.

  • 出版日期2018-5