摘要

Bloom filter (BF) is a simple but powerful data structure that can check membership to a static set. The trade-off to use Bloom filter is a certain configurable risk of false positives. The odds of a false positive can be made very low if the hash bitmap is sufficiently large. Spam is an irrelevant or inappropriate message sent on the internet to a large number of newsgroups or users. A spam word is a list of well-known words that often appear in spam mails. The proposed system of bin Bloom filter (BBF) groups the words into number of bins with different false positive rates based on the weights of the spam words. Cuckoo search (CS) and bat algorithm are bio-inspired algorithms that imitate the way cuckoo breeding and microbat foraging behaviours respectively. This paper demonstrates the CS and bat algorithm for minimising the total membership invalidation cost of the BBFs by finding the optimal false positive rates and number of elements stored in every bin. The experimental results demonstrate the application of CS and bat algorithm for various numbers of bins and strings.

  • 出版日期2012