A comparative study of cuckoo search and bat algorithm for Bloom filter optimisation in spam filtering

被引:29
作者
Natarajan, Arulanand [1 ]
Subramanian, S. [2 ]
Premalatha, K. [3 ]
机构
[1] Anna Univ Technol Coimbatore, Coimbatore 641047, Tamil Nadu, India
[2] Sri Krishna Coll Engn & Technol, Coimbatore 641008, Tamil Nadu, India
[3] Bannari Amman Inst Technol, Dept Comp Sci & Engn, Erode Dt 638401, Tamil Nadu, India
关键词
bat algorithm; bin Bloom filter; BBF; Bloom filter; BF; cuckoo search; CS; false positive rate; hash function; spam word; ARCHITECTURE;
D O I
10.1504/IJBIC.2012.047179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bloom filter (BF) is a simple but powerful data structure that can check membership to a static set. The trade-off to use Bloom filter is a certain configurable risk of false positives. The odds of a false positive can be made very low if the hash bitmap is sufficiently large. Spam is an irrelevant or inappropriate message sent on the internet to a large number of newsgroups or users. A spam word is a list of well-known words that often appear in spam mails. The proposed system of bin Bloom filter (BBF) groups the words into number of bins with different false positive rates based on the weights of the spam words. Cuckoo search (CS) and bat algorithm are bio-inspired algorithms that imitate the way cuckoo breeding and microbat foraging behaviours respectively. This paper demonstrates the CS and bat algorithm for minimising the total membership invalidation cost of the BBFs by finding the optimal false positive rates and number of elements stored in every bin. The experimental results demonstrate the application of CS and bat algorithm for various numbers of bins and strings.
引用
收藏
页码:89 / 99
页数:11
相关论文
共 36 条
[1]  
[Anonymous], BATS WINGS NIGHT
[2]  
Bauer D, 2004, C LOCAL COMPUT NETW, P6
[3]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[4]   Beyond bloom filters: From approximate membership checks to approximate state machines [J].
Bonomi, Flavio ;
Mitzenmacher, Michael ;
Panigrahy, Rina ;
Singh, Sushil ;
Varghese, George .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2006, 36 (04) :315-326
[5]  
Broder Andrei, 2002, Internet mathematics, P636, DOI DOI 10.1080/15427951.2004.10129096
[6]  
Chazelle Bernard, 2004, P 15 ANN ACM SIAM S, P30
[7]  
Chen HongHua Chen HongHua, 2011, China Condiment, P1
[8]  
Cohen S., 2003, P ACM INT C MAN DAT, P241, DOI [10.1145/872757.872787, DOI 10.1145/872757.872787]
[9]  
Cuenca-Acuna FM, 2003, 12TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, P236
[10]  
Deng F., 2006, SIGMOD 06, P25, DOI DOI 10.1145/1142473.1142477