Bloom Filter With Noisy Coding Framework for Multi-Set Membership Testing

被引:59
作者
Dai, Haipeng [1 ]
Yu, Jun [1 ]
Li, Meng [1 ]
Wang, Wei [1 ]
Liu, Alex X. [2 ]
Ma, Jinghao [1 ]
Qi, Lianyong [3 ]
Chen, Guihai [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Peoples R China
[2] Ant Financial Serv Grp, Hangzhou 310099, Peoples R China
[3] Qufu Normal Univ, Sch Comp Sci, Jining 273165, Peoples R China
基金
中国国家自然科学基金;
关键词
Encoding; Noise measurement; Testing; Hash functions; Information filters; Costs; Art; Multi-set membership testing; bloom filter; asymmetric error-correcting code; data storage representations; sketch; SKETCH; BOUNDS;
D O I
10.1109/TKDE.2022.3199646
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article is on designing a compact data structure for multi-set membership testing that allows fast set querying. Multi-set membership testing is a fundamental operation for computing systems. Most existing schemes for multi-set membership testing are built upon Bloom filter and fall short in either storage space cost or query speed. To address this issue, we propose Noisy Bloom Filter (NBF), Error Corrected Noisy Bloom Filter (NBF-E), and Data-driven Noisy Bloom Filter (NBF-D) in this paper. We optimize their misclassification and false positive rates by theoretical analysis and present criteria for selection between NBF, NBF-E, and NBF-D. The key novelty of the three schemes is to store set ID information in a compact but noisy way that allows fast recording and querying and use a denoising method for querying. Especially, NBF-E incorporates asymmetric error-correcting coding techniques into NBF, and NBF-D encodes set ID basedt membership testing. on their cardinality. To evaluate NBF, NBF-E, and NBF-D in comparison with the prior art, we conducted experiments using real-world network traces. The results show that NBF, NBF-E, and NBF-D significantly advance the state-of-the-art on multi-se
引用
收藏
页码:6710 / 6724
页数:15
相关论文
共 55 条
  • [1] Upper bounds for constant-weight codes
    Agrell, E
    Vardy, A
    Zeger, K
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2000, 46 (07) : 2373 - 2395
  • [2] Aguilera MK, 2008, PROC VLDB ENDOW, V1, P598
  • [3] [Anonymous], 2005, PROC ALLERTON C
  • [4] [Anonymous], 2010, MOD COD
  • [5] TUTORIAL ON LARGE DEVIATIONS FOR THE BINOMIAL-DISTRIBUTION
    ARRATIA, R
    GORDON, L
    [J]. BULLETIN OF MATHEMATICAL BIOLOGY, 1989, 51 (01) : 125 - 131
  • [6] SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS
    BLOOM, BH
    [J]. COMMUNICATIONS OF THE ACM, 1970, 13 (07) : 422 - &
  • [7] The phase transition in inhomogeneous random graphs
    Bollobas, Bela
    Janson, Svante
    Riordan, Oliver
    [J]. RANDOM STRUCTURES & ALGORITHMS, 2007, 31 (01) : 3 - 122
  • [8] Beyond bloom filters: From approximate membership checks to approximate state machines
    Bonomi, Flavio
    Mitzenmacher, Michael
    Panigrahy, Rina
    Singh, Sushil
    Varghese, George
    [J]. ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2006, 36 (04) : 315 - 326
  • [9] Technologies and building blocks for fast packet forwarding
    Bux, W
    Denzel, WE
    Engbersen, T
    Herkersdorf, A
    Luijten, RP
    [J]. IEEE COMMUNICATIONS MAGAZINE, 2001, 39 (01) : 70 - 77
  • [10] Chang F, 2004, IEEE INFOCOM SER, P2196