Bloom Filter With Noisy Coding Framework for Multi-Set Membership Testing

被引:59
作者
Dai, Haipeng [1 ]
Yu, Jun [1 ]
Li, Meng [1 ]
Wang, Wei [1 ]
Liu, Alex X. [2 ]
Ma, Jinghao [1 ]
Qi, Lianyong [3 ]
Chen, Guihai [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Peoples R China
[2] Ant Financial Serv Grp, Hangzhou 310099, Peoples R China
[3] Qufu Normal Univ, Sch Comp Sci, Jining 273165, Peoples R China
基金
中国国家自然科学基金;
关键词
Encoding; Noise measurement; Testing; Hash functions; Information filters; Costs; Art; Multi-set membership testing; bloom filter; asymmetric error-correcting code; data storage representations; sketch; SKETCH; BOUNDS;
D O I
10.1109/TKDE.2022.3199646
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article is on designing a compact data structure for multi-set membership testing that allows fast set querying. Multi-set membership testing is a fundamental operation for computing systems. Most existing schemes for multi-set membership testing are built upon Bloom filter and fall short in either storage space cost or query speed. To address this issue, we propose Noisy Bloom Filter (NBF), Error Corrected Noisy Bloom Filter (NBF-E), and Data-driven Noisy Bloom Filter (NBF-D) in this paper. We optimize their misclassification and false positive rates by theoretical analysis and present criteria for selection between NBF, NBF-E, and NBF-D. The key novelty of the three schemes is to store set ID information in a compact but noisy way that allows fast recording and querying and use a denoising method for querying. Especially, NBF-E incorporates asymmetric error-correcting coding techniques into NBF, and NBF-D encodes set ID basedt membership testing. on their cardinality. To evaluate NBF, NBF-E, and NBF-D in comparison with the prior art, we conducted experiments using real-world network traces. The results show that NBF, NBF-E, and NBF-D significantly advance the state-of-the-art on multi-se
引用
收藏
页码:6710 / 6724
页数:15
相关论文
共 55 条
  • [51] Yu Minlan., 2009, Proceedings of the ACM 5th International Conference on Emerging Networking Experiments and Technologies (CoNEXT), P313
  • [52] Two-stage index-based central keyword-ranked searches over encrypted cloud data
    Zhong, Hong
    Li, Zhanfei
    Xu, Yan
    Chen, Zhili
    Cui, Jie
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (03)
  • [53] Nonuniform Codes for Correcting Asymmetric Errors in Data Storage
    Zhou, Hongchao
    Jiang, Anxiao
    Bruck, Jehoshua
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (05) : 2988 - 3002
  • [54] Zhou SQ, 2019, 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), P1, DOI [10.23919/ECC.2019.8796140, 10.23919/ecc.2019.8796140]
  • [55] Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing
    Zhou, Yang
    Yang, Tong
    Jiang, Jie
    Cui, Bin
    Yu, Minlan
    Li, Xiaoming
    Uhlig, Steve
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 741 - 756