Adaptive Learned Bloom Filters under Incremental Workloads

被引:7
作者
Bhattacharya, Arindam [1 ]
Bedathur, Srikanta [1 ]
Bagchi, Amitabha [1 ]
机构
[1] IIT Delhi, Dept Comp Sci & Engn, Delhi, India
来源
PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020) | 2020年
关键词
D O I
10.1145/3371158.3371171
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The recently proposed paradigm of learned Bloom filters (LBF) seems to offer significant advantages over traditional Bloom filters in terms of low memory footprint and overall performance as evidenced by empirical evaluations over static data. Its behavior in presence of updates to the set of keys being stored in Bloom filters is not very well understood. At the same time, maintaining the false positive rates (FPR) of traditional Bloom filters in presence of dynamics has been studied and extensions to carefully expand memory footprint of the filters without sacrificing FPR have been proposed. Building on these, we propose two distinct approaches for handling data updates encountered in practical uses of LBF: (i) CA-LBF, where we adjust the learned model (e.g., by retraining) to accommodate the new "unseen" data, resulting in classifier adaptive methods, and (ii) IA-LBF, where we replace the traditional Bloom filter with its adaptive version while keeping the learned model unchanged, leading to an index adaptive method. In this paper, we explore these two approaches in detail under incremental workloads, evaluating them in terms of their adaptability, memory footprint and false positive rates. Our empirical results using a variety of datasets and learned models of varying complexity show that our proposed methods' ability to handle incremental updates is quite robust.
引用
收藏
页码:107 / 115
页数:9
相关论文
共 14 条
[1]  
[Anonymous], 2019, ARXIV190106493
[2]  
[Anonymous], 2018, ARXIV180200884
[3]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[4]  
Facebook, 2016, FAC 5 CHECK DAT
[5]   Tracking the Conductance of Rapidly Evolving Topic-Subgraphs [J].
Galhotra, Sainyam ;
Bagchi, Amitabha ;
Bedathur, Srikanta ;
Ramanath, Maya ;
Jain, Vidit .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (13) :2170-2181
[6]   The Dynamic Bloom Filters [J].
Guo, Deke ;
Wu, Jie ;
Chen, Honghui ;
Yuan, Ye ;
Luo, Xueshan .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) :120-133
[7]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[8]   The Case for Learned Index Structures [J].
Kraska, Tim ;
Beutel, Alex ;
Chi, Ed H. ;
Dean, Jeffrey ;
Polyzotis, Neoklis .
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, :489-504
[9]  
Krizhevsky Alex, 2009, LEARNING MULTIPLE LA
[10]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324