Improving Query Execution Performance in Big Data using Cuckoo Filter

被引:0
作者
Mosharraf, Sharafat Ibn Mollah [1 ]
Adnan, Muhammad Abdullah [1 ]
机构
[1] Bangladesh Univ Engn & Technol, Dhaka, Bangladesh
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Big Data; Query Optimization; Probabilistic Data Structure; Bloom Filter; Cuckoo Filter;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Performance is a critical concern when reading and writing billions of records stored in a Big Data warehouse. To improve performance, Big Data systems utilize the Eventual Consistency model as well as a probabilistic data structure called Bloom Filter. In recent times, cost minimization and privacy concerns have led Big Data systems to regularly delete data besides reading and writing. Bloom Filter has an important limitation of not supporting deletion of elements from within it. This limitation along with constraints of the Eventual Consistency model prevents the system to delete data from disk immediately upon an execution of a DELETE command. Rather data is actually deleted during a compaction process, which is executed very infrequently due to it being resource-intensive. During the time between data deletion and compaction (typically a few days), lookup queries fetch records from disk that includes the deleted rows, then detect and remove the deleted records from the returning result set. This leads to poor query execution performance. In this paper, we propose a scheme to improve performance of lookup queries after data deletion by replacing Bloom Filter with a better probabilistic data structure called Cuckoo Filter that supports deletion of elements from within it. We evaluate our proposed scheme using a popular Big Data database (Cassandra) that uses both the Eventual Consistency model and Bloom Filter, and show that performance of lookup queries executed after data deletion can improve for up to 100%. We also show that our scheme does not degrade performance of other data manipulation queries as a side effect.
引用
收藏
页码:1079 / 1084
页数:6
相关论文
共 29 条
[1]  
Al-hisnawi Mohammad, 2017, 2017 Annual Conference on New Trends in Information & Communications Technology Applications (NTICT), P197, DOI 10.1109/NTICT.2017.7976111
[2]  
Bhushan B, 2014, HANDBOOK OF NANOMATERIALS PROPERTIES, P1, DOI 10.1007/978-3-642-31107-9
[3]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[4]  
Bonomi F, 2006, LECT NOTES COMPUT SC, V4168, P684
[5]  
Casares J., 2012, MULTIDATACENTER REPL
[6]   Bigtable: A distributed storage system for structured data [J].
Chang, Fay ;
Dean, Jeffrey ;
Ghemawat, Sanjay ;
Hsieh, Wilson C. ;
Wallach, Deborah A. ;
Burrows, Mike ;
Chandra, Tushar ;
Fikes, Andrew ;
Gruber, Robert E. .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2008, 26 (02)
[7]  
Chen HM., 2017, SCI TECHNOLOGY NUCL, V2017, P1
[8]   SPACF: A Secure Privacy-Preserving Authentication Scheme for VANET With Cuckoo Filter [J].
Cui, Jie ;
Zhang, Jing ;
Zhong, Hong ;
Xu, Yan .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2017, 66 (11) :10283-10295
[9]   Cuckoo Filter: Practically Better Than Bloom [J].
Fan, Bin ;
Andersen, David G. ;
Kaminsky, Michael ;
Mitzenrnacher, Michael D. .
PROCEEDINGS OF THE 2014 CONFERENCE ON EMERGING NETWORKING EXPERIMENTS AND TECHNOLOGIES (CONEXT'14), 2014, :75-87
[10]   Summary cache: A scalable wide-area Web cache sharing protocol [J].
Fan, L ;
Cao, P ;
Almeida, J ;
Broder, AZ .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2000, 8 (03) :281-293