LBFM: Multi-dimensional Membership Index for Block-level Data Skipping

被引:1
|
作者
Wang, Yong [1 ]
Yun, Xiaochun [2 ]
Wang, Xi [1 ]
Wang, Shupeng [1 ]
Wu, Yongshang [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] CNCERT CC, Beijing, Peoples R China
[3] Nanjing Univ, Sch Software, Nanjing, Jiangsu, Peoples R China
来源
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017) | 2017年
关键词
data skipping; membership index; bloom filter; bitmap;
D O I
10.1109/ISPA/IUCC.2017.00056
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data skipping has been a promising technique to reduce data access in query engines. By maintaining metadata for each block of tuples, a query may skip a block if the metadata indicates that the block does not contain relevant data. Obviously, the key factor is how to build effective metadata by extracting representative features of blocks. In this paper, we propose a multi-dimensional index, Layered Bloom Filter Matrix (LBFM), which adopts a recursively layered framework, and represents the matrix as an ordered hierarchy of hashmap and bitmap to compress space consumption instead of space-consuming bit matrix. Additionally, LBFM supports dimension combination cutting, and optimal indexing strategy could be generated according to it, thus the space efficiency could be further improved. We demonstrate time complexity of LBFM, and theoretically prove that LBFM has lower space consumption than Bloom Filter Matrix algorithm. We proto-typed our index technique on Spark SQL. Our experiments on TPC-H and a real-world workload show that LBFM gains significant improvement in aspect of query response time over traditional methods.
引用
收藏
页码:343 / 351
页数:9
相关论文
共 50 条
  • [1] Data broadcasting with multi-dimensional index on multiple channels
    Liu, CM
    Lin, KF
    INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 5, PROCEEDINGS, 2004, : 362 - 367
  • [2] Design of Freeway Traffic Block Multi-dimensional Data Warehouse
    He, Lin
    Chen, Yan
    Liu, LanYing
    Men, Ning
    MATERIALS, TRANSPORTATION AND ENVIRONMENTAL ENGINEERING, PTS 1 AND 2, 2013, 779-780 : 954 - +
  • [3] File Versioning for Block-Level Continuous Data Protection
    Lu, Maohua
    Chiueh, Tzi-cker
    2009 29TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2009, : 327 - 334
  • [4] Multi-dimensional systematic block codes
    Yue, DW
    Tao, C
    CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (02): : 334 - 337
  • [5] Design and Analysis of Block-Level Snapshots for Data Protection and Recovery
    Xiao, Weijun
    Yang, Qing
    Ren, Jin
    Xie, Changsheng
    Li, Huaiyang
    IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (12) : 1615 - 1625
  • [6] Optimal Data Layout for Block-Level Random Accesses to Scratchpad
    Singapura, Shreyas G.
    Kannan, Rajgopal
    Prasanna, Viktor K.
    2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
  • [7] A Hierarchical Storage Strategy Based on Block-Level Data Valuation
    Zhao, Xiaonan
    Li, Zhanhuai
    Zeng, Leijie
    NCM 2008 : 4TH INTERNATIONAL CONFERENCE ON NETWORKED COMPUTING AND ADVANCED INFORMATION MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 36 - 41
  • [8] Measuring skill: a multi-dimensional index
    Portela, M
    ECONOMICS LETTERS, 2001, 72 (01) : 27 - 32
  • [9] MULTI-DIMENSIONAL PIXEL PURITY INDEX
    Heylen, Rob
    Scheunders, Paul
    2013 5TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2013,
  • [10] Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads
    Ding, Jialin
    Nathan, Vikram
    Alizadeh, Mohammad
    Kraska, Tim
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (02): : 74 - 86