LBFM: Multi-dimensional Membership Index for Block-level Data Skipping

被引:1
|
作者
Wang, Yong [1 ]
Yun, Xiaochun [2 ]
Wang, Xi [1 ]
Wang, Shupeng [1 ]
Wu, Yongshang [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] CNCERT CC, Beijing, Peoples R China
[3] Nanjing Univ, Sch Software, Nanjing, Jiangsu, Peoples R China
来源
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017) | 2017年
关键词
data skipping; membership index; bloom filter; bitmap;
D O I
10.1109/ISPA/IUCC.2017.00056
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data skipping has been a promising technique to reduce data access in query engines. By maintaining metadata for each block of tuples, a query may skip a block if the metadata indicates that the block does not contain relevant data. Obviously, the key factor is how to build effective metadata by extracting representative features of blocks. In this paper, we propose a multi-dimensional index, Layered Bloom Filter Matrix (LBFM), which adopts a recursively layered framework, and represents the matrix as an ordered hierarchy of hashmap and bitmap to compress space consumption instead of space-consuming bit matrix. Additionally, LBFM supports dimension combination cutting, and optimal indexing strategy could be generated according to it, thus the space efficiency could be further improved. We demonstrate time complexity of LBFM, and theoretically prove that LBFM has lower space consumption than Bloom Filter Matrix algorithm. We proto-typed our index technique on Spark SQL. Our experiments on TPC-H and a real-world workload show that LBFM gains significant improvement in aspect of query response time over traditional methods.
引用
收藏
页码:343 / 351
页数:9
相关论文
共 50 条
  • [41] An efficient cache conscious multi-dimensional index structure
    Shim, JM
    Song, SI
    Yoo, JS
    Min, YS
    INFORMATION PROCESSING LETTERS, 2004, 92 (03) : 133 - 142
  • [42] IDRS: Combining file-level intrusion detection with block-level data recovery based on iSCSI
    Zhang, Youhui
    Wang, Hongyi
    Gu, Yu
    Wang, Dongsheng
    ARES 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON AVAILABILITY, SECURITY AND RELIABILITY, 2008, : 630 - +
  • [43] Privacy-preserving and Updatable Block-level Data Deduplication in Cloud Storage Services
    Shin, Hyungjune
    Koo, Dongyoung
    Shin, Youngjoo
    Hur, Junbeom
    PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, : 392 - 400
  • [44] Clustering for multi-dimensional data and its visualization
    Ren, Y.-G. (renyg@dl.cn), 1861, Science Press (28):
  • [45] Efficient quantile retrieval on multi-dimensional data
    Yiu, Man Lung
    Mamoulis, Nikos
    Tao, Yufei
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2006, 2006, 3896 : 167 - 185
  • [46] Detecting clusters and Outliers for multi-dimensional data
    Shi, Yong
    MUE: 2008 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND UBIQUITOUS ENGINEERING, PROCEEDINGS, 2008, : 429 - 432
  • [47] MODELING NONLINEARITY IN MULTI-DIMENSIONAL DEPENDENT DATA
    Han, Qiuyi
    Ding, Jie
    Airoldi, Edoardo
    Tarokh, Vahid
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 206 - 210
  • [48] Modeling multi-dimensional data in biological systems
    Mao, BY
    BIOPHYSICAL JOURNAL, 2001, 80 (01) : 321A - 322A
  • [49] In Pursuit of Outliers in Multi-dimensional Data Streams
    Sadik, Shiblee
    Gruenwald, Le
    Leal, Eleazar
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 512 - 521
  • [50] Heterogeneous Replicas for Multi-dimensional Data Management
    Qiao, Jialin
    Kang, Yuyuan
    Huang, Xiangdong
    Rui, Lei
    Jiang, Tian
    Wang, Jianmin
    Yu, Philip S.
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 20 - 36