Learned index for non-key queries

被引:0
作者
Zhu, Rui [1 ]
Wang, Hongzhi [1 ]
Xia, Sheng [1 ]
Zheng, Bo [2 ]
机构
[1] Harbin Inst Technol, Comp Sci & Technol, 92 Xidazhi St, Harbin 150000, Heilongjiang, Peoples R China
[2] CnosDB, Beijing 100000, Peoples R China
基金
中国国家自然科学基金;
关键词
Bloom filter; Learned index; Non-key query; Index;
D O I
10.1007/s10115-024-02233-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learned indexes have attracted a lot of interest lately due to their superior performance over conventional indexes. When there is a lot of data traffic, the learned index efficiently addresses the issue of the standard index's large memory usage. In this paper, we concentrate on a well-known learned index, the recursive model index (RMI). Since the machine learning model is unbiased while calculating, when there are too many non-key queried, the model will calculate the position of the key as if it were positive key, which wastes a lot of time on unnecessary calculations. To deal with this condition, we propose a hierarchical learned index structure based on Bloom filter named HBFdex. HBFdex can effectively prune non-keys, which means most non-key return in layer of BF before they get to machine learning model. By lowering the number of layers traversed by non-key and the time spent looking for non-key within the error bound that is provided by machine learning model, HBFdex decreases the average query time of learned index. We compare HBFdex with B-Tree and RMI, and the results prove that our new structure optimizes the performance of RMI in the case of non-key queries.
引用
收藏
页码:497 / 519
页数:23
相关论文
共 32 条
  • [21] Li P, 2019, ARXIV
  • [22] Stable Learned Bloom Filters for Data Streams
    Liu, Qiyu
    Zheng, Libin
    Shen, Yanyan
    Chen, Lei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (11): : 2355 - 2367
  • [23] Negative query generation: bridging the gap between query likelihood retrieval models and relevance
    Lv, Yuanhua
    Zhai, ChengXiang
    [J]. INFORMATION RETRIEVAL JOURNAL, 2015, 18 (04): : 359 - 378
  • [24] Macke S, 2018, WORKSH ML SYST NEURI, P1
  • [25] Learning Multi-dimensional Indexes
    Nathan, Vikram
    Ding, Jialin
    Alizadeh, Mohammad
    Kraska, Tim
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 985 - 1000
  • [26] Rao J, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P78
  • [27] Function Interpolation for Learned Index Structures
    Setiawan, Naufal Fikri
    Rubinstein, Benjamin I. P.
    Borovica-Gajic, Renata
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2020, 2020, 12008 : 68 - 80
  • [28] A Hybrid B plus -tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms
    Shahvarani, Amirhesam
    Jacobsen, Hans-Arno
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1523 - 1538
  • [29] XIndex: A Scalable Learned Index for Multicore Data Storage
    Tang, Chuzhe
    Wang, Youyun
    Dong, Zhiyuan
    Hu, Gansen
    Wang, Zhaoguo
    Wang, Minjie
    Chen, Haibo
    [J]. PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 308 - 320
  • [30] Vaidya K, 2020, INT C LEARN REPR