FLEX: A fast and light-weight learned index for kNN search in-dimensional

被引:1
作者
Li, Lingli [1 ]
Han, Ao [1 ]
Cui, Xiaotong [1 ]
Wu, Baohua [1 ]
机构
[1] Heilongjiang Univ, Sch Comp Sci & Technol, Harbin, Heilongjiang, Peoples R China
关键词
Learned index; kNN search; High-dimensional; Deep learning; Data layout; PRODUCT QUANTIZATION; QUERIES;
D O I
10.1016/j.ins.2024.120546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The k Nearest Neighbors (kNN) search in high -dimensional space is a fundamental problem with various applications. In this paper, we try to solve this problem by using deep neural networks (DNNs). We apply DNNs to represent complex correlations between high -dimensional objects, enabling us to project similar objects into the same class, thus reducing the search cost. Based on DNNs, we propose two novel techniques to improve query efficiency while keeping accuracy. First, traditional DNNs typically demand extensive training data for achieving high accuracy. To decrease the training size, we design a multi -module DNN framework comprising several small modules. Each module learns to capture part of knowledge for the given query. The collective output of these sub -modules is then seamlessly integrated to form the final result. Second, with machine learning models, the size of candidates are unbounded. Thus, we design a linear -time data layout refinement algorithm, aiming to limit the number of candidates to a small constant. Empirically we find that our approach significantly outperforms the state-of-the-art methods in terms of both time efficiency and space efficiency while still attaining comparable or better accuracy.
引用
收藏
页数:16
相关论文
共 44 条
  • [1] HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces
    Arora, Akhil
    Sinha, Sakshi
    Kumar, Piyush
    Bhattacharya, Arnab
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (08): : 906 - 919
  • [2] Dynamic optimization of queries in pivot-based indexing
    Bratsberg, Svein Erik
    Hetland, Magnus Lie
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2012, 60 (02) : 261 - 275
  • [3] Brin S., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P574
  • [4] Pivot-based approximate k-NN similarity joins for big high-dimensional data
    Cech, Premysl
    Lokoc, Jakub
    Silva, Yasin N.
    [J]. INFORMATION SYSTEMS, 2020, 87
  • [5] Indexing Metric Spaces for Exact Similarity Search
    Chen, Lu
    Gao, Yunjun
    Song, Xuan
    Li, Zheng
    Zhu, Yifan
    Miao, Xiaoye
    Jensen, Christian S.
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (06)
  • [6] Pivot-based Metric Indexing
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Jensen, Christian S.
    Yang, Hanyu
    Yang, Keyu
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (10): : 1058 - 1069
  • [7] Ciaccia P, 1997, PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, P426
  • [8] Davitkova A ..., 2020, P 23 INT C EXT DAT T, P407
  • [9] Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads
    Ding, Jialin
    Nathan, Vikram
    Alizadeh, Mohammad
    Kraska, Tim
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (02): : 74 - 86
  • [10] Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861]