A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search

被引:1
|
作者
Vecchiato, Thomas [1 ]
Lucchese, Claudio [1 ]
Nardini, Franco Maria [2 ]
Bruch, Sebastian [3 ]
机构
[1] Ca Foscari Univ Venice, Venice, Italy
[2] ISTI CNR, Pisa, Italy
[3] Pinecone, New York, NY USA
来源
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年
关键词
Approximate Nearest Neighbor Search; Inverted File; Learning to; Rank; EFFICIENT;
D O I
10.1145/3626772.3657931
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search. Its objective is to return a set of k . data points that are closest to a query point, with its accuracy measured by the proportion of exact nearest neighbors captured in the returned set. One popular approach to this question is clustering: The indexing algorithm partitions data points into non-overlapping subsets and represents each partition by a point such as its centroid. The query processing algorithm first identifies the nearest clusters-a process known as routing-then performs a nearest neighbor search over those clusters only. In this work, we make a simple observation: The routing function solves a ranking problem. Its quality can therefore be assessed with a ranking metric, making the function amenable to learning-to-rank. Interestingly, ground-truth is often freely available: Given a query distribution in a top-k. configuration, the ground-truth is the set of clusters that contain the exact top-k. vectors. We develop this insight and apply it to Maximum Inner Product Search (MIPS). As we demonstrate empirically on various datasets, learning a simple linear function consistently improves the accuracy of clustering-based MIPS.
引用
收藏
页码:2261 / 2265
页数:5
相关论文
共 50 条
  • [41] M-PCA Binary Embedding For Approximate Nearest Neighbor Search
    Ozan, Ezgi Can
    Kiranyaz, Serkan
    Gabbouj, Moncef
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 1 - 5
  • [42] Optimized K-means Hashing for Approximate Nearest Neighbor Search
    Guo, Qin-Zhen
    Zeng, Zhi
    Zhang, Shuwu
    Zhang, Yuan
    Zhang, Guixuan
    MATERIAL SCIENCE, CIVIL ENGINEERING AND ARCHITECTURE SCIENCE, MECHANICAL ENGINEERING AND MANUFACTURING TECHNOLOGY II, 2014, 651-653 : 2168 - 2171
  • [43] Efficient Approximate Nearest Neighbor Search by Optimized Residual Vector Quantization
    Ai, Liefu
    Yu, Junqing
    Guan, Tao
    He, Yunfeng
    2014 12TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2014,
  • [44] A review of feature indexing methods for fast approximate nearest neighbor search
    The-Anh Pham
    Van-Hao Le
    Dinh-Nghiep Le
    PROCEEDINGS OF 2018 5TH NAFOSTED CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS 2018), 2018, : 372 - 377
  • [45] Multiple Query-independent Values based Asymmetric Ranking for Approximate Nearest Neighbor Search
    Cao, Yuan
    Qi, Heng
    Li, Keqiu
    Stojmenovic, Milos
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1628 - 1635
  • [46] HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search
    Wan, Ji
    Tang, Sheng
    Zhang, Yongdong
    Li, Jintao
    Wu, Pengcheng
    Hoi, Steven C. H.
    NEUROCOMPUTING, 2017, 237 : 401 - 404
  • [47] Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement
    Li, Wen
    Zhang, Ying
    Sun, Yifang
    Wang, Wei
    Li, Mingjie
    Zhang, Wenjie
    Lin, Xuemin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (08) : 1475 - 1488
  • [48] Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination
    Li, Conglong
    Zhang, Minjia
    Andersen, David G.
    He, Yuxiong
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2539 - 2554
  • [49] AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification
    Tagami, Yukihiro
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 455 - 464
  • [50] Private approximate nearest neighbor search for on-chain data based on locality-sensitive hashing
    Shang, Siyuan
    Du, Xuehui
    Wang, Xiaohan
    Liu, Aodi
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 164