A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search

被引：1

作者：

Vecchiato, Thomas ^{[1
]}

Lucchese, Claudio ^{[1
]}

Nardini, Franco Maria ^{[2
]}

Bruch, Sebastian ^{[3
]}

机构：

[1] Ca Foscari Univ Venice, Venice, Italy

[2] ISTI CNR, Pisa, Italy

[3] Pinecone, New York, NY USA

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年

关键词：

Approximate Nearest Neighbor Search; Inverted File; Learning to; Rank; EFFICIENT;

D O I：

10.1145/3626772.3657931

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search. Its objective is to return a set of k . data points that are closest to a query point, with its accuracy measured by the proportion of exact nearest neighbors captured in the returned set. One popular approach to this question is clustering: The indexing algorithm partitions data points into non-overlapping subsets and represents each partition by a point such as its centroid. The query processing algorithm first identifies the nearest clusters-a process known as routing-then performs a nearest neighbor search over those clusters only. In this work, we make a simple observation: The routing function solves a ranking problem. Its quality can therefore be assessed with a ranking metric, making the function amenable to learning-to-rank. Interestingly, ground-truth is often freely available: Given a query distribution in a top-k. configuration, the ground-truth is the set of clusters that contain the exact top-k. vectors. We develop this insight and apply it to Maximum Inner Product Search (MIPS). As we demonstrate empirically on various datasets, learning a simple linear function consistently improves the accuracy of clustering-based MIPS.

引用

页码：2261 / 2265

页数：5

共 50 条

[1] Learning Adaptive Hypersphere: Boosting Efficiency on Approximate Nearest Neighbor Search
Ai, Liefu
Jiang, Changyu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2190 - 2194
[2] Quantization to speedup approximate nearest neighbor search
Hao Peng
Neural Computing and Applications, 2024, 36 : 2303 - 2313
[3] Competitive Quantization for Approximate Nearest Neighbor Search
Ozan, Ezgi Can
Kiranyaz, Serkan
Gabbouj, Moncef
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (11) : 2884 - 2894
[4] Quantization to speedup approximate nearest neighbor search
Peng, Hao
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (05) : 2303 - 2313
[5] Learning-based query optimization for multi-probe approximate nearest neighbor search
Zhang, Pengcheng
Yao, Bin
Gao, Chao
Wu, Bin
He, Xiao
Li, Feifei
Lu, Yuanfei
Zhan, Chaoqun
Tang, Feilong
VLDB JOURNAL, 2023, 32 (03) : 623 - 645
[6] Learning-based query optimization for multi-probe approximate nearest neighbor search
Pengcheng Zhang
Bin Yao
Chao Gao
Bin Wu
Xiao He
Feifei Li
Yuanfei Lu
Chaoqun Zhan
Feilong Tang
The VLDB Journal, 2023, 32 : 623 - 645
[7] Tree-based compact hashing for approximate nearest neighbor search
Hou, Guangdong
Cui, Runpeng
Pan, Zheng
Zhang, Changshui
NEUROCOMPUTING, 2015, 166 : 271 - 281
[8] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search
Cai, Deng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2337 - 2348
[9] Fast spectral analysis for approximate nearest neighbor search
Jing Wang
Jie Shen
Machine Learning, 2022, 111 : 2297 - 2322
[10] Scalable Distributed Hashing for Approximate Nearest Neighbor Search
Cao, Yuan
Liu, Junwei
Qi, Heng
Gui, Jie
Li, Keqiu
Ye, Jieping
Liu, Chao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 472 - 484

← 1 2 3 4 5 →