Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination

被引：16

作者：

Li, Conglong ^{[1
]}

Zhang, Minjia ^{[2
]}

Andersen, David G. ^{[1
]}

He, Yuxiong ^{[2
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Microsoft AI & Res, Bellevue, WA USA

来源：

SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2020年

基金：

美国国家科学基金会;

关键词：

information retrieval; approximate nearest neighbor search;

D O I：

10.1145/3318464.3380600

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In applications ranging from image search to recommendation systems, the problem of identifying a set of "similar" real-valued vectors to a query vector plays a critical role. However, retrieving these vectors and computing the corresponding similarity scores from a large database is computationally challenging. Approximate nearest neighbor (ANN) search relaxes the guarantee of exactness for efficiency by vector compression and/or by only searching a subset of database vectors for each query. Searching a larger subset increases both accuracy and latency. State-of-the-art ANN approaches use fixed configurations that apply the same termination condition (the size of subset to search) for all queries, which leads to undesirably high latency when trying to achieve the last few percents of accuracy. We find that due to the index structures and the vector distributions, the number of database vectors that must be searched to find the ground-truth nearest neighbor varies widely among queries. Critically, we further identify that the intermediate search result after a certain amount of search is an important runtime feature that indicates how much more search should be performed. To achieve a better tradeoff between latency and accuracy, we propose a novel approach that adaptively determines search termination conditions for individual queries. To do so, we build and train gradient boosting decision tree models to learn and predict when to stop searching for a certain query. These models enable us to achieve the same accuracy with less total amount of search compared to the fixed configurations. We apply the learned adaptive early termination to state-of-the-art ANN approaches, and evaluate the end-to-end performance on three million to billion-scale datasets. Compared with fixed configurations, our approach consistently improves the average end-to-end latency by up to 7.1 times faster under the same high accuracy targets. Our approach is open source at github.com/efficient/faisslearned-termination.

引用

页码：2539 / 2554

页数：16

共 50 条

[1] Adaptive bit allocation hashing for approximate nearest neighbor search
Guo, Qin-Zhen
Zeng, Zhi
Zhang, Shuwu
NEUROCOMPUTING, 2015, 151 : 719 - 728
[2] ADAPTIVE BIT ALLOCATION HASHING FOR APPROXIMATE NEAREST NEIGHBOR SEARCH
Guo, Qin-Zhen
Zeng, Zhi
Zhang, Shuwu
Zhang, Yuan
Wang, Fangyuan
2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
[3] Learning Adaptive Hypersphere: Boosting Efficiency on Approximate Nearest Neighbor Search
Ai, Liefu
Jiang, Changyu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2190 - 2194
[4] Quantization to speedup approximate nearest neighbor search
Hao Peng
Neural Computing and Applications, 2024, 36 : 2303 - 2313
[5] Competitive Quantization for Approximate Nearest Neighbor Search
Ozan, Ezgi Can
Kiranyaz, Serkan
Gabbouj, Moncef
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (11) : 2884 - 2894
[6] Quantization to speedup approximate nearest neighbor search
Peng, Hao
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (05) : 2303 - 2313
[7] EI-LSH: An early-termination driven I/O efficient incrementalc-approximate nearest neighbor search
Liu, Wanqi
Wang, Hanchen
Zhang, Ying
Wang, Wei
Qin, Lu
Lin, Xuemin
VLDB JOURNAL, 2021, 30 (02) : 215 - 235
[8] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search
Cai, Deng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2337 - 2348
[9] Fast spectral analysis for approximate nearest neighbor search
Jing Wang
Jie Shen
Machine Learning, 2022, 111 : 2297 - 2322
[10] Scalable Distributed Hashing for Approximate Nearest Neighbor Search
Cao, Yuan
Liu, Junwei
Qi, Heng
Gui, Jie
Li, Keqiu
Ye, Jieping
Liu, Chao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 472 - 484

← 1 2 3 4 5 →