Approximate Nearest Neighbor Search on Standard Search Engines

被引:8
作者
Carrara, Fabio [1 ]
Vadicamo, Lucia [1 ]
Gennaro, Claudio [1 ]
Amato, Giuseppe [1 ]
机构
[1] ISTI CNR, Pisa, Italy
来源
SIMILARITY SEARCH AND APPLICATIONS (SISAP 2022) | 2022年 / 13590卷
基金
欧盟地平线“2020”;
关键词
Surrogate text representation; Inverted index; Approximate search; High-dimensional indexing; Very large databases;
D O I
10.1007/978-3-031-17849-8_17
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Approximate search for high-dimensional vectors is commonly addressed using dedicated techniques often combined with hardware acceleration provided by GPUs, FPGAs, and other custom inmemory silicon. Despite their effectiveness, harmonizing those optimized solutions with other types of searches often poses technological difficulties. For example, to implement a combined text+image multimodal search, we are forced first to query the index of high-dimensional image descriptors and then filter the results based on the textual query or vice versa. This paper proposes a text surrogate technique to translate real-valued vectors into text and index them with a standard textual search engine such as Elasticsearch or Apache Lucene. This technique allows us to perform approximate kNN searches of high-dimensional vectors alongside classical full-text searches natively on a single textual search engine, enabling multimedia queries without sacrificing scalability. Our proposal exploits a combination of vector quantization and scalar quantization. We compared our approach to the existing literature in this field of research, demonstrating a significant improvement in performance through preliminary experimentation.
引用
收藏
页码:214 / 221
页数:8
相关论文
共 16 条
[1]   VISIONE at Video Browser Showdown 2022 [J].
Amato, Giuseppe ;
Bolettieri, Paolo ;
Carrara, Fabio ;
Falchi, Fabrizio ;
Gennaro, Claudio ;
Messina, Nicola ;
Vadicamo, Lucia ;
Vairo, Claudio .
MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 :543-548
[2]   The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval [J].
Amato, Giuseppe ;
Bolettieri, Paolo ;
Carrara, Fabio ;
Debole, Franca ;
Falchi, Fabrizio ;
Gennaro, Claudio ;
Vadicamo, Lucia ;
Vairo, Claudio .
JOURNAL OF IMAGING, 2021, 7 (05)
[3]   Large-scale instance-level image retrieval [J].
Amato, Giuseppe ;
Carrara, Fabio ;
Falchi, Fabrizio ;
Gennaro, Claudio ;
Vadicamo, Lucia .
INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
[4]   Large-Scale Image Retrieval with Elasticsearch [J].
Amato, Giuseppe ;
Bolettieri, Paolo ;
Carrara, Fabio ;
Falchi, Fabrizio ;
Gennaro, Claudio .
ACM/SIGIR PROCEEDINGS 2018, 2018, :925-928
[5]   ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms [J].
Aumueller, Martin ;
Bernhardsson, Erik ;
Faithfull, Alexander .
SIMILARITY SEARCH AND APPLICATIONS, SISAP 2017, 2017, 10609 :34-49
[6]  
Ciaccia P, 1997, PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, P426
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]   INTRODUCTION TO MODERN INFORMATION-RETRIEVAL - SALTON,G, MCGILL,M [J].
DILLON, M .
INFORMATION PROCESSING & MANAGEMENT, 1983, 19 (06) :402-403
[9]  
Dua D, 2017, UCI machine learning repository
[10]  
Gennaro C, 2010, LECT NOTES COMPUT SC, V6273, P55, DOI 10.1007/978-3-642-15464-5_8