Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

被引：6

作者：

Jafari, Omid ^{[1
]}

Nagarkar, Parth ^{[1
]}

机构：

[1] New Mexico State Univ, Las Cruces, NM 88003 USA

来源：

DATABASES THEORY AND APPLICATIONS (ADC 2021) | 2021年 / 12610卷

关键词：

Locality Sensitive Hashing; High-dimensional spaces; Approximate nearest neighbor; EFFICIENT; LSH;

D O I：

10.1007/978-3-030-69377-0_6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many multimedia retrieval applications. Exact tree-based approaches are known to suffer from the notorious curse of dimensionality for high-dimensional data. Approximate searching techniques sacrifice some accuracy while returning good enough results for faster performance. Locality Sensitive Hashing (LSH) is a popular technique for finding approximate nearest neighbors. There are two main benefits of LSH techniques: they provide theoretical guarantees on the query results, and they are highly scalable. The most dominant costs for existing external memory-based LSH techniques are algorithm time and index I/Os required to find candidate points. Existing works do not compare both of these costs in their evaluation. In this experimental survey paper, we show the impact of both these costs on the overall performance. We compare three state-of-the-art techniques on six real-world datasets, and show the importance of comparing these costs to achieve a more fair comparison.

引用

页码：62 / 73

页数：12

共 25 条

[1] Arora A., 2018, VLDB
[2] Efficient Indexing of Billion-Scale datasets of deep descriptors
Babenko, Artem
Lempitsky, Victor
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2055 - 2063
[3] Bawa Mayank, 2005, P 14 INT C WORLD WID, P651, DOI [DOI 10.1145/1060745.1060840, 10.1145/1060745.1060840]
[4] Searching in metric spaces
Chávez, E
Navarro, G
BaezaYates, R
Marroquín, JL
[J]. ACM COMPUTING SURVEYS, 2001, 33 (03) : 273 - 321
[5] Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning
Danziger, Samuel A.
Baronio, Roberta
Ho, Lydia
Hall, Linda
Salmon, Kirsty
Hatfield, G. Wesley
Kaiser, Peter
Lathrop, Richard H.
[J]. PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (09)
[6] Datar Mayur, 2004, P 20 ACM S COMP GEOM
[7] Gan Junhao, 2012, P 2012 ACM SIGMOD IN, P541, DOI [10.1145/2213836.2213898, DOI 10.1145/2213836.2213898]
[8] Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518
[9] Huang Q, 2015, PROC VLDB ENDOW, V9, P1
[10] Product Quantization for Nearest Neighbor Search
Jegou, Herve
Douze, Matthijs
Schmid, Cordelia
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (01) : 117 - 128

← 1 2 3 →