Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

被引:6
作者
Jafari, Omid [1 ]
Nagarkar, Parth [1 ]
机构
[1] New Mexico State Univ, Las Cruces, NM 88003 USA
来源
DATABASES THEORY AND APPLICATIONS (ADC 2021) | 2021年 / 12610卷
关键词
Locality Sensitive Hashing; High-dimensional spaces; Approximate nearest neighbor; EFFICIENT; LSH;
D O I
10.1007/978-3-030-69377-0_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many multimedia retrieval applications. Exact tree-based approaches are known to suffer from the notorious curse of dimensionality for high-dimensional data. Approximate searching techniques sacrifice some accuracy while returning good enough results for faster performance. Locality Sensitive Hashing (LSH) is a popular technique for finding approximate nearest neighbors. There are two main benefits of LSH techniques: they provide theoretical guarantees on the query results, and they are highly scalable. The most dominant costs for existing external memory-based LSH techniques are algorithm time and index I/Os required to find candidate points. Existing works do not compare both of these costs in their evaluation. In this experimental survey paper, we show the impact of both these costs on the overall performance. We compare three state-of-the-art techniques on six real-world datasets, and show the importance of comparing these costs to achieve a more fair comparison.
引用
收藏
页码:62 / 73
页数:12
相关论文
共 25 条
  • [1] Arora A., 2018, VLDB
  • [2] Efficient Indexing of Billion-Scale datasets of deep descriptors
    Babenko, Artem
    Lempitsky, Victor
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2055 - 2063
  • [3] Bawa Mayank, 2005, P 14 INT C WORLD WID, P651, DOI [DOI 10.1145/1060745.1060840, 10.1145/1060745.1060840]
  • [4] Searching in metric spaces
    Chávez, E
    Navarro, G
    BaezaYates, R
    Marroquín, JL
    [J]. ACM COMPUTING SURVEYS, 2001, 33 (03) : 273 - 321
  • [5] Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning
    Danziger, Samuel A.
    Baronio, Roberta
    Ho, Lydia
    Hall, Linda
    Salmon, Kirsty
    Hatfield, G. Wesley
    Kaiser, Peter
    Lathrop, Richard H.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (09)
  • [6] Datar Mayur, 2004, P 20 ACM S COMP GEOM
  • [7] Gan Junhao, 2012, P 2012 ACM SIGMOD IN, P541, DOI [10.1145/2213836.2213898, DOI 10.1145/2213836.2213898]
  • [8] Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518
  • [9] Huang Q, 2015, PROC VLDB ENDOW, V9, P1
  • [10] Product Quantization for Nearest Neighbor Search
    Jegou, Herve
    Douze, Matthijs
    Schmid, Cordelia
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (01) : 117 - 128