PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

被引:10
作者
Aumuller, Martin [1 ]
Christiani, Tobias [1 ]
Pagh, Rasmus [1 ,2 ]
Vesterli, Michael [1 ]
机构
[1] IT Univ Copenhagen, Copenhagen, Denmark
[2] BARC, Copenhagen, Denmark
来源
27TH ANNUAL EUROPEAN SYMPOSIUM ON ALGORITHMS (ESA 2019) | 2019年 / 144卷
基金
欧洲研究理事会;
关键词
Nearest Neighbor Search; Locality-Sensitive Hashing; Adaptive Similarity Search; SEARCH;
D O I
10.4230/LIPIcs.ESA.2019.10
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present PUFFINN, a parameterless LSH-based index for solving the k-nearest neighbor problem with probabilistic guarantees. By parameterless we mean that the user is only required to specify the amount of memory the index is supposed to use and the result quality that should be achieved. The index combines several heuristic ideas known in the literature. By small adaptions to the query algorithm, we make heuristics rigorous. We perform experiments on real-world and synthetic inputs to evaluate implementation choices and show that the implementation satisfies the quality guarantees while being competitive with other state-of-the-art approaches to nearest neighbor search. We describe a novel synthetic data set that is difficult to solve for almost all existing nearest neighbor search approaches, and for which PUFFINN significantly outperform previous methods.
引用
收藏
页数:16
相关论文
共 31 条
[21]  
Iwasaki Masajiro, 2018, ARXIV181007355
[22]   Theory and Applications of b-Bit Minwise Hashing [J].
Li, Ping ;
Koenig, Arnd Christian .
COMMUNICATIONS OF THE ACM, 2011, 54 (08) :101-109
[23]  
Mikolov Tomas, 2013, P 1 INT C LEARN REPR
[24]  
Muja M, 2009, VISAPP 2009: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 1, P331
[25]  
Pagh Rasmus, 2019, ENCY BIG DATA TECHNO, DOI [10.1007/978-3-319-63962-8_58-1, DOI 10.1007/978-3-319-63962-8_58-1]
[26]  
Pennington Jeffrey, 2014, P 2014 C EMP METH NA, P1532
[27]  
Rahimi A., 2007, Advances in Neural Information Processing Systems, P1177
[28]   Bayesian Locality Sensitive Hashing for Fast Similarity Search [J].
Satuluri, Venu ;
Parthasarathy, Srinivasan .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (05) :430-441
[29]  
Terasawa K, 2007, LECT NOTES COMPUT SC, V4619, P27
[30]  
YIANILOS PN, 1993, PROCEEDINGS OF THE FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P311