Fast spectral analysis for approximate nearest neighbor search

被引:1
作者
Wang, Jing [1 ]
Shen, Jie [2 ]
机构
[1] Amazon, New York, NY 10001 USA
[2] Stevens Inst Technol, Hoboken, NJ 07030 USA
关键词
Approximate nearest neighbor search; Spectral analysis; Hashing; Noise; Subspace; QUANTIZATION; ALGORITHMS;
D O I
10.1007/s10994-021-06124-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In large-scale machine learning, of central interest is the problem of approximate nearest neighbor (ANN) search, where the goal is to query particular points that are close to a given object under certain metric. In this paper, we develop a novel data-driven ANN search algorithm where the data structure is learned by fast spectral technique based on s landmarks selected by approximate ridge leverage scores. We show that with overwhelming probability, our algorithm returns the (1 + epsilon/4)-ANN for any approximation parameter epsilon is an element of (0, 1). A remarkable feature of our algorithm is that it is computationally efficient. Specifically, learning k-length hash codes requires O((s(3) + ns(2)) log n) running time and O(d(2)) extra space, and returning the (1 + epsilon/4)-ANN of the query needs O(k log n) running time. The experimental results on computer vision and natural language understanding tasks demonstrate the significant advantage of our algorithm compared to state-of-the-art methods.
引用
收藏
页码:2297 / 2322
页数:26
相关论文
共 50 条
[31]  
Liu W, 2011, SER INF MANAGE SCI, V10, P1
[32]  
Liu Wei, 2014, P INT C NEUR INF PRO
[33]   VECTOR QUANTIZATION IN SPEECH CODING [J].
MAKHOUL, J ;
ROUCOS, S ;
GISH, H .
PROCEEDINGS OF THE IEEE, 1985, 73 (11) :1551-1588
[34]  
Mitzenmacher M., 2017, Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis
[35]  
Musco, 2015, ADV NEURAL INFORM PR, P1396
[36]  
Musco C, 2017, ADV NEUR IN, V30
[37]  
Pennington J., 2014, P 2014 C EMP METH NA, P1532, DOI 10.3115/v1/D14-1162
[38]  
Rajpurkar P., 2016, P 2016 C EMPIRICAL M, DOI DOI 10.18653/V1/D16-1264
[39]  
Reimers N, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P567
[40]  
Samet Hanan, 2006, Foundations of multidimensional and metric data structures