Point-to-Hyperplane Nearest Neighbor Search Beyond the Unit Hypersphere

被引:9
作者
Huang, Qiang [1 ]
Lei, Yifan [1 ]
Tung, Anthony K. H. [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
来源
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2021年
基金
新加坡国家研究基金会;
关键词
Locality-Sensitive Hashing; Nearest Neighbor Search; Furthest Neighbor Search; Asymmetric Transformation; Active Learning; FURTHEST NEIGHBOR; APPROXIMATE; ALGORITHMS; QUERIES; OBJECT;
D O I
10.1145/3448016.3457240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Point-to-Hyperplane Nearest Neighbor Search (P2HNNS) is a fundamental yet challenging problem, and it has plenty of applications in various fields. Existing hyperplane hashing schemes enjoy sublinear query time and achieve excellent performance on applications such as large-scale active learning with Support Vector Machines (SVMs). However, they only conditionally deal with this problem with a strong assumption that all of the data objects are normalized, located at the unit hypersphere. Those hyperplane hashing schemes may be arbitrarily bad without this assumption. In this paper, we introduce a new asymmetric transformation and develop the first two provable hyperplane hashing schemes, Nearest Hyperplane hashing (NH) and Furthest Hyperplane hashing (FH), for high-dimensional P2HNNS beyond the unit hypersphere. With this asymmetric transformation, we demonstrate that the hash functions of NH and FH are locality-sensitive to the hyperplane queries, and both of them enjoy quality guarantee on query results. Moreover, we propose a data-dependent multi-partition strategy to boost the search performance of FH. NH can perform the hyperplane queries in sub-linear time, while FH enjoys a better practical performance. We evaluate NH and FH over five real-life datasets and show that we are around 3 similar to 100x faster than the best competitor in four out of five datasets, especially for the recall in [20%, 80%].
引用
收藏
页码:777 / 789
页数:13
相关论文
共 50 条
[1]  
Andoni A, 2006, ANN IEEE SYMP FOUND, P459
[2]  
Andoni A, 2015, ADV NEUR IN, V28
[3]   Optimal Data-Dependent Hashing for Approximate Near Neighbors [J].
Andoni, Alexandr ;
Razenshteyn, Ilya .
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :793-801
[4]  
[Anonymous], 2012, Theory of Computing
[5]  
[Anonymous], 2012, ICML
[6]   Distance-Sensitive Hashing [J].
Aumueller, Martin ;
Christiani, Tobias ;
Pagh, Rasmus ;
Silvestri, Francesco .
PODS'18: PROCEEDINGS OF THE 37TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2018, :89-104
[7]   Speeding Up the Xbox Recommender System Using a Euclidean Transformation for Inner-Product Spaces [J].
Bachrach, Yoram ;
Finkelstein, Yehuda ;
Gilad-Bachrach, Ran ;
Katzir, Liran ;
Koenigstein, Noam ;
Nice, Nir ;
Paquet, Ulrich .
PROCEEDINGS OF THE 8TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'14), 2014, :257-264
[8]  
BENTLEY JL, 1990, PROCEEDINGS OF THE SIXTH ANNUAL SYMPOSIUM ON COMPUTATIONAL GEOMETRY, P187, DOI 10.1145/98524.98564
[9]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[10]  
Charikar M. S., 2002, 34 ANN ACM S THEORY, P380, DOI DOI 10.1145/509907.509965