Query-aware locality-sensitive hashing scheme for lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_p$$\end{document} norm

被引:0
作者
Qiang Huang
Jianlin Feng
Qiong Fang
Wilfred Ng
Wei Wang
机构
[1] Sun Yat-Sen Univeisity,School of Data and Computer Science
[2] South China University of Technology,School of Software Engineering
[3] Hong Kong University of Science and Technology,Department of Computer Science and Engineering
[4] The University of New South Wales,School of Computer Science and Engineering
关键词
Query-aware; Locality-sensitive hashing; Norm; Nearest neighbor search; -Stable distributions;
D O I
10.1007/s00778-017-0472-7
中图分类号
学科分类号
摘要
The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes to tackle the c-ANN search problem. Traditionally, LSH functions are constructed in a query-oblivious manner, in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c \ge 2$$\end{document}. In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the “anchor” for bucket partition. Accordingly, a query-aware LSH function under a specific lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_p$$\end{document} norm with p∈(0,2]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p \in (0,2]$$\end{document} is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. The query-aware bucket partitioning strategy can be easily implemented so that query performance is guaranteed. For each lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_p$$\end{document} norm (p∈(0,2])\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p \in (0,2])$$\end{document}, based on the corresponding p-stable distribution, we propose a novel LSH scheme named query-aware LSH (QALSH) for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c>1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c > 1$$\end{document}. In addition, we propose a heuristic variant named QALSH+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^+$$\end{document} to improve the scalability of QALSH. Extensive experiments show that QALSH and QALSH+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^+$$\end{document} outperform the state-of-the-art schemes, especially in high-dimensional space. Specifically, by using a ratio c<2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c < 2$$\end{document}, QALSH can achieve much better query quality.
引用
收藏
页码:683 / 708
页数:25
相关论文
共 48 条
[1]  
Chambers JM(1976)A method for simulating stable random variables JASA 71 340-344
[2]  
Mallows CL(1999)Similarity search in high dimensions via hashing VLDB 99 518-529
[3]  
Stuck B(2012)Approximate nearest neighbor: towards removing the curse of dimensionality Theory Comput. 8 321-350
[4]  
Gionis A(1963)Probability inequalities for sums of bounded random variables JASA 58 13-30
[5]  
Indyk P(2015)Query-aware locality-sensitive hashing for approximate nearest neighbor search Proc. VLDB 9 1-12
[6]  
Motwani R(2005)idistance: An adaptive b+-tree based indexing method for nearest neighbor search TODS 30 364-397
[7]  
Har-Peled S(2011)Product quantization for nearest neighbor search IEEE TPAMI 33 117-128
[8]  
Indyk P(1997)The sr-tree: an index structure for high-dimensional nearest neighbor queries ACM SIGMOD Rec. 26 369-380
[9]  
Motwani R(1994)The tv-tree: an index structure for high-dimensional data VLDBJ 3 517-542
[10]  
Hoeffding W(2014)Sk-lsh: an efficient index structure for approximate nearest neighbor search Proc. VLDB 7 745-756