Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph

被引:131
作者
Fu, Cong [1 ,2 ]
Xiang, Chao [1 ,2 ]
Wang, Changxu [1 ,2 ]
Cai, Deng [1 ,2 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, State Key Lab CAD&CG, Hangzhou, Zhejiang, Peoples R China
[2] Alibaba Zhejiang Univ, Fabu Inc, Joint Inst Frontier Technol, Hangzhou, Zhejiang, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 05期
关键词
PRODUCT QUANTIZATION; SMALL WORLD; ALGORITHM; LOCALITY;
D O I
10.14778/3303753.3303754
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial scenario of Taobao (Alibaba Group) and has been integrated into their billion-scale search engine.
引用
收藏
页码:461 / 474
页数:14
相关论文
共 42 条
[1]  
André F, 2015, PROC VLDB ENDOW, V9, P288
[2]  
[Anonymous], 2008, Introduction to information retrieval
[3]  
[Anonymous], 1990, P 1990 ACM SIGMOD IN, DOI DOI 10.1145/93597.98741
[4]  
Arjen P.de Vries., 2002, SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, P322
[5]   HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces [J].
Arora, Akhil ;
Sinha, Sakshi ;
Kumar, Piyush ;
Bhattacharya, Arnab .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (08) :906-919
[6]  
ARYA S, 1993, PROCEEDINGS OF THE FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P271
[7]  
AURENHAMMER F, 1991, COMPUT SURV, V23, P345, DOI 10.1145/116873.116880
[8]   Efficient Indexing of Billion-Scale datasets of deep descriptors [J].
Babenko, Artem ;
Lempitsky, Victor .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2055-2063
[9]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[10]   Navigability of complex networks [J].
Boguna, Marian ;
Krioukov, Dmitri ;
Claffy, K. C. .
NATURE PHYSICS, 2009, 5 (01) :74-80