Learning to Distribute Vocabulary Indexing for Scalable Visual Search

被引:80
作者
Ji, Rongrong [1 ]
Duan, Ling-Yu [1 ]
Chen, Jie [1 ]
Xie, Lexing [2 ]
Yao, Hongxun [3 ]
Gao, Wen [1 ]
机构
[1] Peking Univ, Inst Digital Media, Beijing 100871, Peoples R China
[2] Australian Natl Univ, Sch Comp Sci, Canberra, ACT 0200, Australia
[3] Harbin Inst Technol, Dept Comp Sci, Harbin 150001, Peoples R China
基金
美国国家科学基金会;
关键词
Distributed search; inverted indexing; parallel computing; visual search; visual vocabulary;
D O I
10.1109/TMM.2012.2225035
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there is an ever-increasing research focus on Bag-of-Words based near duplicate visual search paradigm with inverted indexing. One fundamental yet unexploited challenge is how to maintain the large indexing structures within a single server subject to its memory constraint, which is extremely hard to scale up to millions or even billions of images. In this paper, we propose to parallelize the near duplicate visual search architecture to index millions of images over multiple servers, including the distribution of both visual vocabulary and the corresponding indexing structure. We optimize the distribution of vocabulary indexing from a machine learning perspective, which provides a "memory light" search paradigm that leverages the computational power across multiple servers to reduce the search latency. Especially, our solution addresses two essential issues: "What to distribute" and "How to distribute". "What to distribute" is addressed by a "lossy" vocabulary Boosting, which discards both frequent and indiscriminating words prior to distribution. "How to distribute" is addressed by learning an optimal distribution function, which maximizes the uniformity of assigning the words of a given query to multiple servers. We validate the distributed vocabulary indexing scheme in a real world location search system over 10 million landmark images. Comparing to the state-of-the-art alternatives of single-server search [5], [6], [16] and distributed search [23], our scheme has yielded a significant gain of about 200% speedup at comparable precision by distributing only 5% words. We also report excellent robustness even when partial servers crash.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 43 条
[1]  
[Anonymous], 2009, IEEE TPAMI
[2]  
[Anonymous], P INT C COMP VIS
[3]  
[Anonymous], 2009, P INT C COMP VIS
[4]  
[Anonymous], 2010, P IEEE C COMP VIS PA
[5]  
[Anonymous], 2006, P IEEE COMPUTER SOC
[6]  
[Anonymous], P EUR C COMP VIS
[7]  
[Anonymous], 2007, CVPR
[8]  
[Anonymous], P IEEE C COMP VIS PA
[9]   Distributed query processing using partitioned inverted files [J].
Badue, C ;
Ribeiro-Neto, B ;
Baeza-Yates, R ;
Ziviani, N .
EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, :10-20
[10]   Web search for a planet:: The Google cluster architecture [J].
Barroso, LA ;
Dean, J ;
Hölzle, U .
IEEE MICRO, 2003, 23 (02) :22-28