Learning to Distribute Vocabulary Indexing for Scalable Visual Search

被引:80
|
作者
Ji, Rongrong [1 ]
Duan, Ling-Yu [1 ]
Chen, Jie [1 ]
Xie, Lexing [2 ]
Yao, Hongxun [3 ]
Gao, Wen [1 ]
机构
[1] Peking Univ, Inst Digital Media, Beijing 100871, Peoples R China
[2] Australian Natl Univ, Sch Comp Sci, Canberra, ACT 0200, Australia
[3] Harbin Inst Technol, Dept Comp Sci, Harbin 150001, Peoples R China
基金
美国国家科学基金会;
关键词
Distributed search; inverted indexing; parallel computing; visual search; visual vocabulary;
D O I
10.1109/TMM.2012.2225035
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there is an ever-increasing research focus on Bag-of-Words based near duplicate visual search paradigm with inverted indexing. One fundamental yet unexploited challenge is how to maintain the large indexing structures within a single server subject to its memory constraint, which is extremely hard to scale up to millions or even billions of images. In this paper, we propose to parallelize the near duplicate visual search architecture to index millions of images over multiple servers, including the distribution of both visual vocabulary and the corresponding indexing structure. We optimize the distribution of vocabulary indexing from a machine learning perspective, which provides a "memory light" search paradigm that leverages the computational power across multiple servers to reduce the search latency. Especially, our solution addresses two essential issues: "What to distribute" and "How to distribute". "What to distribute" is addressed by a "lossy" vocabulary Boosting, which discards both frequent and indiscriminating words prior to distribution. "How to distribute" is addressed by learning an optimal distribution function, which maximizes the uniformity of assigning the words of a given query to multiple servers. We validate the distributed vocabulary indexing scheme in a real world location search system over 10 million landmark images. Comparing to the state-of-the-art alternatives of single-server search [5], [6], [16] and distributed search [23], our scheme has yielded a significant gain of about 200% speedup at comparable precision by distributing only 5% words. We also report excellent robustness even when partial servers crash.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 50 条
  • [1] VOCABULARY TREE INCREMENTAL INDEXING FOR SCALABLE LOCATION RECOGNITION
    Ji, Rongrong
    Xie, Xing
    Yao, Hongxun
    Wo, Yongjian
    Ma, Wei-Ying
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 869 - +
  • [2] The visual indexing vocabulary:: Developing a thesaurus for indexing images across diverse domains
    Jörgensen, C
    ASIST 2004: PROCEEDINGS OF THE 67TH ASIS&T ANNUAL MEETING, VOL 41, 2004: MANAGING AND ENHANCING INFORMATION: CULTURES AND CONFLICTS, 2004, 41 : 287 - 293
  • [3] Learning to Distribute Queries into Web Search Nodes
    Mendoza, Marcelo
    Marin, Mauricio
    Ferrarotti, Flavio
    Poblete, Barbara
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 281 - 292
  • [4] Scalable Multimodal Search with Distributed Indexing by Sparse Hashing
    Mourao, Andre
    Magalhaes, Joao
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 283 - 290
  • [5] Toward Scalable Indexing and Search on Distributed and Unstructured Data
    Orhean, Alexandru Iulian
    Ijagbone, Itua
    Raicu, Ioan
    Chard, Kyle
    Zhao, Dongfang
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 31 - 38
  • [6] Vocabulary Hierarchy Optimization and Transfer for Scalable Image Search
    Ji, Rongrong
    Yao, Hongxun
    Xie, Xing
    Tian, Qi
    IEEE MULTIMEDIA, 2011, 18 (03) : 66 - 76
  • [7] VisMed: A visual vocabulary approach for medical image indexing and retrieval
    Lim, JH
    Chevallet, JP
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 84 - 96
  • [8] LEARNING VOCABULARY WITH VISUAL HELPS
    不详
    JOURNAL OF READING, 1987, 31 (03): : 279 - 279
  • [9] Adaptive vocabulary forests for dynamic indexing and category learning
    Yeh, Tom
    Lee, John
    Darrell, Trevor
    2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 1759 - 1766
  • [10] Efficient Indexing for Large Scale Visual Search
    Zhang, Xiao
    Li, Zhiwei
    Zhang, Lei
    Ma, Wei-Ying
    Shum, Heung-Yeung
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 1103 - 1110