Collective Reconstructive Embeddings for Cross-Modal Hashing

被引:126
作者
Hu, Mengqiu [1 ,2 ]
Yang, Yang [1 ,2 ]
Shen, Fumin [1 ,2 ]
Xie, Ning [1 ,2 ]
Hong, Richang [3 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China
[3] Hefei Univ Technol, Sch Comp & Informat, Hefei 230009, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal hashing; reconstructive embeddings; cross-modal retrieval; BINARY-CODES; QUANTIZATION; ROBUST;
D O I
10.1109/TIP.2018.2890144
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of cross-modal retrieval by hashing-based approximate nearest neighbor search techniques. Most existing cross-modal hashing works mainly address the issue of multi-modal integration complexity using the same mapping and similarity calculation for data from different media types. Nonetheless, this may cause information loss during the mapping process due to overlooking the specifics of each individual modality. In this paper, we propose a simple yet effective cross-modal hashing approach, termed collective reconstructive embeddings (CRE), which can simultaneously solve the heterogeneity and integration complexity of multi-modal data. To address the heterogeneity challenge, we propose to process heterogeneous types of data using different modality-specific models. Specifically, we model textual data with cosine similarity-based reconstructive embedding to alleviate the data sparsity to the greatest extent, while for image data, we utilize the Euclidean distance to characterize the relationships of the projected hash codes. Meanwhile, we unify the projections of text and image to the Hamming space into a common reconstructive embedding through rigid mathematical reformulation, which not only reduces the optimization complexity significantly but also facilitates the inter-modal similarity preservation among different modalities. We further incorporate the code balance and uncorrelation criteria into the problem and devise an efficient iterative algorithm for optimization. Comprehensive experiments on four widely used multimodal benchmarks show that the proposed CRE can achieve a superior performance compared with the state of the art on several challenging cross-modal tasks.
引用
收藏
页码:2770 / 2784
页数:15
相关论文
共 70 条
[1]   Describing Video With Attention-Based Bidirectional LSTM [J].
Bin, Yi ;
Yang, Yang ;
Shen, Fumin ;
Xie, Ning ;
Shen, Heng Tao ;
Li, Xuelong .
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (07) :2631-2641
[2]  
Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928
[3]   Deep Visual-Semantic Hashing for Cross-Modal Retrieval [J].
Cao, Yue ;
Long, Mingsheng ;
Wang, Jianmin ;
Yang, Qiang ;
Yu, Philip S. .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1445-1454
[4]  
Chua T.-S., 2009, P ACM INT C IM VID R
[5]   On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval [J].
Costa Pereira, Jose ;
Coviello, Emanuele ;
Doyle, Gabriel ;
Rasiwasia, Nikhil ;
Lanckriet, Gert R. G. ;
Levy, Roger ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :521-535
[6]   Triplet-Based Deep Hashing Network for Cross-Modal Retrieval [J].
Deng, Cheng ;
Chen, Zhaojia ;
Liu, Xianglong ;
Gao, Xinbo ;
Tao, Dacheng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) :3893-3903
[7]   Collective Matrix Factorization Hashing for Multimodal Data [J].
Ding, Guiguang ;
Guo, Yuchen ;
Zhou, Jile .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2083-2090
[8]  
Gong Y., 2012, NIPS, P1196
[9]   Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval [J].
Gong, Yunchao ;
Lazebnik, Svetlana ;
Gordo, Albert ;
Perronnin, Florent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (12) :2916-2929
[10]   Canonical correlation analysis: An overview with application to learning methods [J].
Hardoon, DR ;
Szedmak, S ;
Shawe-Taylor, J .
NEURAL COMPUTATION, 2004, 16 (12) :2639-2664