Unsupervised Multi-modal Hashing for Cross-Modal Retrieval

被引:8
作者
Yu, Jun [1 ,2 ]
Wu, Xiao-Jun [1 ,2 ]
Zhang, Donglin [1 ,2 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Jiangsu, Peoples R China
[2] Jiangnan Univ, Jiangsu Prov Engn Lab Pattern Recognit & Computat, Wuxi 214122, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal hashing; Cross-modal retrieval; Unsupervised learning; Manifold preserving; SIMILARITY;
D O I
10.1007/s12559-021-09847-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosive growth of multimedia data on the Internet has magnified the challenge of information retrieval. Multimedia data usually emerges in different modalities, such as image, text, video, and audio. Unsupervised cross-modal hashing techniques that support searching among multi-modal data have gained importance in large-scale retrieval tasks because of the advantage of low storage cost and high efficiency. Current methods learn the hash functions by transforming high-dimensional data into discrete hash codes. However, the original manifold structure and semantic correlation are not preserved well in compact hash codes. We propose a novel unsupervised cross-modal hashing method to cope with this problem from two perspectives. On the one hand, the semantic correlation in textual space and the locally geometric structure in the visual space are reconstructed by unified hashing features seamlessly and simultaneously. On the other hand, the l(2,1)-norm penalties are imposed on the projection matrices separately to learn the relevant and discriminative hash codes. The experimental results indicate that our proposed method achieves an improvement of 1%, 6%, 9%, and 2% over the best comparison method on the four publicly available datasets (WiKi, PASCAL-VOC, UCI Handwritten Digit, and NUS-WIDE), respectively. In conclusion, the proposed framework which combines hash functions learning and multimodal graph embedding is effective in learning hash codes and achieves superior retrieval performance compared to state-of-the-art methods.
引用
收藏
页码:1159 / 1171
页数:13
相关论文
共 41 条
[1]   Beyond diffusion process: Neighbor set similarity for fast re-ranking [J].
Bai, Xiang ;
Bai, Song ;
Wang, Xinggang .
INFORMATION SCIENCES, 2015, 325 :342-354
[2]   Cross-Modal Hamming Hashing [J].
Cao, Yue ;
Liu, Bin ;
Long, Mingsheng ;
Wang, Jianmin .
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :207-223
[3]   Noise-robust dictionary learning with slack block-Diagonal structure for face recognition [J].
Chen, Zhe ;
Wu, Xiao-Jun ;
Yin, He-Feng ;
Kittler, Josef .
PATTERN RECOGNITION, 2020, 100
[4]  
Chua Tat-Seng., 2009, CIVR, DOI DOI 10.1145/1646396.1646452
[5]   Scalable Deep Hashing for Large-Scale Social Image Retrieval [J].
Cui, Hui ;
Zhu, Lei ;
Li, Jingjing ;
Yang, Yang ;
Nie, Liqiang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :1271-1284
[6]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[7]   Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval [J].
Gong, Yunchao ;
Lazebnik, Svetlana ;
Gordo, Albert ;
Perronnin, Florent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (12) :2916-2929
[8]   Learning to Hash With Optimized Anchor Embedding for Scalable Retrieval [J].
Guo, Yuchen ;
Ding, Guiguang ;
Liu, Li ;
Han, Jungong ;
Shao, Ling .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (03) :1344-1354
[9]  
He L., 2020, PATTERN RECOGN, V107, P1
[10]   Cross-Modal Subspace Learning via Pairwise Constraints [J].
He, Ran ;
Zhang, Man ;
Wang, Liang ;
Ji, Ye ;
Yin, Qiyue .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5543-5556