Deep fused two-step cross-modal hashing with multiple semantic supervision

被引:0
作者
Peipei Kang
Zehang Lin
Zhenguo Yang
Alexander M. Bronstein
Qing Li
Wenyin Liu
机构
[1] Guangdong University of Technology,School of Computer Science and Technology
[2] Technion - Israel Institute of Technology,Computer Science Department
[3] Hong Kong Polytechnic University,Department of Computing
来源
Multimedia Tools and Applications | 2022年 / 81卷
关键词
Cross-modal hashing; Deep fusion network; Semantic reconstruction; Two-step learning; Supervised learning;
D O I
暂无
中图分类号
学科分类号
摘要
Existing cross-modal hashing methods ignore the informative multimodal joint information and cannot fully exploit the semantic labels. In this paper, we propose a deep fused two-step cross-modal hashing (DFTH) framework with multiple semantic supervision. In the first step, DFTH learns unified hash codes for instances by a fusion network. Semantic label and similarity reconstruction have been introduced to acquire binary codes that are informative, discriminative and semantic similarity preserving. In the second step, two modality-specific hash networks are learned under the supervision of common hash codes reconstruction, label reconstruction, and intra-modal and inter-modal semantic similarity reconstruction. The modality-specific hash networks can generate semantic preserving binary codes for out-of-sample queries. To deal with the vanishing gradients of binarization, continuous differentiable tanh is introduced to approximate the discrete sign function, making the networks able to back-propagate by automatic gradient computation. Extensive experiments on MIRFlickr25K and NUS-WIDE show the superiority of DFTH over state-of-the-art methods.
引用
收藏
页码:15653 / 15670
页数:17
相关论文
共 24 条
[1]  
Ding G(2016)Large-scale cross-modality search via collective matrix factorization hashing TIP 25 5427-5440
[2]  
Guo Y(2018)Deep binary reconstruction for cross-modal hashing TMM 21 973-985
[3]  
Zhou J(2017)An overview of cross-media retrieval: concepts, methodologies, benchmarks, and challenges TCSVT 28 2372-2385
[4]  
Gao Y(2014)Statistical quantization for similarity search Comput Vis Image Underst 124 22-30
[5]  
Hu D(2017)Learning discriminative binary codes for large-scale cross-modal retrieval TIP 26 2494-2507
[6]  
Nie F(2020)Learning shared semantic space with correlation alignment for cross-modal event retrieval TOMM 16 1-22
[7]  
Li X(undefined)undefined undefined undefined undefined-undefined
[8]  
Peng Y(undefined)undefined undefined undefined undefined-undefined
[9]  
Huang X(undefined)undefined undefined undefined undefined-undefined
[10]  
Zhao Y(undefined)undefined undefined undefined undefined-undefined