Modality-Invariant Asymmetric Networks for Cross-Modal Hashing

被引:75
作者
Zhang, Zheng [1 ,2 ]
Luo, Haoyang [3 ]
Zhu, Lei [4 ]
Lu, Guangming [5 ]
Shen, Heng Tao [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[4] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250358, Peoples R China
[5] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 610054, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Binary codes; Electronic mail; Training; Representation learning; Measurement; Feature extraction; Deep asymmetric learning; modality-alignment network; binary code learning; cross-modal hashing;
D O I
10.1109/TKDE.2022.3144352
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal hashing has garnered considerable attention and gained great success in many cross-media similarity search applications due to its prominent computational efficiency and low storage overhead. However, it still remains challenging how to effectively take multilevel advantages of semantics on the entire database to jointly bridge the semantic and heterogeneity gaps across different modalities. In this paper, we propose a novel Modality-Invariant Asymmetric Networks (MIAN) architecture, which explores the asymmetric intra- and inter-modal similarity preservation under a probabilistic modality alignment framework. Specifically, an intra-modal asymmetric network is conceived to capture the query-vs-all internal pairwise similarities for each modality in a probabilistic asymmetric learning manner. Moreover, an inter-modal asymmetric network is deployed to fully harness the cross-modal semantic similarities supported by the maximum inner product search formula between two distinct hash embeddings. Particularly, the pairwise, piecewise and transformed semantics are jointly considered into one unified semantic-preserving hash codes learning scheme. Furthermore, we construct a modality alignment network to distill the redundancy-free visual features and maximize the conditional bottleneck information between different modalities. Such a network could close the heterogeneity and domain shift across different modalities and enable it to yield discriminative modality-invariant hash codes. Extensive experiments evidence that our MIAN approach can outperform the state-of-the-art cross-modal hashing methods.
引用
收藏
页码:5091 / 5104
页数:14
相关论文
共 61 条
[1]  
Alemi A. A., 2018, PROC UNCERTAINTY DEE
[2]  
Alemi A. A., 2016, P INT C LEARN REPR T
[3]   Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle [J].
Amjad, Rana Ali ;
Geiger, Bernhard C. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (09) :2225-2239
[4]  
[Anonymous], 2013, P ACM SIGMOD INT C M, DOI DOI 10.1145/2463676.2465274
[5]  
[Anonymous], 2018, PROC INT C LEARN REP
[6]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[7]   SCRATCH: A Scalable Discrete Matrix Factorization Hashing Framework for Cross-Modal Retrieval [J].
Chen, Zhen-Duo ;
Li, Chuan-Xiang ;
Luo, Xin ;
Nie, Liqiang ;
Zhang, Wei ;
Xu, Xin-Shun .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (07) :2262-2275
[8]  
Chua T. -S., 2009, P ACM INT C IM VID R, P1
[9]  
Cong Bai, 2020, ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval, P525, DOI 10.1145/3372278.3390711
[10]  
Dai B, 2018, PR MACH LEARN RES, V80