Multilevel Deep Semantic Feature Asymmetric Network for Cross-Modal Hashing Retrieval

被引:0
作者
Jiang, Xiaolong [1 ]
Fan, Jiabao [1 ]
Zhang, Jie [1 ]
Lin, Ziyong [1 ]
Li, Mingyong [1 ]
机构
[1] Chongqing Normal Univ, Sch Comp Technol & Informat Sci, Chongqing 401331, Peoples R China
关键词
Semantics; Feature extraction; Data mining; Accuracy; Deep learning; Hash functions; Binary codes; Cross-modal hashing; cross-modal retrieval; mul-feature method; graph convolutional network;
D O I
10.1109/TLA.2024.10620388
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal hash retrieval has been widely applied due to its efficiency and low storage overhead. In the domain of supervised cross-modal hash retrieval, existing methods exhibit limitations in refining data features, leading to insufficiently detailed semantic information extraction and inaccurate reflection of data similarity. The challenge lies in utilizing multi-level deep semantic features of the data to generate more refined hash representations, thereby reducing the semantic gap and heterogeneity caused by different modalities. To address this challenging problem, we propose a multilevel deep semantic feature asymmetric network structure (MDSAN). Firstly, this architecture explores the multilevel deep features of the data, generating more accurate hash representations under richer supervised information guidance. Secondly, we investigate the preservation of asymmetric similarity within and between different modalities, allowing for a more comprehensive utilization of the multilevel deep features to bridge the gap among diverse modal data. Our network architecture effectively enhances model accuracy and robustness. Extensive experiments on three datasets validate the significant improvement advantages of the MDSAN model structure compared to current methods.
引用
收藏
页码:621 / 631
页数:11
相关论文
共 37 条
[1]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[2]  
Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928
[3]  
Can Y., 2018, P EUR C COMP VIS ECC, p202 218, DOI [10.1007/978-3-030-01246-5-13, DOI 10.1007/978-3-030-01246-5-13]
[4]   Local Graph Convolutional Networks for Cross-Modal Hashing [J].
Chen, Yudong ;
Wang, Sen ;
Lu, Jianglin ;
Chen, Zhi ;
Zhang, Zheng ;
Huang, Zi .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1921-1928
[5]   Scalable Deep Hashing for Large-Scale Social Image Retrieval [J].
Cui, Hui ;
Zhu, Lei ;
Li, Jingjing ;
Yang, Yang ;
Nie, Liqiang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :1271-1284
[6]   Two-Stream Deep Hashing With Class-Specific Centers for Supervised Image Search [J].
Deng, Cheng ;
Yang, Erkun ;
Liu, Tongliang ;
Tao, Dacheng .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (06) :2189-2201
[7]   Triplet-Based Deep Hashing Network for Cross-Modal Retrieval [J].
Deng, Cheng ;
Chen, Zhaojia ;
Liu, Xianglong ;
Gao, Xinbo ;
Tao, Dacheng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) :3893-3903
[8]   Collective Matrix Factorization Hashing for Multimodal Data [J].
Ding, Guiguang ;
Guo, Yuchen ;
Zhou, Jile .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2083-2090
[9]  
Dosovitskiy A., ARXIV
[10]   Deep learning for visual understanding: A review [J].
Guo, Yanming ;
Liu, Yu ;
Oerlemans, Ard ;
Lao, Songyang ;
Wu, Song ;
Lew, Michael S. .
NEUROCOMPUTING, 2016, 187 :27-48