Soft Contrastive Cross-Modal Retrieval

被引:0
作者
Song, Jiayu [1 ]
Hu, Yuxuan [1 ]
Zhu, Lei [2 ]
Zhang, Chengyuan [3 ]
Zhang, Jian [1 ]
Zhang, Shichao [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
[2] Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Peoples R China
[3] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 05期
基金
中国国家自然科学基金;
关键词
cross-modal retrieval; soft contrastive learning; smooth label learning; common subspace; deep learning; NEURAL-NETWORKS; REPRESENTATION;
D O I
10.3390/app14051944
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image-text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines.
引用
收藏
页数:18
相关论文
共 61 条
[41]  
Sarukkai V, 2024, PROC IEEECVF WINTER, P4208
[42]  
Shawe-Taylor, 2010, P C DAT MIN DAT WAR, P1
[43]  
Simonyan K, 2015, Arxiv, DOI arXiv:1409.1556
[44]   Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval [J].
Song, Yale ;
Soleymani, Mohammad .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1979-1988
[45]   Efficient Processing of Deep Neural Networks: A Tutorial and Survey [J].
Sze, Vivienne ;
Chen, Yu-Hsin ;
Yang, Tien-Ju ;
Emer, Joel S. .
PROCEEDINGS OF THE IEEE, 2017, 105 (12) :2295-2329
[46]   Siamese Image Modeling for Self-Supervised Vision Representation Learning [J].
Tao, Chenxin ;
Zhu, Xizhou ;
Su, Weijie ;
Huang, Gao ;
Li, Bin ;
Zhou, Jie ;
Qiao, Yu ;
Wang, Xiaogang ;
Dai, Jifeng .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :2132-2141
[47]  
Tian Yonglong, 2020, Advances in Neural Information Processing Systems, V33
[48]   Adversarial Cross-Modal Retrieval [J].
Wang, Bokun ;
Yang, Yang ;
Xu, Xing ;
Hanjalic, Alan ;
Shen, Heng Tao .
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, :154-162
[49]  
Wang WR, 2015, PR MACH LEARN RES, V37, P1083
[50]   Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning [J].
Wang, Xun ;
Han, Xintong ;
Huang, Weiling ;
Dong, Dengke ;
Scott, Matthew R. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5017-5025