Soft Contrastive Cross-Modal Retrieval

被引:0
作者
Song, Jiayu [1 ]
Hu, Yuxuan [1 ]
Zhu, Lei [2 ]
Zhang, Chengyuan [3 ]
Zhang, Jian [1 ]
Zhang, Shichao [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
[2] Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Peoples R China
[3] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 05期
基金
中国国家自然科学基金;
关键词
cross-modal retrieval; soft contrastive learning; smooth label learning; common subspace; deep learning; NEURAL-NETWORKS; REPRESENTATION;
D O I
10.3390/app14051944
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image-text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines.
引用
收藏
页数:18
相关论文
共 61 条
[1]   CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding [J].
Afham, Mohamed ;
Dissanayake, Isuru ;
Dissanayake, Dinithi ;
Dharmasiri, Amaya ;
Thilakarathna, Kanchana ;
Rodrigo, Ranga .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9892-9902
[2]  
Andrew G., 2013, ICML
[3]   Unsupervised Learning [J].
Barlow, H. B. .
NEURAL COMPUTATION, 1989, 1 (03) :295-311
[4]   Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction [J].
Cai, Shaofei ;
Wang, Zihao ;
Ma, Xiaojian ;
Liu, Anji ;
Liang, Yitao .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :13734-13744
[5]   Deep Visual-Semantic Hashing for Cross-Modal Retrieval [J].
Cao, Yue ;
Long, Mingsheng ;
Wang, Jianmin ;
Yang, Qiang ;
Yu, Philip S. .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1445-1454
[6]   Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens [J].
Chen, Yuxiao ;
Yuan, Jianbo ;
Tian, Yu ;
Geng, Shijie ;
Li, Xinyu ;
Zhou, Ding ;
Metaxas, Dimitris N. ;
Yang, Hongxia .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :15095-15104
[7]  
Chua T.-S., 2009, P ACM CIVR, P1
[8]   Probabilistic Embeddings for Cross-Modal Retrieval [J].
Chun, Sanghyuk ;
Oh, Seong Joon ;
de Rezende, Rafael Sampaio ;
Kalantidis, Yannis ;
Larlus, Diane .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8411-8420
[9]   On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval [J].
Costa Pereira, Jose ;
Coviello, Emanuele ;
Doyle, Gabriel ;
Rasiwasia, Nikhil ;
Lanckriet, Gert R. G. ;
Levy, Roger ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :521-535
[10]   A Survey on Multimodal Large Language Models for Autonomous Driving [J].
Cui, Can ;
Ma, Yunsheng ;
Cao, Xu ;
Ye, Wenqian ;
Zhou, Yang ;
Liang, Kaizhao ;
Chen, Jintai ;
Lu, Juanwu ;
Yang, Zichong ;
Liao, Kuei-Da ;
Gao, Tianren ;
Li, Erlong ;
Tang, Kun ;
Cao, Zhipeng ;
Zhou, Tong ;
Liu, Ao ;
Yan, Xinrui ;
Mei, Shuqi ;
Cao, Jianguo ;
Wang, Ziran ;
Zheng, Chao .
2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, :958-979