Learning Cross-Modal Retrieval with Noisy Labels

被引：73

作者：

Hu, Peng ^{[1
,2
]}

Peng, Xi ^{[1
]}

Zhu, Hongyuan ^{[2
]}

Zhen, Liangli ^{[3
]}

Lin, Jie ^{[2
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore

[3] Agcy Sci Technol & Res, Inst High Performance Comp, Singapore, Singapore

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

国家重点研发计划;

关键词：

HASHING NETWORK;

D O I：

10.1109/CVPR46437.2021.00536

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently; cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expensive and time-consuming, and not to mention the additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon's Mechanical Turk, can be utilized to mitigate the labeling cost, but leading to the unavoidable noise in labels for the non-expert annotating. To tackle the challenge, this paper presents a general Multimodal Robust Learning framework (MRL) for learning with multimodal noisy labels to mitigate noisy samples and correlate distinct modalities simultaneously. To be specific, we propose a Robust Clustering loss (RC) to make the deep networks focus on clean samples instead of noisy ones. Besides, a simple yet effective multimodal loss function, called Multimodal Contrastive loss (MC), is proposed to maximize the mutual information between different modalities, thus alleviating the interference of noisy samples and cross-modal discrepancy. Extensive experiments are conducted on four widely-used multimodal datasets to demonstrate the effectiveness of the proposed approach by comparing to 14 state-of-the-art methods.

引用

页码：5399 / 5409

页数：11

共 63 条

[51] Wang WR, 2015, PR MACH LEARN RES, V37, P1083
[52] Symmetric Cross Entropy for Robust Learning with Noisy Labels
Wang, Yisen
Ma, Xingjun
Chen, Zaiyi
Luo, Yuan
Yi, Jinfeng
Bailey, James
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 322 - 330
[53] Cross-Modal Retrieval With CNN Visual Features: A New Baseline
Wei, Yunchao
Zhao, Yao
Lu, Canyi
Wei, Shikui
Liu, Luoqi
Zhu, Zhenfeng
Yan, Shuicheng
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (02) : 449 - 460
[54] Unsupervised Feature Learning via Non-Parametric Instance Discrimination
Wu, Zhirong
Xiong, Yuanjun
Yu, Stella X.
Lin, Dahua
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3733 - 3742
[55] Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval
Xie, De
Deng, Cheng
Li, Chao
Liu, Xianglong
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3626 - 3637
[56] Yan Y, 2016, AAAI CONF ARTIF INTE, P2244
[57] Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval
Yu, En
Sun, Jiande
Li, Jing
Chang, Xiaojun
Han, Xian-Hua
Hauptmann, Alexander G.
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) : 1276 - 1288
[58] Yu X., 2019, ICML, P7164
[59] Zhang Changqing, 2019, ADV NEURAL INFORM PR, P559
[60] ZHANG CX, 2018, INT C LEARN REPR, V86, P136, DOI DOI 10.1002/PROT.25414

← 1 2 3 4 5 6 7 →