Learning Cross-Modal Retrieval with Noisy Labels

被引:73
作者
Hu, Peng [1 ,2 ]
Peng, Xi [1 ]
Zhu, Hongyuan [2 ]
Zhen, Liangli [3 ]
Lin, Jie [2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore
[3] Agcy Sci Technol & Res, Inst High Performance Comp, Singapore, Singapore
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
国家重点研发计划;
关键词
HASHING NETWORK;
D O I
10.1109/CVPR46437.2021.00536
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently; cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expensive and time-consuming, and not to mention the additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon's Mechanical Turk, can be utilized to mitigate the labeling cost, but leading to the unavoidable noise in labels for the non-expert annotating. To tackle the challenge, this paper presents a general Multimodal Robust Learning framework (MRL) for learning with multimodal noisy labels to mitigate noisy samples and correlate distinct modalities simultaneously. To be specific, we propose a Robust Clustering loss (RC) to make the deep networks focus on clean samples instead of noisy ones. Besides, a simple yet effective multimodal loss function, called Multimodal Contrastive loss (MC), is proposed to maximize the mutual information between different modalities, thus alleviating the interference of noisy samples and cross-modal discrepancy. Extensive experiments are conducted on four widely-used multimodal datasets to demonstrate the effectiveness of the proposed approach by comparing to 14 state-of-the-art methods.
引用
收藏
页码:5399 / 5409
页数:11
相关论文
共 63 条
  • [51] Wang WR, 2015, PR MACH LEARN RES, V37, P1083
  • [52] Symmetric Cross Entropy for Robust Learning with Noisy Labels
    Wang, Yisen
    Ma, Xingjun
    Chen, Zaiyi
    Luo, Yuan
    Yi, Jinfeng
    Bailey, James
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 322 - 330
  • [53] Cross-Modal Retrieval With CNN Visual Features: A New Baseline
    Wei, Yunchao
    Zhao, Yao
    Lu, Canyi
    Wei, Shikui
    Liu, Luoqi
    Zhu, Zhenfeng
    Yan, Shuicheng
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (02) : 449 - 460
  • [54] Unsupervised Feature Learning via Non-Parametric Instance Discrimination
    Wu, Zhirong
    Xiong, Yuanjun
    Yu, Stella X.
    Lin, Dahua
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3733 - 3742
  • [55] Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval
    Xie, De
    Deng, Cheng
    Li, Chao
    Liu, Xianglong
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3626 - 3637
  • [56] Yan Y, 2016, AAAI CONF ARTIF INTE, P2244
  • [57] Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval
    Yu, En
    Sun, Jiande
    Li, Jing
    Chang, Xiaojun
    Han, Xian-Hua
    Hauptmann, Alexander G.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) : 1276 - 1288
  • [58] Yu X., 2019, ICML, P7164
  • [59] Zhang Changqing, 2019, ADV NEURAL INFORM PR, P559
  • [60] ZHANG CX, 2018, INT C LEARN REPR, V86, P136, DOI DOI 10.1002/PROT.25414