Learning Cross-Modal Retrieval with Noisy Labels

被引:73
作者
Hu, Peng [1 ,2 ]
Peng, Xi [1 ]
Zhu, Hongyuan [2 ]
Zhen, Liangli [3 ]
Lin, Jie [2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore
[3] Agcy Sci Technol & Res, Inst High Performance Comp, Singapore, Singapore
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
国家重点研发计划;
关键词
HASHING NETWORK;
D O I
10.1109/CVPR46437.2021.00536
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently; cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expensive and time-consuming, and not to mention the additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon's Mechanical Turk, can be utilized to mitigate the labeling cost, but leading to the unavoidable noise in labels for the non-expert annotating. To tackle the challenge, this paper presents a general Multimodal Robust Learning framework (MRL) for learning with multimodal noisy labels to mitigate noisy samples and correlate distinct modalities simultaneously. To be specific, we propose a Robust Clustering loss (RC) to make the deep networks focus on clean samples instead of noisy ones. Besides, a simple yet effective multimodal loss function, called Multimodal Contrastive loss (MC), is proposed to maximize the mutual information between different modalities, thus alleviating the interference of noisy samples and cross-modal discrepancy. Extensive experiments are conducted on four widely-used multimodal datasets to demonstrate the effectiveness of the proposed approach by comparing to 14 state-of-the-art methods.
引用
收藏
页码:5399 / 5409
页数:11
相关论文
共 63 条
  • [61] Triterpenoids with α-glucosidase inhibitory activity from Artemisia argyi
    Zhang, Lai-Bin
    Chang, Jia-Jing
    Guo, Li-Min
    Lv, Jie-Li
    [J]. JOURNAL OF ASIAN NATURAL PRODUCTS RESEARCH, 2020, 22 (03) : 241 - 248
  • [62] Deep Supervised Cross-modal Retrieval
    Zhen, Liangli
    Hu, Peng
    Wang, Xu
    Peng, Dezhong
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10386 - 10395
  • [63] Zhou Tao, 2020, P IEEE CVF C COMP VI, P10277