Learning Cross-Modal Retrieval with Noisy Labels

被引：73

作者：

Hu, Peng ^{[1
,2
]}

Peng, Xi ^{[1
]}

Zhu, Hongyuan ^{[2
]}

Zhen, Liangli ^{[3
]}

Lin, Jie ^{[2
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore

[3] Agcy Sci Technol & Res, Inst High Performance Comp, Singapore, Singapore

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

国家重点研发计划;

关键词：

HASHING NETWORK;

D O I：

10.1109/CVPR46437.2021.00536

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently; cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expensive and time-consuming, and not to mention the additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon's Mechanical Turk, can be utilized to mitigate the labeling cost, but leading to the unavoidable noise in labels for the non-expert annotating. To tackle the challenge, this paper presents a general Multimodal Robust Learning framework (MRL) for learning with multimodal noisy labels to mitigate noisy samples and correlate distinct modalities simultaneously. To be specific, we propose a Robust Clustering loss (RC) to make the deep networks focus on clean samples instead of noisy ones. Besides, a simple yet effective multimodal loss function, called Multimodal Contrastive loss (MC), is proposed to maximize the mutual information between different modalities, thus alleviating the interference of noisy samples and cross-modal discrepancy. Extensive experiments are conducted on four widely-used multimodal datasets to demonstrate the effectiveness of the proposed approach by comparing to 14 state-of-the-art methods.

引用

页码：5399 / 5409

页数：11

共 63 条

[1] Andrew G., 2013, P INT C MACH LEARN, P1247
[2] Arpit D, 2017, PR MACH LEARN RES, V70
[3] Chen P., 2019, UNDERSTANDING UTILIZ, V97, P1062
[4] Chen T, 2020, PR MACH LEARN RES, V119
[5] Chua T.-S., 2009, P ACM INT C IM VID R
[6] Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
Deng, Cheng
Chen, Zhaojia
Liu, Xianglong
Gao, Xinbo
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 3893 - 3903
[7] Cross-modal Retrieval with Correspondence Autoencoder
Feng, Fangxiang
Wang, Xiaojie
Li, Ruifan
[J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
[8] Ghosh A, 2017, AAAI CONF ARTIF INTE, P1919
[9] Han Bo, 2018, ADV NEURAL INFORM PR, P5836
[10] Han Bo, 2018, ADV NEURAL INFORM PR, P8527

← 1 2 3 4 5 6 7 →