RONO: Robust Discriminative Learning with Noisy Labels for 2D-3D Cross-Modal Retrieval

被引:11
作者
Feng, Yanglin [1 ]
Zhu, Hongyuan [2 ]
Peng, Dezhong [1 ,3 ,4 ]
Peng, Xi [1 ]
Hu, Peng [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
[2] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore
[3] Sichuan Zhigian Technol Co Ltd, Chengdu, Peoples R China
[4] Chengdu Ruibei Yingte Informat Technol Co Ltd, Chengdu, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01117
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, with the advent of Metaverse and AI Generated Content, cross-modal retrieval becomes popular with a burst of 2D and 3D data. However, this problem is challenging given the heterogeneous structure and semantic discrepancies. Moreover, imperfect annotations are ubiquitous given the ambiguous 2D and 3D content, thus inevitably producing noisy labels to degrade the learning performance. To tackle the problem, this paper proposes a robust 2D-3D retrieval framework (RONO) to robustly learn from noisy multimodal data. Specifically, one novel Robust Discriminative Center Learning mechanism (RDCL) is proposed in RONO to adaptively distinguish clean and noisy samples for respectively providing them with positive and negative optimization directions, thus mitigating the negative impact of noisy labels. Besides, we present a Shared Space Consistency Learning mechanism (SSCL) to capture the intrinsic information inside the noisy data by minimizing the cross-modal and semantic discrepancy between common space and label space simultaneously. Comprehensive mathematical analyses are given to theoretically prove the noise tolerance of the proposed method. Furthermore, we conduct extensive experiments on four 3D-model multimodal datasets to verify the effectiveness of our method by comparing it with 15 state-of-the-art methods. Code is available at https://github.com/penghu-cs/RONO.
引用
收藏
页码:11610 / 11619
页数:10
相关论文
共 45 条
  • [1] Andrew G., 2013, Proceedings of the 30th International Conference on Machine Learning, P1247
  • [2] Arazo E, 2019, PR MACH LEARN RES, V97
  • [3] Arpit D, 2017, PR MACH LEARN RES, V70
  • [4] Bekker AJ, 2016, INT CONF ACOUST SPEE, P2682, DOI 10.1109/ICASSP.2016.7472164
  • [5] Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection
    Belhumeur, PN
    Hespanha, JP
    Kriegman, DJ
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (07) : 711 - 720
  • [6] Roboscan: a combined 2D and 3D vision system for improved speed and flexibility in pick-and-place operation
    Bellandi, Paolo
    Docchio, Franco
    Sansoni, Giovanna
    [J]. INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2013, 69 (5-8) : 1873 - 1886
  • [7] Webly Supervised Learning of Convolutional Networks
    Chen, Xinlei
    Gupta, Abhinav
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1431 - 1439
  • [8] Cong Bai, 2020, ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval, P525, DOI 10.1145/3372278.3390711
  • [9] Ghosh A, 2017, AAAI CONF ARTIF INTE, P1919
  • [10] Goodfellow Ian, 2020, Commu- nications of the ACM, V63, P3