LongReMix: Robust learning with high confidence samples in a noisy label environment

被引:45
作者
Cordeiro, Filipe R. [3 ]
Sachdeva, Ragav [2 ]
Belagiannis, Vasileios [5 ]
Reid, Ian [1 ]
Carneiro, Gustavo [1 ,4 ]
机构
[1] Australian Inst Machine Learning, Sch Comp Sci, Adelaide, Australia
[2] Univ Oxford, Dept Engn Sci, Visual Geometry Grp, Oxford, England
[3] Univ Fed Rural Pernambuco, Dept Comp, Visual Comp Lab, Recife, Brazil
[4] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, England
[5] Otto Guericke Univ Magdeburg, Magdeburg, Germany
基金
澳大利亚研究理事会;
关键词
Noisy label learning; Deep learning; Empirical vicinal risk; Semi-supervised learning;
D O I
10.1016/j.patcog.2022.109013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art noisy-label learning algorithms rely on an unsupervised learning to classify training sam-ples as clean or noisy, followed by a semi-supervised learning (SSL) that minimises the empirical vicinal risk using a labelled set formed by samples classified as clean, and an unlabelled set with samples clas-sified as noisy. The classification accuracy of such noisy-label learning methods depends on the precision of the unsupervised classification of clean and noisy samples, and the robustness of SSL to small clean sets. We address these points with a new noisy-label training algorithm, called LongReMix, which im-proves the precision of the unsupervised classification of clean and noisy samples and the robustness of SSL to small clean sets with a two-stage learning process. The stage one of LongReMix finds a small but precise high-confidence clean set, and stage two augments this high-confidence clean set with new clean samples and oversamples the clean data to increase the robustness of SSL to small clean sets. We test LongReMix on CIFAR-10 and CIFAR-10 0 with introduced synthetic noisy labels, and the real-world noisy -label benchmarks CNWL (Red Mini-ImageNet), WebVision, Clothing1M, and Food101-N. The results show that our LongReMix produces significantly better classification accuracy than competing approaches, par-ticularly in high noise rate problems. Furthermore, our approach achieves state-of-the-art performance in most datasets. The code is available at https://github.com/filipe-research/LongReMix .(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 62 条
  • [1] ReLaB: Reliable Label Bootstrapping for Semi-Supervised Learning
    Albert, Paul
    Ortego, Diego
    Arazo, Eric
    O'Connor, Noel
    McGuinness, Kevin
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [2] Arazo E., 2020, IEEE IJCNN
  • [3] Arazo E, 2019, PR MACH LEARN RES, V97
  • [4] Berthelot D, 2019, ADV NEUR IN, V32
  • [5] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
  • [6] Chen Pengfei, 2019, P MACHINE LEARNING R, V97
  • [7] Chen Tianlong, 2020, Advances in Neural Information Processing Systems, V33
  • [8] Semi-supervised Deep Learning with Memory
    Chen, Yanbei
    Zhu, Xiatian
    Gong, Shaogang
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 275 - 291
  • [9] Demsar J, 2006, J MACH LEARN RES, V7, P1
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848