LongReMix: Robust learning with high confidence samples in a noisy label environment

被引:60
作者
Cordeiro, Filipe R. [3 ]
Sachdeva, Ragav [2 ]
Belagiannis, Vasileios [5 ]
Reid, Ian [1 ]
Carneiro, Gustavo [1 ,4 ]
机构
[1] Australian Inst Machine Learning, Sch Comp Sci, Adelaide, Australia
[2] Univ Oxford, Dept Engn Sci, Visual Geometry Grp, Oxford, England
[3] Univ Fed Rural Pernambuco, Dept Comp, Visual Comp Lab, Recife, Brazil
[4] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, England
[5] Otto Guericke Univ Magdeburg, Magdeburg, Germany
基金
澳大利亚研究理事会;
关键词
Noisy label learning; Deep learning; Empirical vicinal risk; Semi-supervised learning;
D O I
10.1016/j.patcog.2022.109013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art noisy-label learning algorithms rely on an unsupervised learning to classify training sam-ples as clean or noisy, followed by a semi-supervised learning (SSL) that minimises the empirical vicinal risk using a labelled set formed by samples classified as clean, and an unlabelled set with samples clas-sified as noisy. The classification accuracy of such noisy-label learning methods depends on the precision of the unsupervised classification of clean and noisy samples, and the robustness of SSL to small clean sets. We address these points with a new noisy-label training algorithm, called LongReMix, which im-proves the precision of the unsupervised classification of clean and noisy samples and the robustness of SSL to small clean sets with a two-stage learning process. The stage one of LongReMix finds a small but precise high-confidence clean set, and stage two augments this high-confidence clean set with new clean samples and oversamples the clean data to increase the robustness of SSL to small clean sets. We test LongReMix on CIFAR-10 and CIFAR-10 0 with introduced synthetic noisy labels, and the real-world noisy -label benchmarks CNWL (Red Mini-ImageNet), WebVision, Clothing1M, and Food101-N. The results show that our LongReMix produces significantly better classification accuracy than competing approaches, par-ticularly in high noise rate problems. Furthermore, our approach achieves state-of-the-art performance in most datasets. The code is available at https://github.com/filipe-research/LongReMix .(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 62 条
[1]   ReLaB: Reliable Label Bootstrapping for Semi-Supervised Learning [J].
Albert, Paul ;
Ortego, Diego ;
Arazo, Eric ;
O'Connor, Noel ;
McGuinness, Kevin .
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[2]  
Arazo E., 2020, IEEE IJCNN, P1
[3]  
Arazo E, 2019, PR MACH LEARN RES, V97
[4]  
Berthelot D, 2019, ADV NEUR IN, V32
[5]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[6]  
Chen Pengfei, 2019, P MACHINE LEARNING R, V97
[7]  
Chen Tianlong, 2020, NEURIPS, V33
[8]   Semi-supervised Deep Learning with Memory [J].
Chen, Yanbei ;
Zhu, Xiatian ;
Gong, Shaogang .
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :275-291
[9]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[10]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848