Symmetric Cross Entropy for Robust Learning with Noisy Labels

被引:730
作者
Wang, Yisen [1 ]
Ma, Xingjun [2 ]
Chen, Zaiyi [3 ]
Luo, Yuan [1 ]
Yi, Jinfeng [4 ]
Bailey, James [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Univ Melbourne, Melbourne, Vic, Australia
[3] Cainiao AI, Hangzhou, Peoples R China
[4] JD AI, Nanjing, Peoples R China
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
D O I
10.1109/ICCV.2019.00041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training accurate deep neural networks (DNNs) in the presence of noisy labels is an important and challenging task. Though a number of approaches have been proposed for learning with noisy labels, many open issues remain. In this paper, we show that DNN learning with Cross Entropy (CE) exhibits overfitting to noisy labels on some classes ("easy" classes), but more surprisingly, it also suffers from significant under learning on some other classes ("hard" classes). Intuitively, CE requires an extra term to facilitate learning of hard classes, and more importantly, this term should be noise tolerant, so as to avoid overfitting to noisy labels. Inspired by the symmetric KL-divergence, we propose the approach of Symmetric cross entropy Learning (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE). Our proposed SL approach simultaneously addresses both the under learning and overfitting problem of CE in the presence of noisy labels. We provide a theoretical analysis of SL and also empirically show, on a range of benchmark and real-world datasets, that SL outperforms state-of-the-art methods. We also show that SL can be easily incorporated into existing methods in order to further enhance their performance.
引用
收藏
页码:322 / 330
页数:9
相关论文
共 29 条
[1]   Residue Theorem based soft sliding mode control for wind power generation systems [J].
Alsumiri M. ;
Li L. ;
Jiang L. ;
Tang W. .
Protection and Control of Modern Power Systems, 2018, 3 (01)
[2]  
[Anonymous], 2015, Arxiv.Org, DOI DOI 10.3389/FPSYG.2013.00124
[3]  
Arpit Devansh, 2017, ARXIV PREPRINT ARXIV
[4]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5]  
Ghosh Aritra, 2017, P AAAI C ARTIFICIAL
[6]  
Goldberger J., 2017, INT C LEARNING REPRE
[7]   Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels [J].
Han, Bo ;
Yao, Quanming ;
Yu, Xingrui ;
Niu, Gang ;
Xu, Miao ;
Hu, Weihua ;
Tsang, Ivor W. ;
Sugiyama, Masashi .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[8]  
Krizhevsky A., 2009, Tech. Rep. TR-2009, P1
[9]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324
[10]  
Lee Kuang-Huei, 2017, ARXIV171107131