Deep k-NN for Noisy Labels

被引:0
作者
Bahri, Dara [1 ]
Jiang, Heinrich [1 ]
Gupta, Maya [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 | 2020年 / 119卷
关键词
CONSISTENCY; RATES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern machine learning models are often trained on examples with noisy labels that hurt performance and are hard to identify. In this paper, we provide an empirical study showing that a simple k-nearest neighbor-based filtering approach on the logit layer of a preliminary model can remove mislabeled training data and produce more accurate models than many recently proposed methods. We also provide new statistical guarantees into its efficacy.
引用
收藏
页数:11
相关论文
共 45 条
[1]  
Amid E, 2019, ADV NEUR IN, V32
[2]  
[Anonymous], 2009, Advances in Neural Information Processing Systems
[3]  
[Anonymous], 1951, Tech. rep.
[4]  
[Anonymous], 2014, INT C LEARN REPR ICL
[5]   Fast learning rates for plug-in classifiers [J].
Audibert, Jean-Yves ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2007, 35 (02) :608-633
[6]  
Brodley C. E., 1999, J ARTIFICIAL INTELLI
[7]   Support Vector Machines with the Ramp Loss and the Hard Margin Loss [J].
Brooks, J. Paul .
OPERATIONS RESEARCH, 2011, 59 (02) :467-479
[8]  
Chaudhuri K., 2014, Adv. in Neural Inform. Process. Syst., V27
[9]   Data Cleaning: Overview and Emerging Challenges [J].
Chu, Xu ;
Ilyas, Ihab F. ;
Krishnan, Sanjay ;
Wang, Jiannan .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :2201-2206
[10]  
Cover T. M., 1968, P HAW INT C SYST SCI, P413