Deep k-NN for Noisy Labels

被引：0

作者：

Bahri, Dara ^{[1
]}

Jiang, Heinrich ^{[1
]}

Gupta, Maya ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 | 2020年 / 119卷

关键词：

CONSISTENCY; RATES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modern machine learning models are often trained on examples with noisy labels that hurt performance and are hard to identify. In this paper, we provide an empirical study showing that a simple k-nearest neighbor-based filtering approach on the logit layer of a preliminary model can remove mislabeled training data and produce more accurate models than many recently proposed methods. We also provide new statistical guarantees into its efficacy.

引用

页数：11

共 45 条

[1]

Amid E, 2019, ADV NEUR IN, V32

[2]

[Anonymous], 2009, Advances in Neural Information Processing Systems

[3]

[Anonymous], 1951, Tech. rep.

[4]

[Anonymous], 2014, INT C LEARN REPR ICL

[5] Fast learning rates for plug-in classifiers [J].

Audibert, Jean-Yves ;

Tsybakov, Alexandre B. .

ANNALS OF STATISTICS, 2007, 35 (02) :608-633

[6]

Brodley C. E., 1999, J ARTIFICIAL INTELLI

[7] Support Vector Machines with the Ramp Loss and the Hard Margin Loss [J].

Brooks, J. Paul .

OPERATIONS RESEARCH, 2011, 59 (02) :467-479

[8]

Chaudhuri K., 2014, Adv. in Neural Inform. Process. Syst., V27

[9] Data Cleaning: Overview and Emerging Challenges [J].

Chu, Xu ;

Ilyas, Ihab F. ;

Krishnan, Sanjay ;

Wang, Jiannan .

SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :2201-2206

[10]

Cover T. M., 1968, P HAW INT C SYST SCI, P413

← 1 2 3 4 5 →