On the noise estimation statistics

被引:5
作者
Gao, Wei [1 ]
Zhang, Teng [1 ]
Yang, Bin-Bin [1 ]
Zhou, Zhi-Hua [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210093, Peoples R China
基金
美国国家科学基金会;
关键词
Machine learning; Classification; Random noise; Noise estimation; LABEL NOISE; CLASSIFICATION; CONSISTENCY;
D O I
10.1016/j.artint.2021.103451
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning with noisy labels has attracted much attention during the past few decades. A fundamental problem is how to estimate noise proportions from corrupted data. Previous studies on this issue resort to the estimations of class distributions, conditional distributions, or the kernel embedding of distributions. In this paper, we present another simple and effective approach for noise estimation. The basic idea is to utilize the first- and second-order statistics of observed data, and the positive semi-definiteness of covariance matrices. Then, an upper bound on noise estimation is provided without additional assumptions over data distribution. Based on this idea and using the locality property of random noise, we develop the Noise Estimation Statistics with Clusters (NESC) method, which firstly clusters the corrupted data by k-means algorithm, and then makes noise estimation from clusters based on the first- and second-order statistics. We present the existence, uniqueness and convergence analysis of our noise estimation, and empirical studies verify the effectiveness of the NESC method. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:21
相关论文
共 49 条
[1]  
Angluin D., 1988, Machine Learning, V2, P343, DOI 10.1007/BF00116829
[2]  
[Anonymous], 2008, P 25 INT C MACHINE L, DOI DOI 10.1145/1390156.1390190
[3]  
[Anonymous], 2006, AAAI
[4]   On the sample complexity of noise-tolerant learning [J].
Aslam, JA ;
Decatur, SE .
INFORMATION PROCESSING LETTERS, 1996, 57 (04) :189-195
[5]  
Bekker J, 2018, AAAI CONF ARTIF INTE, P2712
[6]   Correction of noisy labels via mutual consistency check [J].
Bhadra, Sahely ;
Hein, Matthias .
NEUROCOMPUTING, 2015, 160 :34-52
[7]   Classification with asymmetric label noise: Consistency and maximal denoising [J].
Blanchard, Gilles ;
Flaska, Marek ;
Handy, Gregory ;
Pozzi, Sara ;
Scott, Clayton .
ELECTRONIC JOURNAL OF STATISTICS, 2016, 10 (02) :2780-2824
[8]  
Blanchard G, 2010, J MACH LEARN RES, V11, P2973
[9]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[10]  
Brabham D.C., 2008, Convergence, V14, P75, DOI [10.1177/1354856507084420, DOI 10.1177/1354856507084420]