Trusted-Data-Guided Label Enhancement on Noisy Labels

被引:17
作者
Xu, Ning [1 ,2 ]
Li, Jia-Yu [1 ,2 ]
Liu, Yun-Peng [1 ,2 ]
Geng, Xin [1 ,2 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 211189, Peoples R China
[2] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 211189, Peoples R China
基金
美国国家科学基金会; 中国博士后科学基金;
关键词
Noise measurement; Training; Probabilistic logic; Labeling; Training data; Task analysis; Supervised learning; Label distribution learning (LDL); label enhancement (LE); noisy labels; trusted data;
D O I
10.1109/TNNLS.2022.3162316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Label distribution covers a certain number of labels, representing the degree to which each label describes the instance. Label enhancement (LE) is a procedure of recovering the label distribution from the logical labels in the training data, the purpose of which is to better depict the label ambiguity through label distribution. However, data annotation inevitably introduces label noise, and it is extremely challenging to implement LE on corrupted labels. To deal with this problem, one way to recover the label distribution from the corrupted labels is to be guided by a small batch of trusted data. In this article, a novel LE method named TALEN is proposed via recovering and progressively refining label distribution guided by trusted data. Specifically, an LE process is applied to the untrusted data to select samples with a clean label. In addition, a combined loss function is designed to train the predictive model for classification. Experiments on datasets with synthetic label noise validate the feasibility of identifying clean labels via the recovered label distribution. Furthermore, experimental results on both synthetic label noise and real-world label noise on image datasets and additional experiments on text datasets show a clear advantage of TALEN over several existing noise-robust learning methods.
引用
收藏
页码:9940 / 9951
页数:12
相关论文
共 71 条
[1]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[2]  
Chen Pengfei, 2019, P MACHINE LEARNING R, V97
[3]   Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition [J].
Chen, Shikai ;
Wang, Jianfeng ;
Chen, Yuedong ;
Shi, Zhongchao ;
Geng, Xin ;
Rui, Yong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13981-13990
[4]   Classification in the Presence of Label Noise: a Survey [J].
Frenay, Benoit ;
Verleysen, Michel .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) :845-869
[5]  
Gao BB, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P712
[6]   Deep Label Distribution Learning With Label Ambiguity [J].
Gao, Bin-Bin ;
Xing, Chao ;
Xie, Chen-Wei ;
Wu, Jianxin ;
Geng, Xin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (06) :2825-2838
[7]  
Gao YB, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3223
[8]   Label Distribution Learning [J].
Geng, Xin .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (07) :1734-1748
[9]   Multilabel Ranking with Inconsistent Rankers [J].
Geng, Xin ;
Luo, Longrun .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3742-3747
[10]   Head Pose Estimation Based on Multivariate Label Distribution [J].
Geng, Xin ;
Xia, Yu .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1837-1842