Label distribution similarity-based noise correction for crowdsourcing

被引:0
作者
Lijuan Ren
Liangxiao Jiang
Wenjun Zhang
Chaoqun Li
机构
[1] China University of Geosciences,School of Computer Science
[2] Ministry of Education,Key Laboratory of Artificial Intelligence
[3] China University of Geosciences,School of Mathematics and Physics
来源
Frontiers of Computer Science | 2024年 / 18卷
关键词
crowdsourcing learning; noise correction; label distribution similarity; kullback-leibler divergence;
D O I
暂无
中图分类号
学科分类号
摘要
In crowdsourcing scenarios, we can obtain each instance’s multiple noisy labels from different crowd workers and then infer its integrated label via label aggregation. In spite of the effectiveness of label aggregation methods, there still remains a certain level of noise in the integrated labels. Thus, some noise correction methods have been proposed to reduce the impact of noise in recent years. However, to the best of our knowledge, existing methods rarely consider an instance’s information from both its features and multiple noisy labels simultaneously when identifying a noise instance. In this study, we argue that the more distinguishable an instance’s features but the noisier its multiple noisy labels, the more likely it is a noise instance. Based on this premise, we propose a label distribution similarity-based noise correction (LDSNC) method. To measure whether an instance’s features are distinguishable, we obtain each instance’s predicted label distribution by building multiple classifiers using instances’ features and their integrated labels. To measure whether an instance’s multiple noisy labels are noisy, we obtain each instance’s multiple noisy label distribution using its multiple noisy labels. Then, we use the Kullback-Leibler (KL) divergence to calculate the similarity between the predicted label distribution and multiple noisy label distribution and define the instance with the lower similarity as a noise instance. The extensive experimental results on 34 simulated and four real-world crowdsourced datasets validate the effectiveness of our method.
引用
收藏
相关论文
共 80 条
[1]  
Jiang L(2019)A correlation-based feature weighting filter for naive bayes IEEE Transactions on Knowledge and Data Engineering 31 201-213
[2]  
Zhang L(2023)Instance difficulty-based noise correction for crowdsourcing Expert Systems with Applications 212 118794-762
[3]  
Li C(2022)Knowledge learning with crowdsourcing: a brief review and systematic perspective IEEE/CAA Journal of Automatica Sinica 9 749-1085
[4]  
Wu J(2016)Multi-class ground truth inference in crowdsourcing with clustering IEEE Transactions on Knowledge and Data Engineering 28 1080-6568
[5]  
Hu Y(2022)Learning from crowds with multiple noisy label distribution propagation IEEE Transactions on Neural Networks and Learning Systems 33 6558-409
[6]  
Jiang L(2022)Label augmented and weighted majority voting for crowdsourcing Information Sciences 606 397-28
[7]  
Li C(1979)Maximum likelihood estimation of observer error-rates using the Journal of the Royal Statistical Society Series C: Applied Statistics 28 20-1322
[8]  
Zhang J(2010) algorithm The Journal of Machine Learning Research 11 1297-2095
[9]  
Zhang J(2021)Learning from crowds IEEE Transactions on Knowledge and Data Engineering 33 2083-162
[10]  
Sheng V S(2016)Multi-label truth inference for crowdsourcing using mixture models Expert Systems with Applications 66 149-1688