Label similarity-based weighted soft majority voting and pairing for crowdsourcing

被引:0
作者
Fangna Tao
Liangxiao Jiang
Chaoqun Li
机构
[1] China University of Geosciences,School of Computer Science
[2] China University of Geosciences,School of Mathematics and Physics
来源
Knowledge and Information Systems | 2020年 / 62卷
关键词
Crowdsourcing; Label aggregation; Specific quality; Overall quality; Label similarity;
D O I
暂无
中图分类号
学科分类号
摘要
Crowdsourcing services provide an efficient and relatively inexpensive approach to obtain substantial amounts of labeled data by employing crowd workers. It is obvious that the labeling qualities of crowd workers directly affect the quality of the labeled data. However, existing label aggregation strategies seldom consider the differences in the quality of workers labeling different instances. In this paper, we argue that a single worker may even have different labeling qualities on different instances. Based on this premise, we propose four new strategies by assigning different weights to workers when labeling different instances. In our proposed strategies, we first use the similarity among worker labels to estimate the specific quality of the worker on different instances, and then we build a classifier to estimate the overall quality of the worker across all instances. Finally, we combine these two qualities to define the weight of the worker labeling a particular instance. Extensive experimental results show that our proposed strategies significantly outperform other existing state-of-the-art label aggregation strategies.
引用
收藏
页码:2521 / 2538
页数:17
相关论文
共 48 条
[1]  
Dawid AP(1979)Maximum likelihood estimation of observer error-rates using the EM algorithm J R Stat Soc Ser C (Appl Stat) 28 20-28
[2]  
Skene AM(2014)Repeated labeling using multiple noisy labelers Data Min Knowl Discov 28 402-441
[3]  
Ipeirotis PG(2019)Class-specific attribute weighted naive bayes Pattern Recognit 88 321-330
[4]  
Provost FJ(2014)Budget-optimal task allocation for reliable crowdsourcing systems Oper Res 62 1-24
[5]  
Sheng VS(2019)Noise correction to improve data and model quality for crowdsourcing Eng Appl Artif Intell 82 184-191
[6]  
Wang J(2016)Noise filtering to improve data and model quality for crowdsourcing Knowl Based Syst 107 96-103
[7]  
Jiang L(2010)Learning from crowds J Mach Learn Res 11 1297-1322
[8]  
Zhang L(2019)Majority voting and pairing with multiple noisy labeling IEEE Trans Knowl Data Eng 31 1355-1368
[9]  
Liangjun Y(2019)Max-margin majority voting for learning from crowds IEEE Trans Pattern Anal Mach Intell 41 2480-2494
[10]  
Wang D(2020)Class-specific attribute value weighting for naive bayes Inf Sci 508 260-274