Label augmented and weighted majority voting for crowdsourcing

被引:45
作者
Chen, Ziqi [1 ]
Jiang, Liangxiao [1 ,2 ]
Li, Chaoqun [3 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[2] Minist Educ, Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
[3] China Univ Geosci, Sch Math & Phys, Wuhan 430074, Peoples R China
关键词
Crowdsourcing learning; Label integration; Label augmentation; Label weighting; Majority voting; MODEL QUALITY; IMPROVING DATA;
D O I
10.1016/j.ins.2022.05.066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Crowdsourcing provides an efficient way to obtain multiple noisy labels from different crowd workers for each unlabeled instance. Label integration methods are designed to infer the unknown true label of each instance from its multiple noisy label set. We argue that when the label quality is higher than random classification, the more the number of labels, the better the performance of label integration methods. However, in real-world crowd-sourcing scenarios, each instance cannot obtain enough labels for saving costs. To solve this problem, this paper proposes a novel label integration method called label augmented and weighted majority voting (LAWMV). At first, LAWMV uses the K-nearest neighbors (KNN) algorithm to find each instance's K-nearest neighbors (including itself) and merges their multiple noisy label sets to obtain its augmented multiple noisy label set. Then, the labels from different neighbors are weighted by the distances and the label similarities between each instance and its neighbors. Finally, the integrated label of each instance is inferred by weighted majority voting (MV). The experimental results on 34 simulated and two real-world crowdsourced datasets show that LAWMV significantly outperforms all the other state-of-the-art label integration methods. (C) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:397 / 409
页数:13
相关论文
共 42 条
[1]  
[Anonymous], 2009, Advances in Neural Information Processing Systems
[2]  
[Anonymous], 1993, C4.5: Programs of machine learning
[3]  
[Anonymous], 2008, P 14 ACM SIGKDD INT
[4]   Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data? [J].
Buhrmester, Michael ;
Kwang, Tracy ;
Gosling, Samuel D. .
PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2011, 6 (01) :3-5
[5]   CONAN: A framework for detecting and handling collusion in crowdsourcing [J].
Chen, Pengpeng ;
Sun, Hailong ;
Fang, Yili ;
Liu, Xudong .
INFORMATION SCIENCES, 2020, 515 (515) :44-63
[6]  
Dawid A. P., 1979, J ROY STAT SOC C, V28, P20
[7]  
Demartini G, 2012, P 21 INT C WORLD WID, P469, DOI DOI 10.1145/2187836.2187900
[8]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[9]   Improving data and model quality in crowdsourcing using co-training-based noise correction [J].
Dong, Yu ;
Jiang, Liangxiao ;
Li, Chaoqun .
INFORMATION SCIENCES, 2022, 583 :174-188
[10]   Learning From Crowds With Multiple Noisy Label Distribution Propagation [J].
Jiang, Liangxiao ;
Zhang, Hao ;
Tao, Fangna ;
Li, Chaoqun .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) :6558-6568