Learning from crowdsourced labeled data: a survey

被引:0
作者
Jing Zhang
Xindong Wu
Victor S. Sheng
机构
[1] Nanjing University of Science and Technology,School of Computer Science and Engineering
[2] Hefei University of Technology,School of Computer Science and Information Engineering
[3] University of Central Arkansas,Department of Computer Science
[4] Nanjing University of Information Science and Technology,Jiangsu Engineering Center of Network Monitoring
来源
Artificial Intelligence Review | 2016年 / 46卷
关键词
Crowdsourcing; Learning from crowds; Multiple noisy labeling; Label quality; Learning model quality; Ground truth inference;
D O I
暂无
中图分类号
学科分类号
摘要
With the rapid growing of crowdsourcing systems, quite a few applications based on a supervised learning paradigm can easily obtain massive labeled data at a relatively low cost. However, due to the variable uncertainty of crowdsourced labelers, learning procedures face great challenges. Thus, improving the qualities of labels and learning models plays a key role in learning from the crowdsourced labeled data. In this survey, we first introduce the basic concepts of the qualities of labels and learning models. Then, by reviewing recently proposed models and algorithms on ground truth inference and learning models, we analyze connections and distinctions among these techniques as well as clarify the level of the progress of related researches. In order to facilitate the studies in this field, we also introduce open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools. Finally, some potential issues for future studies are discussed.
引用
收藏
页码:543 / 576
页数:33
相关论文
共 137 条
[1]  
Allahbakhsh M(2013)Quality control in crowdsourcing systems: issues and directions IEEE Internet Comput 2 76-81
[2]  
Benatallah B(1997)Handbook of numerical analysis Spectr Methods 5 209-485
[3]  
Ignjatovic A(2001)Random forests Mach Learn 45 5-32
[4]  
Motahari-Nezhad HR(1999)Identifying mislabeled training data J Artif Intell Res 11 131-167
[5]  
Bertino E(2011)Crowdsourcing for search evaluation ACM Sigir Forum ACM 44 17-22
[6]  
Dustdar S(1979)Maximum likelihood estimation of observer error-rates using the em algorithm Appl Stat 28 20-28
[7]  
Bernardi C(2011)Crowdsourcing systems on the world-wide web Commun ACM 54 86-96
[8]  
Maday Y(2014)Classification in the presence of label noise: a survey IEEE Trans Neural Netw Learn Syst 25 845-869
[9]  
Breiman L(2001)An adaptive version of the boost by majority algorithm Mach Learn 43 293-318
[10]  
Brodley CE(2013)A survey on instance selection for active learning Knowl Inf Syst 35 249-283