Active learning with confidence-based answers for crowdsourcing labeling tasks

被引:28
作者
Song, Jinhua [1 ]
Wang, Hao [1 ]
Gao, Yang [1 ]
An, Bo [2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Engn, Blk N4-02c-110,Nanyang Ave, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Confidence-based answer; Active learning; Crowdsourcing; Labeling task; BETA-REGRESSION;
D O I
10.1016/j.knosys.2018.07.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Collecting labels for data is important for many practical applications (e.g., data mining). However, this process can be expensive and time-consuming since it needs extensive efforts of domain experts. To decrease the cost, many recent works combine crowdsourcing, which outsources labeling tasks (usually in the form of questions) to a large group of non-expert workers, and active learning, which actively selects the best instances to be labeled, to acquire labeled datasets. However, for difficult tasks where workers are uncertain about their answers, asking for discrete labels might lead to poor performance due to the low-quality labels. In this paper, we design questions to get continuous worker responses which are more informative and contain workers' labels as well as their confidence. As crowd workers may make mistakes, multiple workers are hired to answer each question. Then, we propose a new aggregation method to integrate the responses. By considering workers' confidence information, the accuracy of integrated labels is improved. Furthermore, based on the new answers, we propose a novel active learning framework to iteratively select instances for "labeling". We define a score function for instance selection by combining the uncertainty derived from the classifier model and the uncertainty derived from the answer sets. The uncertainty derived from uncertain answers is more effective than that derived from labels. We also propose batch methods which select multiple instances at a time to further improve the efficiency of our approach. Experimental studies on both simulated and real data show that our methods are effective in increasing the labeling accuracy and achieve significantly better performance than existing methods.
引用
收藏
页码:244 / 258
页数:15
相关论文
共 41 条
[1]  
[Anonymous], 1993, INTRO BOOTSTRAP
[2]  
[Anonymous], 2013, Bootstrap methods and their application
[3]  
[Anonymous], 2010, ACTIVE LEARNING LIT
[4]  
[Anonymous], 2008, P 17 ACM C INFORM KN
[5]  
[Anonymous], 2017, KNOWL-BASED SYST, DOI DOI 10.1016/j.knosys.2017.09.032
[6]  
Breiman L., 2001, Machine Learning, V45, P5
[7]   Using Crowdsourcing and Active Learning to Track Sentiment in Online Media [J].
Brew, Anthony ;
Greene, Derek ;
Cunningham, Padraig .
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 :145-150
[8]  
Brinker K, 2003, P 20 INT C MACH LEAR, P59
[9]  
Chen X, 2015, J MACH LEARN RES, V16, P1
[10]  
Donmez P, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P259