SLADE: A Smart Large-Scale Task Decomposer in Crowdsourcing

被引:81
作者
Tong, Yongxin [1 ,2 ]
Chen, Lei [3 ]
Zhou, Zimu [4 ]
Jagadish, H. V. [5 ]
Shou, Lidan [6 ]
Lv, Weifeng [1 ,2 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Sch Comp Sci & Engn, Beijing 100083, Peoples R China
[2] Beihang Univ, Int Res Inst Multidisciplinary Sci, Beijing 100083, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
[4] Swiss Fed Inst Technol, Comp Engn & Networks Lab TIK, CH-8092 Zurich, Switzerland
[5] Univ Michigan, Dept Elect Engn & Comp Sci, 2260 Hayward Ave, Ann Arbor, MI 48109 USA
[6] Zhejiang Univ, Key Lab CAD & CG, Hangzhou 310027, Zhejiang, Peoples R China
关键词
Crowdsourcing; task decomposition;
D O I
10.1109/TKDE.2018.2797962
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowdsourcing has been shown to be effective in a wide range of applications, and is seeing increasing use. A large-scale crowdsourcing task often consists of thousands or millions of atomic tasks, each of which is usually a simple task such as binary choice or simple voting. To distribute a large-scale crowdsourcing task to limited crowd workers, a common practice is to pack a set of atomic tasks into a task bin and send to a crowd worker in a batch. It is challenging to decompose a large-scale crowdsourcing task and execute batches of atomic tasks, which ensures reliable answers at a minimal total cost. Large batches lead to unreliable answers of atomic tasks, while small batches incur unnecessary cost. In this paper, we investigate a general crowdsourcing task decomposition problem, called the Smart Large-scAle task DEcomposer (SLADE) problem, which aims to decompose a large-scale crowdsourcing task to achieve the desired reliability at a minimal cost. We prove the NP-hardness of the SLADE problem and propose solutions in both homogeneous and heterogeneous scenarios. For the homogeneous SLADE problem, where all the atomic tasks share the same reliability requirement, we propose a greedy heuristic algorithm and an efficient and effective approximation framework using an optimal priority queue (OPQ) structure with provable approximation ratio. For the heterogeneous SLADE problem, where the atomic tasks can have different reliability requirements, we extend the OPQ-based framework leveraging a partition strategy, and also prove its approximation guarantee. Finally, we verify the effectiveness and efficiency of the proposed solutions through extensive experiments on representative crowdsourcing platforms.
引用
收藏
页码:1588 / 1601
页数:14
相关论文
共 40 条
[1]  
Adamczyk P. D., 2014, P SIGCHI C HUM FACT, P271
[2]  
Amsterdamer Yael., 2013, P ACM SIGMOD INT C M, P241
[3]  
[Anonymous], 2011, P 2011 ACM SIGMOD IN
[4]  
[Anonymous], 2011, NIPS
[5]  
[Anonymous], 2011, P 2011 ACM SIGMOD IN
[6]  
[Anonymous], 2015, P 18 INT C EXT DAT T
[7]  
Cao CC, 2013, 19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), P455
[8]   Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services [J].
Cao, Caleb Chen ;
She, Jieying ;
Tong, Yongxin ;
Chen, Lei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (11) :1495-1506
[9]   A Survey of General-Purpose Crowdsourcing Techniques [J].
Chittilappilly, Anand Inasu ;
Chen, Lei ;
Amer-Yahia, Sihem .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (09) :2246-2266
[10]  
Cormen T. H., 2009, INTRO ALGORITHMS 3 V