A game-based framework for crowdsourced data labeling

被引:5
|
作者
Yang, Jingru [1 ]
Fan, Ju [1 ]
Wei, Zhewei [1 ]
Li, Guoliang [2 ]
Liu, Tongyu [1 ]
Du, Xiaoyong [1 ]
机构
[1] Renmin Univ China, Beijing 100872, Peoples R China
[2] Tsinghua Univ, Beijing 100084, Peoples R China
关键词
Crowdsourcing; Data labeling; Labeling rules;
D O I
10.1007/s00778-020-00613-w
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data labeling, which assigns data with multiple classes, is indispensable for many applications, such as machine learning and data integration. However, existing labeling solutions either incur expensive cost for large datasets or produce noisy results. This paper introduces a cost-effective labeling approach and focuses on the labeling rule generation problem that aims to generate high-quality rules to largely reduce the labeling cost while preserving quality. To address the problem, we first generate candidate rules and then devise a game-based crowdsourcing approach CrowdGame to select high-quality rules by considering coverage and accuracy. CrowdGame employs two groups of crowd workers: One group answers rule validation tasks (whether a rule is valid) to play a role of rule generator, while the other group answers tuple checking tasks (whether the label of a data tuple is correct) to play a role of rule refuter. We let the two groups play a two-player game: Rule generator identifies high-quality rules with large coverage, while rule refuter tries to refute its opponent rule generator by checking some tuples that provide enough evidence to reject rules with low accuracy. This paper studies the challenges in CrowdGame. The first is to balance the trade-off between coverage and accuracy. We define the loss of a rule by considering the two factors. The second is rule accuracy estimation. We utilize Bayesian estimation to combine both rule validation and tuple checking tasks. The third is to select crowdsourcing tasks to fulfill the game-based framework for minimizing the loss. We introduce a minimax strategy and develop efficient task selection algorithms. We also develop a hybrid crowd-machine method for effective label assignment under budget-constrained crowdsourcing settings. We conduct experiments on entity matching and relation extraction, and the results show that our method outperforms state-of-the-art solutions.
引用
收藏
页码:1311 / 1336
页数:26
相关论文
共 50 条
  • [21] A matching game-based crowdsourcing framework for last-mile delivery: Ground-vehicles and Unmanned-Aerial Vehicles
    Abualola, Huda
    Mizouni, Rabeb
    Otrok, Hadi
    Singh, Shakti
    Barada, Hassan
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2023, 213
  • [22] Learning from biased crowdsourced labeling with deep clustering
    Wu, Ming
    Li, Qianmu
    Yang, Fei
    Zhang, Jing
    Sheng, Victor S.
    Hou, Jun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
  • [23] Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks
    Jagabathula, Srikanth
    Subramaniam, Lakshminarayanan
    Venkataraman, Ashwin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [24] A Two-layer Game-based Incentive Mechanism for Decentralized Crowdsourcing
    Han, Rong
    Liang, Xueqin
    Yan, Zheng
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 927 - 933
  • [25] Policy for sustainable entrepreneurship: A crowdsourced framework
    Watson, Rosina
    Nielsen, Kristian Roed
    Wilson, Hugh N.
    Macdonald, Emma K.
    Mera, Christine
    Reisch, Lucia
    JOURNAL OF CLEANER PRODUCTION, 2023, 283
  • [26] Improving the Quality of Crowdsourced Image Labeling via Label Similarity
    Fang, Yi-Li
    Sun, Hai-Long
    Chen, Peng-Peng
    Deng, Ting
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (05) : 877 - 889
  • [27] Improving the Quality of Crowdsourced Image Labeling via Label Similarity
    Yi-Li Fang
    Hai-Long Sun
    Peng-Peng Chen
    Ting Deng
    Journal of Computer Science and Technology, 2017, 32 : 877 - 889
  • [28] Applying Rapid Crowdsourced Playtesting to a Human Computation Game
    Paranthaman, Pratheep Kumar
    Sarkar, Anurag
    Cooper, Seth
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2021, 2021,
  • [29] Crowdsourced Data Management: A Survey
    Li, Guoliang
    Wang, Jiannan
    Zheng, Yudian
    Franklin, Michael J.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (09) : 2296 - 2319
  • [30] Automated Validation of Crowdsourced Data
    Ibrahim, Zailani
    Aris, Hazleen
    Mansur, Aishah
    2018 IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2018,