A game-based framework for crowdsourced data labeling

被引:5
|
作者
Yang, Jingru [1 ]
Fan, Ju [1 ]
Wei, Zhewei [1 ]
Li, Guoliang [2 ]
Liu, Tongyu [1 ]
Du, Xiaoyong [1 ]
机构
[1] Renmin Univ China, Beijing 100872, Peoples R China
[2] Tsinghua Univ, Beijing 100084, Peoples R China
关键词
Crowdsourcing; Data labeling; Labeling rules;
D O I
10.1007/s00778-020-00613-w
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data labeling, which assigns data with multiple classes, is indispensable for many applications, such as machine learning and data integration. However, existing labeling solutions either incur expensive cost for large datasets or produce noisy results. This paper introduces a cost-effective labeling approach and focuses on the labeling rule generation problem that aims to generate high-quality rules to largely reduce the labeling cost while preserving quality. To address the problem, we first generate candidate rules and then devise a game-based crowdsourcing approach CrowdGame to select high-quality rules by considering coverage and accuracy. CrowdGame employs two groups of crowd workers: One group answers rule validation tasks (whether a rule is valid) to play a role of rule generator, while the other group answers tuple checking tasks (whether the label of a data tuple is correct) to play a role of rule refuter. We let the two groups play a two-player game: Rule generator identifies high-quality rules with large coverage, while rule refuter tries to refute its opponent rule generator by checking some tuples that provide enough evidence to reject rules with low accuracy. This paper studies the challenges in CrowdGame. The first is to balance the trade-off between coverage and accuracy. We define the loss of a rule by considering the two factors. The second is rule accuracy estimation. We utilize Bayesian estimation to combine both rule validation and tuple checking tasks. The third is to select crowdsourcing tasks to fulfill the game-based framework for minimizing the loss. We introduce a minimax strategy and develop efficient task selection algorithms. We also develop a hybrid crowd-machine method for effective label assignment under budget-constrained crowdsourcing settings. We conduct experiments on entity matching and relation extraction, and the results show that our method outperforms state-of-the-art solutions.
引用
收藏
页码:1311 / 1336
页数:26
相关论文
共 50 条
  • [31] Crowdsourced surveillance and networked data
    Lally, Nick
    SECURITY DIALOGUE, 2017, 48 (01) : 63 - 77
  • [32] Stackelberg Game-Based Crowdsourcing for Hybrid V2X Communications
    Deng, Zhenjie
    Wan, Dehuan
    Wu, Chunhui
    Bian, Chen
    Liu, Dacai
    IEEE COMMUNICATIONS LETTERS, 2024, 28 (11) : 2593 - 2597
  • [33] A Blockchain-based Crowdsourced Task Assessment Framework using Smart Contract
    Islam, Linta
    Alvi, Syada Tasmia
    Rahman, Mafizur
    Prova, Ayesha Aziz
    Hossain, Md Nazmul
    Sorna, Jannatul Ferdous
    Uddin, Mohammed Nasir
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (01) : 590 - 600
  • [34] A Framework for a Crowdsourced Creation of Smart City Wheels
    Colombo, Moreno
    Hurle, Saskia
    Portmann, Edy
    Schafer, Elias
    2020 SEVENTH INTERNATIONAL CONFERENCE ON EDEMOCRACY & EGOVERNMENT (ICEDEG), 2020, : 305 - 308
  • [35] An Integrated Crowdsourced Framework for Disaster Relief Distribution
    Schempp, Timothy
    Zhang, Haoran
    Schmidt, Alexander
    Hong, Minsung
    Akerkar, Rajendra
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES FOR DISASTER MANAGEMENT (ICT-DM), 2018,
  • [36] Inference of Distribution Grids Based on Crowdsourced Grid Data and Drone Imagery
    Jacobsen, Hans-Arno
    Nasirifard, Pezhman
    Rivera, Jose
    Baruah, Prerona Ray
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2022, 7 (03): : 549 - 560
  • [37] Analysis of spatial variation with app-based crowdsourced audio data
    Kolly, Marie-Jose
    Leemann, Adrian
    Matter, Florian
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1710 - 1714
  • [38] A Framework to Preserve Confidentiality in Crowdsourced Software Development
    Dubey, Alpana
    Abhinav, Kumar
    Virdi, Gurdeep
    PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 115 - 117
  • [39] A game-based mechanism for managing 2-decomposable tasks in competitive crowdsourcing environments
    Mridha, Sankar Kumar
    Bhattacharyya, Malay
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2023, 42 (14) : 2366 - 2386
  • [40] A CROWDSOURCED DESIGN EXPERIMENT USING FREEHAND SKETCH DESIGN METHOD BASED ON THE CDESIGN FRAMEWORK
    Wu, Hao
    Corney, Jonathan
    DS87-4 PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 17), VOL 4: DESIGN METHODS AND TOOLS, 2017, : 415 - 424