Practical Lessons for Gathering Quality Labels at Scale

被引:9
作者
Alonso, Omar [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
来源
SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2015年
关键词
Labeling; crowdsourcing; inter-rater agreement; debugging; Captchas; worker reliability; experimental design;
D O I
10.1145/2766462.2776778
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Information retrieval researchers and engineers use human computation as a mechanism to produce labeled data sets for product development, research and experimentation. To gather useful results, a successful labeling task relies on many different elements: clear instructions, user interface design, representative high-quality datasets, appropriate inter-rater agreement metrics, work quality checks, and channels for worker feedback. Furthermore, designing and implementing tasks that produce and use several thousands or millions of labels is different than conducting small scale research investigations. In this paper we present a perspective for collecting high quality labels with an emphasis on practical problems and scalability. We focus on three main topics: programming crowds, debugging tasks with low agreement, and algorithms for quality control. We show examples from an industrial setting.
引用
收藏
页码:1089 / 1092
页数:4
相关论文
共 8 条
  • [1] Abraham I., MANY WORKERS ASK ADA
  • [2] Alonso O., 2013, P AIRS
  • [3] Alonso O., 2015, P JCDL
  • [4] [Anonymous], P KDD
  • [5] Callison-Burch Chris, 2009, P EMNLP
  • [6] Dai P., 2015, P CSCW
  • [7] Dekel O., 2009, P COLT
  • [8] Marcus A., 2015, FDN TRENDS DATABASES