Using crowdsourcing for TREC relevance assessment

被引:86
作者
Alonso, Omar [2 ]
Mizzaro, Stefano [1 ]
机构
[1] Univ Udine, Dept Maths & Comp Sci, I-33100 Udine, Italy
[2] Microsoft Corp, Mountain View, CA 94043 USA
关键词
IR evaluation; Test collections; Relevance assessment; Crowdsourcing; TREC; Amazon Mechanical Turk; Experimental design; JUDGMENTS;
D O I
10.1016/j.ipm.2012.01.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Crowdsourcing has recently gained a lot of attention as a tool for conducting different kinds of relevance evaluations. At a very high level, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee. This crowdsourcing approach makes possible to conduct information retrieval experiments extremely fast, with good results at a low cost. This paper reports on the first attempts to combine crowdsourcing and TREC: our aim is to validate the use of crowdsourcing for relevance assessment. To this aim, we use the Amazon Mechanical Turk crowdsourcing platform to run experiments on TREC data, evaluate the outcomes, and discuss the results. We make emphasis on the experiment design, execution, and quality control to gather useful results, with particular attention to the issue of agreement among assessors. Our position, supported by the experimental results, is that crowdsourcing is a cheap, quick, and reliable alternative for relevance assessment. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1053 / 1066
页数:14
相关论文
共 40 条
[1]  
Alonso Omar, 2008, SIGIR Forum, V42, P9, DOI 10.1145/1480506.1480508
[2]  
ALONSO O, 2011, P EUR C INF RETR ECI, V6611, P153
[3]  
Alonso O., 2010, P EUR C INF RETR ECI, P623
[4]  
Alonso O, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P1299
[5]   Relevance Criteria for E-Commerce: A Crowdsourcing-based Experimental Analysis [J].
Alonso, Omar ;
Mizzaro, Stefano .
PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, :760-761
[6]  
[Anonymous], CLEFEL
[7]  
[Anonymous], 2009, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
[8]  
[Anonymous], 1994, USABILITY INSPECTION, DOI [10.5555/2821575, DOI 10.5555/2821575]
[9]  
[Anonymous], 2009, P SIGIR 2009 WORKSH
[10]  
Aslam J. A., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P541, DOI 10.1145/1148170.1148263