Exploiting User Disagreement for Web Search Evaluation: an Experimental Approach

被引：6

作者：

Demeester, Thomas ^{[1
]}

Aly, Robin ^{[2
]}

Hiemstra, Djoerd ^{[2
]}

Dong Nguyen ^{[2
]}

Trieschnigg, Dolf ^{[2
]}

Develder, Chris ^{[1
]}

机构：

[1] Univ Ghent, iMinds, Ghent, Belgium

[2] Univ Twente, Enschede, Netherlands

来源：

WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2014年

关键词：

User disagreement; graded relevance; evaluation; RELEVANCE JUDGMENTS;

D O I：

10.1145/2556195.2556268

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To express a more nuanced notion of relevance as compared to binary judgments, graded relevance levels can be used for the evaluation of search results. Especially in Web search, users strongly prefer top results over less relevant results, and yet they often disagree on which are the top results for a given information need. Whereas previous works have generally considered disagreement as a negative effect, this paper proposes a method to exploit this user disagreement by integrating it into the evaluation procedure. First, we present experiments that investigate the user disagreement. We argue that, with a high disagreement, lower relevance levels might need to be promoted more than in the case where there is global consensus on the top results. This is formalized by introducing the User Disagreement Model, resulting in a weighting of the relevance levels with a probabilistic interpretation. A validity analysis is given, and we explain how to integrate the model with well-established evaluation metrics. Finally, we discuss a specific application of the model, in the estimation of suitable weights for the combined relevance of Web search snippets and pages.

引用

页码：33 / 42

页数：10

共 25 条

[1]

Agrawal R, 2009, P 2 ACM INT C WEB SE, V09, P5, DOI [DOI 10.1145/1498759.1498766, 10.1145/1498759.1498766]

[2]

[Anonymous], 2008, INT ACM SIGIR C RES, DOI DOI 10.1145/1390334.1390435

[3]

[Anonymous], 2009, P 18 ACM C INF KNOWL

[4]

Bailey Peter, 2008, Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P667, DOI DOI 10.1145/1390334.1390447

[5]

Buckley C., 2010, TREC

[6] VARIATIONS IN RELEVANCE JUDGMENTS AND THE EVALUATION OF RETRIEVAL PERFORMANCE [J].

BURGIN, R .

INFORMATION PROCESSING & MANAGEMENT, 1992, 28 (05) :619-627

[7]

Carterette B., 2009, TREC

[8]

Carterette B, 2010, SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, P539

[9]

Clarke CLA, 2009, LECT NOTES COMPUT SC, V5766, P188, DOI 10.1007/978-3-642-04417-5_17

[10]

Clarke CL., 2010, Proceedings of The 19th Text REtrieval Conference - TREC'10, P1

← 1 2 3 →