Offline Evaluation by Maximum Similarity to an Ideal Ranking

被引:16
作者
Clarke, Charles L. A. [1 ]
Smucker, Mark D. [2 ]
Vtyurina, Alexandra [1 ]
机构
[1] Univ Waterloo, Comp Sci, Waterloo, ON, Canada
[2] Univ Waterloo, Management Sci, Waterloo, ON, Canada
来源
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT | 2020年
基金
加拿大自然科学与工程研究理事会;
关键词
RETRIEVAL;
D O I
10.1145/3340531.3411915
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
NDCG and similar measures remain standard for the offline evaluation of search, recommendation, question answering and similar systems. These measures require definitions for two or more relevance levels, which human assessors then apply to judge individual documents. Due to this dependence on a definition of relevance, it can be difficult to extend these measures to account for factors beyond relevance. Rather than propose extensions to these measures, we instead propose a radical simplification to replace them. For each query, we define a set of ideal rankings and compute the maximum rank similarity between members of this set and an actual ranking generated by a system. This maximum similarity to an ideal ranking becomes our effectiveness measure, replacing NDCG and similar measures. We propose rank biased overlap (RBO) to compute this rank similarity, since it was specifically created to address the requirements of rank similarity between search results. As examples, we explore ideal rankings that account for document length, diversity, and correctness.
引用
收藏
页码:225 / 234
页数:10
相关论文
共 52 条
[31]  
Raiber Fiana, 2013, 22 TEXT RETRIEVAL C
[32]  
Robertson S. E., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P232
[33]  
Robertson S. E., 1995, Text REtrieval Conference (TREC-3) (NIST SP 500-225), P109
[34]   ON TERM SELECTION FOR QUERY EXPANSION [J].
ROBERTSON, SE .
JOURNAL OF DOCUMENTATION, 1990, 46 (04) :359-364
[35]  
RORVIG ME, 1990, J AM SOC INFORM SCI, V41, P590, DOI 10.1002/(SICI)1097-4571(199012)41:8<590::AID-ASI5>3.0.CO
[36]  
2-T
[37]   A survey on the use of relevance feedback for information access systems [J].
Ruthven, I ;
Lalmas, M .
KNOWLEDGE ENGINEERING REVIEW, 2003, 18 (02) :95-145
[38]   Which Diversity Evaluation Measures Are "Good"? [J].
Sakai, Tetsuya ;
Zeng, Zhaohao .
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :595-604
[39]  
Sakai T, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P473
[40]  
Sakai T, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P1043