Offline Evaluation by Maximum Similarity to an Ideal Ranking

被引:16
作者
Clarke, Charles L. A. [1 ]
Smucker, Mark D. [2 ]
Vtyurina, Alexandra [1 ]
机构
[1] Univ Waterloo, Comp Sci, Waterloo, ON, Canada
[2] Univ Waterloo, Management Sci, Waterloo, ON, Canada
来源
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT | 2020年
基金
加拿大自然科学与工程研究理事会;
关键词
RETRIEVAL;
D O I
10.1145/3340531.3411915
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
NDCG and similar measures remain standard for the offline evaluation of search, recommendation, question answering and similar systems. These measures require definitions for two or more relevance levels, which human assessors then apply to judge individual documents. Due to this dependence on a definition of relevance, it can be difficult to extend these measures to account for factors beyond relevance. Rather than propose extensions to these measures, we instead propose a radical simplification to replace them. For each query, we define a set of ideal rankings and compute the maximum rank similarity between members of this set and an actual ranking generated by a system. This maximum similarity to an ideal ranking becomes our effectiveness measure, replacing NDCG and similar measures. We propose rank biased overlap (RBO) to compute this rank similarity, since it was specifically created to address the requirements of rank similarity between search results. As examples, we explore ideal rankings that account for document length, diversity, and correctness.
引用
收藏
页码:225 / 234
页数:10
相关论文
共 52 条
[1]  
Abualsaud Mustafa, 2019, 28 TEXT RETRIEVAL C
[2]  
Agichtein E., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P19, DOI 10.1145/1148170.1148177
[3]  
Agrawal Rakesh, 2009, WSDM, P5, DOI 10.1145/1498759.1498766
[4]  
Albahem A, 2019, LECT NOTES COMPUT SC, V11437, P607, DOI 10.1007/978-3-030-15712-8_39
[5]  
Armstrong Timothy G., 2009, P 18 INT C INF KNOWL, P601, DOI DOI 10.1145/1645953.1646031
[6]   Measuring the Utility of Search Engine Result Pages [J].
Azzopardi, Leif ;
Thomas, Paul ;
Craswell, Nick .
ACM/SIGIR PROCEEDINGS 2018, 2018, :605-614
[7]  
Baskaya F, 2012, SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P105, DOI 10.1145/2348283.2348301
[8]  
Bendersky Michael, 2018, 27 INT JOINT C ART I, P5219
[9]  
BengioY KeglB, 2011, ADV NEURAL INFORM PR, P2546
[10]  
Buckley Chris., 1994, P TEXT RETRIEVAL C T, P69