Preference-Based Offline Evaluation

被引:7
作者
Clarke, Charles L. A. [1 ]
Diaz, Fernando [2 ]
Arabzadeh, Negar [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] Google, Montreal, PQ, Canada
来源
PROCEEDINGS OF THE SIXTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2023, VOL 1 | 2023年
关键词
offline evaluation; preferences; search; tutorial;
D O I
10.1145/3539597.3572725
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A core step in production model research and development involves the offline evaluation of a system before production deployment. Traditional offline evaluation of search, recommender, and other systems involves gathering item relevance labels from human editors. These labels can then be used to assess system performance using offline evaluation metrics. Unfortunately, this approach does not work when evaluating highly-effective ranking systems, such as those emerging from the advances in machine learning. Recent work demonstrates that moving away from pointwise item and metric evaluation can be a more effective approach to the offline evaluation of systems. This tutorial, intended for both researchers and practitioners, reviews early work in preference-based evaluation and covers recent developments in detail.
引用
收藏
页码:1248 / 1251
页数:4
相关论文
共 87 条
[1]   A System for Efficient High-Recall Retrieval [J].
Abualsaud, Mustafa ;
Ghelani, Nimesh ;
Zhang, Haotian ;
Smucker, Mark D. ;
Cormack, Gordon V. ;
Grossman, Maura R. .
ACM/SIGIR PROCEEDINGS 2018, 2018, :1317-1320
[2]  
Agrawal Rakesh, 2009, P 2 ACM INT C WEB SE, P172
[3]  
Agrawal Rakesh, 2009, P 2 ACM INT C WEB SE, DOI 10.1145/1498759.1498766
[4]  
Al-Maskari Azzah, 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P773, DOI 10.1145/1277741.1277902
[5]  
Albahem A, 2019, LECT NOTES COMPUT SC, V11437, P607, DOI 10.1007/978-3-030-15712-8_39
[6]  
Alonso Omar, 2008, SIGIR Forum, V42, P9, DOI 10.1145/1480506.1480508
[7]   Shallow pooling for sparse labels [J].
Arabzadeh, Negar ;
Vtyurina, Alexandra ;
Yan, Xinyi ;
Clarke, Charles L. A. .
INFORMATION RETRIEVAL JOURNAL, 2022, 25 (04) :365-385
[8]  
Awadallah Ahmed Hassan, 2014, 23 ACM INT C C INF K, P51
[9]   Building Economic Models and Measures of Search [J].
Azzopardi, Leif ;
Moffat, Alistair ;
Thomas, Paul ;
Zuccon, Guido .
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :1401-1402
[10]   Measuring the Utility of Search Engine Result Pages [J].
Azzopardi, Leif ;
Thomas, Paul ;
Craswell, Nick .
ACM/SIGIR PROCEEDINGS 2018, 2018, :605-614