Preference-Based Offline Evaluation

被引:7
作者
Clarke, Charles L. A. [1 ]
Diaz, Fernando [2 ]
Arabzadeh, Negar [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] Google, Montreal, PQ, Canada
来源
PROCEEDINGS OF THE SIXTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2023, VOL 1 | 2023年
关键词
offline evaluation; preferences; search; tutorial;
D O I
10.1145/3539597.3572725
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A core step in production model research and development involves the offline evaluation of a system before production deployment. Traditional offline evaluation of search, recommender, and other systems involves gathering item relevance labels from human editors. These labels can then be used to assess system performance using offline evaluation metrics. Unfortunately, this approach does not work when evaluating highly-effective ranking systems, such as those emerging from the advances in machine learning. Recent work demonstrates that moving away from pointwise item and metric evaluation can be a more effective approach to the offline evaluation of systems. This tutorial, intended for both researchers and practitioners, reviews early work in preference-based evaluation and covers recent developments in detail.
引用
收藏
页码:1248 / 1251
页数:4
相关论文
共 87 条
[21]  
Carterette Ben, 2008, SIGIR 2008 WORKSH BI
[22]  
Chandar P, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P413
[23]  
Chandar Praveen, 2020, Advances in Neural Information Pro- cessing Systems
[24]  
Chapelle Olivier, 2009, Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM '09, P621, DOI DOI 10.1145/1645953.1646033
[25]  
Chen X., 2013, P 6 ACM INT C WEB SE, P193, DOI [DOI 10.1145/2433396.2433420, 10.1145/2433396.2433420]
[26]   Offline evaluation without gain [J].
Clarke, Charles L. A. ;
Vtyurina, Alexandra ;
Smucker, Mark D. .
PROCEEDINGS OF THE 2020 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL, ICTIR 2020, 2020, :185-192
[27]   Assessing Top-k Preferences [J].
Clarke, Charles L. A. ;
Vtyurina, Alexandra ;
Smucker, Mark D. .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2021, 39 (03)
[28]   Offline Evaluation by Maximum Similarity to an Ideal Ranking [J].
Clarke, Charles L. A. ;
Smucker, Mark D. ;
Vtyurina, Alexandra .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :225-234
[29]   Offline Retrieval Evaluation Without Evaluation Metrics [J].
Diaz, Fernando ;
Ferraro, Andres .
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, :599-609
[30]   Evaluating Stochastic Rankings with Expected Exposure [J].
Diaz, Fernando ;
Mitra, Bhaskar ;
Ekstrand, Michael D. ;
Biega, Asia J. ;
Carterette, Ben .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :275-284