Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

被引:26
作者
Hofmann, Katja
Whiteson, Shimon [1 ]
De Rijke, Maarten [1 ]
机构
[1] Univ Amsterdam, ISLA, NL-1012 WX Amsterdam, Netherlands
关键词
Algorithms; Information retrieval; interleaved comparison; interleaving; clicks; online evaluation; importance sampling; CLICK;
D O I
10.1145/2536736.2536737
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-query pairs, which are expensive to obtain. A promising alternative is the use of interleaved comparison methods, which compare rankers using click data obtained when interleaving their rankings. In this article, we propose a framework for analyzing interleaved comparison methods. An interleaved comparison method has fidelity if the expected outcome of ranker comparisons properly corresponds to the true relevance of the ranked documents. It is sound if its estimates of that expected outcome are unbiased and consistent. It is efficient if those estimates are accurate with only little data. We analyze existing interleaved comparison methods and find that, while sound, none meet our criteria for fidelity. We propose a probabilistic interleave method, which is sound and has fidelity. We show empirically that, by marginalizing out variables that are known, it is more efficient than existing interleaved comparison methods. Using importance sampling we derive a sound extension that is able to reuse historical data collected in previous comparisons of other ranker pairs.
引用
收藏
页数:43
相关论文
共 48 条
  • [1] Agichtein E., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P19, DOI 10.1145/1148170.1148177
  • [2] [Anonymous], 2010, ADV NEURAL INFORM PR
  • [3] [Anonymous], 2009, P 18 ACM C INFORM KN, DOI [DOI 10.1145/1645953.1646293, 10.1145]
  • [4] [Anonymous], 2002, P ACM SIGKDD KDD 200, DOI 10.1145/775047.775067
  • [5] [Anonymous], 2008, International Conference on Machine Learning (ICML)
  • [6] [Anonymous], 2004, Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM '04)
  • [7] [Anonymous], 2000, INT C MACH LEARN
  • [8] [Anonymous], 2020, Reinforcement Learning, An Introduction
  • [9] Carterette B., 2008, Advances in Neural Information Processing Systems, V20, P217
  • [10] Chapelle O., 2009, P 18 ACM C INF KNOWL, P621, DOI DOI 10.1145/1645953.1646033