Non-stationary Dueling Bandits for Online Learning to Rank

被引:1
作者
Lu, Shiyin [1 ]
Miao, Yuan [2 ]
Yang, Ping [2 ]
Hu, Yao [2 ]
Zhang, Lijun [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Alibaba Grp, Hangzhou 311121, Peoples R China
来源
WEB AND BIG DATA, PT II, APWEB-WAIM 2022 | 2023年 / 13422卷
关键词
Online learning to rank; Dueling bandits; Non-stationary environments;
D O I
10.1007/978-3-031-25198-6_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study online learning to rank (OL2R), where a parameterized ranking model is optimized based on sequential feedback from users. A natural and popular approach for OL2R is to formulate it as a multi-armed dueling bandits problem, where each arm corresponds to a ranker, i.e., the ranking model with a specific parameter configuration. While the dueling bandits and its application to OL2R have been extensively studied in the literature, existing works focus on static environments where the preference order over rankers is assumed to be stationary. However, this assumption is often violated in real-world OL2R applications as user preference typically changes with time and so does the optimal ranker. To address this problem, we propose non-stationary dueling bandits where the preference order over rankers is modeled by a time-variant function. We develop an efficient and adaptive method for non-stationary dueling bandits with strong theoretical guarantees. The main idea of our method is to run multiple dueling bandits gradient descent (DBGD) algorithms with different step sizes in parallel and employ a meta algorithm to dynamically combine these DBGD algorithms according to their real-time performance. With straightforward extensions, our method can also apply to existing DBGD-type algorithms.
引用
收藏
页码:166 / 174
页数:9
相关论文
共 17 条
[1]  
[Anonymous], 2013, Ph.D. Dissertation
[2]  
Cesa-Bianchi N., 2006, Prediction, learning, and games, DOI DOI 10.1017/CBO9780511546921
[3]   Online Learning to Rank for Information Retrieval SIGIR 2016 Tutorial [J].
Grotov, Artem ;
de Rijke, Maarten .
SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, :1215-1218
[4]  
Hofmann K., 2011, CIKM, P249
[5]   Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval [J].
Hofmann, Katja ;
Whiteson, Shimon ;
de Rijke, Maarten .
INFORMATION RETRIEVAL, 2013, 16 (01) :63-90
[6]   Differentiable Unbiased Online Learning to Rank [J].
Oosterhuis, Harrie ;
de Rijke, Maarten .
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, :1293-1302
[7]  
Radlinski F., 2008, P 17 ACM C INF KNOWL, P43, DOI DOI 10.1145/1458082.1458092
[8]   Multileave Gradient Descent for Fast Online Learning to Rank [J].
Schuth, Anne ;
Oosterhuis, Harrie ;
Whiteson, Shimon ;
de Rijke, Maarten .
PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, :457-466
[9]   Learning to rank for Information Retrieval [J].
Liu, Tie-Yan .
Foundations and Trends in Information Retrieval, 2009, 3 (03) :225-231
[10]  
van Erven T, 2016, ADV NEUR IN, V29