Incentivized Exploration of Non-Stationary Stochastic Bandits

被引:0
作者
Chakraborty, Sourav [1 ]
Chen, Lijun [1 ]
机构
[1] Univ Colorado, Comp Sci Dept, Boulder, CO 80309 USA
来源
2024 AMERICAN CONTROL CONFERENCE, ACC 2024 | 2024年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study incentivized exploration for the multi-armed bandit (MAB) problem with non-stationary reward distributions, where players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on the reward. We consider two different non-stationary environments: abruptly-changing and continuously-changing, and propose incentivized exploration algorithms accordingly. We show that the proposed algorithms achieve sublinear regret and compensation over time, and thus effectively incentivize exploration despite the nonstationarity and the biased or drifted feedback.
引用
收藏
页码:3563 / 3569
页数:7
相关论文
共 36 条
[1]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[2]  
Berry D. A., 1985, Bandit Problems: Sequential Allocation of Experiments
[3]  
Besbes O, 2014, ADV NEUR IN, V27
[4]  
Bouneffouf Djallel., 2012, ICONIP
[5]  
Brochu E., 2010, Tech. Rep. UBC TR-2009-23
[6]   Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J].
Bubeck, Sebastien ;
Cesa-Bianchi, Nicolo .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01) :1-122
[7]  
Chakraborty S., 2024, INCENTIVIZED EXPLORA
[8]   RECOMMENDER SYSTEMS AS MECHANISMS FOR SOCIAL LEARNING [J].
Che, Yeon-Koo ;
Horner, Johannes .
QUARTERLY JOURNAL OF ECONOMICS, 2018, 133 (02) :871-925
[9]   Optimal Algorithm for Bayesian Incentive-Compatible Exploration [J].
Cohen, Lee ;
Mansour, Yishay .
ACM EC '19: PROCEEDINGS OF THE 2019 ACM CONFERENCE ON ECONOMICS AND COMPUTATION, 2019, :135-151
[10]  
Ehsani Zahra., 2015, Effect of quality and price on customer satisfaction and commitment in iran auto industry