Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

被引：0

作者：

Agrawal, Priyank ^{[1
]}

Tulabandhula, Theja ^{[2
]}

机构：

[1] Univ Illinois, Champaign, IL 61820 USA

[2] Univ Illinois, Chicago, IL 60680 USA

来源：

CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020) | 2020年 / 124卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting. In advertising and recommendation systems, repetition effect includes a wear-in period, where the user's propensity to reward the platform via a click or purchase depends on how frequently they see the recommendation in the recent past. It also includes a counteracting wear-out period, where the user's propensity to respond positively is dampened if the recommendation was shown too many times recently. Priming effect can be naturally modelled as a temporal constraint on the strategy space, since the reward for the current action depends on historical actions taken by the platform. We provide novel algorithms that achieves sublinear regret in time and the relevant wear-in/wear-out parameters. The effect of priming on the regret upper bound is also additive, and we get back a guarantee that matches popular algorithms such as the UCB1 and Thompson sampling when there is no priming effect. Our work complements recent work on modeling time varying rewards, delays and corruptions in bandits, and extends the usage of rich behavior models in sequential decision making settings.

引用

页码：470 / 479

页数：10

共 50 条

[1] Active Learning in Multi-armed Bandits
Antos, Andras
Grover, Varun
Szepesvari, Csaba
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
[2] Stochastic Multi-Armed Bandits with Control Variates
Verma, Arun
Hanawal, Manjesh K.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[3] Stochastic Multi-armed Bandits in Constant Space
Liau, David
Price, Eric
Song, Zhao
Yang, Ger
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[4] Quantum Reinforcement Learning for Multi-Armed Bandits
Liu, Yi-Pei
Li, Kuo
Cao, Xi
Jia, Qing-Shan
Wang, Xu
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
[5] TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS
Cai, Changxiao
Cai, T. Tony
Li, Hongzhe
ANNALS OF STATISTICS, 2024, 52 (01): : 207 - 232
[6] Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
Lancewicki, Tal
Segal, Shahar
Koren, Tomer
Mansour, Yishay
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[7] Robust Stochastic Multi-Armed Bandits with Historical Data
Yacobi, Sarah Boufelja
Bounefouf, Djallel
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 959 - 965
[8] Networked Stochastic Multi-Armed Bandits with Combinatorial Strategies
Tang, Shaojie
Zhou, Yaqin
Han, Kai
Zhang, Zhao
Yuan, Jing
Wu, Weili
2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 786 - 793
[9] Anytime optimal algorithms in stochastic multi-armed bandits
Degenne, Remy
Perchet, Vianney
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[10] Parametrized Stochastic Multi-armed Bandits with Binary Rewards
Jiang, Chong
Srikant, R.
2011 AMERICAN CONTROL CONFERENCE, 2011, : 119 - 124

← 1 2 3 4 5 →