Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

被引:0
|
作者
Agrawal, Priyank [1 ]
Tulabandhula, Theja [2 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
[2] Univ Illinois, Chicago, IL 60680 USA
来源
CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020) | 2020年 / 124卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting. In advertising and recommendation systems, repetition effect includes a wear-in period, where the user's propensity to reward the platform via a click or purchase depends on how frequently they see the recommendation in the recent past. It also includes a counteracting wear-out period, where the user's propensity to respond positively is dampened if the recommendation was shown too many times recently. Priming effect can be naturally modelled as a temporal constraint on the strategy space, since the reward for the current action depends on historical actions taken by the platform. We provide novel algorithms that achieves sublinear regret in time and the relevant wear-in/wear-out parameters. The effect of priming on the regret upper bound is also additive, and we get back a guarantee that matches popular algorithms such as the UCB1 and Thompson sampling when there is no priming effect. Our work complements recent work on modeling time varying rewards, delays and corruptions in bandits, and extends the usage of rich behavior models in sequential decision making settings.
引用
收藏
页码:470 / 479
页数:10
相关论文
共 50 条
  • [1] Active Learning in Multi-armed Bandits
    Antos, Andras
    Grover, Varun
    Szepesvari, Csaba
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
  • [2] Stochastic Multi-Armed Bandits with Control Variates
    Verma, Arun
    Hanawal, Manjesh K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [3] Stochastic Multi-armed Bandits in Constant Space
    Liau, David
    Price, Eric
    Song, Zhao
    Yang, Ger
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [4] Quantum Reinforcement Learning for Multi-Armed Bandits
    Liu, Yi-Pei
    Li, Kuo
    Cao, Xi
    Jia, Qing-Shan
    Wang, Xu
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
  • [5] TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS
    Cai, Changxiao
    Cai, T. Tony
    Li, Hongzhe
    ANNALS OF STATISTICS, 2024, 52 (01): : 207 - 232
  • [6] Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
    Lancewicki, Tal
    Segal, Shahar
    Koren, Tomer
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Robust Stochastic Multi-Armed Bandits with Historical Data
    Yacobi, Sarah Boufelja
    Bounefouf, Djallel
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 959 - 965
  • [8] Networked Stochastic Multi-Armed Bandits with Combinatorial Strategies
    Tang, Shaojie
    Zhou, Yaqin
    Han, Kai
    Zhang, Zhao
    Yuan, Jing
    Wu, Weili
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 786 - 793
  • [9] Anytime optimal algorithms in stochastic multi-armed bandits
    Degenne, Remy
    Perchet, Vianney
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [10] Parametrized Stochastic Multi-armed Bandits with Binary Rewards
    Jiang, Chong
    Srikant, R.
    2011 AMERICAN CONTROL CONFERENCE, 2011, : 119 - 124