A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

被引:0
|
作者
Alami, Reda [1 ]
Mahfoud, Mohammed [2 ]
Achab, Mastane [1 ]
机构
[1] Technol Innovat Inst, Masdar City, U Arab Emirates
[2] Montreal Inst Learning Algorithms, Montreal, PQ, Canada
关键词
Non-stationary environments; risk averse bandits; change point detection;
D O I
10.1109/ICDMW60847.2023.00040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon T. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environmentspecific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive riskaware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online ChangePoint Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of (O) over tilde (root KTT) up to time horizon T with K-T the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.
引用
收藏
页码:272 / 280
页数:9
相关论文
共 50 条
  • [11] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Lin, Yifan
    Wang, Yuhao
    Zhou, Enlu
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2023, 32 (03) : 267 - 288
  • [12] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Yifan Lin
    Yuhao Wang
    Enlu Zhou
    Journal of Systems Science and Systems Engineering, 2023, 32 : 267 - 288
  • [14] Stochastic Multi-Armed Bandits with Control Variates
    Verma, Arun
    Hanawal, Manjesh K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [15] Stochastic Multi-armed Bandits in Constant Space
    Liau, David
    Price, Eric
    Song, Zhao
    Yang, Ger
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [16] DYNAMIC SPECTRUM ACCESS WITH NON-STATIONARY MULTI-ARMED BANDIT
    Alaya-Feki, Afef Ben Hadj
    Moulines, Eric
    LeCornec, Alain
    2008 IEEE 9TH WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, VOLS 1 AND 2, 2008, : 416 - 420
  • [17] Risk-Averse Biased Human Policies with a Robot Assistant in Multi-Armed Bandit Settings
    Koller, Michael
    Patten, Timothy
    Vincze, Markus
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 483 - 488
  • [18] Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure
    Vakili, Sattar
    Zhao, Qing
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2016, 10 (06) : 1093 - 1111
  • [19] Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
    Lancewicki, Tal
    Segal, Shahar
    Koren, Tomer
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [20] Robust Stochastic Multi-Armed Bandits with Historical Data
    Yacobi, Sarah Boufelja
    Bounefouf, Djallel
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 959 - 965