A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

被引:0
|
作者
Alami, Reda [1 ]
Mahfoud, Mohammed [2 ]
Achab, Mastane [1 ]
机构
[1] Technol Innovat Inst, Masdar City, U Arab Emirates
[2] Montreal Inst Learning Algorithms, Montreal, PQ, Canada
关键词
Non-stationary environments; risk averse bandits; change point detection;
D O I
10.1109/ICDMW60847.2023.00040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon T. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environmentspecific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive riskaware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online ChangePoint Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of (O) over tilde (root KTT) up to time horizon T with K-T the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.
引用
收藏
页码:272 / 280
页数:9
相关论文
共 50 条
  • [1] Robust Risk-Averse Stochastic Multi-armed Bandits
    Maillard, Odalric-Ambrym
    ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 218 - 233
  • [2] Risk-averse Ambulance Redeployment via Multi-armed Bandits
    Sahin, Umitcan
    Yucesoy, Veysel
    Koc, Aykut
    Tekin, Cem
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [3] Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits
    Kagrecha, Anmol
    Nair, Jayakrishnan
    Jagannathan, Krishna
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (08) : 5248 - 5267
  • [4] A revised approach for risk-averse multi-armed bandits under CVaR criterion
    Khajonchotpanya, Najakorn
    Xue, Yilin
    Rujeerapaiboon, Napat
    OPERATIONS RESEARCH LETTERS, 2021, 49 (04) : 465 - 472
  • [5] Stochastic Multi-Armed Bandits with Non-Stationary Rewards Generated by a Linear Dynamical System
    Gornet, Jonathan
    Hosseinzadeh, Mehdi
    Sinopoli, Bruno
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1460 - 1465
  • [6] The non-stationary stochastic multi-armed bandit problem
    Allesiardo R.
    Féraud R.
    Maillard O.-A.
    Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
  • [7] Contextual Multi-Armed Bandits for Non-Stationary Wireless Network Selection
    Martinez, Lluis
    Vidal, Josep
    Cabrera-Bean, Margarita
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 285 - 290
  • [8] Contextual Multi-Armed Bandits for Non-Stationary Heterogeneous Mobile Edge Computing
    Wirth, Maximilian
    Ortiz, Andrea
    Klein, Anja
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5599 - 5604
  • [9] Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health
    Shen, Yi
    Dunn, Jessilyn
    Zavlanos, Michael M.
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 144 - 149
  • [10] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Lin, Yifan
    Wang, Yuhao
    Zhou, Enlu
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2022,