A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

被引:0
|
作者
Alami, Reda [1 ]
Mahfoud, Mohammed [2 ]
Achab, Mastane [1 ]
机构
[1] Technol Innovat Inst, Masdar City, U Arab Emirates
[2] Montreal Inst Learning Algorithms, Montreal, PQ, Canada
关键词
Non-stationary environments; risk averse bandits; change point detection;
D O I
10.1109/ICDMW60847.2023.00040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon T. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environmentspecific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive riskaware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online ChangePoint Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of (O) over tilde (root KTT) up to time horizon T with K-T the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.
引用
收藏
页码:272 / 280
页数:9
相关论文
共 50 条
  • [41] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
    de Curto, J.
    de Zarza, I.
    Roig, Gemma
    Cano, Juan Carlos
    Manzoni, Pietro
    Calafate, Carlos T.
    ELECTRONICS, 2023, 12 (13)
  • [42] Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation
    Xinyi Chen
    Qinran Hu
    Qingxin Shi
    Xiangjun Quan
    Zaijun Wu
    Fangxing Li
    JournalofModernPowerSystemsandCleanEnergy, 2020, 8 (06) : 1160 - 1167
  • [43] SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits
    Ciucanu, Radu
    Lafourcade, Pascal
    Marcadet, Gael
    Soare, Marta
    Journal of Artificial Intelligence Research, 2022, 73 : 737 - 765
  • [44] Multi-armed bandits for performance marketing
    Gigli, Marco
    Stella, Fabio
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [45] Lenient Regret for Multi-Armed Bandits
    Merlis, Nadav
    Mannor, Shie
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8950 - 8957
  • [46] Finding structure in multi-armed bandits
    Schulz, Eric
    Franklin, Nicholas T.
    Gershman, Samuel J.
    COGNITIVE PSYCHOLOGY, 2020, 119
  • [47] ON MULTI-ARMED BANDITS AND DEBT COLLECTION
    Czekaj, Lukasz
    Biegus, Tomasz
    Kitlowski, Robert
    Tomasik, Pawel
    36TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE, ESM 2022, 2022, : 137 - 141
  • [48] Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect
    Agrawal, Priyank
    Tulabandhula, Theja
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 470 - 479
  • [49] Best Arm Identification for Both Stochastic and Adversarial Multi-armed Bandits
    Zhang, Hantao
    Shen, Cong
    2018 IEEE INFORMATION THEORY WORKSHOP (ITW), 2018, : 385 - 389
  • [50] Visualizations for interrogations of multi-armed bandits
    Keaton, Timothy J.
    Sabbaghi, Arman
    STAT, 2019, 8 (01):