A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

被引：0

作者：

Alami, Reda ^{[1
]}

Mahfoud, Mohammed ^{[2
]}

Achab, Mastane ^{[1
]}

机构：

[1] Technol Innovat Inst, Masdar City, U Arab Emirates

[2] Montreal Inst Learning Algorithms, Montreal, PQ, Canada

来源：

2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023 | 2023年

关键词：

Non-stationary environments; risk averse bandits; change point detection;

D O I：

10.1109/ICDMW60847.2023.00040

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon T. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environmentspecific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive riskaware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online ChangePoint Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of (O) over tilde (root KTT) up to time horizon T with K-T the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.

引用

页码：272 / 280

页数：9

共 50 条

[21] Networked Stochastic Multi-Armed Bandits with Combinatorial Strategies
Tang, Shaojie
Zhou, Yaqin
Han, Kai
Zhang, Zhao
Yuan, Jing
Wu, Weili
2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 786 - 793
[22] Anytime optimal algorithms in stochastic multi-armed bandits
Degenne, Remy
Perchet, Vianney
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[23] Parametrized Stochastic Multi-armed Bandits with Binary Rewards
Jiang, Chong
Srikant, R.
2011 AMERICAN CONTROL CONFERENCE, 2011, : 119 - 124
[24] Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits
Wang, Zhiwei
Wang, Huazheng
Wang, Hongning
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15770 - 15777
[25] Robust risk-averse multi-armed bandits with application in social engagement behavior of children with autism spectrum disorder while imitating a humanoid robot
Aryania, Azra
Aghdasi, Hadi S.
Heshmati, Rasoul
Bonarini, Andrea
INFORMATION SCIENCES, 2021, 573 : 194 - 221
[26] Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets
Wan, Zongqi
Zhang, Zhijie
Li, Tongyang
Zhang, Jialin
Sun, Xiaoming
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10087 - 10094
[27] On Kernelized Multi-armed Bandits
Chowdhury, Sayak Ray
Gopalan, Aditya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[28] Multi-armed Bandits with Compensation
Wang, Siwei
Huang, Longbo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[29] Regional Multi-Armed Bandits
Wang, Zhiyang
Zhou, Ruida
Shen, Cong
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[30] Federated Multi-Armed Bandits
Shi, Chengshuai
Shen, Cong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9603 - 9611

← 1 2 3 4 5 →