AN ASYMPTOTICALLY OPTIMAL HEURISTIC FOR GENERAL NONSTATIONARY FINITE-HORIZON RESTLESS MULTI-ARMED, MULTI-ACTION BANDITS

被引：20

作者：

Zayas-Caban, Gabriel ^{[1
]}

Jasin, Stefanus ^{[2
]}

Wang, Guihua ^{[2
]}

机构：

[1] Univ Wisconsin, Mech Engn Bldg, 1513 Univ Ave,Room 3011, Madison, WI 53706 USA

[2] Univ Michigan, Stephen M Ross Sch Business, 701 Tappan St, Ann Arbor, MI 48109 USA

来源：

ADVANCES IN APPLIED PROBABILITY | 2019年 / 51卷 / 03期

关键词：

Restless bandit; asymptotic optimality; finite horizon; nonstationary; arm acquiring; nonindexable bandit; ARMED BANDITS; INDEX;

D O I：

10.1017/apr.2019.29

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We propose an asymptotically optimal heuristic, which we term randomized assignment control (RAC) for a restless multi-armed bandit problem with discrete-time and finite states. It is constructed using a linear programming relaxation of the original stochastic control formulation. In contrast to most of the existing literature, we consider a finitehorizon problem with multiple actions and time-dependent (i.e. nonstationary) upper bound on the number of bandits that can be activated at each time period; indeed, our analysis can also be applied in the setting with nonstationary transition matrix and nonstationary cost function. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to infinity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Finally, we show that RAC is also asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system.

引用

页码：745 / 772

页数：28

共 50 条

[1] An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits (vol 51, pg 745, 2019)
Zayas-Caban, Gabriel
Liang, Jiaxin
Jasin, Stefanus
Wang, Guihua
ADVANCES IN APPLIED PROBABILITY, 2023, 55 (03) : 1075 - 1083
[2] Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits
Combes, Richard
Magureanu, Stefan
Proutiere, Alexandre
2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 152 - 152
[3] Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits
Xiong, Guojun
Li, Jian
Singh, Rahul
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8726 - 8734
[4] Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits
Komiyama, Junpei
Fouche, Edouard
Honda, Junya
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
[5] On Optimal Foraging and Multi-armed Bandits
Srivastava, Vaibhav
Reverdy, Paul
Leonard, Naomi E.
2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
[6] On a Class of Restless Multi-armed Bandits with Deterministic Policies
Jhunjhunwala, Prakirt Raj
Moharir, Sharayu
Manjunath, D.
Gopalan, Aditya
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 487 - 491
[7] Optimal Algorithms for Multiplayer Multi-Armed Bandits
Wang, Po-An
Proutiere, Alexandre
Ariu, Kaito
Jedra, Yassir
Russo, Alessio
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[8] Optimal Streaming Algorithms for Multi-Armed Bandits
Jin, Tianyuan
Huang, Keke
Tang, Jing
Xiao, Xiaokui
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[9] Multi-armed Bandits: Competing with Optimal Sequences
Anava, Oren
Karnin, Zohar
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[10] Best Arm Identification in Restless Markov Multi-Armed Bandits
Karthik, P. N.
Reddy, Kota Srinivas
Tan, Vincent Y. F.
IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (05) : 3240 - 3262

← 1 2 3 4 5 →