AN ASYMPTOTICALLY OPTIMAL HEURISTIC FOR GENERAL NONSTATIONARY FINITE-HORIZON RESTLESS MULTI-ARMED, MULTI-ACTION BANDITS

被引:20
|
作者
Zayas-Caban, Gabriel [1 ]
Jasin, Stefanus [2 ]
Wang, Guihua [2 ]
机构
[1] Univ Wisconsin, Mech Engn Bldg, 1513 Univ Ave,Room 3011, Madison, WI 53706 USA
[2] Univ Michigan, Stephen M Ross Sch Business, 701 Tappan St, Ann Arbor, MI 48109 USA
关键词
Restless bandit; asymptotic optimality; finite horizon; nonstationary; arm acquiring; nonindexable bandit; ARMED BANDITS; INDEX;
D O I
10.1017/apr.2019.29
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose an asymptotically optimal heuristic, which we term randomized assignment control (RAC) for a restless multi-armed bandit problem with discrete-time and finite states. It is constructed using a linear programming relaxation of the original stochastic control formulation. In contrast to most of the existing literature, we consider a finitehorizon problem with multiple actions and time-dependent (i.e. nonstationary) upper bound on the number of bandits that can be activated at each time period; indeed, our analysis can also be applied in the setting with nonstationary transition matrix and nonstationary cost function. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to infinity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Finally, we show that RAC is also asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system.
引用
收藏
页码:745 / 772
页数:28
相关论文
共 50 条
  • [1] An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits (vol 51, pg 745, 2019)
    Zayas-Caban, Gabriel
    Liang, Jiaxin
    Jasin, Stefanus
    Wang, Guihua
    ADVANCES IN APPLIED PROBABILITY, 2023, 55 (03) : 1075 - 1083
  • [2] Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits
    Combes, Richard
    Magureanu, Stefan
    Proutiere, Alexandre
    2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 152 - 152
  • [3] Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits
    Xiong, Guojun
    Li, Jian
    Singh, Rahul
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8726 - 8734
  • [4] Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits
    Komiyama, Junpei
    Fouche, Edouard
    Honda, Junya
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
  • [5] On Optimal Foraging and Multi-armed Bandits
    Srivastava, Vaibhav
    Reverdy, Paul
    Leonard, Naomi E.
    2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
  • [6] On a Class of Restless Multi-armed Bandits with Deterministic Policies
    Jhunjhunwala, Prakirt Raj
    Moharir, Sharayu
    Manjunath, D.
    Gopalan, Aditya
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 487 - 491
  • [7] Optimal Algorithms for Multiplayer Multi-Armed Bandits
    Wang, Po-An
    Proutiere, Alexandre
    Ariu, Kaito
    Jedra, Yassir
    Russo, Alessio
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [8] Optimal Streaming Algorithms for Multi-Armed Bandits
    Jin, Tianyuan
    Huang, Keke
    Tang, Jing
    Xiao, Xiaokui
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Multi-armed Bandits: Competing with Optimal Sequences
    Anava, Oren
    Karnin, Zohar
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [10] Best Arm Identification in Restless Markov Multi-Armed Bandits
    Karthik, P. N.
    Reddy, Kota Srinivas
    Tan, Vincent Y. F.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (05) : 3240 - 3262