On a Class of Restless Multi-armed Bandits with Deterministic Policies

被引:0
|
作者
Jhunjhunwala, Prakirt Raj [1 ]
Moharir, Sharayu [1 ]
Manjunath, D. [1 ]
Gopalan, Aditya [2 ]
机构
[1] Indian Inst Technol, Bombay, Maharashtra, India
[2] Indian Inst Sci Bangalore, Bangalore, Karnataka, India
来源
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年
关键词
OPPORTUNISTIC ACCESS; OPTIMALITY; INDEX;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We describe and analyze a restless multi-armed bandit (RMAB) in which, in each time-slot, the instantaneous reward from the playing of an arm depends on the time since the arm was last played. This model is motivated by recommendation systems where the payoff from a recommendation on depends the recommendation history. For an RMAB with N arms, and known reward functions for each arm that have a finite support (akin to a maximum memory) of M steps, we characterize the optimal policy that maximizes the infinite horizon time-average of the reward. Specifically, using a weighted-graph representation of the system evolution, we show that a periodic policy is optimal. Further, we show that the optimal periodic policy can be obtained using an algorithm with polynomial time and space complexity. Some extensions to the basic model are also presented; several more are possible. RMABs with such large state spaces for the arms have not been previously considered.
引用
收藏
页码:487 / 491
页数:5
相关论文
共 50 条
  • [1] Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandits
    Mansourifard, Parisa
    Javidi, Tara
    Krishnamachari, Bhaskar
    2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 877 - 882
  • [2] Best Arm Identification in Restless Markov Multi-Armed Bandits
    Karthik, P. N.
    Reddy, Kota Srinivas
    Tan, Vincent Y. F.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (05) : 3240 - 3262
  • [3] Fresh Caching of Dynamic Contents using Restless Multi-armed Bandits
    Koley, Ankita
    Singh, Chandramani
    2024 IEEE 21ST INTERNATIONAL CONFERENCE ON MOBILE AD-HOC AND SMART SYSTEMS, MASS 2024, 2024, : 238 - 246
  • [4] Restless Multi-Armed Bandits under Exogenous Global Markov Process
    Gafni, Tomer
    Yemini, Michal
    Cohen, Kobi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 5218 - 5222
  • [5] Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits
    Li, Dexun
    Varakantham, Pradeep
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 1158 - 1167
  • [6] ε-First Policies for Budget-Limited Multi-Armed Bandits
    Long Tran-Thanh
    Chapman, Archie
    de Cote, Enrique Munoz
    Rogers, Alex
    Jennings, Nicholas R.
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1211 - 1216
  • [7] Optimal Learning Policies for Differential Privacy in Multi-armed Bandits
    Wang, Siwei
    Zhu, Jun
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [8] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [9] Regional Multi-Armed Bandits
    Wang, Zhiyang
    Zhou, Ruida
    Shen, Cong
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [10] Multi-armed Bandits with Compensation
    Wang, Siwei
    Huang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31