On a Class of Restless Multi-armed Bandits with Deterministic Policies

被引：0

作者：

Jhunjhunwala, Prakirt Raj ^{[1
]}

Moharir, Sharayu ^{[1
]}

Manjunath, D. ^{[1
]}

Gopalan, Aditya ^{[2
]}

机构：

[1] Indian Inst Technol, Bombay, Maharashtra, India

[2] Indian Inst Sci Bangalore, Bangalore, Karnataka, India

来源：

2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年

关键词：

OPPORTUNISTIC ACCESS; OPTIMALITY; INDEX;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We describe and analyze a restless multi-armed bandit (RMAB) in which, in each time-slot, the instantaneous reward from the playing of an arm depends on the time since the arm was last played. This model is motivated by recommendation systems where the payoff from a recommendation on depends the recommendation history. For an RMAB with N arms, and known reward functions for each arm that have a finite support (akin to a maximum memory) of M steps, we characterize the optimal policy that maximizes the infinite horizon time-average of the reward. Specifically, using a weighted-graph representation of the system evolution, we show that a periodic policy is optimal. Further, we show that the optimal periodic policy can be obtained using an algorithm with polynomial time and space complexity. Some extensions to the basic model are also presented; several more are possible. RMABs with such large state spaces for the arms have not been previously considered.

引用

页码：487 / 491

页数：5

共 50 条

[21] Multi-armed bandits with dependent arms
Singh, Rahul
Liu, Fang
Sun, Yin
Shroff, Ness
MACHINE LEARNING, 2024, 113 (01) : 45 - 71
[22] On Kernelized Multi-Armed Bandits with Constraints
Zhou, Xingyu
Ji, Bo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[23] Multi-Armed Bandits in Metric Spaces
Kleinberg, Robert
Slivkins, Aleksandrs
Upfal, Eli
STOC'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL SYMPOSIUM ON THEORY OF COMPUTING, 2008, : 681 - +
[24] Multi-Armed Bandits With Costly Probes
Elumar, Eray Can
Tekin, Cem
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (01) : 618 - 643
[25] On Optimal Foraging and Multi-armed Bandits
Srivastava, Vaibhav
Reverdy, Paul
Leonard, Naomi E.
2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
[26] MULTI-ARMED BANDITS AND THE GITTINS INDEX
WHITTLE, P
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1980, 42 (02): : 143 - 149
[27] Multi-armed bandits with episode context
Christopher D. Rosin
Annals of Mathematics and Artificial Intelligence, 2011, 61 : 203 - 230
[28] Multi-armed bandits with switching penalties
Asawa, M
Teneketzis, D
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348
[29] Active Learning in Multi-armed Bandits
Antos, Andras
Grover, Varun
Szepesvari, Csaba
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
[30] Multi-Armed Bandits With Correlated Arms
Gupta, Samarth
Chaudhari, Shreyas
Joshi, Gauri
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732

← 1 2 3 4 5 →