Dynamic Control of Birth-and-Death Restless Bandits: Application to Resource-Allocation Problems

被引:28
作者
Larranaga, Maialen [1 ]
Ayesta, Urtzi [2 ,3 ,4 ,5 ]
Verloop, Ina Maria [5 ,6 ]
机构
[1] Cent Supelec, CNRS, Signaux & Syst Lab, F-91190 Gif Sur Yvette, France
[2] CNRS, Lab Anal & Architecture Syst, F-31400 Toulouse, France
[3] Basque Fdn Sci, Ikerbasque, Bilbao 48011, Spain
[4] Univ Basque Country, UPV EHU, Donostia San Sebastian 20018, Spain
[5] Univ Toulouse, Lab Anal & Architecture Syst, Inst Natl Polytech, F-31400 Toulouse, France
[6] CNRS, Inst Rech Informat Toulouse, F-31071 Toulouse, France
关键词
Whittle's index; restless bandits; birth-and-death processes; fluid approximation; Lagrangian relaxation; index policies; INDEX POLICIES; QUEUE; PERFORMANCE; RULE;
D O I
10.1109/TNET.2016.2562564
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We develop a unifying framework to obtain efficient index policies for restless multi-armed bandit problems with birth-and-death state evolution. This is a broad class of stochastic resource allocation problems whose objective is to determine efficient policies to share resources among competing projects. In a seminal work, Whittle developed a methodology to derive well-performing (Whittle's) index policies that are obtained by solving a relaxed version of the original problem. Our first main contribution is the derivation of a closed-form expression for Whittle's index as a function of the steady-state probabilities. In some particular cases, qualitative insights can be obtained from its expression; nevertheless, it requires several technical conditions to be verified. We, therefore, formulate a fluid version of the relaxed optimization problem, and in our second main contribution, we develop a fluid index policy. The latter does provide qualitative insights and it is equivalent to Whittle's index policy in the light-traffic regime. The applicability of our approach is illustrated by two important problems: optimal class selection and optimal load balancing. Allowing state-dependent capacities, we can model important phenomena, e.g., power-aware server-farms and opportunistic scheduling in wireless systems. Whittle's index and our fluid index policy show remarkably good performance in numerical simulations.
引用
收藏
页码:3812 / 3825
页数:14
相关论文
共 29 条
[1]  
Aalto Samuli, 2015, ACM SIGMETRICS Performance Evaluation Review, V43, P57, DOI 10.1145/2745844.2745851
[2]   Whittle's index policy for a multi-class queueing system with convex holding costs [J].
P. S. Ansell ;
K. D. Glazebrook ;
J. Niño-Mora ;
M. O'Keeffe .
Mathematical Methods of Operations Research, 2003, 57 (1) :21-39
[3]   The cμ/θ Rule for Many-Server Queues with Abandonment [J].
Atar, Rami ;
Giat, Chanit ;
Shimkin, Nahum .
OPERATIONS RESEARCH, 2010, 58 (05) :1427-1439
[4]  
Avram F., 1994, STOCHASTIC NETWORKS, V71, P199
[5]  
Bertsekas D., 2012, Dynamic Programming and Optimal Control
[6]   The single-server scheduling problem with convex costs [J].
Bispo, Carlos F. .
QUEUEING SYSTEMS, 2013, 73 (03) :261-294
[7]   User-level performance of channel-aware scheduling algorithms in wireless data networks [J].
Borst, S .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2005, 13 (03) :636-647
[8]   THE C-MU RULE REVISITED [J].
BUYUKKOC, C ;
VARAIYA, P ;
WALRAND, J .
ADVANCES IN APPLIED PROBABILITY, 1985, 17 (01) :237-238
[9]   Dynamic control of a queue with adjustable service rate [J].
George, JM ;
Harrison, JM .
OPERATIONS RESEARCH, 2001, 49 (05) :720-731
[10]  
Gittins J., 2011, Multi-armed bandit allocation indices