Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access

被引:229
作者
Liu, Keqin [1 ]
Zhao, Qing [1 ]
机构
[1] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA
基金
美国国家科学基金会;
关键词
Dynamic channel selection; indexability; myopic policy; opportunistic access; restless multiarmed bandit (RMAB); Whittle index; OPPORTUNISTIC SPECTRUM ACCESS; PRIORITY ALLOCATION; MARKOV-PROCESSES; POLICY; NETWORKS; CHANNEL;
D O I
10.1109/TIT.2010.2068950
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider a class of restless multiarmed bandit processes (RMABs) that arises in dynamic multichannel access, user/server scheduling, and optimal activation in multiagent systems. For this class of RMABs, we establish the indexability and obtain Whittle index in closed form for both discounted and average reward criteria. These results lead to a direct implementation of Whittle index policy with remarkably low complexity. When arms are stochastically identical, we show that Whittle index policy is optimal under certain conditions. Furthermore, it has a semiuniversal structure that obviates the need to know the Markov transition probabilities. The optimality and the semiuniversal structure result from the equivalence between Whittle index policy and the myopic policy established in this work. For nonidentical arms, we develop efficient algorithms for computing a performance upper bound given by Lagrangian relaxation. The tightness of the upper bound and the near-optimal performance of Whittle index policy are illustrated with simulation examples.
引用
收藏
页码:5547 / 5567
页数:21
相关论文
共 36 条
[1]   Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays [J].
Ahmad, Sahand Haji Ali ;
Liu, Mingyan .
2009 47TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, VOLS 1 AND 2, 2009, :1361-1368
[2]   Optimality of Myopic Sensing in Multichannel Opportunistic Access [J].
Ahmad, Sahand Haji Ali ;
Liu, Mingyan ;
Javidi, Tara ;
Zhao, Qing ;
Krishnamachari, Bhaskar .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (09) :4040-4050
[3]  
[Anonymous], P INF
[4]   Whittle's index policy for a multi-class queueing system with convex holding costs [J].
P. S. Ansell ;
K. D. Glazebrook ;
J. Niño-Mora ;
M. O'Keeffe .
Mathematical Methods of Operations Research, 2003, 57 (1) :21-39
[5]   DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY [J].
ARAPOSTATHIS, A ;
BORKAR, VS ;
FERNANDEZGAUCHERAND, E ;
GHOSH, MK ;
MARCUS, SI .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (02) :282-344
[6]   Joint design and separation principle for opportunistic spectrum access in the presence of sensing errors [J].
Chen, Yunxia ;
Zhao, Qing ;
Swami, Ananthram .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2008, 54 (05) :2053-2071
[7]   WHAT DO DISCOUNTED OPTIMA CONVERGE TO - A THEORY OF DISCOUNT RATE ASYMPTOTICS IN ECONOMIC-MODELS [J].
DUTTA, PK .
JOURNAL OF ECONOMIC THEORY, 1991, 55 (01) :64-94
[8]  
Ehsan N., 2004, P INFOCOM 2004, V3, P1974
[9]  
Gallager R. G., 1995, Discrete Stochastic Processes
[10]   CAPACITY OF A BURST-NOISE CHANNEL [J].
GILBERT, EN .
BELL SYSTEM TECHNICAL JOURNAL, 1960, 39 (05) :1253-1265