Whittle Index for Partially Observed Binary Markov Decision Processes

被引:7
作者
Borkar, Vivek S. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Mumbai 400076, Maharashtra, India
关键词
Binary bandits; dynamic programming; ergodic cost; partially observed bandits; threshold policies; Whittle index; THEOREMS;
D O I
10.1109/TAC.2017.2715329
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of dynamically scheduling M out of N binary Markov chains when only noisy observations of state are available, with ergodic (equivalently, long run average) reward. By passing on to the equivalent problem of controlling the conditional distribution of state given observations and controls, it is cast as a restless bandit problem and its Whittle indexability is established.
引用
收藏
页码:6614 / 6618
页数:5
相关论文
共 17 条
  • [1] Structural properties of optimal transmission policies over a randomly varying channel
    Agarwal, Mukul
    Borkar, Vivek S.
    Karandikar, Abhay
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2008, 53 (06) : 1476 - 1491
  • [2] Optimality of Myopic Sensing in Multichannel Opportunistic Access
    Ahmad, Sahand Haji Ali
    Liu, Mingyan
    Javidi, Tara
    Zhao, Qing
    Krishnamachari, Bhaskar
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (09) : 4040 - 4050
  • [3] [Anonymous], 1991, PITMAN RES NOTES MAT
  • [4] Congestion control of TCP flows in Internet routers by means of index policy
    Avrachenkov, K.
    Ayesta, U.
    Doncel, J.
    Jacko, P.
    [J]. COMPUTER NETWORKS, 2013, 57 (17) : 3463 - 3478
  • [5] Dynamic programming for ergodic control of Markov chains under partial observations (vol 39, pg 673, 2000)
    Borkar, Vivek S.
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2007, 45 (06) : 2299 - 2304
  • [6] OPTIMAL-CONTROL FOR PARTIALLY OBSERVED DIFFUSIONS
    FLEMING, WH
    PARDOUX, E
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1982, 20 (02) : 261 - 285
  • [7] Fryer Jr R. G., 2013, W19043 NAT BUR EC RE
  • [8] Multi-UAV dynamic routing with partial observations using restless bandit allocation indices
    Le Ny, Jerome
    Dahleh, Munther
    Feron, Eric
    [J]. 2008 AMERICAN CONTROL CONFERENCE, VOLS 1-12, 2008, : 4220 - +
  • [9] Meshram R., 2016, WHITTLE INDEX RESTLE
  • [10] Envelope theorems for arbitrary choice sets
    Milgrom, P
    Segal, I
    [J]. ECONOMETRICA, 2002, 70 (02) : 583 - 601