Learning Optimal Scheduling Policy for Remote State Estimation Under Uncertain Channel Condition

被引：30

作者：

Wu, Shuang ^{[1
]}

Ren, Xiaoqiang ^{[2
]}

Jia, Qing-Shan ^{[3
]}

Johansson, Karl Henrik ^{[4
]}

Shi, Ling ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

[2] Shanghai Univ, Sch Mechatron Engn & Automat, Shanghai 200444, Peoples R China

[3] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Dept Automat, Ctr Intelligent & Networked Syst, Beijing 100084, Peoples R China

[4] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, S-11428 Stockholm, Sweden

来源：

IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS | 2020年 / 7卷 / 02期

基金：

中国国家自然科学基金; 瑞典研究理事会;

关键词：

Learning algorithm; scheduling; state estimation; threshold structure;

D O I：

10.1109/TCNS.2019.2959162

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider optimal sensor scheduling with unknown communication channel statistics. We formulate two types of scheduling problems with the communication rate being a soft or hard constraint, respectively. We first present some structural results on the optimal scheduling policy using dynamic programming and assuming that the channel statistics is known. We prove that the Q-factor is monotonic and submodular, which leads to thresholdlike structures in both problems. Then, we develop a stochastic approximation and parameter learning frameworks to deal with the two scheduling problems with unknown channel statistics. We utilize their structures to design specialized learning algorithms. We, then prove the convergence of these algorithms. Performance improvement compared with the standard Q-learning algorithm is shown through numerical examples, which will also discuss an alternative method based on recursive estimation of the channel quality.

引用

页码：579 / 591

页数：13

共 31 条

[1] Learning algorithms or Markov decision processes with average cost
Abounadi, J
Bertsekas, D
Borkar, VS
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
[2] ADAPTIVE-CONTROL OF CONSTRAINED MARKOV-CHAINS
ALTMAN, E
SCHWARTZ, A
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1991, 36 (04) : 454 - 462
[3] [Anonymous], 1999, STOCH MODEL SER, DOI 10.1201/9781315140223
[4] [Anonymous], 1997, OPTIMIZATION VECTOR
[5] [Anonymous], 1998, INTRO REINFORCEMENT
[6] [Anonymous], 2002, Internat. Ser. Oper. Res. Management Sci.
[7] Bertsekas DP, 1995, PROCEEDINGS OF THE 34TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, P560, DOI 10.1109/CDC.1995.478953
[8] Bhatnagar Shalabh, 2012, Stochastic recursive algorithms for optimization: simultaneous perturbation methods, V434
[9] An actor-critic algorithm for constrained Markov decision processes
Borkar, VS
[J]. SYSTEMS & CONTROL LETTERS, 2005, 54 (03) : 207 - 213
[10] The ODE method for convergence of stochastic approximation and reinforcement learning
Borkar, VS
Meyn, SP
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) : 447 - 469

← 1 2 3 4 →