Distributed Reinforcement Learning for Cyber-Physical System With Multiple Remote State Estimation Under DoS Attacker

被引:43
作者
Dai, Pengcheng [1 ]
Yu, Wenwu [1 ,2 ]
Wang, He [3 ]
Wen, Guanghui [1 ,4 ]
Lv, Yuezu [1 ]
机构
[1] Southeast Univ, Sch Math, Guanghui Wen, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[3] Australian Natl Univ, Coll Engn & Comp Sci, Canberra, ACT, Australia
[4] RMIT Univ, Sch Engn, Melbourne, Vic 3001, Australia
来源
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING | 2020年 / 7卷 / 04期
基金
中国国家自然科学基金;
关键词
Learning (artificial intelligence); Games; State estimation; Sensor systems; Nash equilibrium; Channel estimation; Cyber-physical system; DoS attack; infinite time-horizon; distributed reinforcement learning; SENSOR;
D O I
10.1109/TNSE.2020.3018871
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In this paper, we consider cyber-physical system (CPS) with multiple remote state estimation under denial-of-service (DoS) attack in infinite time-horizon. The sensors monitor the system and send their local state estimate to remote estimators by choosing the local channels in "State 0" or "State 1". The aim of sensors is to find policies for choosing local channel in a specific state to transmit message to minimize the total estimation error covariance on account of energy-saving in an infinite time-horizon. The DoS attacker aims to achieve the opposite goal by choosing channels to attack or not. The games between sensors and DoS attacker under two different structures of public information are investigated, that is the open-loop case (where sensors and attacker cannot observe others' behaviors) and the closed-loop case (where sensors and attacker can observe the others' behaviors causally). For the open-loop case with assumption that the DoS attacker can get the information from the remote estimators to the sensors, the distributed reinforcement learning algorithms for sensors and attacker based on local information are proposed to find their Nash equilibrium policies, respectively. Further, we consider in closed loop case that the DoS attacker cannot get the information from the remote estimators to the sensors which leads to asymmetric information between the sensors and attacker. To derive Nash equilibrium policies for sensors and attacker, we convert the original game into a belief-based continuous-state stochastic game. The convergence of distributed reinforcement learning method is proved. Some simulations are presented to demonstrate its effectiveness.
引用
收藏
页码:3212 / 3222
页数:11
相关论文
共 37 条
[1]  
Cardenas Alvaro A., 2008, 2008 28th International Conference on Distributed Computing Systems Workshops (ICDCS Workshops), P495, DOI 10.1109/ICDCS.Workshops.2008.40
[2]   Distributed Collaborative Control for Industrial Automation With Wireless Sensor and Actuator Networks [J].
Chen, Jiming ;
Cao, Xianghui ;
Cheng, Peng ;
Xiao, Yang ;
Sun, Youxian .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2010, 57 (12) :4219-4230
[3]   Distributed Reinforcement Learning Algorithm for Dynamic Economic Dispatch With Unknown Generation Cost Functions [J].
Dai, Pengcheng ;
Yu, Wenwu ;
Wen, Guanghui ;
Baldi, Simone .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (04) :2258-2267
[4]   Input-to-State Stabilizing Control Under Denial-of-Service [J].
De Persis, Claudio ;
Tesi, Pietro .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2015, 60 (11) :2930-2944
[5]   Security Control for Discrete-Time Stochastic Nonlinear Systems Subject to Deception Attacks [J].
Ding, Derui ;
Wang, Zidong ;
Han, Qing-Long ;
Wei, Guoliang .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (05) :779-789
[6]   DoS Attacks on Remote State Estimation With Asymmetric Information [J].
Ding, Kemi ;
Ren, Xiaoqiang ;
Quevedo, Daniel E. ;
Dey, Subhrakanti ;
Shi, Ling .
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2019, 6 (02) :653-666
[7]   A multi-channel transmission schedule for remote state estimation under DoS attacks [J].
Ding, Kemi ;
Li, Yuzhe ;
Quevedo, Daniel E. ;
Dey, Subhrakanti ;
Shi, Ling .
AUTOMATICA, 2017, 78 :194-201
[8]   Event-Triggered Control Systems Under Denial-of-Service Attacks [J].
Dolk, V. S. ;
Tesi, P. ;
De Persis, C. ;
Heemels, W. P. M. H. .
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2017, 4 (01) :93-105
[9]   Networked Control Under DoS Attacks: Tradeoffs Between Resilience and Data Rate [J].
Feng, Shuai ;
Cetinkaya, Ahmet ;
Ishii, Hideaki ;
Tesi, Pietro ;
De Persis, Claudio .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (01) :460-467
[10]   On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage [J].
Gupta, V ;
Chung, TH ;
Hassibi, B ;
Murray, RM .
AUTOMATICA, 2006, 42 (02) :251-260