Sensor Activation Policy Optimization for Opacity Enforcement Based on Reinforcement Learning

被引:1
作者
He, Jiahan [1 ]
Wang, Deguang [1 ]
Yang, Ming [1 ]
Liang, Chengbin [1 ]
机构
[1] Guizhou Univ, Sch Elect Engn, Guiyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Sensors; Heuristic algorithms; Sensor systems; Optimization; Intelligent sensors; Q-learning; Costs; Switches; Monitoring; Supervisory control; Discrete-event system (DES); dynamic sensor activation; numerical optimization; opacity enforcement; reinforcement learning (RL); DISCRETE-EVENT SYSTEMS;
D O I
10.1109/JSEN.2024.3471931
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As a confidentiality property, opacity characterizes the ability of an external intruder to infer the secret information of a system. Ensuring opacity can be realized by dynamic sensor activation to manage event observability. By controlling which sensors are active and what events are observable, the system can effectively prevent the exposure of sensitive information, ensuring that the confidential parts of its behavior remain opaque. In practice, event hiding and sensor switching involved in dynamic sensor activation are recognized as costly operations. This study addresses the numerical optimization problem of sensor activation policy (SAP) to enforce opacity using reinforcement learning (RL). A most permissive observer (MPO) is used to incorporate all valid SAPs that ensure opacity. The quantitative objective of the optimization problem is to minimize the maximum discounted total cost. A systematic procedure is provided to convert MPO into a Markov game, facilitating the use of RL techniques. Minimax Q-learning is presented as the methodology to derive an optimal policy for sensor activation/deactivation decisions from the convergent Q-table. Finally, the effectiveness and applicability of the proposed method are demonstrated on a location-tracking problem in a smart factory setting.
引用
收藏
页码:38429 / 38439
页数:11
相关论文
共 40 条
[1]   Analysis and Control for Resilience of Discrete Event Systems [J].
Basilio, Joao Carlos ;
Hadjicostis, Christoforos N. ;
Su, Rong .
FOUNDATIONS AND TRENDS IN SYSTEMS AND CONTROL, 2021, 8 (04) :285-443
[2]   Modelling Opacity Using Petri Nets [J].
Bryans, Jeremy W. ;
Koutny, Maciej ;
Ryan, Peter Y. A. .
ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2005, 121 :101-115
[3]   Synthesis of opaque systems with static and dynamic masks [J].
Cassez, Franck ;
Dubreil, Jeremy ;
Marchand, Herve .
FORMAL METHODS IN SYSTEM DESIGN, 2012, 40 (01) :88-115
[4]   Dynamic Observers for Fault Diagnosis of Timed Systems [J].
Cassez, Franck .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :4359-4364
[5]  
Cassez F, 2008, FUND INFORM, V88, P497
[6]   Q-Learning: Theory and Applications [J].
Clifton, Jesse ;
Laber, Eric .
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 7, 2020, 2020, 7 :279-301
[7]   On Most Permissive Observers in Dynamic Sensor Activation Problems [J].
Dallal, Eric ;
Lafortune, Stephane .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (04) :966-981
[8]  
Dorigo M, 2004, ANT COLONY OPTIMIZATION, P1
[9]   Supervisory Control for Opacity [J].
Dubreil, Jeremy ;
Darondeau, Philippe ;
Marchand, Herve .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2010, 55 (05) :1089-1100
[10]   An improved weighted sum-fuzzy Dijkstra's algorithm for shortest path problem (iWSFDA) [J].
Dudeja, Chanchal ;
Kumar, Pawan .
SOFT COMPUTING, 2022, 26 (07) :3217-3226