Explainable and Safety Aware Deep Reinforcement Learning-Based Control of Nonlinear Discrete-Time Systems Using Neural Network Gradient Decomposition

被引:0
作者
Farzanegan, Behzad [1 ]
Jagannathan, Sarangapani [1 ]
机构
[1] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO 65401 USA
关键词
Safety; Multi-layer neural network; Optimal control; Artificial neural networks; Trajectory; Cost function; Tuning; Computational modeling; Training; Steady-state; Deep reinforcement learning; approximate dynamic programming; singular value decomposition-based weight tuning; online safety-aware control; explainability;
D O I
10.1109/TASE.2025.3554431
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents an explainable deep-reinforcement learning (DRL)-based safety-aware optimal adaptive tracking (SOAT) scheme for a class of nonlinear discrete-time (DT) affine systems subject to state inequality constraints. The DRL-based SOAT utilizes a multilayer neural network (MNN)-based actor-critic to estimate the cost function and optimal policy while the MNN update laws are tuned both using the singular value decomposition (SVD) of activation function gradient in order to mitigate the vanishing gradient issue and safety-aware Bellman error at each layer. An approximate safety-aware optimal policy is developed using Karush-Kuhn-Tucker (KKT) conditions by incorporating the higher-order control barrier function (HOCBF) into the Hamiltonian through the Lagrangian multiplier. The resulting safety-aware Bellman error helps with safe exploration both during online learning phase and at steady state without any explicit actor-critic MNN update law changes. To study the explainability and gain insights, we employ the Shapley Additive Explanations (SHAP) method to construct an explainer model for the DRL-based SOAT scheme in order to identify the important features in determining the optimal policy. The overall stability is established. Finally, the effectiveness of the proposed method is demonstrated on Shipboard Power Systems (SPS), achieving over a 35% reduction in cumulative cost compared to the existing actor-critic MNN optimal control policy. Note to Practitioners-In practical control systems, meeting safety constraints is often critical since ignoring constraints can lead to degraded performance or damage to equipment. This paper addresses the challenge of a safe DRL-based control approach that not only optimizes performance but also integrates robust safety assurances. Our DRL-based SOAT scheme specifically targets nonlinear discrete-time systems that must satisfy state inequality constraints. The successful proposed control performance in simulations on a Shipboard Power System demonstrates the potential for practical applications. DRL-based SOAT employs an MNN with an actor-critic framework for continuous learning and policy adaptation. Integrating HOCBFs directly into the optimization ensures safe operation, even during online learning, which is critical for real-time applications. The addition of SHAP enhances transparency by identifying key features that influence control decisions. Future work could adapt this framework to other constrained environments, such as autonomous vehicles, robotics, and industrial automation, where safety, optimality, and explainability are essential.
引用
收藏
页码:13556 / 13568
页数:13
相关论文
共 41 条
[1]  
Agrawal A, 2017, ROBOTICS: SCIENCE AND SYSTEMS XIII
[2]   Control Barrier Function Based Quadratic Programs for Safety Critical Systems [J].
Ames, Aaron D. ;
Xu, Xiangru ;
Grizzle, Jessy W. ;
Tabuada, Paulo .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) :3861-3876
[3]  
Ames AD, 2014, IEEE DECIS CONTR P, P6271, DOI 10.1109/CDC.2014.7040372
[4]  
Bandyopadhyay S, 2024, Arxiv, DOI arXiv:2305.12967
[5]  
Bevrani H, 2014, POWER ELECTRON POWER, P1, DOI 10.1007/978-3-319-07278-4
[6]   Practical tracking control of perturbed uncertain nonaffine systems with full state constraints [J].
Cao, Ye ;
Song, Yongduan ;
Wen, Changyun .
AUTOMATICA, 2019, 110
[7]  
Chen YX, 2020, P AMER CONTR CONF, P5407, DOI [10.23919/ACC45564.2020.9147721, 10.23919/acc45564.2020.9147721]
[8]   Generalized Hamilton-Jacobi-Blellman formulation-based neural network control of affine nonlinear discrete-time systems [J].
Chen, Zheng ;
Jagannathan, Sarangapani .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (01) :90-106
[9]  
Cohen MH, 2020, IEEE DECIS CONTR P, P2062, DOI [10.1109/cdc42340.2020.9303896, 10.1109/CDC42340.2020.9303896]
[10]   Rule-Based Safe Probabilistic Movement Primitive Control via Control Barrier Functions [J].
Davoodi, Mohammadreza ;
Iqbal, Asif ;
Cloud, Joseph M. ;
Beksi, William J. ;
Gans, Nicholas R. .
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2023, 20 (03) :1500-1514