Continual Reinforcement Learning Formulation for Zero-Sum Game-Based Constrained Optimal Tracking

被引:10
作者
Farzanegan, Behzad [1 ]
Jagannathan, Sarangapani [1 ]
机构
[1] Misouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO 65409 USA
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2023年 / 53卷 / 12期
关键词
Barrier Lyapunov function; experience replay; hybrid learning; lifelong learning; optimal tracking control; safety; zero-sum game (ZSG);
D O I
10.1109/TSMC.2023.3299556
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study provides a novel reinforcement learning-based optimal tracking control of partially uncertain nonlinear discrete-time (DT) systems with state constraints using zero-sum game (ZSG) formulation. To address optimal tracking, a novel augmented system consisting of tracking error and its integral value, along with an uncertain desired trajectory, is constructed. A barrier function (BF) with a tradeoff factor is incorporated into the cost function to keep the state trajectories to remain within a compact set and to balance safety with optimality. Next, by using the modified value functional, the ZSG formulation is introduced wherein an actor-critic neural network (NN) framework is employed to approximate the value functional, optimal control input, and worst disturbance. The critic NN weights are tuned once at the sample instants and then iteratively within sampling instants. Using control input errors, the actor NN weights are adjusted once a sampling instant. The concurrent learning term in the critic weight tuning law overcomes the need for the persistency excitation (PE) condition. Further, a weight consolidation scheme is incorporated into the critic update law to attain lifelong learning by overcoming catastrophic forgetting. Finally, a numerical example supports the analytical claims.
引用
收藏
页码:7744 / 7757
页数:14
相关论文
共 30 条
[1]   Control Barrier Function Based Quadratic Programs for Safety Critical Systems [J].
Ames, Aaron D. ;
Xu, Xiangru ;
Grizzle, Jessy W. ;
Tabuada, Paulo .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) :3861-3876
[2]   Set invariance in control [J].
Blanchini, F .
AUTOMATICA, 1999, 35 (11) :1747-1767
[3]  
Boyd SP., 2004, Convex Optimization
[4]   Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation [J].
Chowdhary, Girish ;
Johnson, Eric .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3674-3679
[5]   Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update [J].
Dierks, Travis ;
Jagannathan, Sarangapani .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) :1118-1129
[6]   A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems [J].
Fisac, Jaime F. ;
Akametalu, Anayo K. ;
Zeilinger, Melanie N. ;
Kaynama, Shahab ;
Gillula, Jeremy ;
Tomlin, Claire J. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (07) :2737-2752
[7]   Online Solution of Two-Player Zero-Sum Games for Continuous-Time Nonlinear Systems With Completely Unknown Dynamics [J].
Fu, Yue ;
Chai, Tianyou .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (12) :2577-2587
[8]   Iterative ADP learning algorithms for discrete-time multi-player games [J].
Jiang, He ;
Zhang, Huaguang .
ARTIFICIAL INTELLIGENCE REVIEW, 2018, 50 (01) :75-91
[9]   Concurrent learning-based approximate feedback-Nash equilibrium solution of N-player nonzero-sum differential games [J].
Kamalapurkar, Rushikesh ;
Klotz, Justin R. ;
Dixon, Warren E. .
IEEE/CAA Journal of Automatica Sinica, 2014, 1 (03) :239-247
[10]   Safe model-based reinforcement learning for nonlinear optimal control with state and input constraints [J].
Kim, Yeonsoo ;
Kim, Jong Woo .
AICHE JOURNAL, 2022, 68 (05)