Model-free incremental adaptive dynamic programming based approximate robust optimal regulation

被引:12
作者
Li, Cong [1 ]
Wang, Yongchao [1 ]
Liu, Fangzhou [1 ]
Liu, Qingchen [2 ]
Buss, Martin [1 ]
机构
[1] Tech Univ Munich, Automat Control Engn, Theresienstr 90, D-80333 Munich, Germany
[2] Tech Univ Munich, Informat Oriented Control, Munich, Germany
关键词
incremental adaptive dynamic programming; reinforcement learning; robust optimal regulation; time delay estimation; NONLINEAR-SYSTEMS; DESIGN; ALGORITHM; DIFFERENTIATION; STABILITY; TRACKING;
D O I
10.1002/rnc.5964
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article presents a new formulation for model-free robust optimal regulation of continuous-time nonlinear systems. The proposed reinforcement learning based approach, referred to as incremental adaptive dynamic programming (IADP), utilizes measured input-state data to allow the design of the approximate optimal incremental control strategy, stabilizing the controlled system incrementally under model uncertainties, environmental disturbances, and input saturation. By leveraging the time delay estimation (TDE) technique, we first use sensor data to reduce the requirement of a complete dynamics, where input-state data is adopted to construct an incremental dynamics which reflects the system evolution in an incremental form. Then, the resulting incremental dynamics serves to design the approximate optimal incremental control strategy based on adaptive dynamic programming, which is implemented as a simplified single critic structure to get the approximate solution to the value function of the Hamilton-Jacobi-Bellman equation. Furthermore, for the critic neural network, experience data are used to design an off-policy weight update law with guaranteed weight convergence. Rather importantly, we incorporate a TDE error bound related term into the cost function, whereby the unintentionally introduced TDE error is attenuated during the optimization process. The proofs of system stability and weight convergence are provided. Numerical simulations are conducted to validate the effectiveness and superiority of our proposed IADP, especially regarding the reduced control energy expenditure and the enhanced robustness.
引用
收藏
页码:2662 / 2682
页数:21
相关论文
共 48 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]  
Acquatella P. B., 2013, P EUROGNC DELF NETH, P1444
[3]   Online Model-Free n-Step HDP With Stability Analysis [J].
Al Dabooni, Seaar ;
Wunsch, Donald C., II .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (04) :1255-1269
[4]  
[Anonymous], 1998, DIGITAL CONTROL DYNA
[5]   A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].
Bhasin, S. ;
Kamalapurkar, R. ;
Johnson, M. ;
Vamvoudakis, K. G. ;
Lewis, F. L. ;
Dixon, W. E. .
AUTOMATICA, 2013, 49 (01) :82-92
[6]   Robust Identification-Based State Derivative Estimation for Nonlinear Systems [J].
Bhasin, Shubhendu ;
Kamalapurkar, Rushikesh ;
Dinh, Huyen T. ;
Dixon, Warren E. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (01) :187-192
[7]  
Boedecker J, 2014, 2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), P1
[8]   Reinforcement learning for control: Performance, stability, and deep approximators [J].
Busoniu, Lucian ;
de Bruin, Tim ;
Tolic, Domagoj ;
Kober, Jens ;
Palunko, Ivana .
ANNUAL REVIEWS IN CONTROL, 2018, 46 :8-28
[9]   A Systematic Method for Gain Selection of Robust PID Control for Nonlinear Plants of Second-Order Controller Canonical Form [J].
Chang, Pyung Hun ;
Jung, Je Hyung .
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2009, 17 (02) :473-483
[10]   Disturbance-Observer-Based Control and Related Methods-An Overview [J].
Chen, Wen-Hua ;
Yang, Jun ;
Guo, Lei ;
Li, Shihua .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2016, 63 (02) :1083-1095