Approximate Optimal Stabilization Control of Servo Mechanisms based on Reinforcement Learning Scheme

被引：16

作者：

Lv, Yongfeng ^{[1
]}

Ren, Xuemei ^{[1
]}

Hu, Shuangyi ^{[1
]}

Xu, Hao ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China

来源：

INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS | 2019年 / 17卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming; neural networks; optimal control; reinforcement learning; servomechanisms; ZERO-SUM GAMES; TRACKING CONTROL; ROBUST-CONTROL; MOTION CONTROL; TIME-SYSTEMS; PERFORMANCE; DYNAMICS;

D O I：

10.1007/s12555-018-0551-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A reinforcement learning (RL) based adaptive dynamic programming (ADP) is developed to learn the approximate optimal stabilization input of the servo mechanisms, where the unknown system dynamics are approximated with a three-layer neural network (NN) identifier. First, the servo mechanism model is constructed and a three-layer NN identifier is used to approximate the unknown servo system. The NN weights of both the hidden layer and output layer are synchronously tuned with an adaptive gradient law. An RL-based critic three-layer NN is then used to learn the optimal cost function, where NN weights of the first layer are set as constants, NN weights of the second layer are updated by minimizing the squared Hamilton-Jacobi-Bellman (HJB) error. The optimal stabilization input of the servomechanism is obtained based on the three-layer NN identifier and RL-based critic NN scheme, which can stabilize the motor speed from its initial value to the given value. Moreover, the convergence analysis of the identifier and RL-based critic NN is proved, the stability of the cost function with the proposed optimal input is analyzed. Finally, a servo mechanism model and a complex system are provided to verify the correctness of the proposed methods.

引用

页码：2655 / 2665

页数：11

共 38 条

[1] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[2] Robust nonlinear speed control of PM synchronous motor using boundary layer integral sliding mode control technique [J].

Baik, IC ;

Kim, KH ;

Youn, MJ .

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2000, 8 (01) :47-54

[3] Robust control with decoupling performance for steering and traction of 4WS vehicles under velocity-varying motion [J].

Jia, YM .

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2000, 8 (03) :554-569

[4] Alternative proofs for improved LMI representations for the analysis and the design of continuous-time systems with polytopic type uncertainty: A predictive approach [J].

Jia, YM .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (08) :1413-1416

[5] Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems [J].

Na, Jing ;

Herrmann, Guido .

IEEE/CAA Journal of Automatica Sinica, 2014, 1 (04) :412-422

[6]

Lee JM, 2004, INT J CONTROL AUTOM, V2, P263

[7] Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data [J].

Lewis, F. L. ;

Vamvoudakis, Kyriakos G. .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01) :14-25

[8]

Lewis F. L., 2013, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

[9] Integral reinforcement learning based decentralized optimal tracking control of unknown nonlinear large-scale interconnected systems with constrained-input [J].

Liu, Chong ;

Zhang, Huaguang ;

Xiao, Geyang ;

Sun, Shaoxin .

NEUROCOMPUTING, 2019, 323 :1-11

[10] Online Synchronous Approximate Optimal Learning Algorithm for Multiplayer Nonzero-Sum Games With Unknown Dynamics [J].

Liu, Derong ;

Li, Hongliang ;

Wang, Ding .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (08) :1015-1027

← 1 2 3 4 →