Reinforcement Learning-Based Control of Nonlinear Systems Using Lyapunov Stability Concept and Fuzzy Reward Scheme

被引:20
作者
Chen, Ming [1 ]
Lam, Hak Keung [1 ]
Shi, Qian [1 ]
Xiao, Bo [2 ,3 ]
机构
[1] Kings Coll London, Dept Engn, London WC2R 2LS, England
[2] Imperial Coll London, Hamlyn Ctr Robot Surg, London SW7 2AZ, England
[3] Imperial Coll London, Dept Comp, London SW7 2AZ, England
关键词
Optimization; Fuzzy logic; Lyapunov methods; Circuit stability; Stability analysis; Aerospace electronics; Circuits and systems; Proximal policy optimization (PPO); adjustable policy learning rate (APLR); Lyapunov reward system; fuzzy reward system; cart-pole inverted pendulum; DESIGN;
D O I
10.1109/TCSII.2019.2947682
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this brief, a reinforcement learning-based control approach for nonlinear systems is presented. The proposed control approach offers a design scheme of the adjustable policy learning rate (APLR) to reduce the influence imposed by negative or large advantages, which improves the learning stability of the proximal policy optimization (PPO) algorithm. Besides, this brief puts forward a Lyapunov-fuzzy reward system to further promote the learning efficiency. In addition, the proposed control approach absorbs the Lyapunov stability concept into the design of the Lyapunov reward system and a particular fuzzy reward system is set up using the knowledge of the cart-pole inverted pendulum and fuzzy inference system (FIS). The merits of the proposed approach are validated by simulation examples.
引用
收藏
页码:2059 / 2063
页数:5
相关论文
共 25 条
[1]  
[Anonymous], 2015, COMPUTER SCI
[2]  
[Anonymous], 2015, ACS SYM SER
[3]   Toward Self-Driving Bicycles Using State-of-the-Art Deep Reinforcement Learning Algorithms [J].
Choi, SeungYoon ;
Le, Tuyen P. ;
Nguyen, Quang D. ;
Abu Layek, Md ;
Lee, SeungGwan ;
Chung, TaeChoong .
SYMMETRY-BASEL, 2019, 11 (02)
[4]  
Hamalainen P., 2018, PPO CMA PROXIMAL POL
[5]   Imitation Reinforcement Learning-Based Remote Rotary Inverted Pendulum Control in OpenFlow Network [J].
Kim, Ju-Bong ;
Lim, Hyun-Kyo ;
Kim, Chan-Myung ;
Kim, Min-Suk ;
Hong, Yong-Geun ;
Han, Youn-Hee .
IEEE ACCESS, 2019, 7 :36682-36690
[6]  
Kim S. K., IEEE T CIRCUITS SYST
[7]   Energy management in solar microgrid via reinforcement learning using fuzzy reward [J].
Kofinas, Panagiotis ;
Vouros, George ;
Dounis, Anastasios I. .
ADVANCES IN BUILDING ENERGY RESEARCH, 2018, 12 (01) :97-115
[8]   A review on stability analysis of continuous-time fuzzy-model-based control systems: From membership-function-independent to membership-function-dependent analysis [J].
Lam, H. K. .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 67 :390-408
[9]   Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints [J].
Liu, Derong ;
Yang, Xiong ;
Wang, Ding ;
Wei, Qinglai .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (07) :1372-1385
[10]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533