Reinforcement Learning-Based Control of Nonlinear Systems Using Lyapunov Stability Concept and Fuzzy Reward Scheme

被引：20

作者：

Chen, Ming ^{[1
]}

Lam, Hak Keung ^{[1
]}

Shi, Qian ^{[1
]}

Xiao, Bo ^{[2
,3
]}

机构：

[1] Kings Coll London, Dept Engn, London WC2R 2LS, England

[2] Imperial Coll London, Hamlyn Ctr Robot Surg, London SW7 2AZ, England

[3] Imperial Coll London, Dept Comp, London SW7 2AZ, England

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2020年 / 67卷 / 10期

关键词：

Optimization; Fuzzy logic; Lyapunov methods; Circuit stability; Stability analysis; Aerospace electronics; Circuits and systems; Proximal policy optimization (PPO); adjustable policy learning rate (APLR); Lyapunov reward system; fuzzy reward system; cart-pole inverted pendulum; DESIGN;

D O I：

10.1109/TCSII.2019.2947682

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this brief, a reinforcement learning-based control approach for nonlinear systems is presented. The proposed control approach offers a design scheme of the adjustable policy learning rate (APLR) to reduce the influence imposed by negative or large advantages, which improves the learning stability of the proximal policy optimization (PPO) algorithm. Besides, this brief puts forward a Lyapunov-fuzzy reward system to further promote the learning efficiency. In addition, the proposed control approach absorbs the Lyapunov stability concept into the design of the Lyapunov reward system and a particular fuzzy reward system is set up using the knowledge of the cart-pole inverted pendulum and fuzzy inference system (FIS). The merits of the proposed approach are validated by simulation examples.

引用

页码：2059 / 2063

页数：5

共 25 条

[1]

[Anonymous], 2015, COMPUTER SCI

[2]

[Anonymous], 2015, ACS SYM SER

[3] Toward Self-Driving Bicycles Using State-of-the-Art Deep Reinforcement Learning Algorithms [J].

Choi, SeungYoon ;

Le, Tuyen P. ;

Nguyen, Quang D. ;

Abu Layek, Md ;

Lee, SeungGwan ;

Chung, TaeChoong .

SYMMETRY-BASEL, 2019, 11 (02)

[4]

Hamalainen P., 2018, PPO CMA PROXIMAL POL

[5] Imitation Reinforcement Learning-Based Remote Rotary Inverted Pendulum Control in OpenFlow Network [J].

Kim, Ju-Bong ;

Lim, Hyun-Kyo ;

Kim, Chan-Myung ;

Kim, Min-Suk ;

Hong, Yong-Geun ;

Han, Youn-Hee .

IEEE ACCESS, 2019, 7 :36682-36690

[6]

Kim S. K., IEEE T CIRCUITS SYST

[7] Energy management in solar microgrid via reinforcement learning using fuzzy reward [J].

Kofinas, Panagiotis ;

Vouros, George ;

Dounis, Anastasios I. .

ADVANCES IN BUILDING ENERGY RESEARCH, 2018, 12 (01) :97-115

[8] A review on stability analysis of continuous-time fuzzy-model-based control systems: From membership-function-independent to membership-function-dependent analysis [J].

Lam, H. K. .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 67 :390-408

[9] Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints [J].

Liu, Derong ;

Yang, Xiong ;

Wang, Ding ;

Wei, Qinglai .

IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (07) :1372-1385

[10] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

← 1 2 3 →