Principled reward shaping for reinforcement learning via lyapunov stability theory

被引:45
作者
Dong, Yunlong [1 ]
Tang, Xiuchuan [1 ]
Yuan, Ye [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Key Lab Imaging Proc & Intelligent Control, State Key Lab Digital Mfg Equipments & Technol, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Principled reward shaping; Lyapunov stability theory; Stochastic approximation;
D O I
10.1016/j.neucom.2020.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) suffers from the designation in reward function and the large computational iterating steps until convergence. How to accelerate the training process in RL plays a vital role. In this paper, we proposed a Lyapunov function based approach to shape the reward function which can effectively accelerate the training. Furthermore, the shaped reward function leads to convergence guarantee via stochastic approximation, an invariant optimality condition using Bellman Equation and an asymptotical unbiased policy. Moreover, sufficient RL benchmarks have been experimented to demonstrate the effectiveness of our proposed method. It has been verified that our proposed method substantially accelerates the convergence process as well as improves the performance in terms of a higher accumulated reward. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:83 / 90
页数:8
相关论文
共 42 条
[1]  
Abed-alguni B. H., 2018, Int. J. Artif. Intell, V16, P41
[2]   Learning dexterous in-hand manipulation [J].
Andrychowicz, Marcin ;
Baker, Bowen ;
Chociej, Maciek ;
Jozefowicz, Rafal ;
McGrew, Bob ;
Pachocki, Jakub ;
Petron, Arthur ;
Plappert, Matthias ;
Powell, Glenn ;
Ray, Alex ;
Schneider, Jonas ;
Sidor, Szymon ;
Tobin, Josh ;
Welinder, Peter ;
Weng, Lilian ;
Zaremba, Wojciech .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) :3-20
[3]  
[Anonymous], 2018, P ICLR
[4]  
[Anonymous], 2018, P 32 AAAI C ART INT
[5]  
[Anonymous], 1999, P ICML
[6]  
[Anonymous], 2013, P 12 INT C AUTONOMOU
[7]  
[Anonymous], ARXIV170706347
[8]  
[Anonymous], 2003, P ICML
[9]  
[Anonymous], 2019, ARXIV190302020
[10]  
[Anonymous], 2019, ARXIV190203079