Deep Recurrent Deterministic Policy Gradient for Physical Control

被引：3

作者：

Zhang, Lei ^{[1
]}

Han, Shuai ^{[2
,3
]}

Zhang, Zhiruo ^{[1
]}

Li, Lefan ^{[1
]}

Lu, Shuai ^{[2
,3
]}

机构：

[1] Jilin Univ, Coll Software, Changchun 130012, Peoples R China

[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China

[3] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II | 2020年 / 12397卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Reinforcement learning; Neural networks; Deep learning;

D O I：

10.1007/978-3-030-61616-8_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The observable states play a significant role in Reinforcement Learning (RL), meanwhile, the performance of RL is strongly associated with the quality of inferred hidden states. It is a challenging task to accurately extract hidden states because they are often related to both environment's and agent's histories, and require numerous domain knowledge. In this work, we aim to leverage history information to improve the performance of agent. Firstly, we discuss that the neglect and usual process of history information are harmful to agent's performance. Secondly, we propose a novel model that combines the advantage of both supervised learning and RL. Specifically, we extend the framework of classical policy gradient and propose to extract history information using recurrent neural networks. Thirdly, we evaluate our model in simulated physical control environments, outperforming the state-of-the-art models and performing obviously better on more challenging tasks. Finally, we analyze the reasons and suggest possible approaches to extend and scale up the model.

引用

页码：257 / 268

页数：12

共 25 条

[1]

Abbeel P., 2015, ASS P 17 INT ACM, P1889, DOI DOI 10.1145/2700648.2809870

[2]

[Anonymous], 2016, LEARNING REINFORCEME

[3]

[Anonymous], 2015, RECURRENT REINFORCEM

[4]

Bengio Y, 2013, INT CONF ACOUST SPEE, P8624, DOI 10.1109/ICASSP.2013.6639349

[5]

Brockman Greg, 2016, arXiv

[6]

Dhariwal, 2017, OPENAI BASELINES

[7]

Duan Y, 2016, PR MACH LEARN RES, V48

[8] An Introduction to Deep Reinforcement Learning [J].

Francois-Lavet, Vincent ;

Henderson, Peter ;

Islam, Riashat ;

Bellemare, Marc G. ;

Pineau, Joelle .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2018, 11 (3-4) :219-354

[9]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[10]

Garnier P., 2019, REV DEEP REINFORCEME

← 1 2 3 →