GAN-Based Planning Model in Deep Reinforcement Learning

被引:0
作者
Chen, Song [1 ]
Jiang, Junpeng [1 ]
Zhang, Xiaofang [1 ,2 ]
Wu, Jinjin [1 ]
Lu, Gongzheng [2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II | 2020年 / 12397卷
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Model-based; Planning; Generative Adversarial Net;
D O I
10.1007/978-3-030-61616-8_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning methods have achieved unprecedented success in many high-dimensional and large-scale space sequential decision-making tasks. In these methods, model-based methods rely on planning as their primary component, while model-free methods primarily rely on learning. However, the accuracy of the environmental model has a significant impact on the learned policy. When the model is incorrect, the planning process is likely to compute a suboptimal policy. In order to get a more accurate environmental model, this paper introduces the GAN-based Planning Model (GBPM) exploiting the strong expressive ability of Generative Adversarial Net (GAN), which can learn to simulate the environment from experience and construct implicit planning. The GBPM can be trained using real transfer samples experienced by the agent. Then, the agent can utilize the GBPM to produce simulated experience or trajectories so as to improve the learned policy. The GBPM can act as a role for experience replay so that it can be applied to both model-based and model-free methods, such as Dyna, DQN, ACER, and so on. Experimental results indicate that the GBPM can improve the data efficiency and algorithm performance on Maze and Atari 2600 game domain.
引用
收藏
页码:323 / 334
页数:12
相关论文
共 24 条
[1]   Experience Replay for Real-Time Reinforcement Learning Control [J].
Adam, Sander ;
Busoniu, Lucian ;
Babuska, Robert .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02) :201-212
[2]  
Brockman Greg, 2016, arXiv
[3]  
Chen X, 2016, ADV NEUR IN, V29
[4]  
Degris T, 2012, P AMER CONTR CONF, P2177
[5]   Multiple model-based reinforcement learning [J].
Doya, K ;
Samejima, K ;
Katagiri, K ;
Kawato, M .
NEURAL COMPUTATION, 2002, 14 (06) :1347-1369
[6]  
Foerster JN, 2017, PR MACH LEARN RES, V70
[7]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[8]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[9]  
Karkus P, 2017, ADV NEUR IN, V30
[10]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533