Applications of asynchronous deep reinforcement learning based on dynamic updating weights

被引:32
作者
Zhao, Xingyu [1 ]
Ding, Shifei [1 ]
An, Yuexuan [1 ]
Jia, Weikuan [2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250358, Shandong, Peoples R China
关键词
Deep reinforcement learning; Asynchronous; Dynamic updating weights; Multithreading; Parallel reinforcement learning; GAME; GO;
D O I
10.1007/s10489-018-1296-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning based on the asynchronous method is a new kind of reinforcement learning. It takes a multithreading way to enable multiple agents to update the parameters asynchronously in different exploration spaces. In this way, agents no longer need experience to reply and can update parameters online. At the same time, the asynchronous method can greatly improve the convergence speed of the algorithms and significantly improve the convergence performance of the algorithms. Asynchronous deep reinforcement learning algorithms, especially asynchronous advantage actor-critic algorithm, are very effective in solving practical problems and have been widely used. However, in existing asynchronous deep reinforcement learning algorithms, when each thread pushes updates to the global thread, it adopts a uniform learning rate, and fails to take account of the different information transmitted by different threads at each update. When the update of the agent to global thread is more biased towards failure information, it has no obvious help to update the parameters of the learning system. Therefore, we introduce the dynamic weights to asynchronous deep reinforcement learning algorithms and propose a new reinforcement learning algorithm named asynchronous advantage actor-critic with dynamic updating weights (DWA3C). When the information pushed by an agent is obviously helpful for the improvement of the system performance, we will enhance the update range, otherwise, we will weaken that. In this way, we can significantly improve the convergence efficiencies and convergence performances of the asynchronous deep reinforcement learning algorithms. And we also test the effectiveness of the algorithm through experiments. The experimental results show that, in the same running time, the proposed algorithm can significantly improve the convergence efficiency and convergence performance compared with the existing algorithms.
引用
收藏
页码:581 / 591
页数:11
相关论文
共 30 条
[1]  
[Anonymous], 2016, ABSTR REINF LEARN WO
[2]  
[Anonymous], 1989, ROBOT AUTON SYST
[3]  
[Anonymous], 2012, COURSERA NEURAL NETW
[4]  
[Anonymous], 2016, PROC INT C MACH LEAR
[5]  
[Anonymous], 2013, P WORKSHOPS 26 NEURA
[6]  
[Anonymous], 2014, ICML ICML 14
[7]  
[Anonymous], 2016, PROC C EMPIR METHODS
[8]  
[Anonymous], DEEP REINFORCEMENT L
[9]   Active Object Localization with Deep Reinforcement Learning [J].
Caicedo, Juan C. ;
Lazebnik, Svetlana .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2488-2496
[10]   Unsupervised extreme learning machine with representational features [J].
Ding, Shifei ;
Zhang, Nan ;
Zhang, Jian ;
Xu, Xinzheng ;
Shi, Zhongzhi .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (02) :587-595