A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

被引:21
作者
Hu, Zijian [1 ]
Wan, Kaifang [1 ]
Gao, Xiaoguang [1 ]
Zhai, Yiwei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710129, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
GAME; GO;
D O I
10.1155/2019/7619483
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.
引用
收藏
页数:10
相关论文
共 33 条
[1]  
[Anonymous], P IEEE INT C ADV COM
[2]  
[Anonymous], SOS FINDING SMART BE
[3]  
[Anonymous], VISION BASED DEEP RE
[4]  
[Anonymous], CROWDSENSING GAME DE
[5]  
[Anonymous], 2016, Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
[6]  
[Anonymous], P IEEE INT C COMP KN
[7]  
[Anonymous], ANS ADAPTIVE NETWORK
[8]  
[Anonymous], 166 CUDEFINFENGTR
[9]  
[Anonymous], P INT C MACH LEARN
[10]  
[Anonymous], ADOBEINDOOENAV DATAS