Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

被引:0
|
作者
Liu, Feng [1 ]
Dai, Shuling [1 ,2 ]
Zhao, Yongjia [1 ]
机构
[1] Beihang Univ BUAA, State Key Lab VR Technol & Syst, Beijing 100191, Peoples R China
[2] Beihang Univ BUAA, Jiangxi Res Inst, Beijing 100191, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Reinforcement learning; Training; Task analysis; Neural networks; Licenses; Machine learning algorithms; Correlation; Deep reinforcement learning; policy return method; fewer trials; stochastic data; GAME; GO;
D O I
10.1109/ACCESS.2020.3045835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.
引用
收藏
页码:228099 / 228107
页数:9
相关论文
共 50 条
  • [21] A Dual Deep Network Based Secure Deep Reinforcement Learning Method
    Zhu F.
    Wu W.
    Fu Y.-C.
    Liu Q.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (08): : 1812 - 1826
  • [22] Mastering the Complex Assembly Task With a Dual-Arm Robot Based on Deep Reinforcement Learning: A Novel Reinforcement Learning Method
    Jiang, Daqi
    Wang, Hong
    Lu, Yanzheng
    IEEE ROBOTICS & AUTOMATION MAGAZINE, 2023, 30 (02) : 57 - 66
  • [23] Policy ensemble gradient for continuous control problems in deep reinforcement learning
    Liu, Guoqiang
    Chen, Gang
    Huang, Victoria
    NEUROCOMPUTING, 2023, 548
  • [24] Deep-Reinforcement-Learning-Based Driving Policy at Intersections Utilizing Lane Graph Networks
    Liu, Yuqi
    Zhang, Qichao
    Gao, Yinfeng
    Zhao, Dongbin
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1759 - 1774
  • [25] Survey of Deep Reinforcement Learning Based on Value Function and Policy Gradient
    Liu J.-W.
    Gao F.
    Luo X.-L.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (06): : 1406 - 1438
  • [26] A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation
    Zhang, Huaqing
    Ma, Hongbin
    Mersha, Bemnet Wondimagegnehu
    Jin, Ying
    APPLIED INTELLIGENCE, 2024, 54 (21) : 11144 - 11159
  • [27] A Novel Method for Improving the Training Efficiency of Deep Multi-Agent Reinforcement Learning
    Pan, Yaozong
    Jiang, Haiyang
    Yang, Haitao
    Zhang, Jian
    IEEE ACCESS, 2019, 7 : 137992 - 137999
  • [28] APER-DDQN: UAV Precise Airdrop Method Based on Deep Reinforcement Learning
    Ouyang, Yan
    Wang, Xinqing
    Hu, Ruizhe
    Xu, Honghui
    IEEE ACCESS, 2022, 10 : 50878 - 50891
  • [29] A Deep Reinforcement Learning-Based Optimal Transmission Control Method for Streaming Videos
    Yang, Yawen
    Xiao, Yuxuan
    IEEE ACCESS, 2024, 12 : 53088 - 53098
  • [30] Compressing Deep Reinforcement Learning Networks With a Dynamic Structured Pruning Method for Autonomous Driving
    Su, Wensheng
    Li, Zhenni
    Xu, Minrui
    Kang, Jiawen
    Niyato, Dusit
    Xie, Shengli
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (12) : 18017 - 18030