Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

被引:0
|
作者
Liu, Feng [1 ]
Dai, Shuling [1 ,2 ]
Zhao, Yongjia [1 ]
机构
[1] Beihang Univ BUAA, State Key Lab VR Technol & Syst, Beijing 100191, Peoples R China
[2] Beihang Univ BUAA, Jiangxi Res Inst, Beijing 100191, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Reinforcement learning; Training; Task analysis; Neural networks; Licenses; Machine learning algorithms; Correlation; Deep reinforcement learning; policy return method; fewer trials; stochastic data; GAME; GO;
D O I
10.1109/ACCESS.2020.3045835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.
引用
收藏
页码:228099 / 228107
页数:9
相关论文
共 50 条
  • [31] A Data-Efficient Training Method for Deep Reinforcement Learning
    Feng, Wenhui
    Han, Chongzhao
    Lian, Feng
    Liu, Xia
    ELECTRONICS, 2022, 11 (24)
  • [32] Learning to Drive Like Human Beings: A Method Based on Deep Reinforcement Learning
    Tian, Yantao
    Cao, Xuanhao
    Huang, Kai
    Fei, Cong
    Zheng, Zhu
    Ji, Xuewu
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) : 6357 - 6367
  • [33] Deep reinforcement learning with shallow controllers: An experimental application to PID tuning
    Lawrence, Nathan P.
    Forbes, Michael G.
    Loewen, Philip D.
    McClement, Daniel G.
    Backstrom, Johan U.
    Gopaluni, R. Bhushan
    CONTROL ENGINEERING PRACTICE, 2022, 121
  • [34] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
    Cheng, Yuhu
    Chen, Lin
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
  • [35] Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics
    Berger, Sandrine
    Ramo, Andrea Arroyo
    Guillet, Valentin
    Lahire, Thibault
    Martin, Brice
    Jardin, Thierry
    Rachelson, Emmanuel
    DATA-CENTRIC ENGINEERING, 2024, 5
  • [36] Deep reinforcement learning finds a new strategy for vortex-induced vibration control
    Ren, Feng
    Wang, Chenglei
    Song, Jian
    Tang, Hui
    JOURNAL OF FLUID MECHANICS, 2024, 990
  • [37] Robot grasping method optimization using improved deep deterministic policy gradient algorithm of deep reinforcement learning
    Zhang, Hongxu
    Wang, Fei
    Wang, Jianhui
    Cui, Ben
    REVIEW OF SCIENTIFIC INSTRUMENTS, 2021, 92 (02)
  • [38] Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
    Barreto, Andre
    Borsa, Diana
    Quan, John
    Schaul, Tom
    Silver, David
    Hessel, Matteo
    Mankowitz, Daniel
    Zidek, Augustin
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [39] Efficient Bayesian Policy Reuse With a Scalable Observation Model in Deep Reinforcement Learning
    Liu, Jinmei
    Wang, Zhi
    Chen, Chunlin
    Dong, Daoyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14797 - 14809
  • [40] Learning new attack vectors from misuse cases with deep reinforcement learning
    Veith, Eric M. S. P.
    Wellssow, Arlena
    Uslar, Mathias
    FRONTIERS IN ENERGY RESEARCH, 2023, 11