Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

被引:0
|
作者
Liu, Feng [1 ]
Dai, Shuling [1 ,2 ]
Zhao, Yongjia [1 ]
机构
[1] Beihang Univ BUAA, State Key Lab VR Technol & Syst, Beijing 100191, Peoples R China
[2] Beihang Univ BUAA, Jiangxi Res Inst, Beijing 100191, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Reinforcement learning; Training; Task analysis; Neural networks; Licenses; Machine learning algorithms; Correlation; Deep reinforcement learning; policy return method; fewer trials; stochastic data; GAME; GO;
D O I
10.1109/ACCESS.2020.3045835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.
引用
收藏
页码:228099 / 228107
页数:9
相关论文
共 50 条
  • [41] A maximum entropy deep reinforcement learning method for sequential well placement optimization using multi-discrete action spaces
    Zhang, Kai
    Sun, Zifeng
    Zhang, Liming
    Xin, Guojing
    Wang, Zhongzheng
    Zhang, Wenjuan
    Liu, Piyang
    Yan, Xia
    Zhang, Huaqing
    Yang, Yongfei
    Sun, Hai
    GEOENERGY SCIENCE AND ENGINEERING, 2024, 240
  • [42] Deep Reinforcement Learning-Based Tie-Line Power Adjustment Method for Power System Operation State Calculation
    Xu, Huating
    Yu, Zhihong
    Zheng, Qingping
    Hou, Jinxiu
    Wei, Yawei
    Zhang, Zhijian
    IEEE ACCESS, 2019, 7 : 156160 - 156174
  • [43] Explainability of Deep Reinforcement Learning Method with Drones
    Cetin, Ender
    Barrado, Cristina
    Pastor, Enric
    2023 IEEE/AIAA 42ND DIGITAL AVIONICS SYSTEMS CONFERENCE, DASC, 2023,
  • [44] Learning to Maximize Return in a Stag Hunt Collaborative Scenario through Deep Reinforcement Learning
    Nica, Andrei
    Berariu, Tudor
    Gogianu, Florin
    Florea, Adina Magda
    2017 19TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2017), 2017, : 188 - 195
  • [45] Model Predictive Control Based on Deep Reinforcement Learning Method with Discrete-Valued Input
    Tange, Yoshio
    Kiryu, Satoshi
    Matsui, Tetsuro
    2019 3RD IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (IEEE CCTA 2019), 2019, : 308 - 313
  • [46] A deep reinforcement learning method to control chaos synchronization between two identical chaotic systems
    Cheng, Haoxin
    Li, Haihong
    Dai, Qionglin
    Yang, Junzhong
    CHAOS SOLITONS & FRACTALS, 2023, 174
  • [47] Deep reinforcement learning based integrated evasion and impact hierarchical intelligent policy of exo-atmospheric vehicles
    Ren, Leliang
    Guo, Weilin
    Xian, Yong
    Liu, Zhenyu
    Zhang, Daqiao
    Li, Shaopeng
    CHINESE JOURNAL OF AERONAUTICS, 2025, 38 (01)
  • [48] A Stock Prediction Method Based on Deep Reinforcement Learning and Sentiment Analysis
    Du, Sha
    Shen, Hailong
    APPLIED SCIENCES-BASEL, 2024, 14 (19):
  • [49] Power System Fault Diagnosis Method Based on Deep Reinforcement Learning
    Wang, Zirui
    Zhang, Ziqi
    Zhang, Xu
    Du, Mingxuan
    Zhang, Huiting
    Liu, Bowen
    ENERGIES, 2022, 15 (20)
  • [50] A hidden anti-jamming method based on deep reinforcement learning
    Wang, Yifan
    Liu, Xin
    Wang, Mei
    Yu, Yu
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (09): : 3444 - 3457