Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

被引：0

作者：

Liu, Feng ^{[1
]}

Dai, Shuling ^{[1
,2
]}

Zhao, Yongjia ^{[1
]}

机构：

[1] Beihang Univ BUAA, State Key Lab VR Technol & Syst, Beijing 100191, Peoples R China

[2] Beihang Univ BUAA, Jiangxi Res Inst, Beijing 100191, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Reinforcement learning; Training; Task analysis; Neural networks; Licenses; Machine learning algorithms; Correlation; Deep reinforcement learning; policy return method; fewer trials; stochastic data; GAME; GO;

D O I：

10.1109/ACCESS.2020.3045835

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.

引用

页码：228099 / 228107

页数：9

共 50 条

[31] A Data-Efficient Training Method for Deep Reinforcement Learning
Feng, Wenhui
Han, Chongzhao
Lian, Feng
Liu, Xia
ELECTRONICS, 2022, 11 (24)
[32] Learning to Drive Like Human Beings: A Method Based on Deep Reinforcement Learning
Tian, Yantao
Cao, Xuanhao
Huang, Kai
Fei, Cong
Zheng, Zhu
Ji, Xuewu
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) : 6357 - 6367
[33] Deep reinforcement learning with shallow controllers: An experimental application to PID tuning
Lawrence, Nathan P.
Forbes, Michael G.
Loewen, Philip D.
McClement, Daniel G.
Backstrom, Johan U.
Gopaluni, R. Bhushan
CONTROL ENGINEERING PRACTICE, 2022, 121
[34] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
Cheng, Yuhu
Chen, Lin
Chen, C. L. Philip
Wang, Xuesong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
[35] Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics
Berger, Sandrine
Ramo, Andrea Arroyo
Guillet, Valentin
Lahire, Thibault
Martin, Brice
Jardin, Thierry
Rachelson, Emmanuel
DATA-CENTRIC ENGINEERING, 2024, 5
[36] Deep reinforcement learning finds a new strategy for vortex-induced vibration control
Ren, Feng
Wang, Chenglei
Song, Jian
Tang, Hui
JOURNAL OF FLUID MECHANICS, 2024, 990
[37] Robot grasping method optimization using improved deep deterministic policy gradient algorithm of deep reinforcement learning
Zhang, Hongxu
Wang, Fei
Wang, Jianhui
Cui, Ben
REVIEW OF SCIENTIFIC INSTRUMENTS, 2021, 92 (02)
[38] Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
Barreto, Andre
Borsa, Diana
Quan, John
Schaul, Tom
Silver, David
Hessel, Matteo
Mankowitz, Daniel
Zidek, Augustin
Munos, Remi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[39] Efficient Bayesian Policy Reuse With a Scalable Observation Model in Deep Reinforcement Learning
Liu, Jinmei
Wang, Zhi
Chen, Chunlin
Dong, Daoyi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14797 - 14809
[40] Learning new attack vectors from misuse cases with deep reinforcement learning
Veith, Eric M. S. P.
Wellssow, Arlena
Uslar, Mathias
FRONTIERS IN ENERGY RESEARCH, 2023, 11

← 1 2 3 4 5 →