Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

被引：0

作者：

Liu, Feng ^{[1
]}

Dai, Shuling ^{[1
,2
]}

Zhao, Yongjia ^{[1
]}

机构：

[1] Beihang Univ BUAA, State Key Lab VR Technol & Syst, Beijing 100191, Peoples R China

[2] Beihang Univ BUAA, Jiangxi Res Inst, Beijing 100191, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Reinforcement learning; Training; Task analysis; Neural networks; Licenses; Machine learning algorithms; Correlation; Deep reinforcement learning; policy return method; fewer trials; stochastic data; GAME; GO;

D O I：

10.1109/ACCESS.2020.3045835

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.

引用

页码：228099 / 228107

页数：9

共 50 条

[41] A maximum entropy deep reinforcement learning method for sequential well placement optimization using multi-discrete action spaces
Zhang, Kai
Sun, Zifeng
Zhang, Liming
Xin, Guojing
Wang, Zhongzheng
Zhang, Wenjuan
Liu, Piyang
Yan, Xia
Zhang, Huaqing
Yang, Yongfei
Sun, Hai
GEOENERGY SCIENCE AND ENGINEERING, 2024, 240
[42] Deep Reinforcement Learning-Based Tie-Line Power Adjustment Method for Power System Operation State Calculation
Xu, Huating
Yu, Zhihong
Zheng, Qingping
Hou, Jinxiu
Wei, Yawei
Zhang, Zhijian
IEEE ACCESS, 2019, 7 : 156160 - 156174
[43] Explainability of Deep Reinforcement Learning Method with Drones
Cetin, Ender
Barrado, Cristina
Pastor, Enric
2023 IEEE/AIAA 42ND DIGITAL AVIONICS SYSTEMS CONFERENCE, DASC, 2023,
[44] Learning to Maximize Return in a Stag Hunt Collaborative Scenario through Deep Reinforcement Learning
Nica, Andrei
Berariu, Tudor
Gogianu, Florin
Florea, Adina Magda
2017 19TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2017), 2017, : 188 - 195
[45] Model Predictive Control Based on Deep Reinforcement Learning Method with Discrete-Valued Input
Tange, Yoshio
Kiryu, Satoshi
Matsui, Tetsuro
2019 3RD IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (IEEE CCTA 2019), 2019, : 308 - 313
[46] A deep reinforcement learning method to control chaos synchronization between two identical chaotic systems
Cheng, Haoxin
Li, Haihong
Dai, Qionglin
Yang, Junzhong
CHAOS SOLITONS & FRACTALS, 2023, 174
[47] Deep reinforcement learning based integrated evasion and impact hierarchical intelligent policy of exo-atmospheric vehicles
Ren, Leliang
Guo, Weilin
Xian, Yong
Liu, Zhenyu
Zhang, Daqiao
Li, Shaopeng
CHINESE JOURNAL OF AERONAUTICS, 2025, 38 (01)
[48] A Stock Prediction Method Based on Deep Reinforcement Learning and Sentiment Analysis
Du, Sha
Shen, Hailong
APPLIED SCIENCES-BASEL, 2024, 14 (19):
[49] Power System Fault Diagnosis Method Based on Deep Reinforcement Learning
Wang, Zirui
Zhang, Ziqi
Zhang, Xu
Du, Mingxuan
Zhang, Huiting
Liu, Bowen
ENERGIES, 2022, 15 (20)
[50] A hidden anti-jamming method based on deep reinforcement learning
Wang, Yifan
Liu, Xin
Wang, Mei
Yu, Yu
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (09): : 3444 - 3457

← 1 2 3 4 5 →