Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

被引：0

作者：

Liu, Feng ^{[1
]}

Dai, Shuling ^{[1
,2
]}

Zhao, Yongjia ^{[1
]}

机构：

[1] Beihang Univ BUAA, State Key Lab VR Technol & Syst, Beijing 100191, Peoples R China

[2] Beihang Univ BUAA, Jiangxi Res Inst, Beijing 100191, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Reinforcement learning; Training; Task analysis; Neural networks; Licenses; Machine learning algorithms; Correlation; Deep reinforcement learning; policy return method; fewer trials; stochastic data; GAME; GO;

D O I：

10.1109/ACCESS.2020.3045835

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.

引用

页码：228099 / 228107

页数：9

共 50 条

[21] A Dual Deep Network Based Secure Deep Reinforcement Learning Method
Zhu F.
Wu W.
Fu Y.-C.
Liu Q.
Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (08): : 1812 - 1826
[22] Mastering the Complex Assembly Task With a Dual-Arm Robot Based on Deep Reinforcement Learning: A Novel Reinforcement Learning Method
Jiang, Daqi
Wang, Hong
Lu, Yanzheng
IEEE ROBOTICS & AUTOMATION MAGAZINE, 2023, 30 (02) : 57 - 66
[23] Policy ensemble gradient for continuous control problems in deep reinforcement learning
Liu, Guoqiang
Chen, Gang
Huang, Victoria
NEUROCOMPUTING, 2023, 548
[24] Deep-Reinforcement-Learning-Based Driving Policy at Intersections Utilizing Lane Graph Networks
Liu, Yuqi
Zhang, Qichao
Gao, Yinfeng
Zhao, Dongbin
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1759 - 1774
[25] Survey of Deep Reinforcement Learning Based on Value Function and Policy Gradient
Liu J.-W.
Gao F.
Luo X.-L.
Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (06): : 1406 - 1438
[26] A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation
Zhang, Huaqing
Ma, Hongbin
Mersha, Bemnet Wondimagegnehu
Jin, Ying
APPLIED INTELLIGENCE, 2024, 54 (21) : 11144 - 11159
[27] A Novel Method for Improving the Training Efficiency of Deep Multi-Agent Reinforcement Learning
Pan, Yaozong
Jiang, Haiyang
Yang, Haitao
Zhang, Jian
IEEE ACCESS, 2019, 7 : 137992 - 137999
[28] APER-DDQN: UAV Precise Airdrop Method Based on Deep Reinforcement Learning
Ouyang, Yan
Wang, Xinqing
Hu, Ruizhe
Xu, Honghui
IEEE ACCESS, 2022, 10 : 50878 - 50891
[29] A Deep Reinforcement Learning-Based Optimal Transmission Control Method for Streaming Videos
Yang, Yawen
Xiao, Yuxuan
IEEE ACCESS, 2024, 12 : 53088 - 53098
[30] Compressing Deep Reinforcement Learning Networks With a Dynamic Structured Pruning Method for Autonomous Driving
Su, Wensheng
Li, Zhenni
Xu, Minrui
Kang, Jiawen
Niyato, Dusit
Xie, Shengli
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (12) : 18017 - 18030

← 1 2 3 4 5 →