Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

被引：0

作者：

Liu, Feng ^{[1
]}

Dai, Shuling ^{[1
,2
]}

Zhao, Yongjia ^{[1
]}

机构：

[1] Beihang Univ BUAA, State Key Lab VR Technol & Syst, Beijing 100191, Peoples R China

[2] Beihang Univ BUAA, Jiangxi Res Inst, Beijing 100191, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Reinforcement learning; Training; Task analysis; Neural networks; Licenses; Machine learning algorithms; Correlation; Deep reinforcement learning; policy return method; fewer trials; stochastic data; GAME; GO;

D O I：

10.1109/ACCESS.2020.3045835

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.

引用

页码：228099 / 228107

页数：9

共 50 条

[1] An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning
Meng, Wenjia
Zheng, Qian
Shi, Yue
Pan, Gang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2223 - 2235
[2] Rules Based Policy for Stock Trading: A New Deep Reinforcement Learning Method
Badr, Hirchoua
Ouhbi, Brahim
Frikh, Bouchra
PROCEEDINGS OF 2020 5TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS (CLOUDTECH'20), 2020, : 61 - 66
[3] PoPS: Policy Pruning and Shrinking for Deep Reinforcement Learning
Livne, Dor
Cohen, Kobi
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (04) : 789 - 801
[4] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
Rehman, Hafiz Muhammad Raza Ur
On, Byung-Won
Ningombam, Devarani Devi
Yi, Sungwon
Choi, Gyu Sang
IEEE ACCESS, 2021, 9 : 129728 - 129741
[5] A Deep Reinforcement Learning Method For Multimodal Data Fusion in Action Recognition
Guo, Jiale
Liu, Qiang
Chen, Enqing
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 120 - 124
[6] Approximate Policy-Based Accelerated Deep Reinforcement Learning
Wang, Xuesong
Gu, Yang
Cheng, Yuhu
Liu, Aiping
Chen, C. L. Philip
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (06) : 1820 - 1830
[7] An efficient and robust gradient reinforcement learning: Deep comparative policy
Wang, Jiaguo
Li, Wenheng
Lei, Chao
Yang, Meng
Pei, Yang
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 3773 - 3788
[8] A Deep Reinforcement Learning Method based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition
Zuo, Xuan
Xue, Hui-Feng
Wang, Xiao-Yin
Du, Wan-Ru
Tian, Tao
Gao, Shan
Zhang, Pu
CONTROL ENGINEERING AND APPLIED INFORMATICS, 2021, 23 (03): : 88 - 98
[9] A Deep Reinforcement Learning Based Real-Time Solution Policy for the Traveling Salesman Problem
Ling, Zhengxuan
Zhang, Yu
Chen, Xi
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (06) : 5871 - 5882
[10] Attacking Deep Reinforcement Learning With Decoupled Adversarial Policy
Mo, Kanghua
Tang, Weixuan
Li, Jin
Yuan, Xu
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (01) : 758 - 768

← 1 2 3 4 5 →