Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

被引:0
|
作者
Liu, Feng [1 ]
Dai, Shuling [1 ,2 ]
Zhao, Yongjia [1 ]
机构
[1] Beihang Univ BUAA, State Key Lab VR Technol & Syst, Beijing 100191, Peoples R China
[2] Beihang Univ BUAA, Jiangxi Res Inst, Beijing 100191, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Reinforcement learning; Training; Task analysis; Neural networks; Licenses; Machine learning algorithms; Correlation; Deep reinforcement learning; policy return method; fewer trials; stochastic data; GAME; GO;
D O I
10.1109/ACCESS.2020.3045835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using the same algorithm and hyperparameter configurations, deep reinforcement learning (DRL) will derive drastically different results from multiple experimental trials, and most of these results are unsatisfactory. Because of the instability of the results, researchers have to perform many trials to confirm an algorithm or a set of hyperparameters in DRL. In this article, we present the policy return method, which is a new design for reducing the number of trials when training a DRL model. This method allows the learned policy to return to a previous state when it becomes divergent or stagnant at any stage of training. When returning, a certain percentage of stochastic data is added to the weights of the neural networks to prevent a repeated decline. Extensive experiments on challenging tasks and various target scores demonstrate that the policy return method can bring about a 10% to 40% reduction in the required number of trials compared with that of the corresponding original algorithm, and a 10% to 30% reduction compared with the state-of-the-art algorithms.
引用
收藏
页码:228099 / 228107
页数:9
相关论文
共 50 条
  • [1] An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning
    Meng, Wenjia
    Zheng, Qian
    Shi, Yue
    Pan, Gang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2223 - 2235
  • [2] Rules Based Policy for Stock Trading: A New Deep Reinforcement Learning Method
    Badr, Hirchoua
    Ouhbi, Brahim
    Frikh, Bouchra
    PROCEEDINGS OF 2020 5TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS (CLOUDTECH'20), 2020, : 61 - 66
  • [3] PoPS: Policy Pruning and Shrinking for Deep Reinforcement Learning
    Livne, Dor
    Cohen, Kobi
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (04) : 789 - 801
  • [4] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
    Rehman, Hafiz Muhammad Raza Ur
    On, Byung-Won
    Ningombam, Devarani Devi
    Yi, Sungwon
    Choi, Gyu Sang
    IEEE ACCESS, 2021, 9 : 129728 - 129741
  • [5] A Deep Reinforcement Learning Method For Multimodal Data Fusion in Action Recognition
    Guo, Jiale
    Liu, Qiang
    Chen, Enqing
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 120 - 124
  • [6] Approximate Policy-Based Accelerated Deep Reinforcement Learning
    Wang, Xuesong
    Gu, Yang
    Cheng, Yuhu
    Liu, Aiping
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (06) : 1820 - 1830
  • [7] An efficient and robust gradient reinforcement learning: Deep comparative policy
    Wang, Jiaguo
    Li, Wenheng
    Lei, Chao
    Yang, Meng
    Pei, Yang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 3773 - 3788
  • [8] A Deep Reinforcement Learning Method based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition
    Zuo, Xuan
    Xue, Hui-Feng
    Wang, Xiao-Yin
    Du, Wan-Ru
    Tian, Tao
    Gao, Shan
    Zhang, Pu
    CONTROL ENGINEERING AND APPLIED INFORMATICS, 2021, 23 (03): : 88 - 98
  • [9] A Deep Reinforcement Learning Based Real-Time Solution Policy for the Traveling Salesman Problem
    Ling, Zhengxuan
    Zhang, Yu
    Chen, Xi
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (06) : 5871 - 5882
  • [10] Attacking Deep Reinforcement Learning With Decoupled Adversarial Policy
    Mo, Kanghua
    Tang, Weixuan
    Li, Jin
    Yuan, Xu
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (01) : 758 - 768