Gradient Monitored Reinforcement Learning

被引:4
|
作者
Abdul Hameed, Mohammed Sharafath [1 ]
Chadha, Gavneet Singh [1 ]
Schwung, Andreas [1 ]
Ding, Steven X. [2 ]
机构
[1] South Westphalia Univ Appl Sci, Dept Automat Technol, D-59494 Soest, Germany
[2] Univ Duisburg Essen, Dept Automat Control & Complex Syst, D-47057 Duisburg, Germany
基金
美国国家卫生研究院;
关键词
Training; Monitoring; Neural networks; Reinforcement learning; Optimization; Games; Task analysis; Atari games; deep neural networks (DNNs); gradient monitoring (GM); MuJoCo; multirobot coordination; OpenAI GYM; reinforcement learning (RL);
D O I
10.1109/TNNLS.2021.3119853
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning (RL). Particularly, we focus on the enhancement of training and evaluation performance in RL algorithms by systematically reducing gradient's variance and, thereby, providing a more targeted learning process. The proposed method, which we term gradient monitoring (GM), is a method to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself. We propose different variants of the GM method that we prove to increase the underlying performance of the model. One of the proposed variants, momentum with GM (M-WGM), allows for a continuous adjustment of the quantum of backpropagated gradients in the network based on certain learning parameters. We further enhance the method with the adaptive M-WGM (AM-WGM) method, which allows for automatic adjustment between focused learning of certain weights versus more dispersed learning depending on the feedback from the rewards collected. As a by-product, it also allows for automatic derivation of the required deep network sizes during training as the method automatically freezes trained weights. The method is applied to two discrete (real-world multirobot coordination problems and Atari games) and one continuous control task (MuJoCo) using advantage actor-critic (A2C) and proximal policy optimization (PPO), respectively. The results obtained particularly underline the applicability and performance improvements of the methods in terms of generalization capability.
引用
收藏
页码:4106 / 4119
页数:14
相关论文
共 50 条
  • [1] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
    Rehman, Hafiz Muhammad Raza Ur
    On, Byung-Won
    Ningombam, Devarani Devi
    Yi, Sungwon
    Choi, Gyu Sang
    IEEE ACCESS, 2021, 9 : 129728 - 129741
  • [2] Reinforcement Learning From Hierarchical Critics
    Cao, Zehong
    Lin, Chin-Teng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1066 - 1073
  • [3] Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation
    Zhang, Tiantian
    Wang, Xueqian
    Liang, Bin
    Yuan, Bo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 9925 - 9939
  • [4] Efficient Reinforcement Learning for StarCraft by Abstract Forward Models and Transfer Learning
    Liu, Ruo-Ze
    Guo, Haifeng
    Ji, Xiaozhong
    Yu, Yang
    Pang, Zhen-Jia
    Xiao, Zitai
    Wu, Yuzhou
    Lu, Tong
    IEEE TRANSACTIONS ON GAMES, 2022, 14 (02) : 294 - 307
  • [5] Adjacency Constraint for Efficient Hierarchical Reinforcement Learning
    Zhang, Tianren
    Guo, Shangqi
    Tan, Tian
    Hu, Xiaolin
    Chen, Feng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4152 - 4166
  • [6] State Representation Learning With Adjacent State Consistency Loss for Deep Reinforcement Learning
    Zhao, Tianyu
    Zhao, Jian
    Zhou, Wengang
    Zhou, Yun
    Li, Houqiang
    IEEE MULTIMEDIA, 2021, 28 (03) : 117 - 127
  • [7] Gradient dynamics in reinforcement learning
    Fabbricatore, Riccardo
    V. Palyulin, Vladimir
    PHYSICAL REVIEW E, 2022, 106 (02)
  • [8] Attention Enhanced Reinforcement Learning for Multi agent Cooperation
    Pu, Zhiqiang
    Wang, Huimu
    Liu, Zhen
    Yi, Jianqiang
    Wu, Shiguang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 8235 - 8249
  • [9] Optimizing Attention for Sequence Modeling via Reinforcement Learning
    Fei, Hao
    Zhang, Yue
    Ren, Yafeng
    Ji, Donghong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3612 - 3621
  • [10] A Hybrid Multi-Task Learning Approach for Optimizing Deep Reinforcement Learning Agents
    Varghese, Nelson Vithayathil
    Mahmoud, Qusay H.
    IEEE ACCESS, 2021, 9 : 44681 - 44703