Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

被引:0
|
作者
Yang, Yana [1 ]
Xi, Meng [1 ]
Dai, Huiao [1 ]
Wen, Jiabao [1 ]
Yang, Jiachen [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
deep reinforcement learning; off policy; priority experience replay; z-score;
D O I
10.3390/s24237746
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
    Ren, Jineng
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [42] Identification and off-policy learning of multiple objectives using adaptive clustering
    Karimpanal, Thommen George
    Wilhelm, Erik
    NEUROCOMPUTING, 2017, 263 : 39 - 47
  • [43] Synchronous optimal control method for nonlinear systems with saturating actuators and unknown dynamics using off-policy integral reinforcement learning
    Zhang, Zenglian
    Song, Ruizhuo
    Cao, Min
    NEUROCOMPUTING, 2019, 356 : 162 - 169
  • [44] Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning
    Yang, Yongliang
    Guo, Zhishan
    Xiong, Haoyi
    Ding, Da-Wei
    Yin, Yixin
    Wunsch, Donald C.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (12) : 3735 - 3747
  • [45] Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games
    Song, Ruizhuo
    Lewis, Frank L.
    Wei, Qinglai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 704 - 713
  • [46] Adaptive optimal consensus of nonlinear multi-agent systems with unknown dynamics using off-policy integral reinforcement learning
    Yan, Lei
    Liu, Zhi
    Chen, C. L. Philip
    Zhang, Yun
    Wu, Zongze
    NEUROCOMPUTING, 2025, 621
  • [47] Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning
    Ma, Xiao
    Yuan, Yuan
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2024, 361 (07):
  • [48] Unified Intrinsically Motivated Exploration for Off-Policy Learning in Continuous Action Spaces
    Saglam, Baturay
    Mutlu, Furkan B.
    Dalmaz, Onat
    Kozat, Suleyman S.
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [49] Batch process control based on reinforcement learning with segmented prioritized experience replay
    Xu, Chen
    Ma, Junwei
    Tao, Hongfeng
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
  • [50] A Path Planning Method Based on Deep Reinforcement Learning with Improved Prioritized Experience Replay for Human-Robot Collaboration
    Sun, Deyu
    Wen, Jingqian
    Wang, Jingfei
    Yang, Xiaonan
    Hu, Yaoguang
    HUMAN-COMPUTER INTERACTION, PT II, HCI 2024, 2024, 14685 : 196 - 206