Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

被引:0
|
作者
Yang, Yana [1 ]
Xi, Meng [1 ]
Dai, Huiao [1 ]
Wen, Jiabao [1 ]
Yang, Jiachen [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
deep reinforcement learning; off policy; priority experience replay; z-score;
D O I
10.3390/s24237746
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] A General Technique to Combine Off-Policy Reinforcement Learning Algorithms with Satellite Attitude Control
    Zhang, Jian
    Wu, Fengge
    Zhao, Junsuo
    Xu, Fanjiang
    PROCEEDINGS OF 2019 CHINESE INTELLIGENT AUTOMATION CONFERENCE, 2020, 586 : 709 - 719
  • [22] Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning
    Chen, Ning
    Luo, Shuhan
    Dai, Jiayang
    Luo, Biao
    Gui, Weihua
    IEEE ACCESS, 2020, 8 (08): : 149730 - 149740
  • [23] Deep reinforcement learning via good choice resampling experience replay memory
    Chen X.-L.
    Cao L.
    Li C.-X.
    Xu Z.-X.
    He M.
    Chen, Xi-Liang (383618393@qq.com), 2018, Northeast University (33): : 600 - 606
  • [24] A Selective Portfolio Management Algorithm with Off-Policy Reinforcement Learning Using Dirichlet Distribution
    Yang, Hyunjun
    Park, Hyeonjun
    Lee, Kyungjae
    AXIOMS, 2022, 11 (12)
  • [25] An Off-policy maximum entropy deep reinforcement learning method for data-driven secondary frequency control of island microgrid
    Huang, Xiangmin
    Zeng, Jun
    Wang, Tianlun
    Zeng, Shunqi
    APPLIED SOFT COMPUTING, 2025, 170
  • [26] Efficient Policy Learning for General Robotic Tasks with Adaptive Dual-memory Hindsight Experience Replay Based on Deep Reinforcement Learning
    Dong, Menghua
    Ying, Fengkang
    Li, Xiangjian
    Liu, Huashan
    2023 7TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION, ICRCA, 2023, : 62 - 66
  • [27] DERLight: A Deep Reinforcement Learning Traffic Light Control Algorithm with Dual Experience Replay
    Yang, Zhichao
    Kong, Yan
    Hsia, Chih-Hsien
    JOURNAL OF INTERNET TECHNOLOGY, 2024, 25 (01): : 79 - 86
  • [28] Off-Policy Temporal Difference Learning with Bellman Residuals
    Yang, Shangdong
    Sun, Dingyuanhao
    Chen, Xingguo
    MATHEMATICS, 2024, 12 (22)
  • [29] An Off-Policy Reinforcement Learning-Based Adaptive Optimization Method for Dynamic Resource Allocation Problem
    He, Baiyang
    Meng, Ying
    Tang, Lixin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 36 (02) : 3504 - 3518
  • [30] Two-player nonlinear Stackelberg differential game via off-policy integral reinforcement learning
    Cui, Xiaohong
    Chen, Jiayu
    Cui, Yang
    Xu, Suan
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2024, 361 (08):