Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

被引：0

作者：

Yang, Yana ^{[1
]}

Xi, Meng ^{[1
]}

Dai, Huiao ^{[1
]}

Wen, Jiabao ^{[1
]}

Yang, Jiachen ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 23期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

deep reinforcement learning; off policy; priority experience replay; z-score;

D O I：

10.3390/s24237746

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms.

引用

页数：17

共 50 条

[21] A General Technique to Combine Off-Policy Reinforcement Learning Algorithms with Satellite Attitude Control
Zhang, Jian
Wu, Fengge
Zhao, Junsuo
Xu, Fanjiang
PROCEEDINGS OF 2019 CHINESE INTELLIGENT AUTOMATION CONFERENCE, 2020, 586 : 709 - 719
[22] Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning
Chen, Ning
Luo, Shuhan
Dai, Jiayang
Luo, Biao
Gui, Weihua
IEEE ACCESS, 2020, 8 (08): : 149730 - 149740
[23] Deep reinforcement learning via good choice resampling experience replay memory
Chen X.-L.
Cao L.
Li C.-X.
Xu Z.-X.
He M.
Chen, Xi-Liang (383618393@qq.com), 2018, Northeast University (33): : 600 - 606
[24] A Selective Portfolio Management Algorithm with Off-Policy Reinforcement Learning Using Dirichlet Distribution
Yang, Hyunjun
Park, Hyeonjun
Lee, Kyungjae
AXIOMS, 2022, 11 (12)
[25] An Off-policy maximum entropy deep reinforcement learning method for data-driven secondary frequency control of island microgrid
Huang, Xiangmin
Zeng, Jun
Wang, Tianlun
Zeng, Shunqi
APPLIED SOFT COMPUTING, 2025, 170
[26] Efficient Policy Learning for General Robotic Tasks with Adaptive Dual-memory Hindsight Experience Replay Based on Deep Reinforcement Learning
Dong, Menghua
Ying, Fengkang
Li, Xiangjian
Liu, Huashan
2023 7TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION, ICRCA, 2023, : 62 - 66
[27] DERLight: A Deep Reinforcement Learning Traffic Light Control Algorithm with Dual Experience Replay
Yang, Zhichao
Kong, Yan
Hsia, Chih-Hsien
JOURNAL OF INTERNET TECHNOLOGY, 2024, 25 (01): : 79 - 86
[28] Off-Policy Temporal Difference Learning with Bellman Residuals
Yang, Shangdong
Sun, Dingyuanhao
Chen, Xingguo
MATHEMATICS, 2024, 12 (22)
[29] An Off-Policy Reinforcement Learning-Based Adaptive Optimization Method for Dynamic Resource Allocation Problem
He, Baiyang
Meng, Ying
Tang, Lixin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 36 (02) : 3504 - 3518
[30] Two-player nonlinear Stackelberg differential game via off-policy integral reinforcement learning
Cui, Xiaohong
Chen, Jiayu
Cui, Yang
Xu, Suan
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2024, 361 (08):

← 1 2 3 4 5 →