Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

被引：0

作者：

Yang, Yana ^{[1
]}

Xi, Meng ^{[1
]}

Dai, Huiao ^{[1
]}

Wen, Jiabao ^{[1
]}

Yang, Jiachen ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 23期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

deep reinforcement learning; off policy; priority experience replay; z-score;

D O I：

10.3390/s24237746

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms.

引用

页数：17

共 50 条

[41] Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Ren, Jineng
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
[42] Identification and off-policy learning of multiple objectives using adaptive clustering
Karimpanal, Thommen George
Wilhelm, Erik
NEUROCOMPUTING, 2017, 263 : 39 - 47
[43] Synchronous optimal control method for nonlinear systems with saturating actuators and unknown dynamics using off-policy integral reinforcement learning
Zhang, Zenglian
Song, Ruizhuo
Cao, Min
NEUROCOMPUTING, 2019, 356 : 162 - 169
[44] Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning
Yang, Yongliang
Guo, Zhishan
Xiong, Haoyi
Ding, Da-Wei
Yin, Yixin
Wunsch, Donald C.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (12) : 3735 - 3747
[45] Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games
Song, Ruizhuo
Lewis, Frank L.
Wei, Qinglai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 704 - 713
[46] Adaptive optimal consensus of nonlinear multi-agent systems with unknown dynamics using off-policy integral reinforcement learning
Yan, Lei
Liu, Zhi
Chen, C. L. Philip
Zhang, Yun
Wu, Zongze
NEUROCOMPUTING, 2025, 621
[47] Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning
Ma, Xiao
Yuan, Yuan
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2024, 361 (07):
[48] Unified Intrinsically Motivated Exploration for Off-Policy Learning in Continuous Action Spaces
Saglam, Baturay
Mutlu, Furkan B.
Dalmaz, Onat
Kozat, Suleyman S.
2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
[49] Batch process control based on reinforcement learning with segmented prioritized experience replay
Xu, Chen
Ma, Junwei
Tao, Hongfeng
MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
[50] A Path Planning Method Based on Deep Reinforcement Learning with Improved Prioritized Experience Replay for Human-Robot Collaboration
Sun, Deyu
Wen, Jingqian
Wang, Jingfei
Yang, Xiaonan
Hu, Yaoguang
HUMAN-COMPUTER INTERACTION, PT II, HCI 2024, 2024, 14685 : 196 - 206

← 1 2 3 4 5 →