Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

被引：0

作者：

Yang, Yana ^{[1
]}

Xi, Meng ^{[1
]}

Dai, Huiao ^{[1
]}

Wen, Jiabao ^{[1
]}

Yang, Jiachen ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 23期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

deep reinforcement learning; off policy; priority experience replay; z-score;

D O I：

10.3390/s24237746

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms.

引用

页数：17

共 50 条

[1] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
Hu Z.-J.
Gao X.-G.
Wan K.-F.
Zhang L.-T.
Wang Q.-L.
Neretin E.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
[2] Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay
Kong, Seung-Hyun
Nahrendra, I. Made Aswin
Paek, Dong-Hee
IEEE ACCESS, 2021, 9 (09): : 93152 - 93164
[3] High-Value Prioritized Experience Replay for Off-policy Reinforcement Learning
Cao, Xi
Wan, Huaiyu
Lin, Youfang
Han, Sheng
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1510 - 1514
[4] Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics
Berger, Sandrine
Ramo, Andrea Arroyo
Guillet, Valentin
Lahire, Thibault
Martin, Brice
Jardin, Thierry
Rachelson, Emmanuel
DATA-CENTRIC ENGINEERING, 2024, 5
[5] Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay
Cicek, Dogan C.
Duran, Enes
Saglam, Baturay
Mutlu, Furkan B.
Kozat, Suleyman S.
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1255 - 1262
[6] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
Cheng, Yuhu
Chen, Lin
Chen, C. L. Philip
Wang, Xuesong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
[7] Off-Policy Differentiable Logic Reinforcement Learning
Zhang, Li
Li, Xin
Wang, Mingzhong
Tian, Andong
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II, 2021, 12976 : 617 - 632
[8] Model-Based Off-Policy Deep Reinforcement Learning With Model-Embedding
Tan, Xiaoyu
Qu, Chao
Xiong, Junwu
Zhang, James
Qiu, Xihe
Jin, Yaochu
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2974 - 2986
[9] Relative importance sampling for off-policy actor-critic in deep reinforcement learning
Mahammad Humayoo
Gengzhong Zheng
Xiaoqing Dong
Liming Miao
Shuwei Qiu
Zexun Zhou
Peitao Wang
Zakir Ullah
Naveed Ur Rehman Junejo
Xueqi Cheng
Scientific Reports, 15 (1)
[10] An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning
Meng, Wenjia
Zheng, Qian
Shi, Yue
Pan, Gang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2223 - 2235

← 1 2 3 4 5 →