Sample-Efficient Deep Reinforcement Learning with Directed Associative Graph

被引：0

作者：

Yang, Dujia ^{[1
,2
]}

Qin, Xiaowei ^{[1
,2
]}

Xu, Xiaodong ^{[1
,2
]}

Li, Chensheng ^{[1
,2
]}

Wei, Guo ^{[1
,2
]}

机构：

[1] Univ Sci & Technol China, Hefei 230026, Peoples R China

[2] CAS Key Lab Wireless Opt Commun, Hefei 230027, Peoples R China

来源：

CHINA COMMUNICATIONS | 2021年 / 18卷 / 06期

关键词：

directed associative graph; sample efficiency; deep reinforcement learning;

D O I：

暂无

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Reinforcement learning can be modeled as markov decision process mathematically. In consequence, the interaction samples as well as the connection relation between them are two main types of information for learning. However, most of recent works on deep reinforcement learning treat samples independently either in their own episode or between episodes. In this paper, in order to utilize more sample information, we propose another learning system based on directed associative graph (DAG). The DAG is built on all trajectories in real time, which includes the whole connection relation of all samples among all episodes. Through planning with directed edges on DAG, we offer another perspective to estimate state-action pair, especially for the unknowns to deep neural network (DNN) as well as episodic memory (EM). Mixed loss function is generated by the three learning systems (DNN, EM and DAG) to improve the efficiency of the parameter update in the proposed algorithm. We show that our algorithm is significantly better than the state-of-the-art algorithm in performance and sample efficiency on testing environments. Furthermore, the convergence of our algorithm is proved in the appendix and its long-term performance as well as the effects of DAG are verified.

引用

页码：100 / 113

页数：14

共 50 条

[1] Sample-Efficient Deep Reinforcement Learning with Directed Associative Graph
Dujia Yang
Xiaowei Qin
Xiaodong Xu
Chensheng Li
Guo Wei
中国通信, 2021, 18 (06) : 100 - 113
[2] Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update
Lee, Su Young
Choi, Sungik
Chung, Sae-Young
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
Jin, Chi
Kakade, Sham M.
Krishnamurthy, Akshay
Liu, Qinghua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[4] Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents
Kim, Woojun
Shin, Yongjae
Park, Jongeui
Sung, Youngchul
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Sample-efficient Reinforcement Learning in Robotic Table Tennis
Tebbe, Jonas
Krauch, Lukas
Gao, Yapeng
Zell, Andreas
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4171 - 4178
[6] Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information
Efroni, Yonathan
Foster, Dylan J.
Misra, Dipendra
Krishnamurthy, Akshay
Langford, John
CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
[7] Sample-efficient reinforcement learning for CERN accelerator control
Kain, Verena
Hirlander, Simon
Goddard, Brennan
Velotti, Francesco Maria
Porta, Giovanni Zevi Della
Bruchon, Niky
Valentino, Gianluca
PHYSICAL REVIEW ACCELERATORS AND BEAMS, 2020, 23 (12)
[8] A New Sample-Efficient PAC Reinforcement Learning Algorithm
Zehfroosh, Ashkan
Tanner, Herbert G.
2020 28TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2020, : 788 - 793
[9] Conditional Abstraction Trees for Sample-Efficient Reinforcement Learning
Dadvar, Mehdi
Nayyar, Rashmeet Kaur
Srivastava, Siddharth
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 485 - 495
[10] Sample-efficient Deep Reinforcement Learning with Imaginary Rollouts for Human-Robot Interaction
Thabet, Mohammad
Patacchiola, Massimiliano
Cangelosi, Angelo
2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 5079 - 5085

← 1 2 3 4 5 →