Sample-Efficient Deep Reinforcement Learning with Directed Associative Graph

被引：0

作者：

Yang, Dujia ^{[1
,2
]}

Qin, Xiaowei ^{[1
,2
]}

Xu, Xiaodong ^{[1
,2
]}

Li, Chensheng ^{[1
,2
]}

Wei, Guo ^{[1
,2
]}

机构：

[1] Univ Sci & Technol China, Hefei 230026, Peoples R China

[2] CAS Key Lab Wireless Opt Commun, Hefei 230027, Peoples R China

来源：

CHINA COMMUNICATIONS | 2021年 / 18卷 / 06期

关键词：

directed associative graph; sample efficiency; deep reinforcement learning;

D O I：

暂无

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Reinforcement learning can be modeled as markov decision process mathematically. In consequence, the interaction samples as well as the connection relation between them are two main types of information for learning. However, most of recent works on deep reinforcement learning treat samples independently either in their own episode or between episodes. In this paper, in order to utilize more sample information, we propose another learning system based on directed associative graph (DAG). The DAG is built on all trajectories in real time, which includes the whole connection relation of all samples among all episodes. Through planning with directed edges on DAG, we offer another perspective to estimate state-action pair, especially for the unknowns to deep neural network (DNN) as well as episodic memory (EM). Mixed loss function is generated by the three learning systems (DNN, EM and DAG) to improve the efficiency of the parameter update in the proposed algorithm. We show that our algorithm is significantly better than the state-of-the-art algorithm in performance and sample efficiency on testing environments. Furthermore, the convergence of our algorithm is proved in the appendix and its long-term performance as well as the effects of DAG are verified.

引用

页码：100 / 113

页数：14

共 50 条

[31] Sample-efficient deep learning for accelerating photonic inverse design
Hegde, Ravi
OSA CONTINUUM, 2021, 4 (03): : 1019 - 1033
[32] Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model
Nguyen, Thanh
Luu, Tung M.
Vu, Thang
Yoo, Chang D.
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3471 - 3477
[33] Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards
Metcalf, Katherine
Sarabia, Miguel
Mackraz, Natalie
Theobald, Barry-John
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[34] Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
Wang, Bingyan
Yan, Yuling
Fan, Jianqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[35] On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Nguyen-Tang, Thanh
Arora, Raman
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[36] Sample-Efficient Multimodal Dynamics Modeling for Risk-Sensitive Reinforcement Learning
Yashima, Ryota
Yamaguchi, Akihiko
Hashimoto, Koichi
2022 8TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND ROBOTICS ENGINEERING (ICMRE 2022), 2022, : 21 - 27
[37] Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting
Li, Gen
Chen, Yuxin
Chi, Yuejie
Gu, Yuantao
Wei, Yuting
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[38] Sample-Efficient Multimodal Dynamics Modeling for Risk-Sensitive Reinforcement Learning
Yashima, Ryota
Yamaguchi, Akihiko
Hashimoto, Koichi
2022 8th International Conference on Mechatronics and Robotics Engineering, ICMRE 2022, 2022, : 21 - 27
[39] Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control
Qiu, Yunbo
Zhan, Yuzhu
Jin, Yue
Wang, Jian
Zhang, Xudong
2022 IEEE 96TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-FALL), 2022,
[40] Ship course-keeping in waves using sample-efficient reinforcement learning
Greep, Justin
Bayezit, Afsin Baran
Mak, Bart
Rijpkema, Douwe
Kinaci, Omer Kemal
Duz, Bulent
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 141

← 1 2 3 4 5 →