Sample-Efficient Deep Reinforcement Learning with Directed Associative Graph

被引:0
|
作者
Yang, Dujia [1 ,2 ]
Qin, Xiaowei [1 ,2 ]
Xu, Xiaodong [1 ,2 ]
Li, Chensheng [1 ,2 ]
Wei, Guo [1 ,2 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] CAS Key Lab Wireless Opt Commun, Hefei 230027, Peoples R China
关键词
directed associative graph; sample efficiency; deep reinforcement learning;
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Reinforcement learning can be modeled as markov decision process mathematically. In consequence, the interaction samples as well as the connection relation between them are two main types of information for learning. However, most of recent works on deep reinforcement learning treat samples independently either in their own episode or between episodes. In this paper, in order to utilize more sample information, we propose another learning system based on directed associative graph (DAG). The DAG is built on all trajectories in real time, which includes the whole connection relation of all samples among all episodes. Through planning with directed edges on DAG, we offer another perspective to estimate state-action pair, especially for the unknowns to deep neural network (DNN) as well as episodic memory (EM). Mixed loss function is generated by the three learning systems (DNN, EM and DAG) to improve the efficiency of the parameter update in the proposed algorithm. We show that our algorithm is significantly better than the state-of-the-art algorithm in performance and sample efficiency on testing environments. Furthermore, the convergence of our algorithm is proved in the appendix and its long-term performance as well as the effects of DAG are verified.
引用
收藏
页码:100 / 113
页数:14
相关论文
共 50 条
  • [1] Sample-Efficient Deep Reinforcement Learning with Directed Associative Graph
    Dujia Yang
    Xiaowei Qin
    Xiaodong Xu
    Chensheng Li
    Guo Wei
    中国通信, 2021, 18 (06) : 100 - 113
  • [2] Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update
    Lee, Su Young
    Choi, Sungik
    Chung, Sae-Young
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
    Jin, Chi
    Kakade, Sham M.
    Krishnamurthy, Akshay
    Liu, Qinghua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents
    Kim, Woojun
    Shin, Yongjae
    Park, Jongeui
    Sung, Youngchul
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Sample-efficient Reinforcement Learning in Robotic Table Tennis
    Tebbe, Jonas
    Krauch, Lukas
    Gao, Yapeng
    Zell, Andreas
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4171 - 4178
  • [6] Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information
    Efroni, Yonathan
    Foster, Dylan J.
    Misra, Dipendra
    Krishnamurthy, Akshay
    Langford, John
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [7] Sample-efficient reinforcement learning for CERN accelerator control
    Kain, Verena
    Hirlander, Simon
    Goddard, Brennan
    Velotti, Francesco Maria
    Porta, Giovanni Zevi Della
    Bruchon, Niky
    Valentino, Gianluca
    PHYSICAL REVIEW ACCELERATORS AND BEAMS, 2020, 23 (12)
  • [8] A New Sample-Efficient PAC Reinforcement Learning Algorithm
    Zehfroosh, Ashkan
    Tanner, Herbert G.
    2020 28TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2020, : 788 - 793
  • [9] Conditional Abstraction Trees for Sample-Efficient Reinforcement Learning
    Dadvar, Mehdi
    Nayyar, Rashmeet Kaur
    Srivastava, Siddharth
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 485 - 495
  • [10] Sample-efficient Deep Reinforcement Learning with Imaginary Rollouts for Human-Robot Interaction
    Thabet, Mohammad
    Patacchiola, Massimiliano
    Cangelosi, Angelo
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 5079 - 5085