Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

被引:0
作者
Wang, Haoran [1 ]
Tang, Zeshen [1 ]
Sun, Yaoru [1 ]
Wang, Fang [2 ]
Zhang, Siyu [1 ]
Chen, Yeming [1 ]
机构
[1] Tongji Univ, Coll Elect & Informat Engn, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England
关键词
Planning; Task analysis; Reinforcement learning; Robustness; Sun; Learning systems; Vehicle dynamics; Deep reinforcement learning (DRL); goal conditioning; hierarchical reinforcement learning (HRL); interlevel cooperation; model-based rollout;
D O I
10.1109/TNNLS.2024.3425809
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q -function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Multiphase Autonomous Docking via Model-Based and Hierarchical Reinforcement Learning
    Aborizk, Anthony
    Fitz-Coy, Norman
    JOURNAL OF SPACECRAFT AND ROCKETS, 2024, 61 (04) : 993 - 1005
  • [2] Model-based hierarchical reinforcement learning and human action control
    Botvinick, Matthew
    Weinstein, Ari
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2014, 369 (1655)
  • [3] Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations
    Sun, Yuewen
    Zhang, Kun
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1035 - 1048
  • [4] A survey on model-based reinforcement learning
    Luo, Fan-Ming
    Xu, Tian
    Lai, Hang
    Chen, Xiong-Hui
    Zhang, Weinan
    Yu, Yang
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (02)
  • [5] Model-Based Reinforcement Learning via Stochastic Hybrid Models
    Abdulsamad, Hany
    Peters, Jan
    IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2023, 2 : 155 - 170
  • [6] Model-Based Reinforcement Learning via Proximal Policy Optimization
    Sun, Yuewen
    Yuan, Xin
    Liu, Wenzhang
    Sun, Changyin
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
  • [7] Model-based reinforcement learning for alternating Markov games
    Mellor, D
    AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2903 : 520 - 531
  • [8] Uncertainty-Aware Model-Based Offline Reinforcement Learning for Automated Driving
    Diehl, Christopher
    Sievernich, Timo Sebastian
    Kruger, Martin
    Hoffmann, Frank
    Bertram, Torsten
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (02) : 1167 - 1174
  • [9] Coalition Game of Radar Network for Multitarget Tracking via Model-Based Multiagent Reinforcement Learning
    Xiong, Kui
    Zhang, Tianxian
    Cui, Guolong
    Wang, Shiyuan
    Kong, Lingjiang
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (03) : 2123 - 2140
  • [10] Data-Efficient Task Generalization via Probabilistic Model-Based Meta Reinforcement Learning
    Bhardwaj, Arjun
    Rothfuss, Jonas
    Sukhija, Bhavya
    As, Yarden
    Hutter, Marco
    Coros, Stelian
    Krause, Andreas
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3918 - 3925