Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

被引:0
作者
Wang, Haoran [1 ]
Tang, Zeshen [1 ]
Sun, Yaoru [1 ]
Wang, Fang [2 ]
Zhang, Siyu [1 ]
Chen, Yeming [1 ]
机构
[1] Tongji Univ, Coll Elect & Informat Engn, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England
关键词
Planning; Task analysis; Reinforcement learning; Robustness; Sun; Learning systems; Vehicle dynamics; Deep reinforcement learning (DRL); goal conditioning; hierarchical reinforcement learning (HRL); interlevel cooperation; model-based rollout;
D O I
10.1109/TNNLS.2024.3425809
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q -function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Coordination of distributed unmanned surface vehicles via model-based reinforcement learning methods
    Miao, Runlong
    Wang, Lingxiao
    Pang, Shuo
    APPLIED OCEAN RESEARCH, 2022, 122
  • [22] Reinforcement Twinning: From digital twins to model-based reinforcement learning
    Schena, Lorenzo
    Marques, Pedro A.
    Poletti, Romain
    Van den Berghe, Jan
    Mendez, Miguel A.
    JOURNAL OF COMPUTATIONAL SCIENCE, 2024, 82
  • [23] Latent Causal Dynamics Model for Model-Based Reinforcement Learning
    Hao, Zhifeng
    Zhu, Haipeng
    Chen, Wei
    Cai, Ruichu
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 219 - 230
  • [24] Model-based reinforcement learning with model error and its application
    Tajima, Yoshiyuki
    Onisawa, Takehisa
    PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-8, 2007, : 1333 - 1336
  • [25] Model-based reinforcement learning: a computational model and an fMRI study
    Yoshida, W
    Ishii, S
    NEUROCOMPUTING, 2005, 63 : 253 - 269
  • [26] Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning With Asymmetric Rewards
    Sonmez, Yasin
    Junnarkar, Neelay
    Arcak, Murat
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1180 - 1185
  • [27] Model-Based Reinforcement Learning in Differential Graphical Games
    Kamalapurkar, Rushikesh
    Klotz, Justin R.
    Walters, Patrick
    Dixon, Warren E.
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2018, 5 (01): : 423 - 433
  • [28] Model-Based Reinforcement Learning for Cavity Filter Tuning
    Nimara, Doumitrou Daniil
    Malek-Mohammadi, Mohammadreza
    Wei, Jieqiang
    Huang, Vincent
    Ogren, Petter
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [29] Likelihood Estimator for Multi Model-Based Reinforcement Learning
    Albarrans, Guilherme
    Freire, Valdinei
    INTELLIGENT SYSTEMS, BRACIS 2024, PT II, 2025, 15413 : 184 - 198
  • [30] Model-based Bayesian Reinforcement Learning for Dialogue Management
    Lison, Pierre
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 475 - 479