Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

被引:0
作者
Wang, Haoran [1 ]
Tang, Zeshen [1 ]
Sun, Yaoru [1 ]
Wang, Fang [2 ]
Zhang, Siyu [1 ]
Chen, Yeming [1 ]
机构
[1] Tongji Univ, Coll Elect & Informat Engn, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England
关键词
Planning; Task analysis; Reinforcement learning; Robustness; Sun; Learning systems; Vehicle dynamics; Deep reinforcement learning (DRL); goal conditioning; hierarchical reinforcement learning (HRL); interlevel cooperation; model-based rollout;
D O I
10.1109/TNNLS.2024.3425809
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q -function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Learnable Weighting Mechanism in Model-based Reinforcement Learning
    Huang W.-Z.
    Yin Q.-Y.
    Zhang J.-G.
    Huang K.-Q.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (06): : 2765 - 2775
  • [32] Exploration in Relational Domains for Model-based Reinforcement Learning
    Lang, Tobias
    Toussaint, Marc
    Kersting, Kristian
    JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 3725 - 3768
  • [33] A Model-based Factored Bayesian Reinforcement Learning Approach
    Wu, Bo
    Feng, Yanpeng
    Zheng, Hongyan
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1092 - 1095
  • [34] Free Will Belief as a Consequence of Model-Based Reinforcement Learning
    Rehn, Erik M.
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 353 - 363
  • [35] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [36] Guided Reinforcement Learning via Sequence Learning
    Ramamurthy, Rajkumar
    Sifa, Rafet
    Luebbering, Max
    Bauckhage, Christian
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 335 - 345
  • [37] Wind Turbine Fault-Tolerant Control via Incremental Model-Based Reinforcement Learning
    Xie, Jingjie
    Dong, Hongyang
    Zhao, Xiaowei
    Lin, Shuyue
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 1958 - 1969
  • [38] Wind Turbine Fault-Tolerant Control via Incremental Model-Based Reinforcement Learning
    Xie, Jingjie
    Dong, Hongyang
    Zhao, Xiaowei
    Lin, Shuyue
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 1958 - 1969
  • [39] Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning
    Dong, Linsen
    Li, Yuanlong
    Zhou, Xin
    Wen, Yonggang
    Guan, Kyle
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2758 - 2771
  • [40] Estimating Lyapunov Region of Attraction for Robust Model-Based Reinforcement Learning USV
    Xia, Lei
    Cui, Yunduan
    Yi, Zhengkun
    Li, Huiyun
    Wu, Xinyu
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 8898 - 8911