Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

被引:0
作者
Wang, Haoran [1 ]
Tang, Zeshen [1 ]
Sun, Yaoru [1 ]
Wang, Fang [2 ]
Zhang, Siyu [1 ]
Chen, Yeming [1 ]
机构
[1] Tongji Univ, Coll Elect & Informat Engn, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England
关键词
Planning; Task analysis; Reinforcement learning; Robustness; Sun; Learning systems; Vehicle dynamics; Deep reinforcement learning (DRL); goal conditioning; hierarchical reinforcement learning (HRL); interlevel cooperation; model-based rollout;
D O I
10.1109/TNNLS.2024.3425809
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q -function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Knowledge-Aided Model-Based Reinforcement Learning for Anti-Jamming Strategy Learning
    Li, Kang
    Liu, Hongwei
    Jiu, Bo
    Pu, Wenqiang
    Peng, Xiaojun
    Yan, Junkun
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (03) : 2976 - 2994
  • [42] Case-Based Task Generalization in Model-Based Reinforcement Learning
    Zholus, Artem
    Panov, Aleksandr, I
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2021, 2022, 13154 : 344 - 354
  • [43] Flexible unmanned surface vehicles control using probabilistic model-based reinforcement learning with hierarchical Gaussian distribution
    Cui, Yunduan
    Xu, Kun
    Zheng, Chunhua
    Liu, Jia
    Peng, Lei
    Li, Huiyun
    OCEAN ENGINEERING, 2023, 285
  • [44] Sequential Monte Carlo Samplers for Model-Based Reinforcement Learning
    Sonmez, Orhan
    Cemgil, A. Taylan
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [45] Efficient hyperparameter optimization through model-based reinforcement learning
    Wu, Jia
    Chen, SenPeng
    Liu, XiYuan
    NEUROCOMPUTING, 2020, 409 : 381 - 393
  • [46] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
    Li, Hao
    Zhou, Xiao-Hu
    Xie, Xiao-Liang
    Liu, Shi-Qi
    Feng, Zhen-Qiu
    Gui, Mei-Jiang
    Xiang, Tian-Yu
    Huang, De-Xing
    Hou, Zeng-Guang
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (03): : 1054 - 1062
  • [47] Federated Ensemble Model-Based Reinforcement Learning in Edge Computing
    Wang, Jin
    Hu, Jia
    Mills, Jed
    Min, Geyong
    Xia, Ming
    Georgalas, Nektarios
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (06) : 1848 - 1859
  • [48] High-accuracy model-based reinforcement learning, a survey
    Plaat, Aske
    Kosters, Walter
    Preuss, Mike
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9541 - 9573
  • [49] Model-based Reinforcement Learning for Ship Path Following with Disturbances
    Dong, Zhengyang
    Chen, Linying
    Huang, Yamin
    Chen, Pengfei
    Mou, Junmin
    IFAC PAPERSONLINE, 2024, 58 (20): : 247 - 252
  • [50] High-accuracy model-based reinforcement learning, a survey
    Aske Plaat
    Walter Kosters
    Mike Preuss
    Artificial Intelligence Review, 2023, 56 : 9541 - 9573