Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

被引：0

作者：

Wang, Haoran ^{[1
]}

Tang, Zeshen ^{[1
]}

Sun, Yaoru ^{[1
]}

Wang, Fang ^{[2
]}

Zhang, Siyu ^{[1
]}

Chen, Yeming ^{[1
]}

机构：

[1] Tongji Univ, Coll Elect & Informat Engn, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China

[2] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

关键词：

Planning; Task analysis; Reinforcement learning; Robustness; Sun; Learning systems; Vehicle dynamics; Deep reinforcement learning (DRL); goal conditioning; hierarchical reinforcement learning (HRL); interlevel cooperation; model-based rollout;

D O I：

10.1109/TNNLS.2024.3425809

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q -function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.

引用

页数：15

共 50 条

[41] Knowledge-Aided Model-Based Reinforcement Learning for Anti-Jamming Strategy Learning
Li, Kang
Liu, Hongwei
Jiu, Bo
Pu, Wenqiang
Peng, Xiaojun
Yan, Junkun
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (03) : 2976 - 2994
[42] Case-Based Task Generalization in Model-Based Reinforcement Learning
Zholus, Artem
Panov, Aleksandr, I
ARTIFICIAL GENERAL INTELLIGENCE, AGI 2021, 2022, 13154 : 344 - 354
[43] Flexible unmanned surface vehicles control using probabilistic model-based reinforcement learning with hierarchical Gaussian distribution
Cui, Yunduan
Xu, Kun
Zheng, Chunhua
Liu, Jia
Peng, Lei
Li, Huiyun
OCEAN ENGINEERING, 2023, 285
[44] Sequential Monte Carlo Samplers for Model-Based Reinforcement Learning
Sonmez, Orhan
Cemgil, A. Taylan
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
[45] Efficient hyperparameter optimization through model-based reinforcement learning
Wu, Jia
Chen, SenPeng
Liu, XiYuan
NEUROCOMPUTING, 2020, 409 : 381 - 393
[46] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
Li, Hao
Zhou, Xiao-Hu
Xie, Xiao-Liang
Liu, Shi-Qi
Feng, Zhen-Qiu
Gui, Mei-Jiang
Xiang, Tian-Yu
Huang, De-Xing
Hou, Zeng-Guang
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (03): : 1054 - 1062
[47] Federated Ensemble Model-Based Reinforcement Learning in Edge Computing
Wang, Jin
Hu, Jia
Mills, Jed
Min, Geyong
Xia, Ming
Georgalas, Nektarios
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (06) : 1848 - 1859
[48] High-accuracy model-based reinforcement learning, a survey
Plaat, Aske
Kosters, Walter
Preuss, Mike
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9541 - 9573
[49] Model-based Reinforcement Learning for Ship Path Following with Disturbances
Dong, Zhengyang
Chen, Linying
Huang, Yamin
Chen, Pengfei
Mou, Junmin
IFAC PAPERSONLINE, 2024, 58 (20): : 247 - 252
[50] High-accuracy model-based reinforcement learning, a survey
Aske Plaat
Walter Kosters
Mike Preuss
Artificial Intelligence Review, 2023, 56 : 9541 - 9573

← 1 2 3 4 5 →