Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation

被引:2
|
作者
Luo, Yu [1 ]
Ji, Tianying [1 ]
Sun, Fuchun [1 ]
Liu, Huaping [1 ]
Zhang, Jianwei [2 ]
Jing, Mingxuan [3 ]
Huang, Wenbing [4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100190, Peoples R China
[2] Univ Hamburg, Dept Informat, D-20148 Hamburg, Germany
[3] Chinese Acad Sci, Inst Software, Sci & Technol Integrated Informat Syst Lab, Beijing 100190, Peoples R China
[4] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing 100872, Peoples R China
关键词
Task analysis; Heuristic algorithms; Convergence; Predictive models; Trajectory; Reinforcement learning; Navigation; Hierarchical reinforcement learning (HRL); model-based prediction; neural network approximation; regret bounds; robot locomotion and navigation;
D O I
10.1109/TNNLS.2024.3354061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical reinforcement learning (HRL) exhibits remarkable potential in addressing large-scale and long-horizon complex tasks. However, a fundamental challenge, which arises from the inherently entangled nature of hierarchical policies, has not been understood well, consequently compromising the training stability and exploration efficiency of HRL. In this article, we propose a novel HRL algorithm, high-level model approximation (HLMA), presenting both theoretical foundations and practical implementations. In HLMA, a Planner constructs an innovative high-level dynamic model to predict the k-step transition of the Controller in a subtask. This allows for the estimation of the evolving performance of the Controller. At low level, we leverage the initial state of each subtask, transforming absolute states into relative deviations by a designed operator as Controller input. This approach facilitates the reuse of subtask domain knowledge, enhancing data efficiency. With this designed structure, we establish the local convergence of each component within HLMA and subsequently derive regret bounds to ensure global convergence. Abundant experiments conducted on complex locomotion and navigation tasks demonstrate that HLMA surpasses other state-of-the-art single-level RL and HRL algorithms in terms of sample efficiency and asymptotic performance. In addition, thorough ablation studies validate the effectiveness of each component of HLMA.
引用
收藏
页码:2705 / 2719
页数:15
相关论文
共 50 条
  • [1] Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning
    Robert, Arnaud
    Pike-Burke, Ciara
    Faisal, A. Aldo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning
    Li, Jinning
    Tang, Chen
    Tomizuka, Masayoshi
    Zhan, Wei
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10216 - 10223
  • [3] Contrastive Learning as Goal-Conditioned Reinforcement Learning
    Eysenbach, Benjamin
    Zhang, Tianjun
    Levine, Sergey
    Salakhutdinov, Ruslan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] Goal-Conditioned Reinforcement Learning with Imagined Subgoals
    Chane-Sane, Elliot
    Schmid, Cordelia
    Laptev, Ivan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] State Representation Learning for Goal-Conditioned Reinforcement Learning
    Steccanella, Lorenzo
    Jonsson, Anders
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 84 - 99
  • [6] Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning
    Hansen-Estruch, Philippe
    Zhang, Amy
    Nair, Ashvin
    Yin, Patrick
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Curriculum Goal-Conditioned Imitation for Offline Reinforcement Learning
    Feng, Xiaoyun
    Jiang, Li
    Yu, Xudong
    Xu, Haoran
    Sun, Xiaoyan
    Wang, Jie
    Zhan, Xianyuan
    Chan, Wai Kin
    IEEE TRANSACTIONS ON GAMES, 2024, 16 (01) : 102 - 112
  • [8] Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
    Zeng, Zilai
    Zhang, Ce
    Wang, Shijie
    Sun, Chen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Goal-Conditioned Reinforcement Learning for Ultrasound Navigation Guidance
    Amadou, Abdoul Aziz
    Singh, Vivek
    Ghesu, Florin C.
    Kim, Young-Ho
    Stanciulescu, Laura
    Sai, Harshitha P.
    Sharma, Puneet
    Young, Alistair
    Rajani, Ronak
    Rhode, Kawal
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 319 - 329
  • [10] Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning
    Tang, Yunhao
    Kucukelbir, Alp
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130