Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation

被引:2
|
作者
Luo, Yu [1 ]
Ji, Tianying [1 ]
Sun, Fuchun [1 ]
Liu, Huaping [1 ]
Zhang, Jianwei [2 ]
Jing, Mingxuan [3 ]
Huang, Wenbing [4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100190, Peoples R China
[2] Univ Hamburg, Dept Informat, D-20148 Hamburg, Germany
[3] Chinese Acad Sci, Inst Software, Sci & Technol Integrated Informat Syst Lab, Beijing 100190, Peoples R China
[4] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing 100872, Peoples R China
关键词
Task analysis; Heuristic algorithms; Convergence; Predictive models; Trajectory; Reinforcement learning; Navigation; Hierarchical reinforcement learning (HRL); model-based prediction; neural network approximation; regret bounds; robot locomotion and navigation;
D O I
10.1109/TNNLS.2024.3354061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical reinforcement learning (HRL) exhibits remarkable potential in addressing large-scale and long-horizon complex tasks. However, a fundamental challenge, which arises from the inherently entangled nature of hierarchical policies, has not been understood well, consequently compromising the training stability and exploration efficiency of HRL. In this article, we propose a novel HRL algorithm, high-level model approximation (HLMA), presenting both theoretical foundations and practical implementations. In HLMA, a Planner constructs an innovative high-level dynamic model to predict the k-step transition of the Controller in a subtask. This allows for the estimation of the evolving performance of the Controller. At low level, we leverage the initial state of each subtask, transforming absolute states into relative deviations by a designed operator as Controller input. This approach facilitates the reuse of subtask domain knowledge, enhancing data efficiency. With this designed structure, we establish the local convergence of each component within HLMA and subsequently derive regret bounds to ensure global convergence. Abundant experiments conducted on complex locomotion and navigation tasks demonstrate that HLMA surpasses other state-of-the-art single-level RL and HRL algorithms in terms of sample efficiency and asymptotic performance. In addition, thorough ablation studies validate the effectiveness of each component of HLMA.
引用
收藏
页码:2705 / 2719
页数:15
相关论文
共 50 条
  • [21] Goal-Conditioned Reinforcement Learning With Disentanglement-Based Reachability Planning
    Qian, Zhifeng
    You, Mingyu
    Zhou, Hongjun
    Xu, Xuanhui
    He, Bin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08): : 4721 - 4728
  • [22] Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning
    Liu, Jinxin
    Wang, Donglin
    Tian, Qiangxing
    Chen, Zhengyu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7558 - 7566
  • [23] Goal-conditioned offline reinforcement learning through state space partitioning
    Wang, Mianchu
    Jin, Yue
    Montana, Giovanni
    MACHINE LEARNING, 2024, 113 (05) : 2435 - 2465
  • [24] Highly valued subgoal generation for efficient goal-conditioned reinforcement learning
    Li, Yao
    Wang, YuHui
    Tan, XiaoYang
    NEURAL NETWORKS, 2025, 181
  • [25] Goal-conditioned offline reinforcement learning through state space partitioning
    Mianchu Wang
    Yue Jin
    Giovanni Montana
    Machine Learning, 2024, 113 : 2435 - 2465
  • [26] Instructing Goal-Conditioned Reinforcement Learning Agents with Temporal Logic Objectives
    Qiu, Wenjie
    Mao, Wensen
    Zhu, He
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability
    Zhu, Hanlin
    Zhang, Amy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning
    Hongyu Ding
    Yuanze Tang
    Qing Wu
    Bo Wang
    Chunlin Chen
    Zhi Wang
    IEEE/CAAJournalofAutomaticaSinica, 2023, 10 (12) : 2233 - 2247
  • [29] Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning
    Hoang, Christopher
    Sohn, Sungryull
    Choi, Jongwook
    Carvalho, Wilka
    Lee, Honglak
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [30] Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: A Short Survey
    Colas, Cedric
    Karch, Tristan
    Sigaud, Olivier
    Oudeyer, Pierre-Yves
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 74 : 1159 - 1199