Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation

被引:6
作者
Luo, Yu [1 ]
Ji, Tianying [1 ]
Sun, Fuchun [1 ]
Liu, Huaping [1 ]
Zhang, Jianwei [2 ]
Jing, Mingxuan [3 ]
Huang, Wenbing [4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100190, Peoples R China
[2] Univ Hamburg, Dept Informat, D-20148 Hamburg, Germany
[3] Chinese Acad Sci, Inst Software, Sci & Technol Integrated Informat Syst Lab, Beijing 100190, Peoples R China
[4] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing 100872, Peoples R China
关键词
Task analysis; Heuristic algorithms; Convergence; Predictive models; Trajectory; Reinforcement learning; Navigation; Hierarchical reinforcement learning (HRL); model-based prediction; neural network approximation; regret bounds; robot locomotion and navigation;
D O I
10.1109/TNNLS.2024.3354061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical reinforcement learning (HRL) exhibits remarkable potential in addressing large-scale and long-horizon complex tasks. However, a fundamental challenge, which arises from the inherently entangled nature of hierarchical policies, has not been understood well, consequently compromising the training stability and exploration efficiency of HRL. In this article, we propose a novel HRL algorithm, high-level model approximation (HLMA), presenting both theoretical foundations and practical implementations. In HLMA, a Planner constructs an innovative high-level dynamic model to predict the k-step transition of the Controller in a subtask. This allows for the estimation of the evolving performance of the Controller. At low level, we leverage the initial state of each subtask, transforming absolute states into relative deviations by a designed operator as Controller input. This approach facilitates the reuse of subtask domain knowledge, enhancing data efficiency. With this designed structure, we establish the local convergence of each component within HLMA and subsequently derive regret bounds to ensure global convergence. Abundant experiments conducted on complex locomotion and navigation tasks demonstrate that HLMA surpasses other state-of-the-art single-level RL and HRL algorithms in terms of sample efficiency and asymptotic performance. In addition, thorough ablation studies validate the effectiveness of each component of HLMA.
引用
收藏
页码:2705 / 2719
页数:15
相关论文
共 50 条
[31]   Effect of high-level content organizers on hypertext learning [J].
Voeroes, Zsofia ;
Rouet, Jean-Francois ;
Pleh, Csaba .
COMPUTERS IN HUMAN BEHAVIOR, 2011, 27 (05) :2047-2055
[32]   A Novel Artificial Hydrocarbon Networks Based Value Function Approximation in Hierarchical Reinforcement Learning [J].
Ponce, Hiram .
ADVANCES IN SOFT COMPUTING, MICAI 2016, PT II, 2017, 10062 :211-225
[33]   Computational Properties of the Hippocampus Increase the Efficiency of Goal-Directed Foraging through Hierarchical Reinforcement Learning [J].
Chalmers, Eric ;
Luczak, Artur ;
Gruber, Aaron J. .
FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2016, 10
[34]   Hierarchical Deep Reinforcement Learning for Self-Powered Monitoring and Communication Integrated System in High-Speed Railway Networks [J].
Ling, Zhuang ;
Hu, Fengye ;
Liu, Tanda ;
Jia, Ziye ;
Han, Zhu .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (06) :6336-6349
[35]   Autonomous Learning of High-Level States and Actions in Continuous Environments [J].
Mugan, Jonathan ;
Kuipers, Benjamin .
IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (01) :70-86
[36]   Model-based hierarchical reinforcement learning and human action control [J].
Botvinick, Matthew ;
Weinstein, Ari .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2014, 369 (1655)
[37]   End-to-End High-Level Control of Lower-Limb Exoskeleton for Human Performance Augmentation Based on Deep Reinforcement Learning [J].
Zheng, Ranran ;
Yu, Zhiyuan ;
Liu, Hongwei ;
Chen, Jing ;
Zhao, Zhe ;
Jia, Longfei .
IEEE ACCESS, 2023, 11 :102340-102351
[38]   A hierarchical reinforcement learning model explains individual differences in attentional set shifting [J].
Talwar, Anahita ;
Cormack, Francesca ;
Huys, Quentin J. M. ;
Roiser, Jonathan P. .
COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE, 2024, 24 (06) :1008-1022
[39]   High-level features for resource economy and fast learning in skill transfer [J].
Ahmetoglu, Alper ;
Ugur, Emre ;
Asada, Minoru ;
Oztop, Erhan .
ADVANCED ROBOTICS, 2022, 36 (5-6) :291-303
[40]   Hierarchical Evasive Path Planning Using Reinforcement Learning and Model Predictive Control [J].
Feher, Arpad ;
Aradi, Szilard ;
Becsi, Tamas .
IEEE ACCESS, 2020, 8 :187470-187482