Hierarchical Reinforcement Learning: A Comprehensive Survey

被引:242
作者
Pateria, Shubham [1 ]
Subagdja, Budhitama [2 ]
Tan, Ah-hwee [2 ]
Quek, Chai [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, 50 Nanyang Ave, Singapore 639798, Singapore
[2] Singapore Management Univ, Sch Comp & Informat Syst, 80 Stamford Rd, Singapore 178902, Singapore
基金
新加坡国家研究基金会;
关键词
Hierarchical reinforcement learning; subtask discovery; skill discovery; hierarchical reinforcement learning survey; hierarchical reinforcement learning taxonomy; TEMPORAL ABSTRACTION; FRAMEWORK; OPTIONS;
D O I
10.1145/3453160
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.
引用
收藏
页数:35
相关论文
共 104 条
[1]  
Achiam J., 2018, ARXIV180710299
[2]  
Ahilan S., 2019, ARXIV190108492
[3]  
Al-Emran Mostafa, 2015, International Journal of Computing and Digital Systems, V4, P137, DOI 10.12785/ijcds/040207
[4]  
[Anonymous], 2017, P INT C LEARN REPR
[5]  
[Anonymous], 2018, ARXIV180311485
[6]  
Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[7]  
Bagaria A., 2020, INT C LEARNING REPRE
[8]  
Bakker B, 2004, Proceedings of the Second IASTED International Conference on Neural Networks and Computational Intelligence, P125
[9]  
Barreto A, 2019, ADV NEUR IN, V32
[10]  
Barto AG, 2003, DISCRETE EVENT DYN S, V13, P41, DOI [10.1023/A:1022140919877, 10.1023/A:1025696116075]