Hierarchical Reinforcement Learning: A Survey and Open Research Challenges

被引：51

作者：

Hutsebaut-Buysse, Matthias ^{[1
]}

Mets, Kevin ^{[1
]}

Latre, Steven ^{[1
]}

机构：

[1] Univ Antwerp imec, Dept Comp Sci, Sint Pietersvliet 7, B-2000 Antwerp, Belgium

来源：

MACHINE LEARNING AND KNOWLEDGE EXTRACTION | 2022年 / 4卷 / 01期

关键词：

hierarchical reinforcement learning; deep reinforcement learning; reinforcement learning; DEEP NEURAL-NETWORKS; LEVEL; ABSTRACTION; FRAMEWORK; OPTIONS;

D O I：

10.3390/make4010009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) allows an agent to solve sequential decision-making problems by interacting with an environment in a trial-and-error fashion. When these environments are very complex, pure random exploration of possible solutions often fails, or is very sample inefficient, requiring an unreasonable amount of interaction with the environment. Hierarchical reinforcement learning (HRL) utilizes forms of temporal- and state-abstractions in order to tackle these challenges, while simultaneously paving the road for behavior reuse and increased interpretability of RL systems. In this survey paper we first introduce a selection of problem-specific approaches, which provided insight in how to utilize often handcrafted abstractions in specific task settings. We then introduce the Options framework, which provides a more generic approach, allowing abstractions to be discovered and learned semi-automatically. Afterwards we introduce the goal-conditional approach, which allows sub-behaviors to be embedded in a continuous space. In order to further advance the development of HRL agents, capable of simultaneously learning abstractions and how to use them, solely from interaction with complex high dimensional environments, we also identify a set of promising research directions.

引用

页码：172 / 221

页数：50

共 266 条

[31] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[32]

Bradtke S.J., 1994, P 7 INT C NEUR INF P

[33]

Brunskill E, 2014, PR MACH LEARN RES, V32, P316

[34]

Burda Y., 2018, P INT C LEARN REPR V

[35]

Burda Y., ARXIV181012894

[36]

Castro Pablo Samuel, 2012, Recent Advances in Reinforcement Learning. 9th European Workshop (EWRL 2011). Revised Selected Papers, P140, DOI 10.1007/978-3-642-29946-9_16

[37]

Castro PS, 2010, AAAI CONF ARTIF INTE, P1065

[38]

Cho K., 2014, ARXIV14061078, DOI [10.48550/arXiv.1406.1078, DOI 10.3115/V1/D14-1179]

[39]

Cobbe K, 2019, PR MACH LEARN RES, V97

[40]

Comanici G., 2010, P 9 INT C AUT AG MUL, P7, DOI [10.1145/1838206.1838300, DOI 10.1145/1838206.1838300]

← 1 2 3 4 5 6 7 8 9 10 →