Hierarchical Adversarial Inverse Reinforcement Learning

被引：3

作者：

Chen, Jiayu ^{[1
]}

Lan, Tian ^{[2
]}

Aggarwal, Vaneet ^{[1
,3
]}

机构：

[1] Purdue Univ, Sch Ind Engn, W Lafayette, IN 47907 USA

[2] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA

[3] KAUST, Comp Sci Dept, Thuwal 23955, Saudi Arabia

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 12期

关键词：

Inverse reinforcement learning (IRL); hierarchical imitation learning (HIL); robotic learning;

D O I：

10.1109/TNNLS.2023.3305983

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Imitation learning (IL) has been proposed to recover the expert policy from demonstrations. However, it would be difficult to learn a single monolithic policy for highly complex long-horizon tasks of which the expert policy usually contains subtask hierarchies. Therefore, hierarchical IL (HIL) has been developed to learn a hierarchical policy from expert demonstrations through explicitly modeling the activity structure in a task with the option framework. Existing HIL methods either overlook the causal relationship between the subtask structure and the learned policy, or fail to learn the high-level and low-level policy in the hierarchical framework in conjuncture, which leads to suboptimality. In this work, we propose a novel HIL algorithm-hierarchical adversarial inverse reinforcement learning (H-AIRL), which extends a state-of-the-art (SOTA) IL algorithm-AIRL, with the one-step option framework. Specifically, we redefine the AIRL objectives on the extended state and action spaces, and further introduce a directed information term to the objective function to enhance the causality between the low-level policy and its corresponding subtask. Moreover, we propose an expectation-maximization (EM) adaption of our algorithm so that it can be applied to expert demonstrations without the subtask annotations which are more accessible in practice. Theoretical justifications of our algorithm design and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm compared with SOTA HIL baselines. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.

引用

页码：17549 / 17558

页数：10

共 53 条

[1] Cyber-security and reinforcement learning - A brief survey
Adawadkar, Amrin Maria Khan
Kulkarni, Nilima
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
[2] DeepPool: Distributed Model-Free Algorithm for Ride-Sharing Using Deep Reinforcement Learning
Al-Abbasi, Abubakr O.
Ghosh, Arnob
Aggarwal, Vaneet
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (12) : 4714 - 4727
[3] A survey of robot learning from demonstration
Argall, Brenna D.
Chernova, Sonia
Veloso, Manuela
Browning, Brett
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) : 469 - 483
[4] Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[5] Large-Scale Machine Learning with Stochastic Gradient Descent
Bottou, Leon
[J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
[6] Brockman G, 2016, Arxiv, DOI arXiv:1606.01540
[7] Chen J., 2021, P INT C AUT PLAN SCH, V31, P510
[8] Chen JY, 2023, Arxiv, DOI [arXiv:2305.12633, DOI 10.5555/3618408.3618602]
[9] Chen Jiayu, 2022, ADV NEUR IN
[10] Chung J, 2015, ADV NEUR IN, V28

← 1 2 3 4 5 6 →