Error Bounds of Imitating Policies and Environments for Reinforcement Learning

被引：22

作者：

Xu, Tian ^{[1
]}

Li, Ziniu ^{[2
]}

Yu, Yang ^{[1
,3
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

[2] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China

[3] Pazhou Lab, Guangzhou 510330, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 10期

基金：

国家重点研发计划;

关键词：

Planning; Reinforcement learning; Cloning; Complexity theory; Supervised learning; Decision making; Upper bound; Imitation learning; behavioral cloning; generative adversarial imitation; model-based reinforcement learning; NEURAL-NETWORKS; GO;

D O I：

10.1109/TPAMI.2021.3096966

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In sequential decision-making, imitation learning (IL) trains a policy efficiently by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understandings need further studies, among which the compounding error in long-horizon decisions is a major issue. In this paper, we first analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning (BC) and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding error compared to BC. Furthermore, we establish the lower bounds of IL under two settings, suggesting the significance of environment interactions in IL. By considering the environment transition model as a dual agent, IL can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than BC. Particularly, we obtain a policy evaluation error that is linear with the effective planning horizon w.r.t. the model bias, suggesting a novel application of adversarial imitation for model-based reinforcement learning (MBRL). We hope these results could inspire future advances in IL and MBRL.

引用

页码：6968 / 6980

页数：13

共 50 条

[41] Distributed hierarchical reinforcement learning in multi-agent adversarial environments
Naderializadeh, Navid
Soleyman, Sean
Hung, Fan
Khosla, Deepak
Chen, Yang
Fadaie, Joshua G.
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS IV, 2022, 12113
[42] Balanced Map Coverage using Reinforcement Learning in Repeated Obstacle Environments
Xia, Xue
Roppel, Thaddeus
Hung, John Y.
Zhang, Jian
Periaswamy, Senthilkumar C. G.
Patton, Justin
2020 IEEE 29TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2020, : 41 - 48
[43] Reinforcement imitation learning for reliable and efficient autonomous navigation in complex environments
Kumar D.
Neural Computing and Applications, 2024, 36 (20) : 11945 - 11961
[44] Reinforcement Learning with Symbiotic Relationships for Multiagent Environments
Mabu, Shingo
Obayashi, Masanao
Kuremoto, Takashi
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB2015), 2015, : 102 - 106
[45] The Dreaming Variational Autoencoder for Reinforcement Learning Environments
Andersen, Per-Arne
Goodwin, Morten
Granmo, Ole-Christoffer
ARTIFICIAL INTELLIGENCE XXXV (AI 2018), 2018, 11311 : 143 - 155
[46] Reinforcement Learning for Robot Navigation in Nondeterministic Environments
Liu, Xiaoyun
Zhou, Qingrui
Ren, Hailin
Sun, Changhao
PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 615 - 619
[47] Applying Reinforcement Learning in Context Limited Environments
Ferreira, Diogo
Antunes, Mario
Gomes, Diogo
Aguiar, Rui L.
PROCEEDINGS OF THE 2020 IEEE INTERNATIONAL CONFERENCE ON HUMAN-MACHINE SYSTEMS (ICHMS), 2020, : 481 - 486
[48] Reinforcement Learning with Symbiotic Relationships for Multiagent Environments
Mabu, Shingo
Obayashi, Masanao
Kuremoto, Takashi
JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2015, 2 (01): : 40 - 45
[49] Learning to Navigate in Human Environments via Deep Reinforcement Learning
Gao, Xingyuan
Sun, Shiying
Zhao, Xiaoguang
Tan, Min
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 418 - 429
[50] The Misbehavior of Reinforcement Learning
Mongillo, Gianluigi
Shteingart, Hanan
Loewenstein, Yonatan
PROCEEDINGS OF THE IEEE, 2014, 102 (04) : 528 - 541

← 1 2 3 4 5 →