Error Bounds of Imitating Policies and Environments for Reinforcement Learning

被引：22

作者：

Xu, Tian ^{[1
]}

Li, Ziniu ^{[2
]}

Yu, Yang ^{[1
,3
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

[2] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China

[3] Pazhou Lab, Guangzhou 510330, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 10期

基金：

国家重点研发计划;

关键词：

Planning; Reinforcement learning; Cloning; Complexity theory; Supervised learning; Decision making; Upper bound; Imitation learning; behavioral cloning; generative adversarial imitation; model-based reinforcement learning; NEURAL-NETWORKS; GO;

D O I：

10.1109/TPAMI.2021.3096966

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In sequential decision-making, imitation learning (IL) trains a policy efficiently by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understandings need further studies, among which the compounding error in long-horizon decisions is a major issue. In this paper, we first analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning (BC) and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding error compared to BC. Furthermore, we establish the lower bounds of IL under two settings, suggesting the significance of environment interactions in IL. By considering the environment transition model as a dual agent, IL can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than BC. Particularly, we obtain a policy evaluation error that is linear with the effective planning horizon w.r.t. the model bias, suggesting a novel application of adversarial imitation for model-based reinforcement learning (MBRL). We hope these results could inspire future advances in IL and MBRL.

引用

页码：6968 / 6980

页数：13

共 50 条

[1] Deep Reinforcement Learning for Autonomous Driving: A Survey
Kiran, B. Ravi
Sobh, Ibrahim
Talpaert, Victor
Mannion, Patrick
Al Sallab, Ahmad A.
Yogamani, Senthil
Perez, Patrick
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) : 4909 - 4926
[2] Reinforcement learning for imitating constrained reaching movements
LASA Laboratory, Ecole Polytechnique Federale de Lausanne, 1015 Lausanne, Switzerland
Adv Rob, 2007, 13 (1521-1544): : 1521 - 1544
[3] Reinforcement learning for imitating constrained reaching movements
Guenter, Florent
Hersch, Micha
Calinon, Sylvain
Billard, Aude
ADVANCED ROBOTICS, 2007, 21 (13) : 1521 - 1544
[4] Cooperative Deep Reinforcement Learning Policies for Autonomous Navigation in Complex Environments
Tran, Van Manh
Kim, Gon-Woo
IEEE ACCESS, 2024, 12 : 101053 - 101065
[5] A unified framework to control estimation error in reinforcement learning
Zhang, Yujia
Li, Lin
Wei, Wei
Lv, Yunpeng
Liang, Jiye
NEURAL NETWORKS, 2024, 178
[6] More Human-Like Gameplay by Blending Policies From Supervised and Reinforcement Learning
Ogawa, Tatsuyoshi
Hsueh, Chu-Hsuan
Ikeda, Kokolo
IEEE TRANSACTIONS ON GAMES, 2024, 16 (04) : 831 - 843
[7] Enforcing Hard State-Dependent Action Bounds on Deep Reinforcement Learning Policies
De Cooman, Bram
Suykens, Johan
Ortseifen, Andreas
MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2022, PT II, 2023, 13811 : 193 - 218
[8] Learning Curriculum Policies for Reinforcement Learning
Narvekar, Sanmit
Stone, Peter
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 25 - 33
[9] Lower Bounds on the Generalization Error of Nonlinear Learning Models
Seroussi, Inbar
Zeitouni, Ofer
IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (12) : 7956 - 7970
[10] Reinforcement Learning for Motion Policies in Mobile Relaying Networks
Evmorfos, Spilios
Diamantaras, Konstantinos, I
Petropulu, Athina P.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 850 - 861

← 1 2 3 4 5 →