Error Bounds of Imitating Policies and Environments for Reinforcement Learning

被引:22
|
作者
Xu, Tian [1 ]
Li, Ziniu [2 ]
Yu, Yang [1 ,3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China
[3] Pazhou Lab, Guangzhou 510330, Peoples R China
基金
国家重点研发计划;
关键词
Planning; Reinforcement learning; Cloning; Complexity theory; Supervised learning; Decision making; Upper bound; Imitation learning; behavioral cloning; generative adversarial imitation; model-based reinforcement learning; NEURAL-NETWORKS; GO;
D O I
10.1109/TPAMI.2021.3096966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In sequential decision-making, imitation learning (IL) trains a policy efficiently by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understandings need further studies, among which the compounding error in long-horizon decisions is a major issue. In this paper, we first analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning (BC) and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding error compared to BC. Furthermore, we establish the lower bounds of IL under two settings, suggesting the significance of environment interactions in IL. By considering the environment transition model as a dual agent, IL can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than BC. Particularly, we obtain a policy evaluation error that is linear with the effective planning horizon w.r.t. the model bias, suggesting a novel application of adversarial imitation for model-based reinforcement learning (MBRL). We hope these results could inspire future advances in IL and MBRL.
引用
收藏
页码:6968 / 6980
页数:13
相关论文
共 50 条
  • [1] Deep Reinforcement Learning for Autonomous Driving: A Survey
    Kiran, B. Ravi
    Sobh, Ibrahim
    Talpaert, Victor
    Mannion, Patrick
    Al Sallab, Ahmad A.
    Yogamani, Senthil
    Perez, Patrick
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) : 4909 - 4926
  • [2] Reinforcement learning for imitating constrained reaching movements
    LASA Laboratory, Ecole Polytechnique Federale de Lausanne, 1015 Lausanne, Switzerland
    Adv Rob, 2007, 13 (1521-1544): : 1521 - 1544
  • [3] Reinforcement learning for imitating constrained reaching movements
    Guenter, Florent
    Hersch, Micha
    Calinon, Sylvain
    Billard, Aude
    ADVANCED ROBOTICS, 2007, 21 (13) : 1521 - 1544
  • [4] Cooperative Deep Reinforcement Learning Policies for Autonomous Navigation in Complex Environments
    Tran, Van Manh
    Kim, Gon-Woo
    IEEE ACCESS, 2024, 12 : 101053 - 101065
  • [5] A unified framework to control estimation error in reinforcement learning
    Zhang, Yujia
    Li, Lin
    Wei, Wei
    Lv, Yunpeng
    Liang, Jiye
    NEURAL NETWORKS, 2024, 178
  • [6] More Human-Like Gameplay by Blending Policies From Supervised and Reinforcement Learning
    Ogawa, Tatsuyoshi
    Hsueh, Chu-Hsuan
    Ikeda, Kokolo
    IEEE TRANSACTIONS ON GAMES, 2024, 16 (04) : 831 - 843
  • [7] Enforcing Hard State-Dependent Action Bounds on Deep Reinforcement Learning Policies
    De Cooman, Bram
    Suykens, Johan
    Ortseifen, Andreas
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2022, PT II, 2023, 13811 : 193 - 218
  • [8] Learning Curriculum Policies for Reinforcement Learning
    Narvekar, Sanmit
    Stone, Peter
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 25 - 33
  • [9] Lower Bounds on the Generalization Error of Nonlinear Learning Models
    Seroussi, Inbar
    Zeitouni, Ofer
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (12) : 7956 - 7970
  • [10] Reinforcement Learning for Motion Policies in Mobile Relaying Networks
    Evmorfos, Spilios
    Diamantaras, Konstantinos, I
    Petropulu, Athina P.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 850 - 861