Error Bounds of Imitating Policies and Environments for Reinforcement Learning

被引:22
作者
Xu, Tian [1 ]
Li, Ziniu [2 ]
Yu, Yang [1 ,3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China
[3] Pazhou Lab, Guangzhou 510330, Peoples R China
基金
国家重点研发计划;
关键词
Planning; Reinforcement learning; Cloning; Complexity theory; Supervised learning; Decision making; Upper bound; Imitation learning; behavioral cloning; generative adversarial imitation; model-based reinforcement learning; NEURAL-NETWORKS; GO;
D O I
10.1109/TPAMI.2021.3096966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In sequential decision-making, imitation learning (IL) trains a policy efficiently by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understandings need further studies, among which the compounding error in long-horizon decisions is a major issue. In this paper, we first analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning (BC) and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding error compared to BC. Furthermore, we establish the lower bounds of IL under two settings, suggesting the significance of environment interactions in IL. By considering the environment transition model as a dual agent, IL can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than BC. Particularly, we obtain a policy evaluation error that is linear with the effective planning horizon w.r.t. the model bias, suggesting a novel application of adversarial imitation for model-based reinforcement learning (MBRL). We hope these results could inspire future advances in IL and MBRL.
引用
收藏
页码:6968 / 6980
页数:13
相关论文
共 50 条
  • [41] Distributed hierarchical reinforcement learning in multi-agent adversarial environments
    Naderializadeh, Navid
    Soleyman, Sean
    Hung, Fan
    Khosla, Deepak
    Chen, Yang
    Fadaie, Joshua G.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS IV, 2022, 12113
  • [42] Balanced Map Coverage using Reinforcement Learning in Repeated Obstacle Environments
    Xia, Xue
    Roppel, Thaddeus
    Hung, John Y.
    Zhang, Jian
    Periaswamy, Senthilkumar C. G.
    Patton, Justin
    2020 IEEE 29TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2020, : 41 - 48
  • [43] Reinforcement imitation learning for reliable and efficient autonomous navigation in complex environments
    Kumar D.
    Neural Computing and Applications, 2024, 36 (20) : 11945 - 11961
  • [44] Reinforcement Learning with Symbiotic Relationships for Multiagent Environments
    Mabu, Shingo
    Obayashi, Masanao
    Kuremoto, Takashi
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB2015), 2015, : 102 - 106
  • [45] The Dreaming Variational Autoencoder for Reinforcement Learning Environments
    Andersen, Per-Arne
    Goodwin, Morten
    Granmo, Ole-Christoffer
    ARTIFICIAL INTELLIGENCE XXXV (AI 2018), 2018, 11311 : 143 - 155
  • [46] Reinforcement Learning for Robot Navigation in Nondeterministic Environments
    Liu, Xiaoyun
    Zhou, Qingrui
    Ren, Hailin
    Sun, Changhao
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 615 - 619
  • [47] Applying Reinforcement Learning in Context Limited Environments
    Ferreira, Diogo
    Antunes, Mario
    Gomes, Diogo
    Aguiar, Rui L.
    PROCEEDINGS OF THE 2020 IEEE INTERNATIONAL CONFERENCE ON HUMAN-MACHINE SYSTEMS (ICHMS), 2020, : 481 - 486
  • [48] Reinforcement Learning with Symbiotic Relationships for Multiagent Environments
    Mabu, Shingo
    Obayashi, Masanao
    Kuremoto, Takashi
    JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2015, 2 (01): : 40 - 45
  • [49] Learning to Navigate in Human Environments via Deep Reinforcement Learning
    Gao, Xingyuan
    Sun, Shiying
    Zhao, Xiaoguang
    Tan, Min
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 418 - 429
  • [50] The Misbehavior of Reinforcement Learning
    Mongillo, Gianluigi
    Shteingart, Hanan
    Loewenstein, Yonatan
    PROCEEDINGS OF THE IEEE, 2014, 102 (04) : 528 - 541