Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward

被引:0
作者
Yusukc N. [1 ]
Sachiyo A. [2 ]
机构
[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University
关键词
Bayesian inference; Inverse reinforcement learning; Markov decision processes; Reinforcement learning;
D O I
10.1527/tjsai.G-J73
中图分类号
学科分类号
摘要
Though a reinforcement learning framework has numerous achievements, it requires a careful shaping of a re-ward function that represents the objective of a task. There is a class of task in which an expert could demonstrate the optimal way of doing, but it is difficult to design a proper reward function. For these tasks, an inverse reinforcement learning approach seems useful because it makes it possible to estimates a reward function from expert's demonstrations. Most existing inverse reinforcement learning algorithms assume that an expert gives demonstrations in a unique environment. However, an expert also could provide demonstrations of tasks within other environments of which have a specific objective function. For example, though it is hard to represent objective explicitly for a driving task, the driver could give demonstrations under multiple situations. In such cases, it is natural to utilize these demonstrations in multiple environments to estimate expert's reward functions. We formulate this problem as Bayesian Inverse Rein-forcement Learning problem and propose a Markov Chain Monte Carlo method for the problem. Experimental results show that the proposed method quantitatively overperforms existing methods. © 2020, Japanese Society for Artificial Intelligence. All rights reserved.
引用
收藏
相关论文
共 50 条
[41]   Heterogeneous formation control of multiple rotorcrafts with unknown dynamics by reinforcement learning [J].
Liu, Hao ;
Peng, Fachun ;
Modares, Hamidreza ;
Kiumarsi, Bahare .
INFORMATION SCIENCES, 2021, 558 :194-207
[42]   MA-TREX: Mutli-agent Trajectory-Ranked Reward Extrapolation via Inverse Reinforcement Learning [J].
Huang, Sili ;
Yang, Bo ;
Chen, Hechang ;
Piao, Haiyin ;
Sun, Zhixiao ;
Chang, Yi .
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT II, 2020, 12275 :3-14
[43]   Imitation of piping warm-up operation and estimation of operational intention by inverse reinforcement learning [J].
Nakagawa, Yosuke ;
Ono, Hitoi ;
Hazui, Yusuke ;
Arai, Sachiyo .
JOURNAL OF PROCESS CONTROL, 2023, 122 :41-48
[44]   A state-based inverse reinforcement learning approach to model activity-travel choices behavior with reward function recovery [J].
Song, Yuchen ;
Li, Dawei ;
Ma, Zhenliang ;
Liu, Dongjie ;
Zhang, Tong .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2024, 158
[45]   Heads for learning, tails for memory: reward, reinforcement and a role of dopamine in determining behavioral relevance across multiple timescales [J].
Baudonnat, Mathieu ;
Huber, Anna ;
David, Vincent ;
Walton, Mark E. .
FRONTIERS IN NEUROSCIENCE, 2013, 7
[46]   Personalized origin-destination travel time estimation with active adversarial inverse reinforcement learning and Transformer [J].
Liu, Shan ;
Zhang, Ya ;
Wang, Zhengli ;
Liu, Xiang ;
Yang, Hai .
TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW, 2025, 193
[47]   Toward Proactive-Aware Autonomous Driving: A Reinforcement Learning Approach Utilizing Expert Priors During Unprotected Turns [J].
Fan, Jialin ;
Ni, Ying ;
Zhao, Donghu ;
Hang, Peng ;
Sun, Jian .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) :3700-3712
[48]   Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems [J].
Amarildo Likmeta ;
Alberto Maria Metelli ;
Giorgia Ramponi ;
Andrea Tirinzoni ;
Matteo Giuliani ;
Marcello Restelli .
Machine Learning, 2021, 110 :2541-2576
[49]   Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems [J].
Likmeta, Amarildo ;
Metelli, Alberto Maria ;
Ramponi, Giorgia ;
Tirinzoni, Andrea ;
Giuliani, Matteo ;
Restelli, Marcello .
MACHINE LEARNING, 2021, 110 (09) :2541-2576
[50]   Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning [J].
Haruno, Masahiko ;
Kawato, Mitsuo .
NEURAL NETWORKS, 2006, 19 (08) :1242-1254