Model-based inverse reinforcement learning for deterministic systems

被引:29
作者
Self, Ryan [1 ]
Abudia, Moad [1 ]
Mahmud, S. M. Nahid [1 ]
Kamalapurkar, Rushikesh [1 ]
机构
[1] Oklahoma State Univ, Sch Mech & Aerosp Engn, Stillwater, OK 74078 USA
基金
美国国家科学基金会;
关键词
Inverse reinforcement learning; Inverse optimal control; System identification; State estimation; ADAPTIVE-CONTROL; CONTINUOUS-TIME;
D O I
10.1016/j.automatica.2022.110242
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on the development of an online data-driven model-based inverse reinforcement learning (MBIRL) technique for linear and nonlinear deterministic systems. Input and output trajectories of an agent under observation, attempting to optimize an unknown reward function, are used to estimate the reward function and the corresponding unknown optimal value function, online and in real-time. To achieve MBIRL using limited data, a novel feedback-driven approach to MBIRL is developed. The feedback policy and the dynamic model of the agent under observation are estimated from the measured data and the estimates are used to generate synthetic data to drive MBIRL. Theoretical guarantees for ultimate boundedness of the estimation errors in general, and convergence of the estimation errors to zero in special cases, are derived using Lyapunov techniques. Proof of concept numerical experiments demonstrates the utility of the developed method to solve linear and nonlinear inverse reinforcement learning problems.(C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
[1]  
Abbeel P., 2004, Apprenticeship learning via inverse reinforcement learning. pages, P1, DOI DOI 10.1145/1015330.1015430
[2]  
Abbeel P., 2005, P INT C MACH LEARN, DOI DOI 10.1145/1102351
[3]   Finite-time parameter estimation in adaptive control of nonlinear systems [J].
Adetola, Veronica ;
Guay, Martin .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2008, 53 (03) :807-811
[4]  
[Anonymous], 2010, P 23 INT C NEUR INF
[5]  
[Anonymous], 2016, ARXIV161207796
[6]  
[Anonymous], 2007, Advances in Neural Information Processing Systems
[7]  
Arora S., 2018, ARXIV180507871
[8]  
Arora S, 2019, AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, P1170
[9]   Concurrent learning adaptive control of linear systems with exponentially convergent bounds [J].
Chowdhary, Girish ;
Yucelen, Tansel ;
Muehlegg, Maximillian ;
Johnson, Eric N. .
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2013, 27 (04) :280-301
[10]  
Chowdhary G, 2011, P AMER CONTR CONF, P3547