Model-based inverse reinforcement learning for deterministic systems

被引：29

作者：

Self, Ryan ^{[1
]}

Abudia, Moad ^{[1
]}

Mahmud, S. M. Nahid ^{[1
]}

Kamalapurkar, Rushikesh ^{[1
]}

机构：

[1] Oklahoma State Univ, Sch Mech & Aerosp Engn, Stillwater, OK 74078 USA

来源：

AUTOMATICA | 2022年 / 140卷

基金：

美国国家科学基金会;

关键词：

Inverse reinforcement learning; Inverse optimal control; System identification; State estimation; ADAPTIVE-CONTROL; CONTINUOUS-TIME;

D O I：

10.1016/j.automatica.2022.110242

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper focuses on the development of an online data-driven model-based inverse reinforcement learning (MBIRL) technique for linear and nonlinear deterministic systems. Input and output trajectories of an agent under observation, attempting to optimize an unknown reward function, are used to estimate the reward function and the corresponding unknown optimal value function, online and in real-time. To achieve MBIRL using limited data, a novel feedback-driven approach to MBIRL is developed. The feedback policy and the dynamic model of the agent under observation are estimated from the measured data and the estimates are used to generate synthetic data to drive MBIRL. Theoretical guarantees for ultimate boundedness of the estimation errors in general, and convergence of the estimation errors to zero in special cases, are derived using Lyapunov techniques. Proof of concept numerical experiments demonstrates the utility of the developed method to solve linear and nonlinear inverse reinforcement learning problems.(C) 2022 Elsevier Ltd. All rights reserved.

引用

页数：13

共 50 条

[11]

GLAD ST, 1990, PROCEEDINGS OF THE 29TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, P3236, DOI 10.1109/CDC.1990.203389

[12]

Grizzle J. W., 2002, NONLINEAR SYSTEMS, V3

[13]

Ioannou P, 1996, ROBUST ADAPTIVE CONT

[14] Convergence analysis of an incremental approach to online inverse reinforcement learning [J].

Jin, Zhuo-jun ;

Qian, Hui ;

Chen, Shen-yi ;

Zhu, Miao-liang .

JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2011, 12 (01) :17-24

[15]

Kalman R. E., 1964, J BASIC ENG-T ASME, V86, P51, DOI [DOI 10.1115/1.3653115, 10.1115/1.3653115]

[16]

Kamalapurkar Rushikesh, 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC), P2164, DOI 10.1109/CDC.2017.8263965

[17]

Kamalapurkar R, 2018, P AMER CONTR CONF, P1683

[18]

Kamalapurkar R, 2017, P AMER CONTR CONF, P5672, DOI 10.23919/ACC.2017.7963838

[19] Model-Based Reinforcement Learning for Infinite-Horizon Approximate Optimal Tracking [J].

Kamalapurkar, Rushikesh ;

Andrews, Lindsey ;

Walters, Patrick ;

Dixon, Warren E. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :753-758

[20] Approximate optimal trajectory tracking for continuous-time nonlinear systems [J].

Kamalapurkar, Rushikesh ;

Dinh, Huyen ;

Bhasin, Shubhendu ;

Dixon, Warren E. .

AUTOMATICA, 2015, 51 :40-48

← 1 2 3 4 5 →