Triangle Inequality for Inverse Optimal Control

被引:0
作者
Mitsuhashi, Sho [1 ]
Ishii, Shin [1 ,2 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Dept Syst Sci, Kyoto 6068501, Japan
[2] Adv Telecommun Res Inst Int ATR, Seika 6190288, Japan
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
Cost estimation; imitation learning; inverse optimal control; inverse reinforcement learning; CONTINUOUS-TIME; MODEL;
D O I
10.1109/ACCESS.2023.3327426
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inverse optimal control (IOC) is a problem of estimating a cost function based on the behaviors of an expert that behaves optimally with respect to the cost function. Although the Hamilton-Jacobi-Bellman (HJB) equation for the value function that evaluates the temporal integral of the cost function provides a necessary condition for the optimality of expert behaviors, the use of the HJB equation alone is insufficient for solving the IOC problem. In this study, we propose a triangle inequality which is useful for estimating the better representation of the value function, along with a new IOC method incorporating the triangle inequality. Through several IOC problems and imitation learning problems of time-dependent control behaviors, we show that our IOC method performs substantially better than an existing IOC method. Showing our IOC method is also applicable to an imitation of expert control of a 2-link manipulator, we demonstrate applicability of our method to real-world problems.
引用
收藏
页码:119187 / 119199
页数:13
相关论文
共 27 条
[1]  
Abbeel P., 2004, Apprenticeship learning via inverse reinforcement learning, P1
[2]   A nonlinear continuous time optimal control model of dynamic pricing and inventory control with no backorders [J].
Adida, Elodie ;
Perakis, Georgia .
NAVAL RESEARCH LOGISTICS, 2007, 54 (07) :767-795
[3]  
Bojarski M, 2016, Arxiv, DOI [arXiv:1604.07316, DOI 10.48550/ARXIV.1604.07316]
[4]  
Boularias A., 2011, P 14 INT C ARTIFICIA, P182
[5]  
Choi K.-E., 2011, Proc. Adv. Neural Inf. Process. Syst., V24, P1
[6]  
Dvijotham Krishnamurthy, 2010, P 27 INT C MACH LEAR, DOI DOI 10.0RG/PAPERS/571.PDF
[7]   Predictive active steering control for autonomous vehicle systems [J].
Falcone, Paolo ;
Borrelli, Francesco ;
Asgari, Jahan ;
Tseng, Hongtei Eric ;
Hrovat, Davor .
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2007, 15 (03) :566-580
[8]  
Fu J., 2018, PROC INT C LEARN REP
[9]   Imitation Learning: A Survey of Learning Methods [J].
Hussein, Ahmed ;
Gaber, Mohamed Medhat ;
Elyan, Eyad ;
Jayne, Chrisina .
ACM COMPUTING SURVEYS, 2017, 50 (02)
[10]  
Kamalapurkar R, 2018, P AMER CONTR CONF, P1683