Energy-Based Continuous Inverse Optimal Control

被引:2
作者
Xu, Yifei [1 ]
Xie, Jianwen [2 ]
Zhao, Tianyang [1 ]
Baker, Chris [3 ]
Zhao, Yibiao [3 ]
Wu, Ying Nian [1 ]
机构
[1] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
[2] Baidu Res, Cognit Comp Lab, Bellevue, WA 98004 USA
[3] iSee Inc, Cambridge, MA 02139 USA
关键词
Trajectory; Cost function; Optimal control; Heuristic algorithms; Generators; Autonomous vehicles; Maximum likelihood estimation; Cooperative learning; energy-based models (EBMs); inverse optimal control (IOC); Langevin dynamics; 3D SHAPE SYNTHESIS; MODELS; NETWORKS; FRAME;
D O I
10.1109/TNNLS.2022.3168795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model (EBM), where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an ``analysis by synthesis'' scheme, which iterates: 1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via backpropagation through time and 2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose to train the EBM simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of the EBM. We demonstrate the proposed methods on autonomous driving tasks and show that they can learn suitable cost functions for optimal control.
引用
收藏
页码:10563 / 10577
页数:15
相关论文
共 57 条
[1]   Social LSTM: Human Trajectory Prediction in Crowded Spaces [J].
Alahi, Alexandre ;
Goel, Kratarth ;
Ramanathan, Vignesh ;
Robicquet, Alexandre ;
Li Fei-Fei ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :961-971
[2]  
[Anonymous], HDB BRAIN THEORY NEU
[3]  
[Anonymous], 2015, MATH STAT BASIC IDEA
[4]   The explicit linear quadratic regulator for constrained systems [J].
Bemporad, A ;
Morari, M ;
Dua, V ;
Pistikopoulos, EN .
AUTOMATICA, 2002, 38 (01) :3-20
[5]  
Bhattacharyya RP, 2019, IEEE INT CONF ROBOT, P789, DOI [10.1109/icra.2019.8793750, 10.1109/ICRA.2019.8793750]
[6]  
Bhattacharyya RP, 2018, IEEE INT C INT ROBOT, P1534, DOI 10.1109/IROS.2018.8593758
[7]  
Chen TQ, 2014, PR MACH LEARN RES, V32, P1683
[8]  
Colyar J., 2007, Tech. FHWA-HRT-07-030
[9]  
Cover T. M., 1999, Wiley Series in Telecommunications and Signal Processing
[10]   How Would Surround Vehicles Move? A Unified Framework for Maneuver Classification and Motion Prediction [J].
Deo, Nachiket ;
Rangesh, Akshay ;
Trivedi, Mohan M. .
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2018, 3 (02) :129-140