Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

被引:0
|
作者
Akrour, Riad [1 ]
Abdolmaleki, Abbas [2 ]
Abdulsamad, Hany [1 ]
Peters, Jan [1 ,3 ]
Neumann, Gerhard [1 ,4 ]
机构
[1] Tech Univ Darmstadt, CLAS IAS, Hsch Str 10, D-64289 Darmstadt, Germany
[2] DeepMind, London N1C 4AG, England
[3] Max Planck Inst Intelligent Syst, Max Planck Ring 4, Tubingen, Germany
[4] Univ Lincoln, L CAS, Lincoln LN6 7TS, England
基金
欧盟地平线“2020”;
关键词
Reinforcement Learning; Policy Optimization; Trajectory Optimization; Robotics;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent Q-Function learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics. We experimentally demonstrate on highly non-linear control tasks the improvement in performance of our algorithm in comparison to approaches linearizing the system dynamics. In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] Training recurrent neural networks via dynamical trajectory-based optimization
    Khodabandehlou, Hamid
    Fadali, M. Sami
    NEUROCOMPUTING, 2019, 368 : 1 - 10
  • [42] Trajectory-based combustion control for renewable fuels in free piston engines
    Zhang, Chen
    Sun, Zongxuan
    APPLIED ENERGY, 2017, 187 : 72 - 83
  • [43] Model-Free GPU Online Energy Optimization
    Wang, Farui
    Hao, Meng
    Zhang, Weizhe
    Wang, Zheng
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2024, 9 (02): : 141 - 154
  • [44] Nonlinear System Identification using Neural Networks and Trajectory-based Optimization
    Khodabandehlou, Hamid
    Fadali, M. Sami
    ICINCO: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL 1, 2019, : 579 - 586
  • [45] A Modular and Model-Free Trajectory Planning Strategy for Automated Driving
    Vosswinkel, Rick
    Mutlu, Ilhan
    Alaa, Khaled
    Schrodel, Frank
    2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 1186 - 1191
  • [46] A New Model-Free Trajectory Tracking Control for Robot Manipulators
    Wang, Yaoyao
    Zhu, Kangwu
    Chen, Bai
    Wu, Hongtao
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [47] Trajectory-Based Morphological Operators: A Model for Efficient Image Processing
    Jimeno-Morenilla, Antonio
    Pujol, Francisco A.
    Molina-Carmona, Rafael
    Sanchez-Romero, Jose L.
    Pujol, Mar
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [48] A TRAJECTORY-BASED COMPUTATIONAL MODEL FOR OPTICAL-FLOW ESTIMATION
    CHAUDHURY, K
    MEHROTRA, R
    IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1995, 11 (05): : 733 - 741
  • [49] Exploring the Accuracy of a Parallel Cooperative Model for Trajectory-Based Metaheuristics
    Luque, Gabriel
    Luna, Francisco
    Alba, Enrique
    Nesmachnow, Sergio
    COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2011, PT I, 2012, 6927 : 319 - 326
  • [50] On the Use of Social Trajectory-Based Clustering Methods for Public Transport Optimization
    Nin, Jordi
    Carrera, David
    Villatoro, Daniel
    CITIZEN IN SENSOR NETWORKS, 2014, 8313 : 59 - 70