Value iteration with deep neural networks for optimal control of input-affine nonlinear systems

被引:1
作者
Beppu H. [1 ,2 ]
Maruta I. [1 ]
Fujimoto K. [1 ]
机构
[1] Department of Aeronautics and Astronautics, Graduate School of Engineering, Kyoto University, Kyoto
[2] Japan Society for the Promotion of Science, Tokyo
关键词
convergence analysis; deep neural networks; input-affine nonlinear systems; optimal control; Value iteration;
D O I
10.1080/18824889.2021.1936817
中图分类号
学科分类号
摘要
This paper proposes a new algorithm with deep neural networks to solve optimal control problems for continuous-time input nonlinear systems based on a value iteration algorithm. The proposed algorithm applies the networks to approximating the value functions and control inputs in the iterations. Consequently, the partial differential equations of the original algorithm reduce to the optimization problems for the parameters of the networks. Although the conventional algorithm can obtain the optimal control with iterative computations, each of the computations needs to be completed precisely, and it is hard to achieve sufficient precision in practice. Instead, the proposed method provides a practical method using deep neural networks and overcomes the difficulty based on a property of the networks, under which our convergence analysis shows that the proposed algorithm can achieve the minimum of the value function and the corresponding optimal controller. The effectiveness of the proposed method even with reasonable computational resources is demonstrated in two numerical simulations. © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
引用
收藏
页码:140 / 149
页数:9
相关论文
共 27 条
  • [1] Bellman R., On the theory of dynamic programming, Proc Natl Acad Sci USA, 38, 8, (1952)
  • [2] Bellman R., The theory of dynamic programming, Bull Amer Math Soc, 60, 6, pp. 503-515, (1954)
  • [3] Bellman R., Dynamic programming, Science, 153, 3731, pp. 34-37, (1966)
  • [4] Sutton R.S., Barto A.G., Reinforcement learning: an introduction, (1998)
  • [5] Powell W.B., Approximate dynamic programming: solving the curses of dimensionality, 703, (2007)
  • [6] Lewis F.L., Liu D., Reinforcement learning and approximate dynamic programming for feedback control, 17, (2013)
  • [7] Novoa C., Storer R., An approximate dynamic programming approach for the vehicle routing problem with stochastic demands, Eur J Oper Res, 196, 2, pp. 509-515, (2009)
  • [8] Al-Tamimi A., Lewis F.L., Abu-Khalaf M., Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof, IEEE Trans Syst Man Cyber Part B (Cyber), 38, 4, pp. 943-949, (2008)
  • [9] Wei Q., Liu D., Lin H., Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems, IEEE Trans Cybern, 46, 3, pp. 840-853, (2016)
  • [10] Wu H.N., Luo B., Heuristic dynamic programming algorithm for optimal control design of linear continuous-time hyperbolic pde systems, Ind Eng Chem Res, 51, 27, pp. 9310-9319, (2012)