Action Mapping: A Reinforcement Learning Method for Constrained-Input Systems

被引:6
作者
Yuan, Xin [1 ,2 ]
Wang, Yuanda [1 ,2 ]
Liu, Jian [1 ,2 ]
Sun, Changyin [1 ,2 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Measurement & Control Complex Syst Engn, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Optimal control; Cost function; Reinforcement learning; Convergence; Aerospace electronics; TV; System dynamics; Constrained-input systems; neural network; optimal control; reinforcement learning (RL); DISCRETE-TIME-SYSTEMS; NONLINEAR-SYSTEMS; EXPERIENCE REPLAY; TRACKING CONTROL;
D O I
10.1109/TNNLS.2021.3138924
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing approaches to constrained-input optimal control problems mainly focus on systems with input saturation, whereas other constraints, such as combined inequality constraints and state-dependent constraints, are seldom discussed. In this article, a reinforcement learning (RL)-based algorithm is developed for constrained-input optimal control of discrete-time (DT) systems. The deterministic policy gradient (DPG) is introduced to iteratively search the optimal solution to the Hamilton-Jacobi-Bellman (HJB) equation. To deal with input constraints, an action mapping (AM) mechanism is proposed. The objective of this mechanism is to transform the exploration space from the subspace generated by the given inequality constraints to the standard Cartesian product space, which can be searched effectively by existing algorithms. By using the proposed architecture, the learned policy can output control signals satisfying the given constraints, and the original reward function can be kept unchanged. In our study, the convergence analysis is given. It is shown that the iterative algorithm is convergent to the optimal solution of the HJB equation. In addition, the continuity of the iterative estimated Q-function is investigated. Two numerical examples are provided to demonstrate the effectiveness of our approach.
引用
收藏
页码:7145 / 7157
页数:13
相关论文
共 43 条
  • [1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    Abu-Khalaf, M
    Lewis, FL
    [J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
  • [2] Aubin J-P., 1990, SET-VALUED ANAL, DOI 10.1007/978-0-8176-4848-0
  • [3] Bertsekas D., 2017, DYNAMIC PROGRAMMING, VI
  • [4] Boyd S., 2004, Convex Optimization, DOI 10.1017/CBO9780511804441
  • [5] Inner-Outer Loop Control for Quadrotor UAVs With Input and State Constraints
    Cao, Ning
    Lynch, Alan F.
    [J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2016, 24 (05) : 1797 - 1804
  • [6] Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update
    Dierks, Travis
    Jagannathan, Sarangapani
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) : 1118 - 1129
  • [7] Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming
    Dong, Lu
    Yan, Jun
    Yuan, Xin
    He, Haibo
    Sun, Changyin
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (12) : 4206 - 4218
  • [8] Adaptive Neural Network Control of an Uncertain Robot With Full-State Constraints
    He, Wei
    Chen, Yuhao
    Yin, Zhao
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (03) : 620 - 629
  • [9] Hewitt E., 2013, REAL ABSTRACT ANAL M
  • [10] Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics
    Heydari, Ali
    Balakrishnan, Sivasubramanya N.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (01) : 145 - 157