Action Mapping: A Reinforcement Learning Method for Constrained-Input Systems

被引：6

作者：

Yuan, Xin ^{[1
,2
]}

Wang, Yuanda ^{[1
,2
]}

Liu, Jian ^{[1
,2
]}

Sun, Changyin ^{[1
,2
]}

机构：

[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China

[2] Southeast Univ, Minist Educ, Key Lab Measurement & Control Complex Syst Engn, Nanjing 210096, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Optimal control; Cost function; Reinforcement learning; Convergence; Aerospace electronics; TV; System dynamics; Constrained-input systems; neural network; optimal control; reinforcement learning (RL); DISCRETE-TIME-SYSTEMS; NONLINEAR-SYSTEMS; EXPERIENCE REPLAY; TRACKING CONTROL;

D O I：

10.1109/TNNLS.2021.3138924

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing approaches to constrained-input optimal control problems mainly focus on systems with input saturation, whereas other constraints, such as combined inequality constraints and state-dependent constraints, are seldom discussed. In this article, a reinforcement learning (RL)-based algorithm is developed for constrained-input optimal control of discrete-time (DT) systems. The deterministic policy gradient (DPG) is introduced to iteratively search the optimal solution to the Hamilton-Jacobi-Bellman (HJB) equation. To deal with input constraints, an action mapping (AM) mechanism is proposed. The objective of this mechanism is to transform the exploration space from the subspace generated by the given inequality constraints to the standard Cartesian product space, which can be searched effectively by existing algorithms. By using the proposed architecture, the learned policy can output control signals satisfying the given constraints, and the original reward function can be kept unchanged. In our study, the convergence analysis is given. It is shown that the iterative algorithm is convergent to the optimal solution of the HJB equation. In addition, the continuity of the iterative estimated Q-function is investigated. Two numerical examples are provided to demonstrate the effectiveness of our approach.

引用

页码：7145 / 7157

页数：13

共 43 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
Abu-Khalaf, M
Lewis, FL
[J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
[2] Aubin J-P., 1990, SET-VALUED ANAL, DOI 10.1007/978-0-8176-4848-0
[3] Bertsekas D., 2017, DYNAMIC PROGRAMMING, VI
[4] Boyd S., 2004, Convex Optimization, DOI 10.1017/CBO9780511804441
[5] Inner-Outer Loop Control for Quadrotor UAVs With Input and State Constraints
Cao, Ning
Lynch, Alan F.
[J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2016, 24 (05) : 1797 - 1804
[6] Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update
Dierks, Travis
Jagannathan, Sarangapani
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) : 1118 - 1129
[7] Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming
Dong, Lu
Yan, Jun
Yuan, Xin
He, Haibo
Sun, Changyin
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (12) : 4206 - 4218
[8] Adaptive Neural Network Control of an Uncertain Robot With Full-State Constraints
He, Wei
Chen, Yuhao
Yin, Zhao
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (03) : 620 - 629
[9] Hewitt E., 2013, REAL ABSTRACT ANAL M
[10] Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics
Heydari, Ali
Balakrishnan, Sivasubramanya N.
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (01) : 145 - 157

← 1 2 3 4 5 →