Adjustable Iterative Q-Learning Schemes for Model-Free Optimal Tracking Control

被引:9
作者
Qiao, Junfei [1 ,2 ]
Zhao, Mingming [1 ,2 ]
Wang, Ding [1 ,2 ]
Ha, Mingming [3 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing Lab Smart Environm Protect, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China
[3] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2024年 / 54卷 / 02期
基金
中国国家自然科学基金;
关键词
Adaptive critic control; adaptive dynamic programming (ADP); convergence speed; optimal tracking; Q-learning; STABILITY ANALYSIS;
D O I
10.1109/TSMC.2023.3324215
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article puts emphasis on the deterministic value-iteration-based Q-learning (VIQL) algorithm with adjustable convergence speed, followed by the application verification on trajectory tracking for completely unknown nonaffine systems. It is worth emphasizing that, under the effect of learning rates, the convergence speed can be adjusted and the new convergence criterion of the VIQL framework is investigated. The merit of the adjustable VIQL scheme is that it can quicken the learning speed and decrease the number of iterations, thereby reducing the computation burden. To carry out the model-free VIQL algorithm, the offline data of system states and reference trajectories are collected to provide the reference control, the tracking error, and the tracking control, which promotes the parameter updating of the adjustable VIQL algorithm via the off-policy learning scheme. By this updating operation, the convergent optimal tracking policy can guarantee that arbitrary initial state tracks the desired trajectory and can completely obviate the terminal tracking error. Finally, numerical simulations are conducted to indicate the validity of the designed tracking control algorithm.
引用
收藏
页码:1202 / 1213
页数:12
相关论文
共 42 条
  • [1] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming
    Bertsekas, Dimitri P.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 500 - 509
  • [2] Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control
    Ha, Mingming
    Wang, Ding
    Liu, Derong
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (07) : 1262 - 1272
  • [3] A Novel Value Iteration Scheme With Adjustable Convergence Rate
    Ha, Mingming
    Wang, Ding
    Liu, Derong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7430 - 7442
  • [4] Offline and Online Adaptive Critic Control Designs With Stability Guarantee Through Value Iteration
    Ha, Mingming
    Wang, Ding
    Liu, Derong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13262 - 13274
  • [5] Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy
    Heydari, Ali
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) : 4522 - 4527
  • [6] Optimal and Autonomous Control Using Reinforcement Learning: A Survey
    Kiumarsi, Bahare
    Vamvoudakis, Kyriakos G.
    Modares, Hamidreza
    Lewis, Frank L.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2042 - 2062
  • [7] Actor-Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems
    Kiumarsi, Bahare
    Lewis, Frank L.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (01) : 140 - 151
  • [8] Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics
    Kiumarsi, Bahare
    Lewis, Frank L.
    Modares, Hamidreza
    Karimpour, Ali
    Naghibi-Sistani, Mohammad-Bagher
    [J]. AUTOMATICA, 2014, 50 (04) : 1167 - 1175
  • [9] Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes
    Lee, JM
    Lee, JH
    [J]. AUTOMATICA, 2005, 41 (07) : 1281 - 1288
  • [10] Reinforcement Learning and Feedback Control USING NATURAL DECISION METHODS TO DESIGN OPTIMAL ADAPTIVE CONTROLLERS
    Lewis, Frank L.
    Vrabie, Draguna
    Vamvoudakis, Kyriakos G.
    [J]. IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (06): : 76 - 105