Dichotomy value iteration with parallel learning design towards discrete-time zero-sum games

被引:3
作者
Wang, Jiangyu [1 ,2 ,3 ,4 ]
Wang, Ding [1 ]
Li, Xin [1 ,2 ,3 ,4 ]
Qiao, Junfei [1 ,2 ,3 ,4 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Key Lab Computat Intelligence & Intelligent Syst, Beijing 100124, Peoples R China
[3] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China
[4] Beijing Univ Technol, Beijing Lab Smart Environm Protect, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive critic; Artificial neural networks; Nonlinear systems; Parallel learning; Value iteration; Zero -sum games; ADAPTIVE CRITIC DESIGNS; STABILITY ANALYSIS;
D O I
10.1016/j.neunet.2023.09.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel parallel learning framework is developed to solve zero-sum games for discrete -time nonlinear systems. Briefly, the purpose of this study is to determine a tentative function according to the prior knowledge of the value iteration (VI) algorithm. The learning process of the parallel controllers can be guided by the tentative function. That is to say, the neighborhood of the optimal cost function can be compressed within a small range via two typical exploration policies. Based on the parallel learning framework, a novel dichotomy VI algorithm is established to accelerate the learning speed. It is shown that the parallel controllers will converge to the optimal policy from contrary initial policies. Finally, two typical systems are used to demonstrate the learning performance of the constructed dichotomy VI algorithm.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:751 / 762
页数:12
相关论文
共 50 条
  • [31] Event-triggered optimal control for discrete-time multi-player non-zero-sum games using parallel control
    Lu, Jingwei
    Wei, Qinglai
    Wang, Ziyang
    Zhou, Tianmin
    Wang, Fei-Yue
    INFORMATION SCIENCES, 2022, 584 : 519 - 535
  • [32] Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis
    Wei, Qinglai
    Lewis, Frank L.
    Liu, Derong
    Song, Ruizhuo
    Lin, Hanquan
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (06): : 875 - 891
  • [33] Primal-Dual Reinforcement Learning for Zero-Sum Games in the Optimal Tracking Control
    Que, Xuejie
    Wang, Zhenlei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (06) : 3146 - 3150
  • [34] Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning
    Wen, Yinlei
    Zhang, Huaguang
    Su, Hanguang
    Ren, He
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2020, 41 (04) : 1233 - 1250
  • [35] Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems
    Wei, Qinglai
    Liu, Derong
    Lin, Hanquan
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (03) : 840 - 853
  • [36] Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms
    Zhang, Huaguang
    Jiang, He
    Luo, Chaomin
    Xiao, Geyang
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (10) : 3331 - 3340
  • [37] Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Admissibility and Termination Analysis
    Wei, Qinglai
    Liu, Derong
    Lin, Qiao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (11) : 2490 - 2502
  • [38] A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
    Hu, Junping
    Yang, Gen
    Hou, Zhicheng
    Zhang, Gong
    Yang, Wenlin
    Wang, Weijun
    IEEE ACCESS, 2021, 9 : 14933 - 14944
  • [39] Robust value iteration for optimal control of discrete-time linear systems
    Lai, Jing
    Xiong, Junlin
    AUTOMATICA, 2025, 174
  • [40] Off-policy synchronous iteration IRL method for multi-player zero-sum games with input constraints
    Ren, He
    Zhang, Huaguang
    Mu, Yunfei
    Duan, Jie
    NEUROCOMPUTING, 2020, 378 : 413 - 421