Dichotomy value iteration with parallel learning design towards discrete-time zero-sum games

被引:3
作者
Wang, Jiangyu [1 ,2 ,3 ,4 ]
Wang, Ding [1 ]
Li, Xin [1 ,2 ,3 ,4 ]
Qiao, Junfei [1 ,2 ,3 ,4 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Key Lab Computat Intelligence & Intelligent Syst, Beijing 100124, Peoples R China
[3] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China
[4] Beijing Univ Technol, Beijing Lab Smart Environm Protect, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive critic; Artificial neural networks; Nonlinear systems; Parallel learning; Value iteration; Zero -sum games; ADAPTIVE CRITIC DESIGNS; STABILITY ANALYSIS;
D O I
10.1016/j.neunet.2023.09.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel parallel learning framework is developed to solve zero-sum games for discrete -time nonlinear systems. Briefly, the purpose of this study is to determine a tentative function according to the prior knowledge of the value iteration (VI) algorithm. The learning process of the parallel controllers can be guided by the tentative function. That is to say, the neighborhood of the optimal cost function can be compressed within a small range via two typical exploration policies. Based on the parallel learning framework, a novel dichotomy VI algorithm is established to accelerate the learning speed. It is shown that the parallel controllers will converge to the optimal policy from contrary initial policies. Finally, two typical systems are used to demonstrate the learning performance of the constructed dichotomy VI algorithm.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:751 / 762
页数:12
相关论文
共 50 条
  • [41] Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games
    Zhu, Yuanheng
    Zhao, Dongbin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (03) : 1228 - 1241
  • [42] Model-free tracking design for nonlinear zero-sum games with an improved utility function
    Wang, Ding
    Tang, Guohan
    Ren, Jin
    Zhao, Mingming
    Qiao, Junfei
    NONLINEAR DYNAMICS, 2025, : 16679 - 16694
  • [43] Discrete-Time Self-Learning Parallel Control
    Wei, Qinglai
    Wang, Lingxiao
    Lu, Jingwei
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (01): : 192 - 204
  • [44] Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games
    Cheung, Yun Kuen
    Piliouras, Georgios
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [45] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    WEI QingLai
    LIU DeRong
    ScienceChina(InformationSciences), 2015, 58 (12) : 147 - 161
  • [46] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    Wei QingLai
    Liu DeRong
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 15
  • [47] Neuro-Optimal Trajectory Tracking With Value Iteration of Discrete-Time Nonlinear Dynamics
    Wang, Ding
    Ha, Mingming
    Cheng, Long
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4237 - 4248
  • [48] Optimal State Tracking Control for Linear Discrete-time Systems Via Value Iteration
    Liu, Yingying
    Shi, Zhan
    Wang, Zhanshan
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 836 - 841
  • [49] Adaptive critic design for nonlinear multi-player zero-sum games with unknown dynamics and control constraints
    Huo, Yu
    Wang, Ding
    Qiao, Junfei
    Li, Menghua
    NONLINEAR DYNAMICS, 2023, 111 (12) : 11671 - 11683
  • [50] Adaptive critic design for nonlinear multi-player zero-sum games with unknown dynamics and control constraints
    Yu Huo
    Ding Wang
    Junfei Qiao
    Menghua Li
    Nonlinear Dynamics, 2023, 111 : 11671 - 11683