Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares

被引:4
作者
Song, Ruizhuo [1 ]
Yang, Gaofu [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, 30 Xueyuan Rd, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Neural network (NN); Recursive least squares (RLS); Reinforcement learning (RL); Nonzero-sum game (NZS); SYSTEMS; DESIGN;
D O I
10.1007/s00500-023-08934-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel online critical neural network weight adjustment algorithm is proposed in this paper by combining policy iteration and recursive least squares (RLSs) to address the problem of optimal control of players in nonlinear systems with nonzero-sum games. The interaction between players and the nonlinearity of the system make it difficult to solve the Hamiltonian function directly. From a linear regression perspective, this paper regards any admissible control and its corresponding value function as the input and output affected by perturbations. By using RLS to process the current data and store the historical data's covariance matrix to adjust the weights, the calculation is greatly simplified by avoiding the space waste caused by a lot of historical data storage and the time waste caused by data collection. When calculating the cumulative error, a discount factor is introduced to avoid the total error value tending to infinity and the effect of historical policy evaluation, where the covariance matrix tends to zero and loses its adjustment effect. When the error of the Hamiltonian equation is zero, the proposed adjustment law will also tend to zero. After that, the stability of the covariance matrix was analysed as well as the convergence of the weight errors was proved. Finally, two simulations are conducted to verify the effectiveness of the proposed algorithm based on the RLS method for solving the Nash equilibrium solution online.
引用
收藏
页码:16659 / 16673
页数:15
相关论文
共 49 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]   Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer [J].
Abualigah, Laith ;
Abd Elaziz, Mohamed ;
Sumari, Putra ;
Geem, Zong Woo ;
Gandomi, Amir H. .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
[3]   A New Design Method for Stable IIR Filters With Nearly Linear-Phase Response Based on Fractional Derivative and Swarm Intelligence [J].
Agrawal, Nikhil ;
Kumar, Anil ;
Bajaj, Varun .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2017, 1 (06) :464-477
[4]   Dwarf Mongoose Optimization Algorithm [J].
Agushaka, Jeffrey O. ;
Ezugwu, Absalom E. ;
Abualigah, Laith .
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2022, 391
[5]   Binary Ebola Optimization Search Algorithm for Feature Selection and Classification Problems [J].
Akinola, Olatunji ;
Oyelade, Olaide N. ;
Ezugwu, Absalom E. .
APPLIED SCIENCES-BASEL, 2022, 12 (22)
[6]  
Bertsekas D., 1996, Neuro-Dynamic Programming
[7]   Multiagent Reinforcement Learning: Rollout and Policy Iteration [J].
Bertsekas, Dimitri .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2021, 8 (02) :249-272
[8]  
Bertsekas DP., 1996, NEURO DYNAMIC PROGRA
[9]   A model-free robust policy iteration algorithm for optimal control of nonlinear systems [J].
Bhasin, S. ;
Johnson, M. ;
Dixon, W. E. .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3060-3065
[10]   Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach [J].
Bian, Tao ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) :2781-2790