The Design of ϵ-Optimal Strategy for Two-Person Zero-Sum Markov Games

被引:0
|
作者
Xie, Kaiyun [1 ]
Xiong, Junlin [1 ]
机构
[1] Univ Sci & Technol China, Dept Automat, Hefei 230022, Peoples R China
来源
IEEE CONTROL SYSTEMS LETTERS | 2024年 / 8卷
基金
中国国家自然科学基金;
关键词
Games; Costs; Nash equilibrium; Convergence; Heuristic algorithms; Standards; Q-learning; Estimation; Approximation algorithms; Vectors; Gauss-Seidel iteration; Markov game; receding horizon; imprecision; & varepsilon; -optimal strategy;
D O I
10.1109/LCSYS.2024.3474057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter focuses on designing approximate Nash strategies for the two-person zero-sum Markov game. Using the receding horizon method, the & varepsilon;-optimal strategies are designed to approximate Nash strategies by executing finite Gauss-Seidel iterations. The relationship between the approximation value of & varepsilon; and the number of iterations is also analyzed. Additionally, the & varepsilon;-optimal strategies are designed for two scenarios with imprecise parameters. For scenarios with imprecise values, the value of & varepsilon; is determined based on the errors between imprecise and iteration values. It provides a theoretical basis for efficiently designing & varepsilon;-optimal strategies using heuristic algorithms or approximate dynamic programming. For scenarios with imprecise transition probabilities, the value of & varepsilon; is determined based on the errors between the estimated and practical transition probabilities. It enables the use of pattern recognition technology or other methods to estimate practical transition probabilities for designing & varepsilon;-optimal strategies.
引用
收藏
页码:2349 / 2354
页数:6
相关论文
共 50 条