A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information

被引:10
作者
Raghavan, TES [1 ]
Syed, Z [1 ]
机构
[1] Univ Illinois, Dept Math Stat & Comp Sci, Chicago, IL 60680 USA
关键词
stochastic games; MDP; perfect information; policy iteration;
D O I
10.1007/s10107-002-0312-3
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We give a policy-improvement type algorithm to locate an optimal pure stationary strategy for discounted stochastic games with perfect information. A graph theoretic motivation for our algorithm is presented as well.
引用
收藏
页码:513 / 532
页数:20
相关论文
共 30 条
[21]   POLICY IMPROVEMENT FOR PERFECT INFORMATION ADDITIVE REWARD AND ADDITIVE TRANSITION STOCHASTIC GAMES WITH DISCOUNTED AND AVERAGE PAYOFFS [J].
Bourque, Matthew ;
Raghavan, T. E. S. .
JOURNAL OF DYNAMICS AND GAMES, 2014, 1 (03) :347-361
[22]   CONTINUITY PROPERTIES OF VALUE FUNCTIONS IN INFORMATION STRUCTURES FOR ZERO-SUM AND GENERAL GAMES AND STOCHASTIC TEAMS* [J].
Hogeboom-Burr, Ian ;
Yuksel, Serdar .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (02) :398-414
[23]   LP formulation of stochastic Bayesian two-player zero-sum games with long horizon [J].
Orpa, Nabiha Nasir ;
Li, Lichun .
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2023,
[24]   Existence of the Limit Value of Two Person Zero-Sum Discounted Repeated Games via Comparison Theorems [J].
Sylvain Sorin ;
Guillaume Vigeral .
Journal of Optimization Theory and Applications, 2013, 157 :564-576
[25]   Existence of the Limit Value of Two Person Zero-Sum Discounted Repeated Games via Comparison Theorems [J].
Sorin, Sylvain ;
Vigeral, Guillaume .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2013, 157 (02) :564-576
[26]   Perfect aggregation of information in two-person multistage games with fixed sequence of moves and aggregated information on partner’s choice [J].
V. S. Aliev .
Automation and Remote Control, 2010, 71 :1240-1246
[27]   Perfect aggregation of information in two-person multistage games with fixed sequence of moves and aggregated information on partner's choice [J].
Aliev, V. S. .
AUTOMATION AND REMOTE CONTROL, 2010, 71 (06) :1240-1246
[28]   Relaxed Policy Iteration Algorithm for Nonlinear Zero-Sum Games With Application to H-Infinity Control [J].
Li, Jie ;
Li, Shengbo Eben ;
Duan, Jingliang ;
Lyu, Yao ;
Zou, Wenjun ;
Guan, Yang ;
Yin, Yuming .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (01) :426-433
[29]   Online Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration [J].
Vamvoudakis, Kyriakos G. ;
Lewis, F. L. .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3040-3047
[30]   Decentralized Learning in Two-Player Zero-Sum Games: A LR-I Lagging Anchor Algorithm [J].
Lu, Xiaosong ;
Schwartz, Howard M. .
2011 AMERICAN CONTROL CONFERENCE, 2011, :107-112