A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information

被引：10

作者：

Raghavan, TES ^{[1
]}

Syed, Z ^{[1
]}

机构：

[1] Univ Illinois, Dept Math Stat & Comp Sci, Chicago, IL 60680 USA

来源：

MATHEMATICAL PROGRAMMING | 2003年 / 95卷 / 03期

关键词：

stochastic games; MDP; perfect information; policy iteration;

D O I：

10.1007/s10107-002-0312-3

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We give a policy-improvement type algorithm to locate an optimal pure stationary strategy for discounted stochastic games with perfect information. A graph theoretic motivation for our algorithm is presented as well.

引用

页码：513 / 532

页数：20

共 30 条

[21] POLICY IMPROVEMENT FOR PERFECT INFORMATION ADDITIVE REWARD AND ADDITIVE TRANSITION STOCHASTIC GAMES WITH DISCOUNTED AND AVERAGE PAYOFFS [J].

Bourque, Matthew ;

Raghavan, T. E. S. .

JOURNAL OF DYNAMICS AND GAMES, 2014, 1 (03) :347-361

[22] CONTINUITY PROPERTIES OF VALUE FUNCTIONS IN INFORMATION STRUCTURES FOR ZERO-SUM AND GENERAL GAMES AND STOCHASTIC TEAMS* [J].

Hogeboom-Burr, Ian ;

Yuksel, Serdar .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (02) :398-414

[23] LP formulation of stochastic Bayesian two-player zero-sum games with long horizon [J].

Orpa, Nabiha Nasir ;

Li, Lichun .

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2023,

[24] Existence of the Limit Value of Two Person Zero-Sum Discounted Repeated Games via Comparison Theorems [J].

Sylvain Sorin ;

Guillaume Vigeral .

Journal of Optimization Theory and Applications, 2013, 157 :564-576

[25] Existence of the Limit Value of Two Person Zero-Sum Discounted Repeated Games via Comparison Theorems [J].

Sorin, Sylvain ;

Vigeral, Guillaume .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2013, 157 (02) :564-576

[26] Perfect aggregation of information in two-person multistage games with fixed sequence of moves and aggregated information on partner’s choice [J].

V. S. Aliev .

Automation and Remote Control, 2010, 71 :1240-1246

[27] Perfect aggregation of information in two-person multistage games with fixed sequence of moves and aggregated information on partner's choice [J].

Aliev, V. S. .

AUTOMATION AND REMOTE CONTROL, 2010, 71 (06) :1240-1246

[28] Relaxed Policy Iteration Algorithm for Nonlinear Zero-Sum Games With Application to H-Infinity Control [J].

Li, Jie ;

Li, Shengbo Eben ;

Duan, Jingliang ;

Lyu, Yao ;

Zou, Wenjun ;

Guan, Yang ;

Yin, Yuming .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (01) :426-433

[29] Online Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration [J].

Vamvoudakis, Kyriakos G. ;

Lewis, F. L. .

49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3040-3047

[30] Decentralized Learning in Two-Player Zero-Sum Games: A LR-I Lagging Anchor Algorithm [J].

Lu, Xiaosong ;

Schwartz, Howard M. .

2011 AMERICAN CONTROL CONFERENCE, 2011, :107-112

← 1 2 3 →