Adversarial multi-armed bandit approach to two-person zero-sum Markov games

被引：0

作者：

Chang, Hyeong Soo ^{[1
]}

Fu, Michael C. ^{[2
]}

Marcus, Steven I. ^{[3
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea

[2] Univ Maryland, Sch Business & Inst Syst Res, College Pk, MD 20742 USA

[3] Univ Maryland, Dept Elect & comp Sci, College Pk, MD 20742 USA

来源：

PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14 | 2007年

基金：

美国国家科学基金会;

关键词：

Markov game; Markov decision process; sample average approximation; sampling;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A sampling-based algorithm for solving stochastic optimization problems, based on Auer et al.'s Exp3 algorithm for "adversarial multi-armed bandit problems," has been recently presented by the authors. In particular, the authors recursively extended the Exp3-based algorithm for solving finite-horizon Markov decision processes (MDPs) and analyzed its finite-iteration performance in terms of the expected bias relative to the maximum value of the "recursive sample-average-approximation (SAA)" problem induced by the sampling process in the algorithm, showing that the upper bound of the expected bias approaches zero as the sampling size per state sampled in each stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit. As a sequel to the previous work, the idea is further extended for solving two-person zero-sum Markov games (MGs), providing a finite-iteration bound to the equilibrium value of the induced "recursive SAA game" problem and asymptotic convergence to the true equilibrium value. The recursively extended algorithm for MGs can be used for breaking the curse of dimensionality.

引用

页码：238 / 243

页数：6

共 11 条

[1] Adaptive Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games
Chang, Hyeong Soo
Hu, Jiaqiao
Fu, Michael C.
Marcus, Steven I.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2010, 55 (02) : 463 - 468
[2] Two-person zero-sum Markov games: Receding horizon approach
Chang, HS
Marcus, SI
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (11) : 1951 - 1961
[3] The Design of ϵ-Optimal Strategy for Two-Person Zero-Sum Markov Games
Xie, Kaiyun
Xiong, Junlin
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2349 - 2354
[4] Perfect information two-person zero-sum markov games with imprecise transition probabilities
Chang, Hyeong Soo
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2006, 64 (02) : 335 - 351
[5] Perfect information two-person zero-sum markov games with imprecise transition probabilities
Hyeong Soo Chang
Mathematical Methods of Operations Research, 2006, 64 : 335 - 351
[6] Converging coevolutionary algorithm for two-person zero-sum discounted Markov games with perfect information
Chang, Hyeong Soo
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2008, 53 (02) : 596 - 601
[7] MINIMAX THEOREM ON A TWO-PERSON ZERO-SUM DYNAMIC GAME
Lai, Hang-Chin
Yu, Chao-Ya
JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2012, 13 (04) : 709 - 720
[8] MINIMAX THEOREM FOR FRACTIONAL EXPECTATION ON A TWO-PERSON ZERO-SUM DYNAMIC GAME
Lai, Hang-Chin
Yu, Chao-Ya
JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2013, 14 (01) : 89 - 101
[9] Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
Li Y.
Zhou J.
Feng Y.
Feng Y.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (01): : 81 - 91
[10] Scheduling users in drive-thru Internet: a multi-armed bandit approach
Thi Thuy Nga Nguyen
Ayesta, Urtzi
Prabhu, Balakrishna
17TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS NETWORKS (WIOPT 2019), 2019, : 338 - 345

← 1 2 →