An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games

被引：0

作者：

Meng, Linjian ^{[1
]}

Ge, Zhenxing ^{[1
]}

Tian, Pinzhuo ^{[2
]}

An, Bo ^{[3
]}

Gao, Yang ^{[1
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China

[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 5 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the most popular methods for learning Nash equilibrium (NE) in large-scale imperfect information extensive-form games (IIEFGs) is the neural variants of counterfactual regret minimization (CFR). CFR is a special case of Follow-The-Regularized-Leader (FTRL). At each iteration, the neural variants of CFR update the agent's strategy via the estimated counterfactual regrets. Then, they use neural networks to approximate the new strategy, which incurs an approximation error. These approximation errors will accumulate since the counterfactual regrets at iteration t are estimated using the agent's past approximated strategies. Such accumulated approximation error causes poor performance. To address this accumulated approximation error, we propose a novel FTRL algorithm called FTRL-ORW, which does not utilize the agent's past strategies to pick the next iteration strategy. More importantly, FTRL-ORW can update its strategy via the trajectories sampled from the game, which is suitable to solve large-scale IIEFGs since sampling multiple actions for each information set is too expensive in such games. However, it remains unclear which algorithm to use to compute the next iteration strategy for FTRL-ORW when only such sampled trajectories are revealed at iteration t. To address this problem and scale FTRL-ORW to large-scale games, we provide a model-free method called Deep FTRL-ORW, which computes the next iteration strategy using model-free Maximum Entropy Deep Reinforcement Learning. Experimental results on two-player zero-sum IIEFGs show that Deep FTRL-ORW significantly outperforms existing model-free neural methods and OS-MCCFR.

引用

页码：5823 / 5831

页数：9

共 36 条

[1]

Bai Y, 2023, Arxiv, DOI arXiv:2202.01752

[2]

Brown N, 2019, PR MACH LEARN RES, V97

[3] Superhuman AI for multiplayer poker [J].

Brown, Noam ;

Sandholm, Tuomas .

SCIENCE, 2019, 365 (6456) :885-+

[4] Superhuman AI for heads-up no-limit poker: Libratus beats top professionals [J].

Brown, Noam ;

Sandholm, Tuomas .

SCIENCE, 2018, 359 (6374) :418-+

[5] Wireless Resource Scheduling in Virtualized Radio Access Networks Using Stochastic Learning [J].

Chen, Xianfu ;

Han, Zhu ;

Zhang, Honggang ;

Xue, Guoliang ;

Xiao, Yong ;

Bennis, Mehdi .

IEEE TRANSACTIONS ON MOBILE COMPUTING, 2018, 17 (04) :961-974

[6]

Duchi J, 2011, J MACH LEARN RES, V12, P2121

[7]

Farina Gabriele, 2021, EC '21: Proceedings of the 22nd ACM Conference on Economics and Computation, P432, DOI 10.1145/3465456.3467576

[8]

Farina G, 2020, PR MACH LEARN RES, V119

[9]

Farina G, 2021, AAAI CONF ARTIF INTE, V35, P5381

[10]

Farina Gabriele., 2019, Advances in Neural Information Processing Systems, P5221

← 1 2 3 4 →