Value functions for depth-limited solving in zero-sum imperfect-information games

被引：3

作者：

Kovarik, Vojtech ^{[1
]}

Seitz, Dominik ^{[1
]}

Lisy, Viliam ^{[1
]}

Rudolf, Jan ^{[1
]}

Sun, Shuo ^{[1
]}

Ha, Karel ^{[1
]}

机构：

[1] Czech Tech Univ, Artificial Intelligence Ctr, FEE, Prague, Czech Republic

来源：

ARTIFICIAL INTELLIGENCE | 2023年 / 314卷

关键词：

Imperfect information game; Multiagent reinforcement learning; Extensive form game; Partially observable stochastic game; Depth limited game; Depth limited solving; Value function; Counterfactual regret minimization;

D O I：

10.1016/j.artint.2022.103805

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We provide a formal definition of depth-limited games together with an accessible and rigorous explanation of the underlying concepts, both of which were previously miss-ing in imperfect-information games. The definition works for an arbitrary (perfect recall) extensive-form game and is not tied to any specific game-solving algorithm. Moreover, this framework unifies and significantly extends three approaches to depth-limited solving that previously existed in extensive-form games and multiagent reinforcement learning but were not known to be compatible. A key ingredient of these depth-limited games is value functions. Focusing on two-player zero-sum imperfect-information games, we show how to obtain optimal value functions and prove that public information provides both necessary and sufficient context for computing them. We provide a domain-independent encoding of the domains that allows for approximating value functions even by simple feed-forward neural networks, which are then able to generalize to unseen parts of the game. We use the resulting value network to implement a depth-limited version of counterfactual re-gret minimization. In three distinct domains, we show that the algorithm's exploitability is roughly linearly dependent on the value network's quality and that it is not difficult to train a value network with which depth-limited CFR's performance is as good as that of CFR with access to the full game.(c) 2022 Published by Elsevier B.V.

引用

页数：51

共 5 条

[1] Limited lookahead in imperfect-information games
Kroer, Christian
Sandholm, Tuomas
ARTIFICIAL INTELLIGENCE, 2020, 283
[2] Scalable sub-game solving for imperfect-information games
Li, Huale
Wang, Xuan
Li, Kunchi
Jia, Fengwei
Wu, Yulin
Zhang, Jiajia
Qi, Shuhan
KNOWLEDGE-BASED SYSTEMS, 2021, 231
[3] Automatically designing counterfactual regret minimization algorithms for solving imperfect-information games
Li, Kai
Xu, Hang
Fu, Haobo
Fu, Qiang
Xing, Junliang
ARTIFICIAL INTELLIGENCE, 2024, 337
[4] VALUE IN MIXED STRATEGIES FOR ZERO-SUM STOCHASTIC DIFFERENTIAL GAMES WITHOUT ISAACS CONDITION
Buckdahn, Rainer
Li, Juan
Quincampoix, Marc
ANNALS OF PROBABILITY, 2014, 42 (04) : 1724 - 1768
[5] Kdb-D2CFR: Solving Multiplayer imperfect-information games with knowledge distillation-based DeepCFR
Li, Huale
Guo, Zengyue
Liu, Yang
Wang, Xuan
Qi, Shuhan
Zhang, Jiajia
Xiao, Jing
KNOWLEDGE-BASED SYSTEMS, 2023, 272

← 1 →