RLCFR: Minimize counterfactual regret by deep reinforcement learning

被引：5

作者：

Li, Huale ^{[1
]}

Wang, Xuan ^{[1
,2
]}

Jia, Fengwei ^{[1
]}

Wu, Yulin ^{[1
]}

Zhang, Jiajia ^{[1
]}

Qi, Shuhan ^{[1
,2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2022年 / 187卷

基金：

中国国家自然科学基金;

关键词：

Counterfactual regret minimization; Decision-making; Imperfect information; Reinforcement learning; GO; SHOGI; CHESS; POKER; GAME;

D O I：

10.1016/j.eswa.2021.115953

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Counterfactual regret minimization (CFR) is a popular method to deal with decision-making problems of two-player zero-sum games with imperfect information. Unlike previous studies that mostly explored solving large-scale problems or accelerating the solution efficiency, we propose a framework, RLCFR, which aims at improving the generalization ability of the CFR method. In RLCFR, the game strategy is solved by CFRbased methods in a reinforcement learning (RL) framework. The dynamic procedure of the iterative interactive strategy updating is modeled as a Markov decision process (MDP). Our method then learns a policy to select the appropriate method of regret updating in the iteration process. In addition, a stepwise reward function is formulated to learn the action policy, which is proportional to how well the iteration strategy performs at each step. Extensive experimental results on various games showed that the generalization ability of our method is significantly improved compared with existing state-of-the-art methods.

引用

页数：15

共 44 条

[1] [Anonymous], 2016, AAAI C ART INT
[2] [Anonymous], 2017, WORKSH 31 AAAI C ART
[3] [Anonymous], 2018, COMPLEXITY ANA UNPUB
[4] Heads-up limit hold'em poker is solved
Bowling, Michael
Burch, Neil
Johanson, Michael
Tammelin, Oskari
[J]. SCIENCE, 2015, 347 (6218) : 145 - 149
[5] Brown N, 2019, AAAI CONF ARTIF INTE, P1829
[6] Brown N, 2019, PR MACH LEARN RES, V97
[7] Superhuman AI for multiplayer poker
Brown, Noam
Sandholm, Tuomas
[J]. SCIENCE, 2019, 365 (6456) : 885 - +
[8] Brown N, 2015, ADV NEUR IN, V28
[9] Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
Brown, Noam
Sandholm, Tuomas
[J]. SCIENCE, 2018, 359 (6374) : 418 - +
[10] Brown Tom B., 2020, ADV NEURAL INFORM PR

← 1 2 3 4 5 →