Policy Similarity Measure for Two-Player Zero-Sum Games

被引：0

作者：

Tang, Hongsong ^{[1
]}

Xiang, Liuyu ^{[2
]}

He, Zhaofeng ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Sci, Beijing 100876, Peoples R China

[2] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期

关键词：

game theory; reinforcement learning; multi-agent systems; policy diversity; GO; LEVEL;

D O I：

10.3390/app15052815

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Policy space response oracles (PSRO) is an important algorithmic framework for approximating Nash equilibria in two-player zero-sum games. Enhancing policy diversity has been shown to improve the performance of PSRO in this approximation process significantly. However, existing diversity metrics are often prone to redundancy, which can hinder optimal strategy convergence. In this paper, we introduce the policy similarity measure (PSM), a novel approach that combines Gaussian and cosine similarity measures to assess policy similarity. We further incorporate the PSM into the PSRO framework as a regularization term, effectively fostering a more diverse policy population. We demonstrate the effectiveness of our method in two distinct game environments: a non-transitive mixture model and Leduc poker. The experimental results show that the PSM-augmented PSRO outperforms baseline methods in reducing exploitability by approximately 7% and exhibits greater policy diversity in visual analysis. Ablation studies further validate the benefits of combining Gaussian and cosine similarities in cultivating more diverse policy sets. This work provides a valuable method for measuring and improving the policy diversity in two-player zero-sum games.

引用

页数：16

共 40 条

[1]

Albrecht S.V., 2024, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches

[2]

Balduzzi D, 2018, PR MACH LEARN RES, V80

[3]

Balduzzi David, 2019, PR MACH LEARN RES, V97

[4]

Bansal T, 2018, Arxiv, DOI arXiv:1710.03748

[5]

Berner C., 2019, Dota 2 with Large Scale Deep Reinforcement Learning

[6]

Brantley K., 2019, P INT C LEARN REPR

[7] Flows and Decompositions of Games: Harmonic and Potential Games [J].

Candogan, Ozan ;

Menache, Ishai ;

Ozdaglar, Asuman ;

Parrilo, Pablo A. .

MATHEMATICS OF OPERATIONS RESEARCH, 2011, 36 (03) :474-503

[8]

Czarnecki Wojciech Marian, 2020, Advances in Neural Information Processing Systems, V33

[9]

Daskalakis C, 2011, PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P235

[10]

Eysenbach B, 2018, Arxiv, DOI arXiv:1802.06070

← 1 2 3 4 →