PPO-ACT: Proximal policy optimization with adversarial curriculum transfer for spatial public goods games

被引:0
作者
Yang, Zhaoqilin [1 ,2 ]
Li, Chanchan [3 ]
Wang, Xin [4 ]
Tian, Youliang [2 ,5 ]
机构
[1] Guizhou Univ, Coll Comp Sci & Technol, State Key Lab Publ Big Data, Guiyang 550025, Guizhou, Peoples R China
[2] Guizhou Univ, Inst Cryptog & Data Secur, Guiyang 550025, Guizhou, Peoples R China
[3] Guizhou Univ, Coll Math & Stat, State Key Lab Publ Big Data, Guiyang 550025, Guizhou, Peoples R China
[4] Beijing Jiaotong Univ, Sch Math & Stat, Beijing 100044, Peoples R China
[5] Guizhou Univ, Coll Big Data & Informat Engn, State Key Lab Publ Big Data, Guiyang 550025, Guizhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Public goods game; Deep reinforcement learning; Proximal policy optimization; Adversarial curriculum transfer; EVOLUTIONARY GAMES; DYNAMICS;
D O I
10.1016/j.chaos.2025.116762
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This study investigates cooperation evolution mechanisms in the spatial public goods game. A novel deep reinforcement learning framework, Proximal Policy Optimization with Adversarial Curriculum Transfer (PPO-ACT), is proposed to model agent strategy optimization in dynamic environments. Traditional evolutionary game models often exhibit limitations in modeling long-term decision-making processes. Imitation-based rules (e.g., Fermi) lack strategic foresight, while tabular methods (e.g., Q-learning) fail to capture spatial-temporal correlations. Deep reinforcement learning effectively addresses these limitation by bridging policy gradient methods with evolutionary game theory. Our study pioneers the application of proximal policy optimization's continuous strategy optimization capability to public goods games through a two-stage adversarial curriculum transfer training paradigm. The experimental results show that PPO-ACT performs better in critical enhancement factor regimes. Compared to conventional standard proximal policy optimization methods, Q-learning and Fermi update rules, achieve earlier cooperation phase transitions and maintain stable cooperative equilibria. This framework exhibits better robustness when handling challenging scenarios like all-defector initial conditions. Systematic comparisons reveal the unique advantage of policy gradient methods in population-scale cooperation, i.e., achieving spatiotemporal payoff coordination through value function propagation. Our work provides a new computational framework for studying cooperation emergence in complex systems, algorithmically validating the punishment promotes cooperation hypothesis while offering methodological insights for multi-agent system strategy design.
引用
收藏
页数:12
相关论文
共 58 条
[1]   The evolutionary public goods game on scale-free networks with heterogeneous investment [J].
Cao, Xian-Bin ;
Du, Wen-Bo ;
Rong, Zhi-Hai .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2010, 389 (06) :1273-1280
[2]   Language-based game theory in the age of artificial intelligence [J].
Capraro, Valerio ;
Di Paolo, Roberto ;
Perc, Matjaz ;
Pizziol, Veronica .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2024, 21 (212)
[3]   Competition and cooperation among different punishing strategies in the spatial public goods game [J].
Chen, Xiaojie ;
Szolnoki, Attila ;
Perc, Matjaz .
PHYSICAL REVIEW E, 2015, 92 (01)
[4]   First carrot, then stick: how the adaptive hybridization of incentives promotes cooperation [J].
Chen, Xiaojie ;
Sasaki, Tatsuya ;
Braennstroem, Ake ;
Dieckmann, Ulf .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2015, 12 (102)
[5]   Probabilistic sharing solves the problem of costly punishment [J].
Chen, Xiaojie ;
Szolnoki, Attila ;
Perc, Matjaz .
NEW JOURNAL OF PHYSICS, 2014, 16
[6]   ANOMALIES - COOPERATION [J].
DAWES, RM ;
THALER, RH .
JOURNAL OF ECONOMIC PERSPECTIVES, 1988, 2 (03) :187-197
[7]   The evolution of anti-social rewarding and its countermeasures in public goods games [J].
dos Santos, Miguel .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2015, 282 (1798)
[8]   Public goods games on random hyperbolic graphs with mixing [J].
Duh, Maja ;
Gosak, Marko ;
Perc, Matjaz .
CHAOS SOLITONS & FRACTALS, 2021, 144
[9]  
Glorot X., 2011, INT C ART INT STAT, V15, P315
[10]   Cyclic public goods games: Compensated coexistence among mutual cheaters stabilized by optimized penalty taxation [J].
Griffin, Christopher ;
Belmonte, Andrew .
PHYSICAL REVIEW E, 2017, 95 (05) :052309