Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

被引:0
作者
Zheng, Liyuan [1 ]
Fiez, Tanner [1 ]
Alumbaugh, Zane [2 ]
Chasnov, Benjamin [1 ]
Ratliff, Lillian J. [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
[2] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
来源
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation. We adopt this viewpoint and model the actor and critic interaction as a two-player general-sum game with a leader-follower structure known as a Stackelberg game. Given this abstraction, we propose a meta-framework for Stackelberg actor-critic algorithms where the leader player follows the total derivative of its objective instead of the usual individual gradient. From a theoretical standpoint, we develop a policy gradient theorem for the refined update and provide a local convergence guarantee for the Stackelberg actor-critic algorithms to a local Stackelberg equilibrium. From an empirical standpoint, we demonstrate via simple examples that the learning dynamics we study mitigate cycling and accelerate convergence compared to the usual gradient dynamics given cost structures induced by actor-critic formulations. Finally, experiments on OpenAI gym environments show that Stackelberg actor-critic algorithms always perform at least as well and often significantly outperform the standard actor-critic algorithm counterparts.
引用
收藏
页码:9217 / 9224
页数:8
相关论文
共 32 条
[1]  
[Anonymous], 2018, INT C MACH LEARN
[2]  
[Anonymous], 2016, Openai gym
[3]  
Basar T., 1998, Dynamic Noncooperative Game Theory
[4]  
Borkar V.S., 2009, Stochastic approximation: a dynamical systems viewpoint, V48
[5]  
Chan S. C., 2019, INT C LEARN REPR
[6]  
Fiez T., 2020, PR MACH LEARN RES, P3133
[7]  
Flokas L., 2021, ARXIV210105248
[8]  
Foerster J, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P122
[9]   A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients [J].
Grondman, Ivo ;
Busoniu, Lucian ;
Lopes, Gabriel A. D. ;
Babuska, Robert .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06) :1291-1307
[10]  
Hernandez-Leal P., 2017, AUTONOMOUS AGENTS MU