Decentralized Learning for Optimality in Stochastic Dynamic Teams and Games With Local Control and Global State Information

被引:17
作者
Yongacoglu, Bora [1 ]
Arslan, Gurdal [2 ]
Yuksel, Serdar [1 ]
机构
[1] Queens Univ, Dept Math & Stat, Kingston, ON K7L 3N6, Canada
[2] Univ Hawaii, Dept Elect Engn, Honolulu, HI 96822 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Games; Stochastic processes; Costs; Convergence; Heuristic algorithms; Reinforcement learning; Q-factor; Cooperative control; game theory; machine learning; stochastic games; stochastic optimal control; APPROXIMATION;
D O I
10.1109/TAC.2021.3121228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic dynamic teams and games are rich models for decentralized systems and challenging testing grounds for multiagent learning. Previous work that guaranteed team optimality assumed stateless dynamics, or an explicit coordination mechanism, or joint-control sharing. In this article, we present an algorithm with guarantees of convergence to team optimal policies in teams and common interest games. The algorithm is a two-timescale method that uses a variant of Q-learning on the finer timescale to perform policy evaluation while exploring the policy space on the coarser timescale. Agents following this algorithm are "independent learners": they use only local controls, local cost realizations, and global state information, without access to controls of other agents. The results presented here are the first, to the best of our knowledge, to give formal guarantees of convergence to team optimality using independent learners in stochastic dynamic teams and common interest games.
引用
收藏
页码:5230 / 5245
页数:16
相关论文
共 57 条
[51]  
Wei E, 2016, J MACH LEARN RES, V17
[52]  
Yongacoglu B, 2019, IEEE DECIS CONTR P, P5556, DOI [10.1109/CDC40024.2019.9030158, 10.1109/cdc40024.2019.9030158]
[53]  
Young H. Peyton, 1998, Individual Strategy and Social Structure: An Evolutionary Theory of Institutions
[54]   THE EVOLUTION OF CONVENTIONS [J].
YOUNG, HP .
ECONOMETRICA, 1993, 61 (01) :57-84
[55]  
Yuksel<spacing S., 2013, STOCHASTIC NETWORKED
[56]  
Zhang K, 2021, HDB REINFORCEMENT LE, P321
[57]  
Zhang KQ, 2018, PR MACH LEARN RES, V80