Decentralized Learning for Optimality in Stochastic Dynamic Teams and Games With Local Control and Global State Information

被引:17
作者
Yongacoglu, Bora [1 ]
Arslan, Gurdal [2 ]
Yuksel, Serdar [1 ]
机构
[1] Queens Univ, Dept Math & Stat, Kingston, ON K7L 3N6, Canada
[2] Univ Hawaii, Dept Elect Engn, Honolulu, HI 96822 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Games; Stochastic processes; Costs; Convergence; Heuristic algorithms; Reinforcement learning; Q-factor; Cooperative control; game theory; machine learning; stochastic games; stochastic optimal control; APPROXIMATION;
D O I
10.1109/TAC.2021.3121228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic dynamic teams and games are rich models for decentralized systems and challenging testing grounds for multiagent learning. Previous work that guaranteed team optimality assumed stateless dynamics, or an explicit coordination mechanism, or joint-control sharing. In this article, we present an algorithm with guarantees of convergence to team optimal policies in teams and common interest games. The algorithm is a two-timescale method that uses a variant of Q-learning on the finer timescale to perform policy evaluation while exploring the policy space on the coarser timescale. Agents following this algorithm are "independent learners": they use only local controls, local cost realizations, and global state information, without access to controls of other agents. The results presented here are the first, to the best of our knowledge, to give formal guarantees of convergence to team optimality using independent learners in stochastic dynamic teams and common interest games.
引用
收藏
页码:5230 / 5245
页数:16
相关论文
共 57 条
[1]   Non zero-sum stochastic games in admission, service and routing control in queueing systems [J].
Altman, E .
QUEUEING SYSTEMS, 1996, 23 (1-4) :259-279
[2]  
[Anonymous], 2002, Advances in neural information processing systems
[3]   Decentralized Q-Learning for Stochastic Teams and Games [J].
Arslan, Gurdal ;
Yuksel, Serdar .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (04) :1545-1558
[4]  
Aumann RJ, 1989, Games and Economic Behavior, P5, DOI [10.1016/0899-8256(89)90003-1, DOI 10.1016/0899-8256(89)90003-1]
[5]   Stochastic approximation with two time scales [J].
Borkar, VS .
SYSTEMS & CONTROL LETTERS, 1997, 29 (05) :291-294
[6]   Multiagent learning using a variable learning rate [J].
Bowling, M ;
Veloso, M .
ARTIFICIAL INTELLIGENCE, 2002, 136 (02) :215-250
[7]   ASPIRATION LEARNING IN COORDINATION GAMES [J].
Chasparis, Georgios C. ;
Arapostathis, Ari ;
Shamma, Jeff S. .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) :465-490
[8]  
Claus C, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P746
[9]   A stochastic games framework for verification and control of discrete time stochastic hybrid systems [J].
Ding, Jerry ;
Kamgarpour, Maryam ;
Summers, Sean ;
Abate, Alessandro ;
Lygeros, John ;
Tomlin, Claire .
AUTOMATICA, 2013, 49 (09) :2665-2674
[10]  
Dobrushin Roland L., 1956, II. Theory of Probability Its Applications, V1, P329, DOI DOI 10.1137/1101029