An actor-critic algorithm for multi-agent learning in queue-based stochastic games

被引:9
作者
Sundar, D. Krishna [1 ]
Ravikumar, K. [1 ]
机构
[1] Indian Inst Management Bangalore, Bangalore 560076, Karnataka, India
关键词
Service markets; Queues; Dynamic pricing; Stochastic games; Learning in games; Reinforcement learning;
D O I
10.1016/j.neucom.2013.07.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider state-dependent pricing in a two-player service market stochastic game where state of the game and its transition dynamics are modeled using a semi-Markovian queue. We propose a multi-time scale actor-critic based reinforcement algorithm for multi-agent learning under self-play and provide experimental results on Nash convergence. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:258 / 265
页数:8
相关论文
共 23 条
  • [1] A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics
    Abdallah, Sherief
    Lesser, Victor
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 33 : 521 - 549
  • [2] Akchurina N., 2009, Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS '09, (Richland, SC), V2, P725
  • [3] [Anonymous], 1994, P 11 INT C INT C MAC
  • [4] [Anonymous], 2000, UAI
  • [5] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS
    BARTO, AG
    SUTTON, RS
    ANDERSON, CW
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05): : 834 - 846
  • [6] Borkar V., 2002, ADV COMPLEX SYST, V5, P55
  • [7] Bowling M., 2001, P INT C MACH LEARN, P27
  • [8] A comprehensive survey of multiagent reinforcement learning
    Busoniu, Lucian
    Babuska, Robert
    De Schutter, Bart
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
  • [9] Uncoupled dynamics do not lead to Nash equilibrium
    Hart, S
    Mas-Colell, A
    [J]. AMERICAN ECONOMIC REVIEW, 2003, 93 (05) : 1830 - 1836
  • [10] Junling Hu, 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P242