An actor-critic algorithm for multi-agent learning in queue-based stochastic games

被引：9

作者：

Sundar, D. Krishna ^{[1
]}

Ravikumar, K. ^{[1
]}

机构：

[1] Indian Inst Management Bangalore, Bangalore 560076, Karnataka, India

来源：

NEUROCOMPUTING | 2014年 / 127卷

关键词：

Service markets; Queues; Dynamic pricing; Stochastic games; Learning in games; Reinforcement learning;

D O I：

10.1016/j.neucom.2013.07.020

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider state-dependent pricing in a two-player service market stochastic game where state of the game and its transition dynamics are modeled using a semi-Markovian queue. We propose a multi-time scale actor-critic based reinforcement algorithm for multi-agent learning under self-play and provide experimental results on Nash convergence. (C) 2013 Elsevier B.V. All rights reserved.

引用

页码：258 / 265

页数：8

共 23 条

[1] A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics [J].

Abdallah, Sherief ;

Lesser, Victor .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 33 :521-549

[2]

Akchurina N., 2009, Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS '09, (Richland, SC), V2, P725

[3]

[Anonymous], 1994, P 11 INT C INT C MAC

[4]

[Anonymous], 2000, UAI

[5] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[6]

Borkar V., 2002, ADV COMPLEX SYST, V5, P55

[7]

Bowling M., 2001, P INT C MACH LEARN, P27

[8] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[9] Uncoupled dynamics do not lead to Nash equilibrium [J].

Hart, S ;

Mas-Colell, A .

AMERICAN ECONOMIC REVIEW, 2003, 93 (05) :1830-1836

[10]

Junling Hu, 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P242

← 1 2 3 →