Multi-Agent Advisor Q-Learning

被引：0

作者：

Subramanian S.G. ^{[1
,2
]}

Taylor M.E. ^{[3
,4
]}

Larson K. ^{[1
]}

Crowley M. ^{[1
]}

机构：

[1] University of Waterloo, 200 University Ave W, Waterloo, N2L 3G1, ON

[2] Vector Institute, 661 University Ave Suite 710, Toronto, M5G 1M1, ON

[3] University of Alberta, 116 Street and 85 Avenue, Edmonton, T6G 2R3, AB

[4] Alberta Machine Intelligence Institute (Amii), 10065 Jasper Ave, Edmonton, T5J 3B1, AB

来源：

Journal of Artificial Intelligence Research | 2022年 / 74卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Decision making - Fertilizers - Game theory - Heuristic methods - Intelligent agents - Learning algorithms - Multi agent systems - Stochastic systems;

D O I：

10.1613/jair.1.13445

中图分类号：

学科分类号：

摘要：

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors. ©2022 AI Access Foundation. All rights reserved.

引用

页码：1 / 74

页数：73

共 50 条

[1] Multi-Agent Advisor Q-Learning
Subramanian, Sriram Ganapathi
Taylor, Matthew E.
Larson, Kate
Crowley, Mark
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6884 - 6889
[2] Multi-Agent Advisor Q-Learning
Subramanian, Sriram Ganapathi
Taylor, Matthew E.
Larson, Kate
Crowley, Mark
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 74 : 1 - 74
[3] Q-learning in Multi-Agent Cooperation
Hwang, Kao-Shing
Chen, Yu-Jen
Lin, Tzung-Feng
2008 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS, 2008, : 239 - 244
[4] Continuous Q-Learning for Multi-Agent Cooperation
Hwang, Kao-Shing
Jiang, Wei-Cheng
Lin, Yu-Hong
Lai, Li-Hsin
CYBERNETICS AND SYSTEMS, 2012, 43 (03) : 227 - 256
[5] Untangling Braids with Multi-Agent Q-Learning
Khan, Abdullah
Vernitski, Alexei
Lisitsa, Alexei
2021 23RD INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2021), 2021, : 135 - 139
[6] Q-learning with FCMAC in multi-agent cooperation
Hwang, Kao-Shing
Chen, Yu-Jen
Lin, Tzung-Feng
ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 599 - 606
[7] A novel multi-agent Q-learning algorithm in cooperative multi-agent system
Ou, HT
Zhang, WD
Zhang, WY
Xu, XM
PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 272 - 276
[8] Pricing in agent economies using multi-agent Q-learning
Tesauro, G
Kephart, JO
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2002, 5 (03) : 289 - 304
[9] Pricing in Agent Economies Using Multi-Agent Q-Learning
Gerald Tesauro
Jeffrey O. Kephart
Autonomous Agents and Multi-Agent Systems, 2002, 5 : 289 - 304
[10] Multi-Agent Reinforcement Learning - An Exploration Using Q-Learning
Graham, Caoimhin
Bell, David
Luo, Zhihui
RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVI: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XVII, 2010, : 293 - 298

← 1 2 3 4 5 →