Multi-Agent Advisor Q-Learning

被引:0
|
作者
Subramanian S.G. [1 ,2 ]
Taylor M.E. [3 ,4 ]
Larson K. [1 ]
Crowley M. [1 ]
机构
[1] University of Waterloo, 200 University Ave W, Waterloo, N2L 3G1, ON
[2] Vector Institute, 661 University Ave Suite 710, Toronto, M5G 1M1, ON
[3] University of Alberta, 116 Street and 85 Avenue, Edmonton, T6G 2R3, AB
[4] Alberta Machine Intelligence Institute (Amii), 10065 Jasper Ave, Edmonton, T5J 3B1, AB
来源
Journal of Artificial Intelligence Research | 2022年 / 74卷
基金
加拿大自然科学与工程研究理事会;
关键词
Decision making - Fertilizers - Game theory - Heuristic methods - Intelligent agents - Learning algorithms - Multi agent systems - Stochastic systems;
D O I
10.1613/jair.1.13445
中图分类号
学科分类号
摘要
In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors. ©2022 AI Access Foundation. All rights reserved.
引用
收藏
页码:1 / 74
页数:73
相关论文
共 50 条
  • [1] Multi-Agent Advisor Q-Learning
    Subramanian, Sriram Ganapathi
    Taylor, Matthew E.
    Larson, Kate
    Crowley, Mark
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6884 - 6889
  • [2] Multi-Agent Advisor Q-Learning
    Subramanian, Sriram Ganapathi
    Taylor, Matthew E.
    Larson, Kate
    Crowley, Mark
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 74 : 1 - 74
  • [3] Q-learning in Multi-Agent Cooperation
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Lin, Tzung-Feng
    2008 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS, 2008, : 239 - 244
  • [4] Continuous Q-Learning for Multi-Agent Cooperation
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Lin, Yu-Hong
    Lai, Li-Hsin
    CYBERNETICS AND SYSTEMS, 2012, 43 (03) : 227 - 256
  • [5] Untangling Braids with Multi-Agent Q-Learning
    Khan, Abdullah
    Vernitski, Alexei
    Lisitsa, Alexei
    2021 23RD INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2021), 2021, : 135 - 139
  • [6] Q-learning with FCMAC in multi-agent cooperation
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Lin, Tzung-Feng
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 599 - 606
  • [7] A novel multi-agent Q-learning algorithm in cooperative multi-agent system
    Ou, HT
    Zhang, WD
    Zhang, WY
    Xu, XM
    PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 272 - 276
  • [8] Pricing in agent economies using multi-agent Q-learning
    Tesauro, G
    Kephart, JO
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2002, 5 (03) : 289 - 304
  • [9] Pricing in Agent Economies Using Multi-Agent Q-Learning
    Gerald Tesauro
    Jeffrey O. Kephart
    Autonomous Agents and Multi-Agent Systems, 2002, 5 : 289 - 304
  • [10] Multi-Agent Reinforcement Learning - An Exploration Using Q-Learning
    Graham, Caoimhin
    Bell, David
    Luo, Zhihui
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVI: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XVII, 2010, : 293 - 298