Decentralized multi-agent reinforcement learning based on best-response policies

被引：2

作者：

Gabler, Volker ^{[1
]}

Wollherr, Dirk ^{[1
]}

机构：

[1] Tech Univ Munich, TUM Sch Computat Informat & Technol, Chair Automat Control Engn, Munich, Germany

来源：

FRONTIERS IN ROBOTICS AND AI | 2024年 / 11卷

基金：

欧盟地平线“2020”;

关键词：

multi-agent reinforcement learning; game theory; deep learning; artificial intelligence; actor-critic algorithm; multi-agent; Stackelberg; decentralized learning schemes; reinforcement leaning; LEVEL;

D O I：

10.3389/frobt.2024.1229026

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems.Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor-critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training.Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.

引用

页数：16

共 55 条

[1]

Ackermann J, 2019, Arxiv, DOI [arXiv:1910.01465, DOI 10.48550/ARXIV.1910.01465]

[2] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[3] A MARKOVIAN DECISION PROCESS [J].

BELLMAN, R .

JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684

[4]

Englert P, 2016, ROBOTICS: SCIENCE AND SYSTEMS XII

[5]

Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974

[6]

Freelan D., 2018, AAAI SPRING S AAAI P

[7]

Haarnoja T, 2019, Arxiv, DOI arXiv:1812.05905

[8]

Haarnoja Tuomas, 2018, Acquiring diverse robot skills via maximum entropy deep reinforcement learning

[9]

Havrylov S, 2017, Arxiv, DOI arXiv:1705.11192

[10]

He ZC, 2021, Arxiv, DOI arXiv:2112.06594

← 1 2 3 4 5 6 →