Multi-agent reinforcement learning as a rehearsal for decentralized planning

被引：238

作者：

Kraemer, Landon ^{[1
]}

Banerjee, Bikramjit ^{[1
]}

机构：

[1] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA

来源：

NEUROCOMPUTING | 2016年 / 190卷

基金：

美国国家科学基金会;

关键词：

Multi-agent reinforcement learning; Decentralized planning;

D O I：

10.1016/j.neucom.2016.01.031

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Decentralized partially observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Multi-agent reinforcement learning (MARL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and policy execution are identical. In some, practical scenarios this may not be the case. We propose a novel MARL approach in which agents are allowed to rehearse with information that will not be available during policy execution. The key is for the agents to learn policies that do not explicitly rely on these rehearsal features. We also establish a weak convergence result for our algorithm, RLaR, demonstrating that RLaR converges in probability when certain conditions are met. We show experimentally that incorporating rehearsal features can enhance the learning rate compared to non-rehearsal based learners, and demonstrate fast, (near) optimal performance on many existing benchmark Dec-POMDP problems. We also compare RLaR against an existing approximate Dec-POMDP solver which, like RLaR, does not assume a priori knowledge of the model. While RLaR's policy representation is not as scalable, we show that RLaR produces higher quality policies for most problems and horizons studied. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：82 / 94

页数：13

共 29 条

[1]

Amato C., 2009, P ICAPS 2009 AAAI

[2]

[Anonymous], 2010, Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence

[3]

[Anonymous], 2011, P 22 INT JOINT C ART

[4] An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs [J].

Aras, Raghav ;

Dutech, Alain .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2010, 37 :329-396

[5]

Auer P., 2002, J MACHINE LEARNING R, V3, P397, DOI DOI 10.4271/610369

[6]

Banerjee B., 2012, Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI'12, P1256

[7]

Banerjee B, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P672

[8] The complexity of decentralized control of Markov decision processes [J].

Bernstein, DS ;

Givan, R ;

Immerman, N ;

Zilberstein, S .

MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (04) :819-840

[9]

Boutilier C, 1996, THEORETICAL ASPECTS OF RATIONALITY AND KNOWLEDGE, P195

[10] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

← 1 2 3 →