Distributed reinforcement learning for sequential decision making

被引：0

作者：

Rogova, G ^{[1
]}

Scott, P ^{[1
]}

Lolett, C ^{[1
]}

机构：

[1] Encompass Consulting, Ctr Multisource Informat Fus, Honeoye Falls, NY USA

来源：

PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOL II | 2002年

关键词：

distributed systems; reinforcement learning; neural network; evidence theory; pignistic likelihood ratios test; profit sharing strategy;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The paper addresses a problem of reinforcement learning in a homogeneous non-communicating multi-agent system for sequential decision making. We introduce a particular reinforcement learning model composed of evidential reinforcement neural networks representing agents, a fusion center, and a decision maker. The fusion center combines beliefs in each hypothesis under consideration generated by the agents and produces pignistic probabilities of the hypotheses under consideration. These pignistic probabilities are used by a decision maker in a sequential pignistic probability ratio test to choose one of two actions: "defer decision" or "decide hypothesis k "The test is shaped to encourage early decisions and incorporates a finite decision deadline. Upon each decision, a non-binary reinforcement signal is computed by the environment, and is then fed back to the agents, which utilize it to learn an optimizing belief function. The learning algorithm adapts the "profit sharing strategy" to the sequential decision making setting.

引用

页码：1263 / 1268

页数：6

共 12 条

[1] A SEQUENTIAL PROCEDURE FOR MULTIHYPOTHESIS TESTING
BAUM, CW
VEERAVALLI, VV
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1994, 40 (06) : 1994 - 2007
[2] DISTRIBUTED PROBLEM-SOLVING TECHNIQUES - A SURVEY
DECKER, KS
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1987, 17 (05): : 729 - 740
[3] Multihypothesis sequential probability ratio tests - Part I: Asymptotic optimality
Dragalin, VP
Tartakovsky, AG
Veeravalli, VV
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1999, 45 (07) : 2448 - 2461
[4] Fu K. S., 1968, SEQUENTIAL METHODS P, V240, P241
[5] MENON R, 1996, IEEE INT C IM PROC S
[6] ROGOVA G, 2001, P FUSION 2001 4 C MU
[7] Rogova GL, 1998, FUSION'98: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTISOURCE-MULTISENSOR INFORMATION FUSION, VOLS 1 AND 2, P191
[8] SEN S, 1995, P IJCAI 95 WORKSH, P219
[9] SHAW MJ, 1991, P 24 ANN HAW INT C S, V4, P13
[10] SIAN SS, 1991, ADAPTATION BASED COO, P257

← 1 2 →