A layered approach to learning coordination knowledge in multiagent environments

被引：13

作者：

Erus, Guray

Polat, Faruk

机构：

[1] Univ Paris 05, Lab SIP CRIP5, F-75006 Paris, France

[2] Middle E Tech Univ, TR-06531 Ankara, Turkey

来源：

APPLIED INTELLIGENCE | 2007年 / 27卷 / 03期

关键词：

reinforcement learning; hierarchical reinforcement learning; multiagent learning;

D O I：

10.1007/s10489-006-0034-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals. Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL) method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical levels; in the first level agents learn to select their target and then they select the action directed to their target in the second level. The agents communicate their perception to their neighbors and use the communication information in their decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior.

引用

页码：249 / 267

页数：19

共 50 条

[1] Multiagent reinforcement learning using function approximation [J].

Abul, O ;

Polat, F ;

Alhajj, R .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2000, 30 (04) :485-497

[2]

[Anonymous], 1999, Learning in Graphical Models

[3]

[Anonymous], 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume

[4]

Barto AG, 2003, DISCRETE EVENT DYN S, V13, P343

[5]

Bertsekas D. P., 1987, DYNAMIC PROGRAMMING

[6]

Bertsekas DP, 2012, DYNAMIC PROGRAMMING, V2

[7] Multiagent learning using a variable learning rate [J].

Bowling, M ;

Veloso, M .

ARTIFICIAL INTELLIGENCE, 2002, 136 (02) :215-250

[8]

Chalkiadakis G., 2003, Autonomous Agents and Multiagent Systems, P709, DOI 10.1145/860575.860689

[9]

Christopher JohnCornish Hella by Watkins., 1989, Learning from delayed rewards

[10]

Claus C, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P746

← 1 2 3 4 5 →