A hybrid agent architecture integrating desire, intention and reinforcement learning

被引：17

作者：

Tan, Ah-Hwee ^{[1
]}

Ong, Yew-Soon ^{[1
]}

Tapanuj, Akejariyawong ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2011年 / 38卷 / 07期

基金：

新加坡国家研究基金会;

关键词：

BDI architecture; Reinforcement learning; Plan learning; Self-organizing neural networks; Minefield navigation; COGNITION;

D O I：

10.1016/j.eswa.2011.01.045

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a hybrid agent architecture that integrates the behaviours of BDI agents, specifically desire and intention, with a neural network based reinforcement learner known as Temporal Difference-Fusion Architecture for Learning and COgNition (TD-FALCON). With the explicit maintenance of goals, the agent performs reinforcement learning with the awareness of its objectives instead of relying on external reinforcement signals. More importantly, the intention module equips the hybrid architecture with deliberative planning capabilities, enabling the agent to purposefully maintain an agenda of actions to perform and reducing the need of constantly sensing the environment. Through reinforcement learning, plans can also be learned and evaluated without the rigidity of user-defined plans as used in traditional BDI systems. For intention and reinforcement learning to work cooperatively, two strategies are presented for combining the intention module and the reactive learning module for decision making in a real time environment. Our case study based on a minefield navigation domain investigates how the desire and intention modules may cooperatively enhance the capability of a pure reinforcement learner. The empirical results show that the hybrid architecture is able to learn plans efficiently and tap both intentional and reactive action execution to yield a robust performance. (C) 2011 Elsevier Ltd. All rights reserved.

引用

页码：8477 / 8487

页数：11

共 31 条

[11] Karim S, 2006, LECT NOTES ARTIF INT, V4099, P200, DOI 10.1007/978-3-540-36668-3_23
[12] Kinny D., 1991, P 12 INT JOINT C ART, P82
[13] LEBIERE C, 2000, LECT NOTES ARTIF INT, V1828, P188
[14] Norling E., 2004, P INT C AUTONOMOUS A, P202
[15] THE USES OF PLANS
POLLACK, ME
[J]. ARTIFICIAL INTELLIGENCE, 1992, 57 (01) : 43 - 68
[16] Si J., 2004, HDB LEARNING APPROXI
[17] SUBAGDJA B, 2005, P 9 INT C KNOWL BAS, V3, P30
[18] Learning plans without a priori knowledge
Sun, R
Sessions, C
[J]. ADAPTIVE BEHAVIOR, 2001, 8 (3-4) : 225 - 253
[19] SUN R, 2000, LNAI, V1828
[20] SUN R, 2000, P INT JOINT C NEUR N, P24

← 1 2 3 4 →