A novel modular Q-learning architecture to improve performance under incomplete learning in a grid soccer game

被引:7
作者
Araghi, Sahar [1 ]
Khosravi, Abbas [1 ]
Johnstone, Michael [1 ]
Creighton, Douglas [1 ]
机构
[1] Deakin Univ, Ctr Intelligent Syst Res, Geelong, Vic 3217, Australia
关键词
Multi-agent systems; Machine learning; Modular reinforcement learning; Q-learning; REINFORCEMENT; BEHAVIOR;
D O I
10.1016/j.engappai.2013.05.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-agent reinforcement learning methods suffer from several deficiencies that are rooted in the large state space of multi-agent environments. This paper tackles two deficiencies of multi-agent reinforcement learning methods: their slow learning rate, and low quality decision-making in early stages of learning. The proposed methods are applied in a grid-world soccer game. In the proposed approach, modular reinforcement learning is applied to reduce the state space of the learning agents from exponential to linear in terms of the number of agents. The modular model proposed here includes two new modules, a partial-module and a single-module. These two new modules are effective for increasing the speed of learning in a soccer game. We also apply the instance-based learning concepts, to choose proper actions in states that are not experienced adequately during learning. The key idea is to use neighbouring states that have been explored sufficiently during the learning phase. The results of experiments in a grid-soccer game environment show that our proposed methods produce a higher average reward compared to the situation where the proposed method is not applied to the modular structure. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2164 / 2171
页数:8
相关论文
共 31 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
[Anonymous], 1994, P 11 INT C INT C MAC
[3]  
Martín JA, 2009, LECT NOTES COMPUT SC, V5601, P305, DOI 10.1007/978-3-642-02264-7_32
[4]  
Bhat Sooraj., 2006, NATL C ARTIFICIAL IN, V21, P318
[5]   Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets [J].
Henderson, James ;
Lemon, Oliver ;
Georgila, Kallirroi .
COMPUTATIONAL LINGUISTICS, 2008, 34 (04) :487-511
[6]  
Iravanian S., 2009, GRID WORLD SOCCER SI
[7]   An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning [J].
Ishiwaka, Y ;
Sato, T ;
Kakazu, Y .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2003, 43 (04) :245-256
[8]   Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system [J].
Jiang, Chengzhi ;
Sheng, Zhaohan .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :6520-6526
[9]  
Martin JM, 2007, ICINCO 2007: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL ICSO, P192
[10]  
Junling Hu, 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P242