Multi-agent adaptive dynamic programming

被引:0
作者
Mukhopadhyay, S [1 ]
Varghese, J [1 ]
机构
[1] Indiana Univ Purdue Univ, Dept Comp & Informat Sci, Indianapolis, IN 46202 USA
来源
MICAI 2000: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS | 2000年 / 1793卷
关键词
adaptive dynamic programming; Markov decision process; reinforcement learning; multiple learning agents; knowledge combining;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dynamic programming offers an exact, general solution method for completely known sequential decision problems, formulated as Markov Decision Processes (MDP), with a finite number of states. Recently, there has been a great amount of interest in the adaptive version of the problem, where the task to be solved is not completely known a priori. In such a case, an agent has to acquire the necessary knowledge through learning, while simultaneously solving the optimal control or decision problem. A large variety of algorithms, variously known as Adaptive Dynamic Programming (ADP) or Reinforcement Learning (RL), has been proposed in the literature. However, almost invariably such algorithms suffer from slow convergence in terms of the number of experiments needed. In this paper Re investigate how the learning speed can be considerably improved by exploiting and combining knowledge accumulated by multiple agents. These agents operate in the same task environment but follow possibly different trajectories. We discuss methods of combining the knowledge structures associated with the multiple agents and different strategies (with varying overheads) for knowledge communication between agents. Results of simulation experiments are also presented to indicate that combining multiple learning agents is a promising direction to improve learning speed. The method also performs significantly better than some of the fastest MDP learning algorithms such as the prioritized sweeping.
引用
收藏
页码:574 / 585
页数:12
相关论文
共 8 条
[1]   LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING [J].
BARTO, AG ;
BRADTKE, SJ ;
SINGH, SP .
ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) :81-138
[2]  
Bellman R., 1957, DYNAMIC PROGRAMMING
[3]   Reinforcement learning: A survey [J].
Kaelbling, LP ;
Littman, ML ;
Moore, AW .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285
[4]  
LASKARI Y, 1994, P AAAI C
[5]  
MOORE AW, 1993, MACHINE LEARNING, V13
[6]  
Narendra K. S., 1997, IEEE T AUTOMATIC CON, V42
[7]  
Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447
[8]   TD-GAMMON, A SELF-TEACHING BACKGAMMON PROGRAM, ACHIEVES MASTER-LEVEL PLAY [J].
TESAURO, G .
NEURAL COMPUTATION, 1994, 6 (02) :215-219