Collaborative Q(λ) reinforcement learning algorithm -: A promising robot learning framework

被引:0
作者
Kartoun, U [1 ]
Stem, H [1 ]
Edan, Y [1 ]
Feied, C [1 ]
Handler, J [1 ]
Smith, M [1 ]
Gillam, M [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Ind Engn & Management, IL-84105 Beer Sheva, Israel
来源
PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON ROBOTICS AND APPLICATIONS | 2005年
关键词
robot simulation; reinforcement learning; and navigation;
D O I
暂无
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
This paper presents the design and implementation of a new reinforcement learning (RL) based algorithm. The proposed algorithm, CQ(lambda) (collaborative Q(lambda)) allows several learning agents to acquire knowledge from each other. Acquiring knowledge learnt by an agent via collaboration with another agent enables acceleration of the entire learning system; therefore, learning can be utilized more efficiently. By developing collaborative learning algorithms, a learning task solution can be achieved significantly faster if performed by a single agent only, namely the number of learning episodes to solve a task is reduced. The proposed algorithm proved to accelerate learning in navigation robotic problem. The CQ(lambda) algorithm was applied to autonomous mobile robot navigation where several robot agents serve as learning processes. Robots learned to navigate an 11 x 11 world contains obstacles and boundaries choosing the optimum path to reach a target. Simulated experiments based on 50 learning episodes showed an average improvement of 17.02% while measuring the number of learning steps required reaching definite optimality and an average improvement of 32.98% for convergence to near optimality by using two robots compared with the Q(lambda) algorithm [1, 2].
引用
收藏
页码:13 / 19
页数:7
相关论文
共 29 条
[1]  
[Anonymous], P IEEE RSJ INT C INT
[2]  
Bellman R., 1965, DYNAMIC PROGRAMMING, V81
[3]  
Bhanu B, 2001, IEEE INT CONF ROBOT, P491, DOI 10.1109/ROBOT.2001.932598
[4]  
BROADBENT R, 2005, P 2005 IEEE INT C RO
[5]  
Dahmani Y., 2005, Journal of Computer Sciences, V1, P28, DOI 10.3844/jcssp.2005.28.30
[6]  
EDAN Y, 2004, C ADV INT TECHN APPL
[7]  
Glorennec P.Y., 2000, P EUR S INT TECHN ES, P14
[8]   A new Q-learning algorithm based on the Metropolis criterion [J].
Guo, MZ ;
Liu, Y ;
Malec, J .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (05) :2140-2143
[9]  
HOWARD A, 1999, THESIS U MELBOURNE D
[10]   Reinforcement learning: A survey [J].
Kaelbling, LP ;
Littman, ML ;
Moore, AW .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285