Generalized Model Learning for Reinforcement Learning on a Humanoid Robot

被引:44
作者
Hester, Todd [1 ]
Quinlan, Michael [1 ]
Stone, Peter [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2010年
基金
美国国家科学基金会;
关键词
D O I
10.1109/ROBOT.2010.5509181
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL) algorithms have long been promising methods for enabling an autonomous robot to improve its behavior on sequential decision-making tasks. The obvious enticement is that the robot should be able to improve its own behavior without the need for detailed step-by-step programming. However, for RL to reach its full potential, the algorithms must be sample efficient: they must learn competent behavior from very few real-world trials. From this perspective, model-based methods, which use experiential data more efficiently than model-free approaches, are appealing. But they often require exhaustive exploration to learn an accurate model of the domain. In this paper, we present an algorithm, Reinforcement Learning with Decision Trees (RLDT), that uses decision trees to learn the model by generalizing the relative effect of actions across states. The agent explores the environment until it believes it has a reasonable policy. The combination of the learning approach with the targeted exploration policy enables fast learning of the model. We compare RL-DT against standard model-free and model-based learning methods, and demonstrate its effectiveness on an Aldebaran Nao humanoid robot scoring goals in a penalty kick scenario.
引用
收藏
页码:2369 / 2374
页数:6
相关论文
共 14 条
[1]  
[Anonymous], 2004, P IEEE INT C ROB AUT
[2]  
[Anonymous], 2009, P 8 INT C AUT AG MUL
[3]  
Bagnell JA, 2001, IEEE INT CONF ROBOT, P1615, DOI 10.1109/ROBOT.2001.932842
[4]  
BRAFMAN R, 2001, P 17 INT JOINT C ART, P953
[5]  
COMMITTEE RT, 2009, ROBOCUP STANDARD PLA
[6]  
Connell J.H., 1993, ROBOT LEARNING
[7]  
Degris T., 2006, ICML, P257
[8]  
Leffler BR, 2007, P AAAI, V7, P572
[9]  
Ng A. Y., 2004, ADV NEURAL INFORM PR, V17
[10]  
Quinlan J. R., 1986, Machine Learning, V1, P81, DOI 10.1007/BF00116251