Exploration in Relational Domains for Model-based Reinforcement Learning

被引:0
作者
Lang, Tobias [1 ]
Toussaint, Marc [1 ]
Kersting, Kristian [2 ]
机构
[1] Free Univ Berlin, Machine Learning & Robot Grp, D-14195 Berlin, Germany
[2] Fraunhofer Inst Intelligent Anal & Informat Syst, Knowledge Discovery Dept, D-53754 St Augustin, Germany
关键词
reinforcement learning; statistical relational learning; exploration; relational transition models; robotics;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E-3 and R-MAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a well-known context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efficiency of our framework using count functions under the assumption that we had a relational KWIK learner and a near-optimal planner. We propose a concrete exploration algorithm which integrates a practically efficient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques.
引用
收藏
页码:3725 / 3768
页数:44
相关论文
共 59 条
[1]  
[Anonymous], 2007, Introduction to Statistical Relational Learning
[2]  
[Anonymous], P 21 INT C MACH LEAR
[3]  
Bengio Y., 2009, P 26 ANN INT C MACH, P41, DOI [10.1145/1553374.1553380, DOI 10.1145/1553374.1553380]
[4]  
Bilgic Mustafa, 2010, P INT C MACH LEARN I
[5]  
Blockeel Hendrik, 1998, ARTIF INTELL, V101, P185
[6]  
Boutilier C., 2001, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), P690
[7]   R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning [J].
Brafman, RI ;
Tennenholtz, M .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (02) :213-231
[8]  
Christensen Henrik, 2009, INTERNET ROBOTICS RO
[9]   Active learning with statistical models [J].
Cohn, DA ;
Ghahramani, Z ;
Jordan, MI .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :129-145
[10]  
Croonenborghs T, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P726