V-MIN: Efficient Reinforcement Learning through Demonstrations and Relaxed Reward Demands

被引：0

作者：

Martinez, David ^{[1
]}

Alenya, Guillem ^{[1
]}

Torras, Carme ^{[1
]}

机构：

[1] UPC, CSIC, Inst Robot & Informat Ind, C Llorens & Artigas 4-6, Barcelona 08028, Spain

来源：

PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2015年

关键词：

MODELS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) is a common paradigm for learning tasks in robotics. However, a lot of exploration is usually required, making RL too slow for high-level tasks. We present V-MIN, an algorithm that integrates teacher demonstrations with RL to learn complex tasks faster. The algorithm combines active demonstration requests and autonomous exploration to find policies yielding rewards higher than a given threshold V-min This threshold sets the degree of quality with which the robot is expected to complete the task, thus allowing the user to either opt for very good policies that require many learning experiences, or to be more permissive with sub-optimal policies that are easier to learn. The threshold can also be increased online to force the system to improve its policies until the desired behavior is obtained. Furthermore, the algorithm generalizes previously learned knowledge, adapting well to changes. The performance of V-MIN has been validated through experimentation, including domains from the international planning competition. Our approach achieves the desired behavior where previous algorithms failed.

引用

页码：2857 / 2863

页数：7

共 21 条

[1] A survey of robot learning from demonstration [J].

Argall, Brenna D. ;

Chernova, Sonia ;

Veloso, Manuela ;

Browning, Brett .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) :469-483

[2] R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning [J].

Brafman, RI ;

Tennenholtz, M .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (02) :213-231

[3] Interactive Policy Learning through Confidence-Based Autonomy [J].

Chernova, Sonia ;

Veloso, Manuela .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2009, 34 :1-25

[4]

Diuk C., 2008, INT C MACH LEARN ICM, P240, DOI 10.1145/1390156.1390187

[5] Dogged learning for robots [J].

Grollman, Daniel H. ;

Jenkins, Odest Chadwicke .

PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-10, 2007, :2483-+

[6] TEXPLORE: real-time sample-efficient reinforcement learning for robots [J].

Hester, Todd ;

Stone, Peter .

MACHINE LEARNING, 2013, 90 (03) :385-429

[7]

IPPC, 2008, 6 INT PLANN COMP UNC

[8] Near-optimal reinforcement learning in polynomial time [J].

Kearns, M ;

Singh, S .

MACHINE LEARNING, 2002, 49 (2-3) :209-232

[9]

Kober J, 2012, ADAPT LEARN OPTIM, V12, P579

[10]

Kolobov A., 2012, P AAAI C ART INT

← 1 2 3 →