Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

被引:9
作者
Biyik, Erdem [1 ]
Margoliash, Jonathan [2 ,3 ]
Alimo, Shahrouz Ryan [2 ]
Sadigh, Dorsa [1 ,4 ]
机构
[1] Stanford Univ, Elect Engn, Stanford, CA 94305 USA
[2] CALTECH, Jet Prop Lab, Pasadena, CA 91109 USA
[3] UC San Diego Jacobs Sch Engn, La Jolla, CA 92093 USA
[4] Stanford Univ, Comp Sci, Stanford, CA 94305 USA
来源
2019 AMERICAN CONTROL CONFERENCE (ACC) | 2019年
基金
美国国家航空航天局;
关键词
STATE;
D O I
10.23919/acc.2019.8815276
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models. Our algorithm guarantees safety by leveraging Lipschitz-continuity to ensure that no unsafe states are visited during exploration. Unlike many other existing techniques, the provided safety guarantee is deterministic. Our algorithm is optimized to reduce the number of actions needed for exploring the safe space. We demonstrate the performance of our algorithm in comparison with baseline methods in simulation on navigation tasks.
引用
收藏
页码:1792 / 1799
页数:8
相关论文
共 25 条
[1]  
Abbeel P., 2005, P 22 INT C MACHINE L, P1
[2]   Autonomous Helicopter Aerobatics through Apprenticeship Learning [J].
Abbeel, Pieter ;
Coates, Adam ;
Ng, Andrew Y. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) :1608-1639
[3]  
Akametalu AK, 2014, IEEE DECIS CONTR P, P1424, DOI 10.1109/CDC.2014.7039601
[4]  
Alimo Shahrouz Ryan, 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC), P2531, DOI 10.1109/CDC.2017.8264025
[5]   Control Barrier Function Based Quadratic Programs for Safety Critical Systems [J].
Ames, Aaron D. ;
Xu, Xiangru ;
Grizzle, Jessy W. ;
Tabuada, Paulo .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) :3861-3876
[6]   A survey of robot learning from demonstration [J].
Argall, Brenna D. ;
Chernova, Sonia ;
Veloso, Manuela ;
Browning, Brett .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) :469-483
[7]   Provably safe and robust learning-based model predictive control [J].
Aswani, Anil ;
Gonzalez, Humberto ;
Sastry, S. Shankar ;
Tomlin, Claire .
AUTOMATICA, 2013, 49 (05) :1216-1226
[8]  
Berkenkamp F., 2016, arXiv
[9]   Q-learning for risk-sensitive control [J].
Borkar, VS .
MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (02) :294-311
[10]   Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes [J].
Coraluppi, SP ;
Marcus, SI .
AUTOMATICA, 1999, 35 (02) :301-309