Safe Exploration Method for Reinforcement Learning Under Existence of Disturbance

被引:2
作者
Okawa, Yoshihiro [1 ]
Sasaki, Tomotake [1 ]
Yanami, Hitoshi [1 ]
Namerikawa, Toru [2 ]
机构
[1] Fujitsu Ltd, Artificial Intelligence Lab, Kawasaki, Kanagawa, Japan
[2] Keio Univ, Dept Syst Design Engn, Yokohama, Kanagawa, Japan
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV | 2023年 / 13716卷
关键词
Reinforcement learning; Safe exploration; Chance constraint;
D O I
10.1007/978-3-031-26412-2_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposedmethod. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
引用
收藏
页码:132 / 147
页数:16
相关论文
共 22 条
[1]  
Achiam J, 2017, PR MACH LEARN RES, V70
[2]  
Ames AD, 2019, 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), P3420, DOI [10.23919/ECC.2019.8796030, 10.23919/ecc.2019.8796030]
[3]  
[Anonymous], 2009, Convex optimization
[4]  
Berkenkamp F, 2017, ADV NEUR IN, V30
[5]   Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models [J].
Biyik, Erdem ;
Margoliash, Jonathan ;
Alimo, Shahrouz Ryan ;
Sadigh, Dorsa .
2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, :1792-1799
[6]  
Brockman G, 2016, Arxiv, DOI arXiv:1606.01540
[7]  
Cheng R, 2019, AAAI CONF ARTIF INTE, P3387
[8]  
Chow Y, 2019, Arxiv, DOI arXiv:1901.10031
[9]  
Fan DD, 2020, IEEE INT CONF ROBOT, P4093, DOI [10.1109/ICRA40945.2020.9196709, 10.1109/icra40945.2020.9196709]
[10]  
García J, 2015, J MACH LEARN RES, V16, P1437