Safe Q-Learning Method Based on Constrained Markov Decision Processes

被引：19

作者：

Ge, Yangyang ^{[1
]}

Zhu, Fei ^{[1
,2
]}

Lin, Xinghong ^{[1
]}

Liu, Quan ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Peoples R China

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

中国国家自然科学基金;

关键词：

Constrained Markov decision processes; safe reinforcement learning; Q-learning; constraint; Lagrange multiplier; REINFORCEMENT; OPTIMIZATION; ALGORITHM;

D O I：

10.1109/ACCESS.2019.2952651

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The application of reinforcement learning in industrial fields makes the safety problem of the agent a research hotspot. Traditional methods mainly alter the objective function and the exploration process of the agent to address the safety problem. Those methods, however, can hardly prevent the agent from falling into dangerous states because most of the methods ignore the damage caused by unsafe states. As a result, most solutions are not satisfactory. In order to solve the aforementioned problem, we come forward with a safe Q-learning method that is based on constrained Markov decision processes, adding safety constraints as prerequisites to the model, which improves standard Q-learning algorithm so that the proposed algorithm seeks for the optimal solution ensuring that the safety premise is satisfied. During the process of finding the solution in form of the optimal state-action value, the feasible space of the agent is limited to the safe space that guarantees the safety via the feasible space being filtered by constraints added to the action space. Because the traditional solution methods are not applicable to the safe Q-learning model as they tend to obtain local optimal solution, we take advantage of the Lagrange multiplier method to solve the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.

引用

页码：165007 / 165017

页数：11

共 50 条

[1] Model-free Safe Reinforcement Learning Method Based on Constrained Markov Decision Processes
Zhu F.
Ge Y.-Y.
Ling X.-H.
Liu Q.
Ruan Jian Xue Bao/Journal of Software, 2022, 33 (08): : 3086 - 3102
[2] Q-learning for Markov decision processes with a satisfiability criterion
Shah, Suhail M.
Borkar, Vivek S.
SYSTEMS & CONTROL LETTERS, 2018, 113 : 45 - 51
[3] Risk-aware Q-Learning for Markov Decision Processes
Huang, Wenjie
Haskell, William B.
2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
[4] Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States
Yang, Xiangyu
Hu, Jiaqiao
Hu, Jian-Qiang
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (10) : 6546 - 6560
[5] Q-learning algorithms for constrained Markov decision processes with randomized monotone policies:: Application to MIMO transmission control
Djonin, Dejan V.
Krishnamurthy, Vikram
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (05) : 2170 - 2181
[6] Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty
Neufeld, Ariel
Sester, Julian
AUTOMATICA, 2024, 168
[7] Reinforcement Learning for Constrained Markov Decision Processes
Gattami, Ather
Bai, Qinbo
Aggarwal, Vaneet
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[8] Optimal method for the generation of the attack path based on the Q-learning decision
Li T.
Cao S.
Yin S.
Wei D.
Ma X.
Ma J.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2021, 48 (01): : 160 - 167
[9] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
Ohnishi, Shota
Uchibe, Eiji
Yamaguchi, Yotaro
Nakanishi, Kosuke
Yasui, Yuji
Ishii, Shin
FRONTIERS IN NEUROROBOTICS, 2019, 13
[10] Optimal operational control for industrial processes based on Q-learning method
Li, Jinna
Gao, Xize
Yuan, Decheng
Fan, Jialu
PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 2562 - 2567

← 1 2 3 4 5 →