Safe Q-Learning Method Based on Constrained Markov Decision Processes

被引：19

作者：

Ge, Yangyang ^{[1
]}

Zhu, Fei ^{[1
,2
]}

Lin, Xinghong ^{[1
]}

Liu, Quan ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Peoples R China

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

中国国家自然科学基金;

关键词：

Constrained Markov decision processes; safe reinforcement learning; Q-learning; constraint; Lagrange multiplier; REINFORCEMENT; OPTIMIZATION; ALGORITHM;

D O I：

10.1109/ACCESS.2019.2952651

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The application of reinforcement learning in industrial fields makes the safety problem of the agent a research hotspot. Traditional methods mainly alter the objective function and the exploration process of the agent to address the safety problem. Those methods, however, can hardly prevent the agent from falling into dangerous states because most of the methods ignore the damage caused by unsafe states. As a result, most solutions are not satisfactory. In order to solve the aforementioned problem, we come forward with a safe Q-learning method that is based on constrained Markov decision processes, adding safety constraints as prerequisites to the model, which improves standard Q-learning algorithm so that the proposed algorithm seeks for the optimal solution ensuring that the safety premise is satisfied. During the process of finding the solution in form of the optimal state-action value, the feasible space of the agent is limited to the safe space that guarantees the safety via the feasible space being filtered by constraints added to the action space. Because the traditional solution methods are not applicable to the safe Q-learning model as they tend to obtain local optimal solution, we take advantage of the Lagrange multiplier method to solve the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.

引用

页码：165007 / 165017

页数：11

共 50 条

[41] A robot demonstration method based on LWR and Q-learning algorithm
Zhao, Guangzhe
Tao, Yong
Liu, Hui
Deng, Xianling
Chen, Youdong
Xiong, Hegen
Xie, Xianwu
Fang, Zengliang
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (01) : 35 - 46
[42] Decentralized Q-Learning in Zero-sum Markov Games
Sayin, Muhammed O.
Zhang, Kaiqing
Leslie, David S.
Sar, Tamer Ba Comma
Ozdaglar, Asuman
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[43] Induced states in a decision tree constructed by Q-learning
Hwang, Kao-Shing
Chen, Yu-Jen
Jiang, Wei-Cheng
Yang, Tsung-Wen
INFORMATION SCIENCES, 2012, 213 : 39 - 49
[44] Motion Planning for Lunar Rover Based on Behavior Decision Field Q-Learning
Pan, Haining
Yuan, Ye
Ju, Hehua
Cui, Pingyuan
2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 1834 - 1839
[45] An intelligent decision system for virtual machine migration based on specific Q-learning
Zhu, Xinying
Xia, Ran
Zhou, Hang
Zhou, Shuo
Liu, Haoran
JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2024, 13 (01):
[46] A multi-stage group decision model based on improved Q-learning
Zhang F.
Liu L.-Y.
Guo X.-X.
Kongzhi yu Juece/Control and Decision, 2019, 34 (09): : 1917 - 1922
[47] Cooperative Q-Learning Based on Learning Automata
Yang, Mao
Tian, Yantao
Qi, Xinyue
2009 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS ( ICAL 2009), VOLS 1-3, 2009, : 1972 - 1977
[48] Decision Making for Communication Anti-Jamming Tasks with Knowledge-Graph-Based Q-Learning
Feng, Xijin
Niu, Yingtao
Liu, Qi
Zhou, Quan
ELECTRONICS, 2024, 13 (23):
[49] Privacy-preserving Decision Making Based on Q-Learning in Cloud Computing
Zhou, Zhipeng
Dong, Chenyu
Mo, Donger
Zheng, Peijia
2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 727 - 732
[50] Improved Exploration Strategy for Q-Learning Based Multipath Routing in SDN Networks
Hassen, Houda
Meherzi, Soumaya
Jemaa, Zouhair Ben
JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2024, 32 (02)

← 1 2 3 4 5 →