Safe Q-Learning Method Based on Constrained Markov Decision Processes

被引:19
|
作者
Ge, Yangyang [1 ]
Zhu, Fei [1 ,2 ]
Lin, Xinghong [1 ]
Liu, Quan [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Peoples R China
来源
IEEE ACCESS | 2019年 / 7卷
基金
中国国家自然科学基金;
关键词
Constrained Markov decision processes; safe reinforcement learning; Q-learning; constraint; Lagrange multiplier; REINFORCEMENT; OPTIMIZATION; ALGORITHM;
D O I
10.1109/ACCESS.2019.2952651
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The application of reinforcement learning in industrial fields makes the safety problem of the agent a research hotspot. Traditional methods mainly alter the objective function and the exploration process of the agent to address the safety problem. Those methods, however, can hardly prevent the agent from falling into dangerous states because most of the methods ignore the damage caused by unsafe states. As a result, most solutions are not satisfactory. In order to solve the aforementioned problem, we come forward with a safe Q-learning method that is based on constrained Markov decision processes, adding safety constraints as prerequisites to the model, which improves standard Q-learning algorithm so that the proposed algorithm seeks for the optimal solution ensuring that the safety premise is satisfied. During the process of finding the solution in form of the optimal state-action value, the feasible space of the agent is limited to the safe space that guarantees the safety via the feasible space being filtered by constraints added to the action space. Because the traditional solution methods are not applicable to the safe Q-learning model as they tend to obtain local optimal solution, we take advantage of the Lagrange multiplier method to solve the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.
引用
收藏
页码:165007 / 165017
页数:11
相关论文
共 50 条
  • [41] A robot demonstration method based on LWR and Q-learning algorithm
    Zhao, Guangzhe
    Tao, Yong
    Liu, Hui
    Deng, Xianling
    Chen, Youdong
    Xiong, Hegen
    Xie, Xianwu
    Fang, Zengliang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (01) : 35 - 46
  • [42] Decentralized Q-Learning in Zero-sum Markov Games
    Sayin, Muhammed O.
    Zhang, Kaiqing
    Leslie, David S.
    Sar, Tamer Ba Comma
    Ozdaglar, Asuman
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [43] Induced states in a decision tree constructed by Q-learning
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Jiang, Wei-Cheng
    Yang, Tsung-Wen
    INFORMATION SCIENCES, 2012, 213 : 39 - 49
  • [44] Motion Planning for Lunar Rover Based on Behavior Decision Field Q-Learning
    Pan, Haining
    Yuan, Ye
    Ju, Hehua
    Cui, Pingyuan
    2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 1834 - 1839
  • [45] An intelligent decision system for virtual machine migration based on specific Q-learning
    Zhu, Xinying
    Xia, Ran
    Zhou, Hang
    Zhou, Shuo
    Liu, Haoran
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2024, 13 (01):
  • [46] A multi-stage group decision model based on improved Q-learning
    Zhang F.
    Liu L.-Y.
    Guo X.-X.
    Kongzhi yu Juece/Control and Decision, 2019, 34 (09): : 1917 - 1922
  • [47] Cooperative Q-Learning Based on Learning Automata
    Yang, Mao
    Tian, Yantao
    Qi, Xinyue
    2009 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS ( ICAL 2009), VOLS 1-3, 2009, : 1972 - 1977
  • [48] Decision Making for Communication Anti-Jamming Tasks with Knowledge-Graph-Based Q-Learning
    Feng, Xijin
    Niu, Yingtao
    Liu, Qi
    Zhou, Quan
    ELECTRONICS, 2024, 13 (23):
  • [49] Privacy-preserving Decision Making Based on Q-Learning in Cloud Computing
    Zhou, Zhipeng
    Dong, Chenyu
    Mo, Donger
    Zheng, Peijia
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 727 - 732
  • [50] Improved Exploration Strategy for Q-Learning Based Multipath Routing in SDN Networks
    Hassen, Houda
    Meherzi, Soumaya
    Jemaa, Zouhair Ben
    JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2024, 32 (02)