Applying Quantitative Model Checking to Analyze Safety in Reinforcement Learning

被引:1
作者
Kwon, Ryeonggu [1 ]
Kwon, Gihwon [1 ]
Park, Sohee [2 ]
Chang, Jiyoung [2 ]
Jo, Suhee [2 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Suwon 16227, Gyeonggi Do, South Korea
[2] Kyonggi Univ, Dept Software Safety & Cyber Secur, Suwon 16227, Gyeonggi Do, South Korea
关键词
Quantitative model checking; reinforcement learning; safety constraint; non-functional requirement;
D O I
10.1109/ACCESS.2024.3358408
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL) is rapidly used in safety-centric applications. However, many studies focus on generating optimal policy that achieves maximum rewards. While maximum rewards are beneficial, safety constraints and non-functional requirements must also be considered in safety-centric applications to avoid dangerous situations. For example, in the case of food delivery robots in restaurants, RL should be used not only to find optimal policy that response to all customer requests through maximum rewards but also to consider safety constraints such as collision avoidance and non-functional requirements such as battery saving. In this paper, we investigated the fulfillment of safety constraints and non-functional requirements of learning models generated through RL with quantitative model checking. We experimented with various time steps and learning rates required for RL, targeting restaurant delivery robots. The functional requirement of these robots is to process all customer order requests, and the non-functional requirements are the number of steps and battery consumption to complete the task. Safety constraints include the amount of collision and the probability of collision. Through these experiments, we made three important findings. First, learning models that obtain maximum rewards may have a low degree of achievement of non-functional requirements and safety constraints. Second, as safety constraints are met, the degree of achievement of non-functional requirements may be low. Third, even if the maximum reward is not obtained, sacrificing non-functional requirements can maximize the achievement of safety constraints. These results show that learning models generated through RL can trade off rewards to achieve safety constraints. In conclusion, our work can contribute to selecting suitable hyperparameters and optimal learning models during RL.
引用
收藏
页码:18957 / 18971
页数:15
相关论文
共 23 条
  • [1] Drone Deep Reinforcement Learning: A Review
    Azar, Ahmad Taher
    Koubaa, Anis
    Ali Mohamed, Nada
    Ibrahim, Habiba A.
    Ibrahim, Zahra Fathy
    Kazim, Muhammad
    Ammar, Adel
    Benjdira, Bilel
    Khamis, Alaa M.
    Hameed, Ibrahim A.
    Casalino, Gabriella
    [J]. ELECTRONICS, 2021, 10 (09)
  • [2] Synthesizing safe policies under probabilistic constraints with reinforcement learning and Bayesian model checking
    Belzner, Lenz
    Wirsing, Martin
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 2021, 206
  • [3] Brockman G, 2016, Arxiv, DOI [arXiv:1606.01540, DOI 10.48550/ARXIV.1606.01540]
  • [4] Safe reinforcement learning under temporal logic with reward design and quantum action selection
    Cai, Mingyu
    Xiao, Shaoping
    Li, Junchao
    Kan, Zhen
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [5] Multi-Agent Reinforcement Learning: A Review of Challenges and Applications
    Canese, Lorenzo
    Cardarilli, Gian Carlo
    Di Nunzio, Luca
    Fazzolari, Rocco
    Giardino, Daniele
    Re, Marco
    Spano, Sergio
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (11):
  • [6] Ciesinski R, 2004, LECT NOTES COMPUT SC, V2925, P147
  • [7] Temporal logic guided safe model-based reinforcement learning: A hybrid systems approach
    Cohen, Max H.
    Serlin, Zachary
    Leahy, Kevin
    Belta, Calin
    [J]. NONLINEAR ANALYSIS-HYBRID SYSTEMS, 2023, 47
  • [8] Dependability Analysis of Deep Reinforcement Learning based Robotics and Autonomous Systems through Probabilistic Model Checking
    Dong, Yi
    Zhao, Xingyu
    Huang, Xiaowei
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 5171 - 5178
  • [9] Gu R., 2020, Combining modelchecking and reinforcement learning for scalable mission planning ofautonomous agents
  • [10] Henriques D., 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems (QEST 2012), P84, DOI 10.1109/QEST.2012.19