Explainable data-driven Q-learning control for a class of discrete-time linear autonomous systems

被引:0
|
作者
Perrusquia, Adolfo [1 ]
Zou, Mengbang [1 ]
Guo, Weisi [1 ]
机构
[1] Cranfield Univ, Sch Aerosp Transport & Mfg, Bedford MK43 0AL, England
关键词
Q-learning; State-transition function; Explainable Q-learning (XQL); Control policy; REINFORCEMENT; IDENTIFICATION;
D O I
10.1016/j.ins.2024.121283
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Explaining what a reinforcement learning (RL) control agent learns play a crucial role in the safety critical control domain. Most of the approaches in the state-of-the-art focused on imitation learning methods that uncover the hidden reward function of a given control policy. However, these approaches do not uncover what the RL agent learns effectively from the agent-environment interaction. The policy learned by the RL agent depends in how good the state transition mapping is inferred from the data. When the state transition mapping is wrongly inferred implies that the RL agent is not learning properly. This can compromise the safety of the surrounding environment and the agent itself. In this paper, we aim to uncover the elements learned by data-driven RL control agents in a special class of discrete-time linear autonomous systems. Here, the approach aims to add a new explainable dimension to data-driven control approaches to increase their trust and safe deployment. We focus on the classical data-driven Q-learning algorithm and propose an explainable Q-learning (XQL) algorithm that can be further expanded to other data-driven RL control agents. Simulation experiments are conducted to observe the effectiveness of the proposed approach under different scenarios using several discrete-time models of autonomous platforms.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis
    Wei, Qinglai
    Lewis, Frank L.
    Sun, Qiuye
    Yan, Pengfei
    Song, Ruizhuo
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (05) : 1224 - 1237
  • [32] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    Wei QingLai
    Liu DeRong
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 15
  • [33] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    WEI QingLai
    LIU DeRong
    ScienceChina(InformationSciences), 2015, 58 (12) : 147 - 161
  • [34] A data-driven approach to actuator and sensor fault detection, isolation and estimation in discrete-time linear systems
    Naderi, Esmaeil
    Khorasani, K.
    AUTOMATICA, 2017, 85 : 165 - 178
  • [35] A Q-LEARNING ALGORITHM FOR DISCRETE-TIME LINEAR-QUADRATIC CONTROL WITH RANDOM PARAMETERS OF UNKNOWN DISTRIBUTION: CONVERGENCE AND STABILIZATION
    DU, K. A., I
    Meng, Q. I. N. G. X. I. N.
    Zhang, F. U.
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2022, 60 (04) : 1991 - 2015
  • [36] Reinforcement Q-Learning and Non-Zero-Sum Games Optimal Tracking Control for Discrete-Time Linear Multi-Input Systems
    Zhao, Jin-Gang
    2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS, 2023, : 277 - 282
  • [37] Safety-Critical Optimal Control of Discrete-Time Non-Linear Systems via Policy Iteration-Based Q-Learning
    Long, Lijun
    Liu, Xiaomei
    Huang, Xiaomin
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2025,
  • [38] Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems
    Luo, Biao
    Yang, Yin
    Liu, Derong
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (07) : 3630 - 3640
  • [39] A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems
    Lin, Mingduo
    Liu, Derong
    Zhao, Bo
    Dai, Qionghai
    Dong, Yi
    2019 9TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST2019), 2019, : 6 - 10
  • [40] Adaptive Observer Based Data-Driven Control for Nonlinear Discrete-Time Processes
    Xu, Dezhi
    Jiang, Bin
    Shi, Peng
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2014, 11 (04) : 1037 - 1045