Explainable data-driven Q-learning control for a class of discrete-time linear autonomous systems

被引:0
|
作者
Perrusquia, Adolfo [1 ]
Zou, Mengbang [1 ]
Guo, Weisi [1 ]
机构
[1] Cranfield Univ, Sch Aerosp Transport & Mfg, Bedford MK43 0AL, England
关键词
Q-learning; State-transition function; Explainable Q-learning (XQL); Control policy; REINFORCEMENT; IDENTIFICATION;
D O I
10.1016/j.ins.2024.121283
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Explaining what a reinforcement learning (RL) control agent learns play a crucial role in the safety critical control domain. Most of the approaches in the state-of-the-art focused on imitation learning methods that uncover the hidden reward function of a given control policy. However, these approaches do not uncover what the RL agent learns effectively from the agent-environment interaction. The policy learned by the RL agent depends in how good the state transition mapping is inferred from the data. When the state transition mapping is wrongly inferred implies that the RL agent is not learning properly. This can compromise the safety of the surrounding environment and the agent itself. In this paper, we aim to uncover the elements learned by data-driven RL control agents in a special class of discrete-time linear autonomous systems. Here, the approach aims to add a new explainable dimension to data-driven control approaches to increase their trust and safe deployment. We focus on the classical data-driven Q-learning algorithm and propose an explainable Q-learning (XQL) algorithm that can be further expanded to other data-driven RL control agents. Simulation experiments are conducted to observe the effectiveness of the proposed approach under different scenarios using several discrete-time models of autonomous platforms.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Optimal trajectory tracking for uncertain linear discrete-time systems using time-varying Q-learning
    Geiger, Maxwell
    Narayanan, Vignesh
    Jagannathan, Sarangapani
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2024, 38 (07) : 2340 - 2368
  • [22] An iterative Q-learning scheme for the global stabilization of discrete-time linear systems subject to actuator saturation
    Rizvi, Syed Ali Asad
    Lin, Zongli
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2019, 29 (09) : 2660 - 2672
  • [23] A DISCRETE-TIME SWITCHING SYSTEM ANALYSIS OF Q-LEARNING
    Lee, Donghwan
    Hu, Jianghai
    He, Niao
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (03) : 1861 - 1880
  • [24] Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems
    Xue, Wenqian
    Lian, Bosen
    Fan, Jialu
    Kolaric, Patrik
    Chai, Tianyou
    Lewis, Frank L.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2386 - 2399
  • [25] Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems
    Ikemoto, Junya
    Ushio, Toshimitsu
    IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2021, 12 (04): : 738 - 757
  • [26] Optimal tracking control for discrete-time modal persistent dwell time switched systems based on Q-learning
    Zhang, Xuewen
    Wang, Yun
    Xia, Jianwei
    Li, Feng
    Shen, Hao
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2023, 44 (06) : 3327 - 3341
  • [27] Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems
    Li, Jinna
    Chai, Tianyou
    Lewis, Frank L.
    Ding, Zhengtao
    Jiang, Yi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1308 - 1320
  • [28] Kernel methods for the approximation of discrete-time linear autonomous and control systems
    Hamzi, Boumediene
    Colonius, Fritz
    SN APPLIED SCIENCES, 2019, 1 (07):
  • [29] Kernel methods for the approximation of discrete-time linear autonomous and control systems
    Boumediene Hamzi
    Fritz Colonius
    SN Applied Sciences, 2019, 1
  • [30] Experience replay-based output feedback Q-learning scheme for optimal output tracking control of discrete-time linear systems
    Rizvi, Syed Ali Asad
    Lin, Zongli
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2019, 33 (12) : 1825 - 1842