Explainable data-driven Q-learning control for a class of discrete-time linear autonomous systems

被引：0

作者：

Perrusquia, Adolfo ^{[1
]}

Zou, Mengbang ^{[1
]}

Guo, Weisi ^{[1
]}

机构：

[1] Cranfield Univ, Sch Aerosp Transport & Mfg, Bedford MK43 0AL, England

来源：

INFORMATION SCIENCES | 2024年 / 682卷

关键词：

Q-learning; State-transition function; Explainable Q-learning (XQL); Control policy; REINFORCEMENT; IDENTIFICATION;

D O I：

10.1016/j.ins.2024.121283

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Explaining what a reinforcement learning (RL) control agent learns play a crucial role in the safety critical control domain. Most of the approaches in the state-of-the-art focused on imitation learning methods that uncover the hidden reward function of a given control policy. However, these approaches do not uncover what the RL agent learns effectively from the agent-environment interaction. The policy learned by the RL agent depends in how good the state transition mapping is inferred from the data. When the state transition mapping is wrongly inferred implies that the RL agent is not learning properly. This can compromise the safety of the surrounding environment and the agent itself. In this paper, we aim to uncover the elements learned by data-driven RL control agents in a special class of discrete-time linear autonomous systems. Here, the approach aims to add a new explainable dimension to data-driven control approaches to increase their trust and safe deployment. We focus on the classical data-driven Q-learning algorithm and propose an explainable Q-learning (XQL) algorithm that can be further expanded to other data-driven RL control agents. Simulation experiments are conducted to observe the effectiveness of the proposed approach under different scenarios using several discrete-time models of autonomous platforms.

引用

页数：15

共 50 条

[21] Optimal trajectory tracking for uncertain linear discrete-time systems using time-varying Q-learning
Geiger, Maxwell
Narayanan, Vignesh
Jagannathan, Sarangapani
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2024, 38 (07) : 2340 - 2368
[22] An iterative Q-learning scheme for the global stabilization of discrete-time linear systems subject to actuator saturation
Rizvi, Syed Ali Asad
Lin, Zongli
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2019, 29 (09) : 2660 - 2672
[23] A DISCRETE-TIME SWITCHING SYSTEM ANALYSIS OF Q-LEARNING
Lee, Donghwan
Hu, Jianghai
He, Niao
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (03) : 1861 - 1880
[24] Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems
Xue, Wenqian
Lian, Bosen
Fan, Jialu
Kolaric, Patrik
Chai, Tianyou
Lewis, Frank L.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2386 - 2399
[25] Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems
Ikemoto, Junya
Ushio, Toshimitsu
IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2021, 12 (04): : 738 - 757
[26] Optimal tracking control for discrete-time modal persistent dwell time switched systems based on Q-learning
Zhang, Xuewen
Wang, Yun
Xia, Jianwei
Li, Feng
Shen, Hao
OPTIMAL CONTROL APPLICATIONS & METHODS, 2023, 44 (06) : 3327 - 3341
[27] Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems
Li, Jinna
Chai, Tianyou
Lewis, Frank L.
Ding, Zhengtao
Jiang, Yi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1308 - 1320
[28] Kernel methods for the approximation of discrete-time linear autonomous and control systems
Hamzi, Boumediene
Colonius, Fritz
SN APPLIED SCIENCES, 2019, 1 (07):
[29] Kernel methods for the approximation of discrete-time linear autonomous and control systems
Boumediene Hamzi
Fritz Colonius
SN Applied Sciences, 2019, 1
[30] Experience replay-based output feedback Q-learning scheme for optimal output tracking control of discrete-time linear systems
Rizvi, Syed Ali Asad
Lin, Zongli
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2019, 33 (12) : 1825 - 1842

← 1 2 3 4 5 →