On the effect of probing noise in optimal control LQR via Q-learning using adaptive filtering algorithms

被引:2
作者
Lopez Yanez, Williams Jesus [1 ]
de Souza, Francisco das Chagas [1 ]
机构
[1] Fed Univ Maranhao UFMA, Dept Elect Engn, Adapt Syst & Signal Proc Lab LSAPS, BR-65080805 Sao Luis, Maranhao, Brazil
关键词
Optimal control; Linear quadratic regulator; Q-learning; Probing noise; Adaptive algorithm; REINFORCEMENT; CONVERGENCE;
D O I
10.1016/j.ejcon.2022.100633
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The probing noise is essential to obtain the persistence of excitation condition for solving Bellman equations in optimal control problems based on reinforcement learning. However, the level of probing noise affects the system state and the input control signal. The Bellman equation solution can be obtained using the classical adaptive filtering algorithms, namely, normalized least-mean-square (NLMS) and recursive-least-squares (RLS); thus, taking into account this kind of solution, in this paper, we present an analysis on the effect of probing noise in the state and input of the dynamic system, considering the linear quadratic regulator (LQR) problem using Q-learning policy iteration. In our analysis, a closed formula for the autocovariance matrices of the system state and the input are obtained, showing that their norms are proportional to the variance of the probing noise during the learning process. Numerical experiments show the performance of Q-learning policy iteration method based on NLMS and RLS for different variances of probing noise. (c) 2022 European Control Association. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
[1]   Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
AUTOMATICA, 2007, 43 (03) :473-481
[2]  
[Anonymous], 1997, Adv. Psychol.
[3]  
[Anonymous], 1985, THESIS
[4]  
Astrom K., 2008, Adaptive control, V2nd
[5]   Missile defense and interceptor allocation by neuro-dynamic programming [J].
Bertsekas, DP ;
Homer, ML ;
Logan, DA ;
Patek, SD ;
Sandell, NR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (01) :42-51
[7]  
Bradtke S.J., 1994, Adaptive linear quadratic control using policy iteration
[8]  
Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f
[9]   Reinforcement Learning in Economics and Finance [J].
Charpentier, Arthur ;
Elie, Romuald ;
Remlinger, Carl .
COMPUTATIONAL ECONOMICS, 2023, 62 (01) :425-462
[10]  
Chen C. T., 1984, LINEAR SYSTEM THEORY