Experience replay-based output feedback Q-learning scheme for optimal output tracking control of discrete-time linear systems

被引:7
作者
Rizvi, Syed Ali Asad [1 ]
Lin, Zongli [1 ]
机构
[1] Univ Virginia, Charles L Brown Dept Elect & Comp Engn, Charlottesville, VA 22904 USA
关键词
discounting factor; optimal tracking; output feedback; Q-learning; ADAPTIVE OPTIMAL-CONTROL; TRAJECTORY TRACKING; DESIGN; LEADER; MRAC;
D O I
10.1002/acs.2981
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on solving the adaptive optimal tracking control problem for discrete-time linear systems with unknown system dynamics using output feedback. A Q-learning-based optimal adaptive control scheme is presented to learn the feedback and feedforward control parameters of the optimal tracking control law. The optimal feedback parameters are learned using the proposed output feedback Q-learning Bellman equation, whereas the estimation of the optimal feedforward control parameters is achieved using an adaptive algorithm that guarantees convergence to zero of the tracking error. The proposed method has the advantage that it is not affected by the exploration noise bias problem and does not require a discounting factor, relieving the two bottlenecks in the past works in achieving stability guarantee and optimal asymptotic tracking. Furthermore, the proposed scheme employs the experience replay technique for data-driven learning, which is data efficient and relaxes the persistence of excitation requirement in learning the feedback control parameters. It is shown that the learned feedback control parameters converge to the optimal solution of the Riccati equation and the feedforward control parameters converge to the solution of the Sylvester equation. Simulation studies on two practical systems have been carried out to show the effectiveness of the proposed scheme.
引用
收藏
页码:1825 / 1842
页数:18
相关论文
共 54 条
[1]   Experience Replay for Real-Time Reinforcement Learning Control [J].
Adam, Sander ;
Busoniu, Lucian ;
Babuska, Robert .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02) :201-212
[2]  
[Anonymous], 2008, P AIAA GUID NAV CONT
[3]  
Bertsekas D. P., 1996, Neuro-Dynamic Programming
[4]  
Bradtke SJ, 1994, P 1994 AM CONTR C BA
[5]  
Choi Y., 2004, PID TRAJECTORY TRACK, V298
[6]  
Chowdhary G, 2011, P 2011 AM CONTR C SA
[7]   Concurrent learning adaptive control of linear systems with exponentially convergent bounds [J].
Chowdhary, Girish ;
Yucelen, Tansel ;
Muehlegg, Maximillian ;
Johnson, Eric N. .
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2013, 27 (04) :280-301
[8]  
Encarnacao P, 2001, P 40 IEEE C DEC CONT
[9]   A New Method for Robust Damping and Tracking Control of Scanning Probe Microscope Positioning Stages [J].
Fleming, Andrew J. ;
Aphale, Sumeet S. ;
Moheimani, S. O. Reza .
IEEE TRANSACTIONS ON NANOTECHNOLOGY, 2010, 9 (04) :438-448
[10]   Adaptive tracking control of a nonholonomic mobile robot [J].
Fukao, T ;
Nakagawa, H ;
Adachi, N .
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 2000, 16 (05) :609-615