Online TD (λ) for discrete-time Markov jump linear systems

被引:0
作者
Beirigo, R. L. [1 ]
Todorov, M. G. [1 ]
Barreto, A. M. S. [1 ,2 ]
机构
[1] Natl Lab Sci Comp LNCC MCTIC, Av Getulio Vargas 333, BR-25651070 Petropolis, RJ, Brazil
[2] Google DeepMind, London, England
来源
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2018年
关键词
Markov jump linear systems; reinforcement learning; adaptive control; robotics; STABILIZATION; STABILITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new approach for the optimal quadratic control of discrete -time Markov jump linear systems (MJLS), inspired on the temporal differences (TD) concepts of reinforcement learning. The method is online, in the sense that it is able to simultaneously apply and refine the currently available controller, and it is transition model free, because there is no need for explicit knowledge of the Markov chain transition probabilities, provided it can be sampled or simulated. The strategy builds upon a previously proposed offline method and we hope will pave the way for developing and adapting reinforcement learning techniques for MJLS. The method is experimentally evaluated in Samuelson's macroeconomic model and in the control of a faulty robotic manipulator arm, performing favorably when compared to its offline predecessor.
引用
收藏
页码:2229 / 2234
页数:6
相关论文
共 29 条
[1]  
[Anonymous], 2010, MATH METHODS ROBUST
[2]  
[Anonymous], 1996, Neuro-dynamic programming
[3]  
[Anonymous], 2015, Reinforcement Learning: An Introduction
[4]  
Beirigo R. L., 2017, P BRAZ C DYN CONTR A
[5]  
Beirigo R. L., 2017, P 56 IEEE C DEC CONT
[6]  
Beirigo R. L., 2018, AUTOMATICA UNPUB
[7]   FEEDBACK-CONTROL OF A CLASS OF LINEAR DISCRETE SYSTEMS WITH JUMP PARAMETERS AND QUADRATIC COST CRITERIA [J].
BLAIR, WP ;
SWORDER, DD .
INTERNATIONAL JOURNAL OF CONTROL, 1975, 21 (05) :833-841
[8]  
Boukas E.-K., 2007, Stochastic switching systems: analysis and design
[9]   KRONECKER PRODUCTS AND MATRIX CALCULUS IN SYSTEM THEORY [J].
BREWER, JW .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1978, 25 (09) :772-781
[10]  
Cassandras C.G., 2007, STOCHASTIC HYBRID SY