A Lyapunov approach for stable reinforcement learning

被引:0
作者
Julio B. Clempner
机构
[1] Escuela Superior de Física y Matemáticas (School of Physics and Mathematics),
[2] Instituto Politécnico Nacional (National Polytechnic Institute),undefined
来源
Computational and Applied Mathematics | 2022年 / 41卷
关键词
Reinforcement learning; Lyapunov; Architecture; Average cost; Markov chains; Optimization; 37M25; 46N10; 65C40; 60J20;
D O I
暂无
中图分类号
学科分类号
摘要
Our strategy is based on a novel reinforcement-learning (RL) Lyapunov methodology. We propose a method for constructing Lyapunov-like functions using a feed-forward Markov decision process. These functions are important for assuring the stability of a behavior policy throughout the learning process. We show that the cost sequence, which corresponds to the best approach, is frequently non-monotonic, implying that convergence cannot be guaranteed. For any Markov-ergodic process, our technique generates a Lyapunov-like function, implying an one-to-one correspondence between the present cost-function and the suggested function, resulting in a monotonically non-increase behavior on the trajectories under optimum strategy realization. We show that the system’s dynamics and trajectory converge. We show how to employ the Lyapunov technique to solve RL problems. We explain how to employ the Lyapunov method to RL. We test the proposed approach to demonstrate its efficacy.
引用
收藏
相关论文
共 31 条
[1]  
Abualigah LM(2017)Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering J Supercomput 73 4773-4795
[2]  
Khader AT(2017)A novel hybridization strategy for krill herd algorithm applied to clustering techniques Appl Soft Comput 60 423-435
[3]  
Abualigah LM(2019)Controller exploitation-exploration: a reinforcement learning architecture Soft Comput 23 3591-3604
[4]  
Khader AT(2011)Convergence method, properties and computational complexity for Lyapunov games Int J Appl Math Comput Sci 21 349-361
[5]  
Hanandeh ES(2014)Simple computing of the customer lifetime value: a fixed local-optimal policy approach J Syst Sci Syst Eng 23 439-459
[6]  
Gandomi AH(2016)Convergence analysis for pure and stationary strategies in repeated potential games: Nash, Lyapunov and correlated equilibria Expert Syst Appl 46 474-484
[7]  
Asiain E(2021)Toward optimal probabilistic active learning using a Bayesian approach Mach Learn 110 1199-1231
[8]  
Clempner JB(2002)Lyapunov design for safe reinforcement learning J Mach Learn Res 3 803-832
[9]  
Poznyak AS(2018)Measuring the emotional distance using game theory via reinforcement learning: a Kullback–Leibler divergence approach Expert Syst Appl 87 266-275
[10]  
Clempner JB(2020)Learning macroscopic parameters in nonlinear multiscale simulations using nonlocal multicontinua upscaling techniques J Comput Phys 412 undefined-undefined