Reinforcement Learning for Stochastic Max-Plus Linear Systems
被引:0
作者:
Subramanian, Vignesh
论文数: 0引用数: 0
h-index: 0
机构:
Georgia Inst Technol, Atlanta, GA 30332 USAGeorgia Inst Technol, Atlanta, GA 30332 USA
Subramanian, Vignesh
[1
]
Farhadi, Farzaneh
论文数: 0引用数: 0
h-index: 0
机构:
Newcastle Univ, Sch Engn, Newcastle Upon Tyne, Tyne & Wear, EnglandGeorgia Inst Technol, Atlanta, GA 30332 USA
Farhadi, Farzaneh
[2
]
Soudjani, Sadegh
论文数: 0引用数: 0
h-index: 0
机构:
Newcastle Univ, Sch Comp, Newcastle Upon Tyne, Tyne & Wear, EnglandGeorgia Inst Technol, Atlanta, GA 30332 USA
Soudjani, Sadegh
[3
]
机构:
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Newcastle Univ, Sch Engn, Newcastle Upon Tyne, Tyne & Wear, England
[3] Newcastle Univ, Sch Comp, Newcastle Upon Tyne, Tyne & Wear, England
来源:
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC
|
2023年
基金:
英国工程与自然科学研究理事会;
关键词:
REACHABILITY ANALYSIS;
D O I:
10.1109/CDC49753.2023.10384207
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
This paper studies the design of control policies for Discrete Event Systems under uncertainties. We capture the timing of the events using the framework of max-plus-linear systems in which the time between consecutive events depends on random delays with unknown distributions. Our policy synthesis approach is with respect to a cost function, and it can be extended directly to satisfy safety specifications on the timing of events. The main novelty of our approach is to translate the system evolution to a Markov decision process (MDP) that has an uncountable state space and develop a stochastic optimisation problem under the evolution of the MDP. To tackle the unknown distribution of uncertainties (thus unknown transition probabilities in the MDP), we employ model-free reinforcement learning to perform optimisations and find control policies for the system. Our implementation results on the 9-dimensional model of a railway network show superiority of our learning approach in comparison with the stochastic model predictive control approach.