Deriving Explicit Control Policies for Markov Decision Processes Using Symbolic Regression

被引:2
作者
Hristov, A. [1 ]
Bosman, J. W. [1 ]
Bhulai, S. [2 ]
van der Mei, R. D. [1 ]
机构
[1] Ctr Math & Comp Sci, Stochast Grp, Amsterdam, Netherlands
[2] Vrije Univ Amsterdam, Dept Math, Amsterdam, Netherlands
来源
PROCEEDINGS OF THE 13TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS ( VALUETOOLS 2020) | 2020年
关键词
Markov Decision Processes; Genetic program; Symbolic regression; Threshold-type policy; Optimal control; Closedform approximation;
D O I
10.1145/3388831.3388840
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we introduce a novel approach to optimizing the control of systems that can be modeled as Markov decision processes (MDPs) with a threshold-based optimal policy. Our method is based on a specific type of genetic program known as symbolic regression (SR). We present how the performance of this program can be greatly improved by taking into account the corresponding MDP framework in which we apply it. The proposed method has two main advantages: (1) it results in near-optimal decision policies, and (2) in contrast to other algorithms, it generates closed-form approximations. Obtaining an explicit expression for the decision policy gives the opportunity to conduct sensitivity analysis, and allows instant calculation of a new threshold function for any change in the parameters. We emphasize that the introduced technique is highly general and applicable to MDPs that have a threshold-based policy. Extensive experimentation demonstrates the usefulness of the method.
引用
收藏
页码:41 / 47
页数:7
相关论文
共 50 条
[31]   Stochastic Primal-Dual Method for Learning Mixture Policies in Markov Decision Processes [J].
Khuzani, Masoud Badiei ;
Vasudevan, Varun ;
Ren, Hongyi ;
Xing, Lei .
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, :1293-1300
[32]   On the empirical state-action frequencies in Markov decision processes under general policies [J].
Mannor, S ;
Tsitsiklis, JN .
MATHEMATICS OF OPERATIONS RESEARCH, 2005, 30 (03) :545-561
[33]   Adaptive Maintenance Policies for Aging Devices Using a Markov Decision Process [J].
Abeygunawardane, Saranga K. ;
Jirutitijaroen, Panida ;
Xu, Huan .
IEEE TRANSACTIONS ON POWER SYSTEMS, 2013, 28 (03) :3194-3203
[34]   Fuzzy Emulated Symbolic Regression for Modelling and Control of Markov Jump Systems With Unknown Transition Rates [J].
Beyhan, Selami .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (03) :1352-1356
[35]   Stability-constrained Markov Decision Processes using MPC [J].
Zanon, Mario ;
Gros, Sebastien ;
Palladino, Michele .
AUTOMATICA, 2022, 143
[36]   A note on cultivation management with sensors using markov decision processes [J].
Maeda Y. .
IEEJ Transactions on Electronics, Information and Systems, 2021, 141 (03) :400-401
[37]   Online Markov Decision Processes With Kullback-Leibler Control Cost [J].
Guan, Peng ;
Raginsky, Maxim ;
Willett, Rebecca M. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (06) :1423-1438
[38]   AVERAGE COST OPTIMALITY INEQUALITY FOR MARKOV DECISION PROCESSES WITH BOREL SPACES AND UNIVERSALLY MEASURABLE POLICIES [J].
Yu, Huizhen .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2020, 58 (04) :2469-2502
[39]   Reconfigurable Digital Channelizer Design Using Factored Markov Decision Processes [J].
Adrian Sapio ;
Lin Li ;
Jiahao Wu ;
Marilyn Wolf ;
Shuvra S. Bhattacharyya .
Journal of Signal Processing Systems, 2018, 90 :1329-1343
[40]   A dynamic generation algorithm for Meta Process using Markov Decision Processes [J].
Sui, Qi ;
Wang, Hai-yang .
FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, :589-+