A kernel-based approximate dynamic programming approach: Theory and application

被引:4
作者
Forootani, Ali [1 ]
Iervolino, Raffaele [2 ]
Tipaldi, Massimo [3 ]
Baccari, Silvio [4 ]
机构
[1] Maynooth Univ, Hamilton Inst, Maynooth W23F2K8, Kildare, Ireland
[2] Univ Naples Federico II, Dept Elect Engn & Informat Technol, I-80125 Naples, Italy
[3] Polytech Univ Bari, Dept Elect & Informat Engn, I-70126 Bari, Italy
[4] Univ Campania Luigi Vanvitelli, Dept Math & Phys, I-81100 Caserta, Italy
关键词
Dynamic Programming; Markov Decision Process; Approximate Dynamic Programming; Kernel function; Support Vector Machine; Sensor scheduling;
D O I
10.1016/j.automatica.2024.111517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article proposes a novel kernel -based Dynamic Programming (DP) approximation method to tackle the typical curse of dimensionality of stochastic DP problems over the finite time horizon. Such a method utilizes kernel functions in combination with Support Vector Machine (SVM) regression to determine an approximate cost function for the entire state space of the underlying Markov Decision Process (MDP), by leveraging cost function computed for selected representative states. Kernel functions are used to define the so-called kernel matrix, while the parameter vector of the given kernel -based cost function approximation is computed by moving backwards in time from the terminal condition and by applying SVM regression. This way, the difficulty of selecting a proper set of features is also tackled. The proposed method is then extended to the infinite time horizon case. To show the effectiveness of the proposed approach, the resulting Recursive Residual Approximate Dynamic Programming (RR -ADP) algorithm is applied to the sensor scheduling design in multi -process remote state estimation problems. (c) 2024 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 22 条
[1]   Statistical mechanics of complex networks [J].
Albert, R ;
Barabási, AL .
REVIEWS OF MODERN PHYSICS, 2002, 74 (01) :47-97
[2]  
Bertsekas D.P., 2019, REINFORCEMENT LEARNI, V1st
[3]   Temporal Difference Methods for General Projected Equations [J].
Bertsekas, Dimitri P. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2011, 56 (09) :2128-2139
[4]   Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations [J].
Bertsekas, Dimitri P. .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (01) :1-31
[5]  
Bertsekas DimitriP., 2017, DYNAMIC PROGRAMMING, V1
[6]  
Bhat N., 2023, Stochastic Systems, V25, P1
[7]  
Bishop C., 1995, NEURAL NETWORKS PATT
[8]   Approximate Dynamic Programming via a Smoothed Linear Program [J].
Desai, Vijay V. ;
Farias, Vivek F. ;
Moallemi, Ciamac C. .
OPERATIONS RESEARCH, 2012, 60 (03) :655-674
[9]  
Dietterich TG, 2002, ADV NEUR IN, V14, P1491
[10]   Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming [J].
Forootani, Ali ;
Iervolino, Raffaele ;
Tipaldi, Massimo ;
Dey, Subhrakanti .
AUTOMATICA, 2022, 136