A UNIFIED FRAMEWORK FOR LINEAR FUNCTION APPROXIMATION OF VALUE FUNCTIONS IN STOCHASTIC CONTROL

被引：0

作者：

Sanchez-Fernandez, Matilde ^{[1
]}

Valcarcel, Sergio ^{[2
]}

Zazo, Santiago ^{[2
]}

机构：

[1] Univ Carlos III Madrid, Signal Theory & Commun Dept, Av La Univ 30, Leganes 28911, Spain

[2] Univ Politecn Madrid, Signals Syst & Radiocommun Dept, E-28040 Madrid, Spain

来源：

2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2013年

关键词：

Approximate dynamic programming; Linear value function approximation; Mean squared Bellman Error; Mean squared projected Bellman Error; Reinforcement Learning;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approximated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the proposed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.

引用

页数：5

共 50 条

[41] Value-approximation-based online policy for vehicle routing problem with stochastic demand
Zhang X.-N.
Zhang J.-X.
Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2022, 39 (02): : 241 - 254
[42] ON CONVERGENCE RATE OF ADAPTIVE MULTISCALE VALUE FUNCTION APPROXIMATION FOR REINFORCEMENT LEARNING
Li, Tao
Zhu, Quanyan
2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
[43] Convergence Rates of Online Critic Value Function Approximation in Native Spaces
Niu, Shengyuan
Bouland, Ali
Wang, Haoran
Fotiadis, Filippos
Kurdila, Andrew
L'Afflitto, Andrea
Paruchuri, Sai Tej
Vamvoudakis, Kyriakos G.
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2145 - 2150
[44] Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms
Jakab, Hunor
Csato, Lehel
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT II, 2011, 6792 : 221 - +
[45] The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation
Fairbank, Michael
Alonso, Eduardo
2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
[46] Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management
George, Abraham
Powell, Warren B.
Kulkarni, Sanjeev R.
JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2079 - 2111
[47] A New Approach for Value Function Approximation Based on Automatic State Partition
Zeng, Jiaan
Han, Yinghua
IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 208 - 213
[48] A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation
Bhandari, Jalaj
Russo, Daniel
Singal, Raghav
OPERATIONS RESEARCH, 2021, 69 (03) : 950 - 973
[49] Gaussian Based Non-linear Function Approximation for Reinforcement Learning
Haider A.
Hawe G.
Wang H.
Scotney B.
SN Computer Science, 2021, 2 (3)
[50] Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming
Bertsekas, Dimitri P.
IFAC PAPERSONLINE, 2024, 58 (18): : 363 - 383

← 1 2 3 4 5 →