A UNIFIED FRAMEWORK FOR LINEAR FUNCTION APPROXIMATION OF VALUE FUNCTIONS IN STOCHASTIC CONTROL

被引:0
|
作者
Sanchez-Fernandez, Matilde [1 ]
Valcarcel, Sergio [2 ]
Zazo, Santiago [2 ]
机构
[1] Univ Carlos III Madrid, Signal Theory & Commun Dept, Av La Univ 30, Leganes 28911, Spain
[2] Univ Politecn Madrid, Signals Syst & Radiocommun Dept, E-28040 Madrid, Spain
来源
2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2013年
关键词
Approximate dynamic programming; Linear value function approximation; Mean squared Bellman Error; Mean squared projected Bellman Error; Reinforcement Learning;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approximated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the proposed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Value-approximation-based online policy for vehicle routing problem with stochastic demand
    Zhang X.-N.
    Zhang J.-X.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2022, 39 (02): : 241 - 254
  • [42] ON CONVERGENCE RATE OF ADAPTIVE MULTISCALE VALUE FUNCTION APPROXIMATION FOR REINFORCEMENT LEARNING
    Li, Tao
    Zhu, Quanyan
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [43] Convergence Rates of Online Critic Value Function Approximation in Native Spaces
    Niu, Shengyuan
    Bouland, Ali
    Wang, Haoran
    Fotiadis, Filippos
    Kurdila, Andrew
    L'Afflitto, Andrea
    Paruchuri, Sai Tej
    Vamvoudakis, Kyriakos G.
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2145 - 2150
  • [44] Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms
    Jakab, Hunor
    Csato, Lehel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT II, 2011, 6792 : 221 - +
  • [45] The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation
    Fairbank, Michael
    Alonso, Eduardo
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [46] Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management
    George, Abraham
    Powell, Warren B.
    Kulkarni, Sanjeev R.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2079 - 2111
  • [47] A New Approach for Value Function Approximation Based on Automatic State Partition
    Zeng, Jiaan
    Han, Yinghua
    IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 208 - 213
  • [48] A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation
    Bhandari, Jalaj
    Russo, Daniel
    Singal, Raghav
    OPERATIONS RESEARCH, 2021, 69 (03) : 950 - 973
  • [49] Gaussian Based Non-linear Function Approximation for Reinforcement Learning
    Haider A.
    Hawe G.
    Wang H.
    Scotney B.
    SN Computer Science, 2021, 2 (3)
  • [50] Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming
    Bertsekas, Dimitri P.
    IFAC PAPERSONLINE, 2024, 58 (18): : 363 - 383