A UNIFIED FRAMEWORK FOR LINEAR FUNCTION APPROXIMATION OF VALUE FUNCTIONS IN STOCHASTIC CONTROL

被引:0
|
作者
Sanchez-Fernandez, Matilde [1 ]
Valcarcel, Sergio [2 ]
Zazo, Santiago [2 ]
机构
[1] Univ Carlos III Madrid, Signal Theory & Commun Dept, Av La Univ 30, Leganes 28911, Spain
[2] Univ Politecn Madrid, Signals Syst & Radiocommun Dept, E-28040 Madrid, Spain
来源
2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2013年
关键词
Approximate dynamic programming; Linear value function approximation; Mean squared Bellman Error; Mean squared projected Bellman Error; Reinforcement Learning;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approximated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the proposed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Linear stochastic approximation driven by slowly varying Markov chains
    Konda, VR
    Tsitsiklis, JN
    SYSTEMS & CONTROL LETTERS, 2003, 50 (02) : 95 - 102
  • [22] A unified framework to control estimation error in reinforcement learning
    Zhang, Yujia
    Li, Lin
    Wei, Wei
    Lv, Yunpeng
    Liang, Jiye
    NEURAL NETWORKS, 2024, 178
  • [23] Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
    Mahadevan, Sridhar
    Maggioni, Mauro
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2169 - 2231
  • [24] Greedy feature replacement for online value function approximation
    Zhao, Feng-fei
    Qin, Zheng
    Shao, Zhuo
    Fang, Jun
    Ren, Bo-yan
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2014, 15 (03): : 223 - 231
  • [25] Greedy feature replacement for online value function approximation
    Feng-fei Zhao
    Zheng Qin
    Zhuo Shao
    Jun Fang
    Bo-yan Ren
    Journal of Zhejiang University SCIENCE C, 2014, 15 : 223 - 231
  • [26] Greedy feature replacement for online value function approximation
    Feng-fei ZHAO
    Zheng QIN
    Zhuo SHAO
    Jun FANG
    Bo-yan REN
    Journal of Zhejiang University-Science C(Computers & Electronics), 2014, 15 (03) : 223 - 231
  • [27] Robust Approximate Bilinear Programming for Value Function Approximation
    Petrik, Marek
    Zilberstein, Shlomo
    JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 3027 - 3063
  • [28] Multiscale Q-learning with linear function approximation
    Shalabh Bhatnagar
    K. Lakshmanan
    Discrete Event Dynamic Systems, 2016, 26 : 477 - 509
  • [29] Provably Efficient Reinforcement Learning with Linear Function Approximation
    Jin, Chi
    Yang, Zhuoran
    Wang, Zhaoran
    Jordan, Michael, I
    MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
  • [30] Multiscale Q-learning with linear function approximation
    Bhatnagar, Shalabh
    Lakshmanan, K.
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2016, 26 (03): : 477 - 509