A tutorial on value function approximation for stochastic and dynamic transportation

被引:0
|
作者
Heinold, Arne [1 ]
机构
[1] Univ Kiel, Sch Econ & Business, Kiel, Germany
来源
4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH | 2024年 / 22卷 / 01期
关键词
Tutorial; Markov decision process; Approximate dynamic programming; Value function approximation; Reinforcement learning; OPTIMIZATION;
D O I
10.1007/s10288-023-00539-3
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper provides an introductory tutorial on Value Function Approximation (VFA), a solution class from Approximate Dynamic Programming. VFA describes a heuristic way for solving sequential decision processes like a Markov Decision Process. Real-world problems in supply chain management (and beyond) containing dynamic and stochastic elements might be modeled as such processes, but large-scale instances are intractable to be solved to optimality by enumeration due to the curses of dimensionality. VFA can be a proper method for these cases and this tutorial is designed to ease its use in research, practice, and education. For this, the tutorial describes VFA in the context of stochastic and dynamic transportation and makes three main contributions. First, it gives a concise theoretical overview of VFA's fundamental concepts, outlines a generic VFA algorithm, and briefly discusses advanced topics of VFA. Second, the VFA algorithm is applied to the taxicab problem that describes an easy-to-understand transportation planning task. Detailed step-by-step results are presented for a small-scale instance, allowing readers to gain an intuition about VFA's main principles. Third, larger instances are solved by enhancing the basic VFA algorithm demonstrating its general capability to approach more complex problems. The experiments are done with artificial instances and the respective Python scripts are part of an electronic appendix. Overall, the tutorial provides the necessary knowledge to apply VFA to a wide range of stochastic and dynamic settings and addresses likewise researchers, lecturers, tutors, students, and practitioners.
引用
收藏
页码:145 / 173
页数:29
相关论文
共 50 条
  • [21] Continuous Control With Swarm Intelligence Based Value Function Approximation
    Wang, Bi
    Li, Xuelian
    Chen, Yang
    Wu, Jianqing
    Zeng, Bowen
    Chen, Junfu
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (01) : 976 - 988
  • [22] Controller design and value function approximation for nonlinear dynamical systems
    Korda, Milan
    Henrion, Didier
    Jones, Colin N.
    AUTOMATICA, 2016, 67 : 54 - 66
  • [23] Distributed Value Function Approximation for Collaborative Multiagent Reinforcement Learning
    Stankovic, Milos S.
    Beko, Marko
    Stankovic, Srdjan S.
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (03): : 1270 - 1280
  • [24] High-Order Taylor Expansion-Based Nonlinear Value Function Approximation for Stochastic Economic Dispatch of Active Distribution Network
    Luo, Yuhao
    Zhu, Jianquan
    Chen, Jiajun
    Wu, Ruibing
    Huang, Haojiang
    Liu, Wenhao
    Liu, Mingbo
    IEEE TRANSACTIONS ON SMART GRID, 2024, 15 (05) : 4511 - 4521
  • [25] Value-approximation-based online policy for vehicle routing problem with stochastic demand
    Zhang X.-N.
    Zhang J.-X.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2022, 39 (02): : 241 - 254
  • [26] Stochastic search for a parametric cost function approximation: Energy storage with rolling forecasts
    Ghadimi, Saeed
    Powell, Warren B.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 312 (02) : 641 - 652
  • [27] Algorithmic Survey of Parametric Value Function Approximation
    Geist, Matthieu
    Pietquin, Olivier
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (06) : 845 - 867
  • [28] Review of Stochastic Dynamic Vehicle Routing in the Evolving Urban Logistics Environment
    Mardesic, Nikola
    Erdelic, Tomislav
    Caric, Tonci
    Durasevic, Marko
    MATHEMATICS, 2024, 12 (01)
  • [29] Efficient exploration through active learning for value function approximation in reinforcement learning
    Akiyama, Takayuki
    Hachiya, Hirotaka
    Sugiyama, Masashi
    NEURAL NETWORKS, 2010, 23 (05) : 639 - 648
  • [30] Least Absolute Policy Iteration-A Robust Approach to Value Function Approximation
    Sugiyama, Masashi
    Hachiya, Hirotaka
    Kashima, Hisashi
    Morimura, Tetsuro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09) : 2555 - 2565