A tutorial on value function approximation for stochastic and dynamic transportation

被引：0

作者：

Heinold, Arne ^{[1
]}

机构：

[1] Univ Kiel, Sch Econ & Business, Kiel, Germany

来源：

4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH | 2024年 / 22卷 / 01期

关键词：

Tutorial; Markov decision process; Approximate dynamic programming; Value function approximation; Reinforcement learning; OPTIMIZATION;

D O I：

10.1007/s10288-023-00539-3

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper provides an introductory tutorial on Value Function Approximation (VFA), a solution class from Approximate Dynamic Programming. VFA describes a heuristic way for solving sequential decision processes like a Markov Decision Process. Real-world problems in supply chain management (and beyond) containing dynamic and stochastic elements might be modeled as such processes, but large-scale instances are intractable to be solved to optimality by enumeration due to the curses of dimensionality. VFA can be a proper method for these cases and this tutorial is designed to ease its use in research, practice, and education. For this, the tutorial describes VFA in the context of stochastic and dynamic transportation and makes three main contributions. First, it gives a concise theoretical overview of VFA's fundamental concepts, outlines a generic VFA algorithm, and briefly discusses advanced topics of VFA. Second, the VFA algorithm is applied to the taxicab problem that describes an easy-to-understand transportation planning task. Detailed step-by-step results are presented for a small-scale instance, allowing readers to gain an intuition about VFA's main principles. Third, larger instances are solved by enhancing the basic VFA algorithm demonstrating its general capability to approach more complex problems. The experiments are done with artificial instances and the respective Python scripts are part of an electronic appendix. Overall, the tutorial provides the necessary knowledge to apply VFA to a wide range of stochastic and dynamic settings and addresses likewise researchers, lecturers, tutors, students, and practitioners.

引用

页码：145 / 173

页数：29

共 50 条

[1] A tutorial on value function approximation for stochastic and dynamic transportation
Arne Heinold
4OR, 2024, 22 : 145 - 173
[2] Primal-Dual Value Function Approximation for Stochastic Dynamic Intermodal Transportation with Eco-Labels
Heinold, Arne
Meisel, Frank
Ulmer, Marlin W.
TRANSPORTATION SCIENCE, 2023, 57 (06) : 1452 - 1472
[3] Adaptive value function approximation for continuous-state stochastic dynamic programming
Fan, Huiyuan
Tarun, Prashant K.
Chen, Victoria C. P.
COMPUTERS & OPERATIONS RESEARCH, 2013, 40 (04) : 1076 - 1084
[4] Controlled approximation of the value function in stochastic dynamic programming for multi-reservoir systems
Zéphyr L.
Lang P.
Lamond B.F.
Computational Management Science, 2015, 12 (4) : 539 - 557
[5] A UNIFIED FRAMEWORK FOR LINEAR FUNCTION APPROXIMATION OF VALUE FUNCTIONS IN STOCHASTIC CONTROL
Sanchez-Fernandez, Matilde
Valcarcel, Sergio
Zazo, Santiago
2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
[6] Geodesic Gaussian kernels for value function approximation
Sugiyama, Masashi
Hachiya, Hirotaka
Towell, Christopher
Vijayakumar, Sethu
AUTONOMOUS ROBOTS, 2008, 25 (03) : 287 - 304
[7] Geodesic Gaussian kernels for value function approximation
Masashi Sugiyama
Hirotaka Hachiya
Christopher Towell
Sethu Vijayakumar
Autonomous Robots, 2008, 25 : 287 - 304
[8] Optimized ensemble value function approximation for dynamic programming
Cervellera, Cristiano
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 309 (02) : 719 - 730
[9] Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation
Zhu, Xinyu
Huang, Yang
Wang, Shaoyu
Wu, Qihui
Ge, Xiaohu
Liu, Yuan
Gao, Zhen
IEEE WIRELESS COMMUNICATIONS LETTERS, 2023, 12 (02) : 386 - 390
[10] Meso-parametric value function approximation for dynamic customer acceptances in delivery routing
Ulmer, Marlin W.
Thomas, Barrett W.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 285 (01) : 183 - 195

← 1 2 3 4 5 →