Approximate dynamic programming for the military inventory routing problem

被引:0
作者
Rebekah S. McKenna
Matthew J. Robbins
Brian J. Lunday
Ian M. McCormack
机构
[1] Air Force Institute of Technology,Department of Operational Sciences
来源
Annals of Operations Research | 2020年 / 288卷
关键词
Inventory routing problem; Markov decision processes; Approximate dynamic programming; Least squares temporal differences; Military;
D O I
暂无
中图分类号
学科分类号
摘要
The United States Army can benefit from effectively utilizing cargo unmanned aerial vehicles (CUAVs) to perform resupply operations in combat environments to reduce the use of manned (ground and aerial) resupply that incurs risk to personnel. We formulate a Markov decision process (MDP) model of an inventory routing problem (IRP) with vehicle loss and direct delivery, which we label the military IRP (MILIRP). The objective of the MILIRP is to determine CUAV dispatching and routing policies for the resupply of geographically dispersed units operating in an austere, combat environment. The large size of the problem instance motivating this research renders dynamic programming algorithms inappropriate, so we utilize approximate dynamic programming (ADP) methods to attain improved policies (relative to a benchmark policy) via an approximate policy iteration algorithmic strategy utilizing least squares temporal differencing for policy evaluation. We examine a representative problem instance motivated by resupply operations experienced by the United States Army in Afghanistan both to demonstrate the applicability of our MDP model and to examine the efficacy of our proposed ADP solution methodology. A designed computational experiment enables the examination of selected problem features and algorithmic features vis-à-vis the quality of solutions attained by our ADP policies. Results indicate that a 4-crew, 8-CUAV unit is able to resupply 57% of the demand from an 800-person organization over a 3-month time horizon when using the ADP policy, a notable improvement over the 18% attained using a benchmark policy. Such results inform the development of procedures governing the design, development, and utilization of CUAV assets for the resupply of dispersed ground combat forces.
引用
收藏
页码:391 / 416
页数:25
相关论文
共 32 条
  • [1] Barr RS(1995)Designing and reporting on computational experiments with heuristic methods Journal of Heuristics 1 9-32
  • [2] Golden BL(2011)Approximate policy iteration: A survey and some new methods Journal of Control Theory and Applications 9 310-335
  • [3] Kelly JP(1996)Linear least-squares algorithms for temporal difference learning Machine Learning 22 33-57
  • [4] Resende MG(2012)Thirty years of inventory routing Transportation Science 48 1-19
  • [5] Stewart J(2017)Approximate dynamic programming for missile defense interceptor fire control European Journal of Operational Research 259 873-886
  • [6] William R(2002)The stochastic inventory routing problem with direct deliveries Transportation Science 36 94-70
  • [7] Bertsekas DP(2004)Dynamic programming approximations for a stochastic inventory routing problem Transportation Science 38 42-1149
  • [8] Bradtke SJ(2003)Least-squares policy iteration The Journal of Machine Learning Research 4 1107-749
  • [9] Barto AG(2010)Disruption management of the vehicle routing problem with vehicle breakdown Journal of the Operational Research Society 62 742-38
  • [10] Coelho LC(2012)Perspectives of approximate dynamic programming Annals of Operations Research 13 1-839