Simultaneous task and energy planning using deep reinforcement learning

被引:14
作者
Wang, Di [1 ]
Hu, Mengqi [1 ]
Weir, Jeffery D. [2 ]
机构
[1] Univ Illinois, Dept Mech & Ind Engn, Chicago, IL 60607 USA
[2] Air Force Inst Technol, Dept Operat Sci, Wright Patterson AFB, OH 45433 USA
基金
美国国家科学基金会;
关键词
Simultaneous task and energy planning; Neural combinatorial optimization; Deep reinforcement learning; End-to-end learning; Sequence-to-sequence decision; VEHICLE-ROUTING PROBLEM; ALGORITHM; ENVIRONMENT; NETWORK;
D O I
10.1016/j.ins.2022.06.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To improve energy awareness of unmanned autonomous vehicles, it is critical to co optimize task planning and energy scheduling. To the best of our knowledge, most of the existing task planning algorithms either ignore energy constraints or make energy scheduling decisions based on simple rules. To bridge these research gaps, we propose a combinatorial optimization model for the simultaneous task and energy planning (STEP) problem. In this paper, we propose three variants of STEP problems (i) the vehicle can visit stationary charging stations multiple times at various locations; (ii) the vehicle can efficiently coordinate with mobile charging stations to achieve zero waiting time for recharging, and (iii) the vehicle can maximally harvest solar energy by considering time variance in solar irradiance. Besides, in order to obtain fast and reliable solutions to STEP problems, we propose a neural combinatorial optimizer using the deep reinforcement learning algorithm with a proposed link information filter. The near-optimal solutions can be obtained very fast without solving the problem from scratch when environments change. Our simulation results demonstrate that (i) our proposed neural optimizer can find solutions close to the optimum and outperform the exact and heuristic algorithms in terms of computational cost; (ii) the end-to-end learning (directly mapping from perceptions to control) model outperforms the traditional learning (mapping from perception to prediction to control) model. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:931 / 946
页数:16
相关论文
共 50 条
  • [1] A novel hybrid column generation-metaheuristic approach for the vehicle routing problem with general soft time window
    Beheshti, Ali Kourank
    Hejazi, Seyed Reza
    [J]. INFORMATION SCIENCES, 2015, 316 : 598 - 615
  • [2] Bello I., 2017, Neural combinatorial optimization with reinforcement learning, P1
  • [3] Chorowski J, 2014, End-to-End Continuous Speech Recognition Using Attention-Based Recurrent NN: First Results, P1
  • [4] GGA: A modified genetic algorithm with gradient-based local search for solving constrained optimization problems
    D'Angelo, Gianni
    Palmieri, Francesco
    [J]. INFORMATION SCIENCES, 2021, 547 : 136 - 162
  • [5] A two-step personalized location recommendation based on multi-objective immune algorithm
    Geng, Bingrui
    Jiao, Licheng
    Gong, Maoguo
    Li, Lingling
    Wu, Yue
    [J]. INFORMATION SCIENCES, 2019, 475 : 161 - 181
  • [6] A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
    Grondman, Ivo
    Busoniu, Lucian
    Lopes, Gabriel A. D.
    Babuska, Robert
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1291 - 1307
  • [7] Guo M., 2020, Conference on Robot Learning, P283
  • [8] A new Q-learning algorithm based on the Metropolis criterion
    Guo, MZ
    Liu, Y
    Malec, J
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (05): : 2140 - 2143
  • [9] Driving preference analysis and electricity pricing strategy comparison for electric vehicles in smart city
    Hu, Bingtao
    Feng, Yixiong
    Sun, Jianzhe
    Gao, Yicong
    Tan, Jianrong
    [J]. INFORMATION SCIENCES, 2019, 504 : 202 - 220
  • [10] Joshi ChaitanyaK, 2019, NEURIPS 2019 GRAPH R