Simultaneous task and energy planning using deep reinforcement learning

被引：14

作者：

Wang, Di ^{[1
]}

Hu, Mengqi ^{[1
]}

Weir, Jeffery D. ^{[2
]}

机构：

[1] Univ Illinois, Dept Mech & Ind Engn, Chicago, IL 60607 USA

[2] Air Force Inst Technol, Dept Operat Sci, Wright Patterson AFB, OH 45433 USA

来源：

INFORMATION SCIENCES | 2022年 / 607卷

基金：

美国国家科学基金会;

关键词：

Simultaneous task and energy planning; Neural combinatorial optimization; Deep reinforcement learning; End-to-end learning; Sequence-to-sequence decision; VEHICLE-ROUTING PROBLEM; ALGORITHM; ENVIRONMENT; NETWORK;

D O I：

10.1016/j.ins.2022.06.015

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To improve energy awareness of unmanned autonomous vehicles, it is critical to co optimize task planning and energy scheduling. To the best of our knowledge, most of the existing task planning algorithms either ignore energy constraints or make energy scheduling decisions based on simple rules. To bridge these research gaps, we propose a combinatorial optimization model for the simultaneous task and energy planning (STEP) problem. In this paper, we propose three variants of STEP problems (i) the vehicle can visit stationary charging stations multiple times at various locations; (ii) the vehicle can efficiently coordinate with mobile charging stations to achieve zero waiting time for recharging, and (iii) the vehicle can maximally harvest solar energy by considering time variance in solar irradiance. Besides, in order to obtain fast and reliable solutions to STEP problems, we propose a neural combinatorial optimizer using the deep reinforcement learning algorithm with a proposed link information filter. The near-optimal solutions can be obtained very fast without solving the problem from scratch when environments change. Our simulation results demonstrate that (i) our proposed neural optimizer can find solutions close to the optimum and outperform the exact and heuristic algorithms in terms of computational cost; (ii) the end-to-end learning (directly mapping from perceptions to control) model outperforms the traditional learning (mapping from perception to prediction to control) model. (c) 2022 Elsevier Inc. All rights reserved.

引用

页码：931 / 946

页数：16

共 50 条

[1] A novel hybrid column generation-metaheuristic approach for the vehicle routing problem with general soft time window
Beheshti, Ali Kourank
Hejazi, Seyed Reza
[J]. INFORMATION SCIENCES, 2015, 316 : 598 - 615
[2] Bello I., 2017, Neural combinatorial optimization with reinforcement learning, P1
[3] Chorowski J, 2014, End-to-End Continuous Speech Recognition Using Attention-Based Recurrent NN: First Results, P1
[4] GGA: A modified genetic algorithm with gradient-based local search for solving constrained optimization problems
D'Angelo, Gianni
Palmieri, Francesco
[J]. INFORMATION SCIENCES, 2021, 547 : 136 - 162
[5] A two-step personalized location recommendation based on multi-objective immune algorithm
Geng, Bingrui
Jiao, Licheng
Gong, Maoguo
Li, Lingling
Wu, Yue
[J]. INFORMATION SCIENCES, 2019, 475 : 161 - 181
[6] A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
Grondman, Ivo
Busoniu, Lucian
Lopes, Gabriel A. D.
Babuska, Robert
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1291 - 1307
[7] Guo M., 2020, Conference on Robot Learning, P283
[8] A new Q-learning algorithm based on the Metropolis criterion
Guo, MZ
Liu, Y
Malec, J
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (05): : 2140 - 2143
[9] Driving preference analysis and electricity pricing strategy comparison for electric vehicles in smart city
Hu, Bingtao
Feng, Yixiong
Sun, Jianzhe
Gao, Yicong
Tan, Jianrong
[J]. INFORMATION SCIENCES, 2019, 504 : 202 - 220
[10] Joshi ChaitanyaK, 2019, NEURIPS 2019 GRAPH R

← 1 2 3 4 5 →