Solving Dynamic Traveling Salesman Problems With Deep Reinforcement Learning

被引：98

作者：

Zhang, Zizhen ^{[1
,2
]}

Liu, Hong ^{[3
]}

Zhou, MengChu ^{[4
,5
]}

Wang, Jiahai ^{[1
,2
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China

[2] Sun Yat Sen Univ, Guangdong Key Lab Big Data Anal & Proc, Guangzhou 510275, Peoples R China

[3] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China

[4] New Jersey Inst Technol, Helen & John C Hartmann Dept Elect & Comp Engn, Newark, NJ 07102 USA

[5] St Petersburg State Marine Tech Univ, Dept Cyber Phys Syst, St Petersburg 198262, Russia

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 04期

基金：

美国国家科学基金会;

关键词：

Routing; Traveling salesman problems; Planning; Real-time systems; Decision making; Computational modeling; Reinforcement learning; Attention model; deep reinforcement learning (DRL); dynamic traveling salesman problem (DTSP); machine learning; policy gradient; ANT COLONY OPTIMIZATION; ROUTING PROBLEM; TIME WINDOWS; ALGORITHM; SEARCH;

D O I：

10.1109/TNNLS.2021.3105905

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A traveling salesman problem (TSP) is a well-known NP-complete problem. Traditional TSP presumes that the locations of customers and the traveling time among customers are fixed and constant. In real-life cases, however, the traffic conditions and customer requests may change over time. To find the most economic route, the decisions can be made constantly upon the time-point when the salesman completes his service of each customer. This brings in a dynamic version of the traveling salesman problem (DTSP), which takes into account the information of real-time traffic and customer requests. DTSP can be extended to a dynamic pickup and delivery problem (DPDP). In this article, we ameliorate the attention model to make it possible to perceive environmental changes. A deep reinforcement learning algorithm is proposed to solve DTSP and DPDP instances with a size of up to 40 customers in 100 locations. Experiments show that our method can capture the dynamic changes and produce a highly satisfactory solution within a very short time. Compared with other baseline approaches, more than 5% improvements can be observed in many cases.

引用

页码：2119 / 2132

页数：14

共 52 条

[1] Abbatecola Lorenzo, 2016, 2016 IEEE International Conference on Automation Science and Engineering (CASE), P361, DOI 10.1109/COASE.2016.7743429
[2] DYNAMIC PROGRAMMING TREATMENT OF TRAVELLING SALESMAN PROBLEM
BELLMAN, R
[J]. JOURNAL OF THE ACM, 1962, 9 (01) : 61 - &
[3] Vehicle routing problem with time windows, part 1:: Route construction and local search algorithms
Bräysy, I
Gendreau, M
[J]. TRANSPORTATION SCIENCE, 2005, 39 (01) : 104 - 118
[4] Dynamic Traveling Salesman Problem: Value of Real-Time Traffic Information
Cheong, Taesu
White, Chelsea C., III
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2012, 13 (02) : 619 - 630
[5] Cho K., 2014, P 2014 C EMP METH NA, P1724, DOI 10.3115/v1/d14-1179
[6] Analysis and Branch-and-Cut Algorithm for the Time-Dependent Travelling Salesman Problem
Cordeau, Jean-Francois
Ghiani, Gianpaolo
Guerriero, Emanuela
[J]. TRANSPORTATION SCIENCE, 2014, 48 (01) : 46 - 58
[7] Dai HJ, 2017, ADV NEUR IN, V30
[8] Dai HJ, 2016, PR MACH LEARN RES, V48
[9] Demsar J, 2006, J MACH LEARN RES, V7, P1
[10] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 6 →