Solving Dynamic Traveling Salesman Problems With Deep Reinforcement Learning

被引:98
作者
Zhang, Zizhen [1 ,2 ]
Liu, Hong [3 ]
Zhou, MengChu [4 ,5 ]
Wang, Jiahai [1 ,2 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China
[2] Sun Yat Sen Univ, Guangdong Key Lab Big Data Anal & Proc, Guangzhou 510275, Peoples R China
[3] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China
[4] New Jersey Inst Technol, Helen & John C Hartmann Dept Elect & Comp Engn, Newark, NJ 07102 USA
[5] St Petersburg State Marine Tech Univ, Dept Cyber Phys Syst, St Petersburg 198262, Russia
基金
美国国家科学基金会;
关键词
Routing; Traveling salesman problems; Planning; Real-time systems; Decision making; Computational modeling; Reinforcement learning; Attention model; deep reinforcement learning (DRL); dynamic traveling salesman problem (DTSP); machine learning; policy gradient; ANT COLONY OPTIMIZATION; ROUTING PROBLEM; TIME WINDOWS; ALGORITHM; SEARCH;
D O I
10.1109/TNNLS.2021.3105905
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A traveling salesman problem (TSP) is a well-known NP-complete problem. Traditional TSP presumes that the locations of customers and the traveling time among customers are fixed and constant. In real-life cases, however, the traffic conditions and customer requests may change over time. To find the most economic route, the decisions can be made constantly upon the time-point when the salesman completes his service of each customer. This brings in a dynamic version of the traveling salesman problem (DTSP), which takes into account the information of real-time traffic and customer requests. DTSP can be extended to a dynamic pickup and delivery problem (DPDP). In this article, we ameliorate the attention model to make it possible to perceive environmental changes. A deep reinforcement learning algorithm is proposed to solve DTSP and DPDP instances with a size of up to 40 customers in 100 locations. Experiments show that our method can capture the dynamic changes and produce a highly satisfactory solution within a very short time. Compared with other baseline approaches, more than 5% improvements can be observed in many cases.
引用
收藏
页码:2119 / 2132
页数:14
相关论文
共 52 条