Reinforcement Learning-Based Nonautoregressive Solver for Traveling Salesman Problems

被引:2
作者
Xiao, Yubin [1 ]
Wang, Di [2 ]
Li, Boyang [3 ]
Chen, Huanhuan [4 ]
Pang, Wei [5 ]
Wu, Xuan [1 ]
Li, Hao [6 ]
Xu, Dong [7 ]
Liang, Yanchun [8 ]
Zhou, You [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Nanyang Technol Univ, Joint NTU UBC Res Ctr Excellence Act Living Elderl, Singapore, Singapore
[3] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 639798, Singapore
[4] Univ Sci & Technol China, Sch Comp Sci, Hefei 230026, Peoples R China
[5] Heriot Watt Univ, Sch Math & Comp Sci, Edinburgh EH14 4AS, Scotland
[6] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
[7] Univ Missouri, Bond Life Sci Ctr, Dept Elect Engn & Comp Sci, Columbia, MO 65211 USA
[8] Zhuhai Coll Sci & Technol, Sch Comp Sci, Zhuhai 519000, Peoples R China
基金
新加坡国家研究基金会;
关键词
Training; Artificial neural networks; Computational modeling; Decoding; Optimization; Inference algorithms; Traveling salesman problems; Computer architecture; Reinforcement learning; Deep learning; Graph neural network (NN); nonautoregressive decoding; reinforcement learning (RL); traveling salesman problem (TSP); VEHICLE-ROUTING PROBLEMS; ALGORITHM;
D O I
10.1109/TNNLS.2024.3483231
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The traveling salesman problem (TSP) is a well-known combinatorial optimization problem (COP) with broad real-world applications. Recently, neural networks (NNs) have gained popularity in this research area because as shown in the literature, they provide strong heuristic solutions to TSPs. Compared to autoregressive neural approaches, nonautoregressive (NAR) networks exploit the inference parallelism to elevate inference speed but suffer from comparatively low solution quality. In this article, we propose a novel NAR model named, which incorporates a specially designed architecture and an enhanced reinforcement learning (RL) strategy. To the best of our knowledge, is the first TSP solver that successfully combines RL and NAR networks. The key lies in the incorporation of NAR network output decoding into the training process. efficiently represents TSP-encoded information as rewards and seamlessly integrates it into RL strategies, while maintaining consistent TSP sequence constraints during both training and testing phases. Experimental results on both synthetic and real-world TSPs demonstrate that outperforms five state-of-the-art (SOTA) models in terms of solution quality, inference speed, and generalization to unseen scenarios.
引用
收藏
页码:13402 / 13416
页数:15
相关论文
共 64 条
[1]   Implementing the Dantzig-Fulkerson-Johnson algorithm for large traveling salesman problems [J].
Applegate, D ;
Bixby, R ;
Chvátal, V ;
Cook, W .
MATHEMATICAL PROGRAMMING, 2003, 97 (1-2) :91-153
[2]  
Applegate D. L., 2007, TRAVELING SALESMAN P
[3]  
Bello I., 2016, INT C LEARN REPR
[4]   Machine learning for combinatorial optimization: A methodological tour d'horizon [J].
Bengio, Yoshua ;
Lodi, Andrea ;
Prouvost, Antoine .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2021, 290 (02) :405-421
[5]  
Bresson X., 2021, ARXIV
[6]  
Charikar M, 2001, SIAM PROC S, P642
[7]   Ship routing and scheduling: Status and perspectives [J].
Christiansen, M ;
Fagerholt, K ;
Ronen, D .
TRANSPORTATION SCIENCE, 2004, 38 (01) :1-18
[8]  
da Costa P., 2021, SN Computer Science, V2, P388, DOI DOI 10.1007/S42979-021-00779-2
[9]  
Dai HJ, 2017, ADV NEUR IN, V30
[10]   Traveling Salesman Problem with neighborhoods on a sphere in reflectance transformation imaging scenarios [J].
Deckerova, Jindriska ;
Faigl, Jan ;
Kratky, Vit .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 198