Learning Heuristics for the TSP by Policy Gradient

被引:231
作者
Deudon, Michel [1 ]
Cournut, Pierre [1 ]
Lacoste, Alexandre [2 ]
Adulyasak, Yossiri [3 ]
Rousseau, Louis-Martin [4 ]
机构
[1] Polytech France, Palaiseau, France
[2] Element AI, Montreal, PQ, Canada
[3] HEC Montreal, Montreal, PQ, Canada
[4] Polytech Montreal, Montreal, PQ, Canada
来源
INTEGRATION OF CONSTRAINT PROGRAMMING, ARTIFICIAL INTELLIGENCE, AND OPERATIONS RESEARCH, CPAIOR 2018 | 2018年 / 10848卷
关键词
Combinatorial optimization; Traveling salesman; Policy gradient; Neural networks; Reinforcement learning; GAME; GO;
D O I
10.1007/978-3-319-93031-2_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the study is to provide interesting insights on how efficient machine learning algorithms could be adapted to solve combinatorial optimization problems in conjunction with existing heuristic procedures. More specifically, we extend the neural combinatorial optimization framework to solve the traveling salesman problem (TSP). In this framework, the city coordinates are used as inputs and the neural network is trained using reinforcement learning to predict a distribution over city permutations. Our proposed framework differs from the one in [1] since we do not make use of the Long Short-Term Memory (LSTM) architecture and we opted to design our own critic to compute a baseline for the tour length which results in more efficient learning. More importantly, we further enhance the solution approach with the well-known 2-opt heuristic. The results show that the performance of the proposed framework alone is generally as good as high performance heuristics (ORTools). When the framework is equipped with a simple 2-opt procedure, it could outperform such heuristics and achieve close to optimal results on 2D Euclidean graphs. This demonstrates that our approach based on machine learning techniques could learn good heuristics which, once being enhanced with a simple local search, yield promising results.
引用
收藏
页码:170 / 181
页数:12
相关论文
共 25 条
[1]  
[Anonymous], 1992, MEMBRANE HDB, DOI DOI 10.1007/978
[2]  
Applegate D., 2006, Concorde TSP Solver
[3]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4]  
Bello Irwan, 2017, WORKSHOP TRACK P
[5]   Improved filtering for weighted circuit constraints [J].
Benchimol, Pascal ;
van Hoeve, Willem-Jan ;
Regin, Jean-Charles ;
Rousseau, Louis-Martin ;
Rueher, Michel .
CONSTRAINTS, 2012, 17 (03) :205-233
[6]  
Bergman D, 2016, ARTIF INTELL-FOUND, P205, DOI 10.1007/978-3-319-42849-9_11
[7]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[8]  
Christofides N, 1976, Report No.: 388
[9]   DASH: Dynamic Approach for Switching Heuristics [J].
Di Liberto, Giovanni ;
Kadioglu, Serdar ;
Leo, Kevin ;
Malitsky, Yuri .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2016, 248 (03) :943-953
[10]   Video Captioning With Attention-Based LSTM and Semantic Consistency [J].
Gao, Lianli ;
Guo, Zhao ;
Zhang, Hanwang ;
Xu, Xing ;
Shen, Heng Tao .
IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) :2045-2055