共 20 条
Contextual bandits learning-based branch-and-price-and-cut algorithm for the two-dimensional vector packing problem with conflicts and time windows
被引:0
|作者:
Chen, Yanru
[2
]
Gao, Mujin
[1
]
Zhang, Zongcheng
[3
]
Li, Junheng
[1
]
Wahab, M. I. M.
[4
]
Jiang, Yangsheng
[3
]
机构:
[1] Southwest Jiaotong Univ, Sch Econ & Management, 111,1st Sect,Northern 2nd Ring Rd, Chengdu 610031, Sichuan, Peoples R China
[2] Southwest Jiaotong Univ, Key Lab Serv Sci & Innovat Sichuan Prov, 111,1st Sect,Northern 2nd Ring Rd, Chengdu 610031, Sichuan, Peoples R China
[3] Southwest Jiaotong Univ, Sch Transportat & Logist, 111,1st Sect,Northern 2nd Ring Rd, Chengdu 610031, Sichuan, Peoples R China
[4] Toronto Metropolitan Univ, Dept Mech Ind & Mechatron Engn, 350 Victoria St, Toronto, ON M5B 2K3, Canada
关键词:
Two-dimensional vector packing problem;
Conflict graph;
Time window;
Branch-and-price-and-cut;
Contextual bandits;
VEHICLE-ROUTING PROBLEM;
DIMENSIONAL BIN-PACKING;
SEARCH;
COLONY;
D O I:
10.1016/j.tre.2024.103866
中图分类号:
F [经济];
学科分类号:
02 ;
摘要:
A two-dimensional vector packing problem with conflicts and time windows (2DVPPCTW) is investigated in this study. It consists of packing items into the minimum number of bins, and items are characterized by different weights, volumes, and time windows. Items also have conflicts and cannot be packed in the same bin. We formulate the 2DVPPCTW as an integer programming model and reformulate it to the master problem and the subproblem based on the Danzig-Wolfe decomposition. An exact algorithm, contextual bandits learning-based branch-andprice-and-cut algorithm (CBL-BPC), is proposed for the 2DVPPCTW with reinforcement learning technique. In particular, we provide a CBL framework for the subproblem, which usually poses considerable computational challenges. Five heuristic algorithms, namely, adaptive large neighborhood search (ALNS), ant colony optimization heuristic (ACO), heuristic dynamic programming (DP), a combination of ALNS and heuristic DP, and a combination of ACO and heuristic DP, are developed as bandit arms in the CBL framework. The CBL framework adaptively chooses one of five heuristics algorithms to solve the subproblem by learning from previous experiences. An exact dynamic programming algorithm is invoked to guarantee optimality once the CBL fails to find a better solution to the subproblem. Rounded capacity inequalities and accelerating strategies are introduced to accelerate the solution. An extensive computational study shows that the CBL-BPC can solve all 800 instances optimally within a reasonable time frame and is highly competitive with state-of-the-art exact and heuristics methods.
引用
收藏
页数:41
相关论文