Online 3D Bin Packing with Constrained Deep Reinforcement Learning

被引：0

作者：

Zhao, Hang ^{[1
]}

She, Qijin ^{[1
]}

Zhu, Chenyang ^{[1
]}

Yang, Yin ^{[2
]}

Xu, Kai ^{[1
]}

机构：

[1] Natl Univ Def Technol, Zunyi, Guizhou, Peoples R China

[2] Clemson Univ, Clemson, SC 29631 USA

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

关键词：

HEURISTICS; SEARCH;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP). In our problem, the agent has limited information about the items to be packed into a single bin, and an item must be packed immediately after its arrival without buffering or readjusting. The item's placement also subjects to the constraints of order dependence and physical stability. We formulate this online 3D-BPP as a constrained Markov decision process (CMDP). To solve the problem, we propose an effective and easy-to-implement constrained deep reinforcement learning (DRL) method under the actor-critic framework. In particular, we introduce a prediction-and-projection scheme: The agent first predicts a feasibility mask for the placement actions as an auxiliary task and then uses the mask to modulate the action probabilities output by the actor during training. Such supervision and projection facilitate the agent to learn feasible policies very efficiently. Our method can be easily extended to handle lookahead items, multi-bin packing, and item re-orienting. We have conducted extensive evaluation showing that the learned policy significantly outperforms the state-of-the-art methods. A preliminary user study even suggests that our method might attain a human-level performance.

引用

页码：741 / 749

页数：9

共 51 条

[21] Haarnoja T., 2018, ARXIV PREPRINT ARXIV
[22] Hifi M., 2010, Electron. Notes Discrete Math., V36, P993
[23] Hu H., 2017, Solving a new 3d bin packing problem with deep reinforcement learning method
[24] Johnson D. S., 1974, SIAM Journal on Computing, V3, P299, DOI 10.1137/0203025
[25] MATHEMATICAL-METHODS OF ORGANIZING AND PLANNING PRODUCTION
KANTOROVICH, LV
[J]. MANAGEMENT SCIENCE, 1960, 6 (04) : 366 - 422
[26] Karabulut K, 2004, LECT NOTES COMPUT SC, V3261, P441
[27] Karmarkar N., 1982, 23rd Annual Symposium on Foundations of Computer Science, P312, DOI 10.1109/SFCS.1982.61
[28] Korte B., 2012, KOMBINATORISCHE OPTI, P499
[29] AN EXACT ALGORITHM FOR THE DUAL BIN PACKING PROBLEM
LABBE, M
LAPORTE, G
MARTELLO, S
[J]. OPERATIONS RESEARCH LETTERS, 1995, 17 (01) : 9 - 18
[30] Laterre A., 2018, ARXIV PREPRINT ARXIV

← 1 2 3 4 5 6 →