Learning practically feasible policies for online 3D bin packing

被引:0
作者
Hang Zhao
Chenyang Zhu
Xin Xu
Hui Huang
Kai Xu
机构
[1] National University of Defense Technology,School of Computer Science
[2] National University of Defense Technology,College of Intelligence Science and Technology
[3] Shenzhen University,College of Computer Science and Software Engineering
来源
Science China Information Sciences | 2022年 / 65卷
关键词
bin packing problem; online 3D-BPP; reinforcement learning;
D O I
暂无
中图分类号
学科分类号
摘要
We tackle the online 3D bin packing problem (3D-BPP), a challenging yet practically useful variant of the classical bin packing problem. In this problem, the items are delivered to the agent without informing the full sequence information. The agent must directly pack these items into the target bin stably without changing their arrival order, and no further adjustment is permitted. Online 3D-BPP can be naturally formulated as a Markov decision process (MDP). We adopt deep reinforcement learning, in particular, the on-policy actor-critic framework, to solve this MDP with constrained action space. To learn a practically feasible packing policy, we propose three critical designs. First, we propose an online analysis of packing stability based on a novel stacking tree. It attains a high analysis accuracy while reducing the computational complexity from O(N2) to O(N log N), making it especially suited for reinforcement learning training. Second, we propose a decoupled packing policy learning for different dimensions of placement which enables high-resolution spatial discretization and hence high packing precision. Third, we introduce a reward function that dictates the robot to place items in a far-to-near order and therefore simplifies the collision avoidance in movement planning of the robotic arm. Furthermore, we provide a comprehensive discussion on several key implemental issues. The extensive evaluation demonstrates that our learned policy outperforms the state-of-the-art methods significantly and is practically usable for real-world applications.
引用
收藏
相关论文
共 42 条
[1]  
Martello S(2000)The three-dimensional bin packing problem Oper Res 48 256-267
[2]  
Pisinger D(2008)Extreme point-based heuristics for three-dimensional bin packing Informs J Comput 20 368-384
[3]  
Vigo D(1960)Mathematical methods of organizing and planning production Manage Sci 6 366-422
[4]  
Crainic T G(2003)Guided local search for the three-dimensional bin-packing problem Informs J Comput 15 267-283
[5]  
Perboli G(2003)A greedy search for the three-dimensional bin packing problem: the packing static stability case Int Trans Oper Res 10 141-153
[6]  
Tadei R(1999)Approximation algorithms for the oriented two-dimensional bin packing problem Eur J Oper Res 112 158-166
[7]  
Kantorovich L V(2009)TS2PACK: a two-level tabu search for the three-dimensional bin packing problem Eur J Oper Res 195 744-760
[8]  
Faroe O(2020)Smart pack: online autonomous object-packing system using RGB-D sensor data Sensors 20 4448-6
[9]  
Pisinger D(2007)Velocity-based shock propagation for multibody dynamics animation ACM Trans Graph 26 12-533
[10]  
Zachariasen M(2012)Automated constraint placement to maintain pile shape ACM Trans Graph 31 1-15