YOLOPose: Transformer-Based Multi-object 6D Pose Estimation Using Keypoint Regression

被引:15
作者
Amini, Arash [1 ]
Periyasamy, Arul Selvam [1 ]
Behnke, Sven [1 ]
机构
[1] Univ Bonn, Autonomous Intelligent Syst, Bonn, Germany
来源
INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17 | 2023年 / 577卷
关键词
Object pose estimation; Scene understanding; Vision transformer; Object detection;
D O I
10.1007/978-3-031-22216-0_27
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
引用
收藏
页码:392 / 406
页数:15
相关论文
共 31 条
[1]  
Amini A., 2021, DAGM GERMAN C PATTER
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]   End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization [J].
Chen, Bo ;
Parra, Alvaro ;
Cao, Jiewei ;
Li, Nan ;
Chin, Tat-Jun .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8097-8106
[4]  
Cohen Nadav, 2017, INT C LEARNING REPRE
[5]   Complete solution classification for the Perspective-Three-Point problem [J].
Gao, XS ;
Hou, XR ;
Tang, JL ;
Cheng, HF .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (08) :930-943
[6]  
Hartley R., 2004, MULTIPLE VIEW GEOMET, V2nd, P201, DOI [DOI 10.1017/CB09780511811685, DOI 10.1017/CBO9780511811685]
[7]   BOP Challenge 2020 on 6D Object Localization [J].
Hodan, Tomas ;
Sundermeyer, Martin ;
Drost, Bertram ;
Labbe, Yann ;
Brachmann, Eric ;
Michel, Frank ;
Rother, Carsten ;
Matas, Jiri .
COMPUTER VISION - ECCV 2020 WORKSHOPS, PT II, 2020, 12536 :577-594
[8]   Single-Stage 6D Object Pose Estimation [J].
Hu, Yinlin ;
Fua, Pascal ;
Wang, Wei ;
Salzmann, Mathieu .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2927-2936
[9]   Segmentation-driven 6D Object Pose Estimation [J].
Hu, Yinlin ;
Hugonot, Joachim ;
Fua, Pascal ;
Salzmann, Mathieu .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3380-3389
[10]   The Hungarian Method for the assignment problem [J].
Kuhn, HW .
NAVAL RESEARCH LOGISTICS, 2005, 52 (01) :7-21