YOLOPose: Transformer-Based Multi-object 6D Pose Estimation Using Keypoint Regression

被引：15

作者：

Amini, Arash ^{[1
]}

Periyasamy, Arul Selvam ^{[1
]}

Behnke, Sven ^{[1
]}

机构：

[1] Univ Bonn, Autonomous Intelligent Syst, Bonn, Germany

来源：

INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17 | 2023年 / 577卷

关键词：

Object pose estimation; Scene understanding; Vision transformer; Object detection;

D O I：

10.1007/978-3-031-22216-0_27

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.

引用

页码：392 / 406

页数：15

共 31 条

[1]

Amini A., 2021, DAGM GERMAN C PATTER

[2] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[3] End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization [J].

Chen, Bo ;

Parra, Alvaro ;

Cao, Jiewei ;

Li, Nan ;

Chin, Tat-Jun .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8097-8106

[4]

Cohen Nadav, 2017, INT C LEARNING REPRE

[5] Complete solution classification for the Perspective-Three-Point problem [J].

Gao, XS ;

Hou, XR ;

Tang, JL ;

Cheng, HF .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (08) :930-943

[6]

Hartley R., 2004, MULTIPLE VIEW GEOMET, V2nd, P201, DOI [DOI 10.1017/CB09780511811685, DOI 10.1017/CBO9780511811685]

[7] BOP Challenge 2020 on 6D Object Localization [J].

Hodan, Tomas ;

Sundermeyer, Martin ;

Drost, Bertram ;

Labbe, Yann ;

Brachmann, Eric ;

Michel, Frank ;

Rother, Carsten ;

Matas, Jiri .

COMPUTER VISION - ECCV 2020 WORKSHOPS, PT II, 2020, 12536 :577-594

[8] Single-Stage 6D Object Pose Estimation [J].

Hu, Yinlin ;

Fua, Pascal ;

Wang, Wei ;

Salzmann, Mathieu .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2927-2936

[9] Segmentation-driven 6D Object Pose Estimation [J].

Hu, Yinlin ;

Hugonot, Joachim ;

Fua, Pascal ;

Salzmann, Mathieu .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3380-3389

[10] The Hungarian Method for the assignment problem [J].

Kuhn, HW .

NAVAL RESEARCH LOGISTICS, 2005, 52 (01) :7-21

← 1 2 3 4 →