T6D-Direct: Transformers for Multi-object 6D Pose Direct Regression

被引：12

作者：

Amini, Arash ^{[1
]}

Periyasamy, Arul Selvam ^{[1
]}

Behnke, Sven ^{[1
]}

机构：

[1] Univ Bonn, Autonomous Intelligent Syst, Bonn, Germany

来源：

PATTERN RECOGNITION, DAGM GCPR 2021 | 2021年 / 13024卷

关键词：

Pose estimation; Transformer; Self-attention;

D O I：

10.1007/978-3-030-92659-5_34

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

6D pose estimation is the task of predicting the translation and orientation of objects in a given input image, which is a crucial prerequisite for many robotics and augmented reality applications. Lately, the Transformer Network architecture, equipped with multi-head self-attention mechanism, is emerging to achieve state-of-the-art results in many computer vision tasks. DETR, a Transformer-based model, formulated object detection as a set prediction problem and achieved impressive results without standard components like region of interest pooling, non-maximal suppression, and bounding box proposals. In this work, we propose T6D-Direct, a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation. We evaluate the performance of our method on the YCB-Video dataset. Our method achieves the fastest inference time, and the pose estimation accuracy is comparable to state-of-the-art methods.

引用

页码：530 / 544

页数：15

共 37 条

[1] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[2]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[3]

Dosovitskiy A., 2021, INT C LEARNING REPRE, DOI DOI 10.48550/ARXIV.2010.11929

[4] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[5] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[6]

Hinterstoisser S., 2013, ACCV, V7724, P548, DOI [DOI 10.1007/978-3-642-37331-242, 10.1007/978- 3- 642-37331-2_42]

[7] BOP Challenge 2020 on 6D Object Localization [J].

Hodan, Tomas ;

Sundermeyer, Martin ;

Drost, Bertram ;

Labbe, Yann ;

Brachmann, Eric ;

Michel, Frank ;

Rother, Carsten ;

Matas, Jiri .

COMPUTER VISION - ECCV 2020 WORKSHOPS, PT II, 2020, 12536 :577-594

[8] Learning non-maximum suppression [J].

Hosang, Jan ;

Benenson, Rodrigo ;

Schiele, Bernt .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6469-6477

[9] Single-Stage 6D Object Pose Estimation [J].

Hu, Yinlin ;

Fua, Pascal ;

Wang, Wei ;

Salzmann, Mathieu .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2927-2936

[10] Segmentation-driven 6D Object Pose Estimation [J].

Hu, Yinlin ;

Hugonot, Joachim ;

Fua, Pascal ;

Salzmann, Mathieu .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3380-3389

← 1 2 3 4 →