T6D-Direct: Transformers for Multi-object 6D Pose Direct Regression

被引:12
作者
Amini, Arash [1 ]
Periyasamy, Arul Selvam [1 ]
Behnke, Sven [1 ]
机构
[1] Univ Bonn, Autonomous Intelligent Syst, Bonn, Germany
来源
PATTERN RECOGNITION, DAGM GCPR 2021 | 2021年 / 13024卷
关键词
Pose estimation; Transformer; Self-attention;
D O I
10.1007/978-3-030-92659-5_34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
6D pose estimation is the task of predicting the translation and orientation of objects in a given input image, which is a crucial prerequisite for many robotics and augmented reality applications. Lately, the Transformer Network architecture, equipped with multi-head self-attention mechanism, is emerging to achieve state-of-the-art results in many computer vision tasks. DETR, a Transformer-based model, formulated object detection as a set prediction problem and achieved impressive results without standard components like region of interest pooling, non-maximal suppression, and bounding box proposals. In this work, we propose T6D-Direct, a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation. We evaluate the performance of our method on the YCB-Video dataset. Our method achieves the fastest inference time, and the pose estimation accuracy is comparable to state-of-the-art methods.
引用
收藏
页码:530 / 544
页数:15
相关论文
共 37 条
[1]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[2]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[3]  
Dosovitskiy A., 2021, INT C LEARNING REPRE, DOI DOI 10.48550/ARXIV.2010.11929
[4]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[5]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[6]  
Hinterstoisser S., 2013, ACCV, V7724, P548, DOI [DOI 10.1007/978-3-642-37331-242, 10.1007/978- 3- 642-37331-2_42]
[7]   BOP Challenge 2020 on 6D Object Localization [J].
Hodan, Tomas ;
Sundermeyer, Martin ;
Drost, Bertram ;
Labbe, Yann ;
Brachmann, Eric ;
Michel, Frank ;
Rother, Carsten ;
Matas, Jiri .
COMPUTER VISION - ECCV 2020 WORKSHOPS, PT II, 2020, 12536 :577-594
[8]   Learning non-maximum suppression [J].
Hosang, Jan ;
Benenson, Rodrigo ;
Schiele, Bernt .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6469-6477
[9]   Single-Stage 6D Object Pose Estimation [J].
Hu, Yinlin ;
Fua, Pascal ;
Wang, Wei ;
Salzmann, Mathieu .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2927-2936
[10]   Segmentation-driven 6D Object Pose Estimation [J].
Hu, Yinlin ;
Hugonot, Joachim ;
Fua, Pascal ;
Salzmann, Mathieu .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3380-3389